parse keywords??

TazzyTazzyTazzyTazzy Registered Users Posts: 32 Big grins
I could really use some help with parsing keywords from smugmug using PHP as I'm really stumped - been working on this WAY too long.:dunno

Keywords come in all kinds of styles. Comma separated, semi colon separated, quotes around spaces for two+ word keywords. How to parse all this? I need a code snipped that accepts a keyword string the the smugmug API and returns an array of keywords. I thought I had one, but it's just not cutting it.

I'm tinkering around with keywords for the drupal module for smugmug. You can see the keywords sorta working: http://dev.warmy.com/gallery/allkeywords

San francisco should be one word, not two, and there's a few more examples. San francisco is enclosed with double quotes (").

I think I have the keyword search/display down. http://dev.warmy.com/gallery/keyword/cody

Then under the picture, you can click on "+jerrad" to narrow down the keywords displayed - much like smugmug adds on keywords. It's started to look good.

I just need some help parsing the keywords properly.

One more twist, at least for me, is that many times keywords are supposed to be CSV, but smugmug tacks on a trailing ; (semi-colon) to the end of the CSV when I upload a file. The PHP code needs to account for that too.

Any help would be much appreciated! :clap

Comments

  • darryldarryl Registered Users Posts: 997 Major grins
    edited October 15, 2007
    Heya -- I'm just encountering this with my script to convert comments -> keywords.

    So far I've found that:

    - Apparently certain photo tagging tools (Photoshop, etc.) use semi-colons
    - SmugMug doesn't have a very elegant way of dealing with this
    - Double-quotes are your friend

    Here's the test case:
    keywords (typed directly into SmugMug):
    phrase one; onetag twotag "phrase two" some other words

    Here's the keywords that SmugMug ends up getting:
    - onetag twotag some other words
    - phrase one
    - phrase two

    Whoa, wacky, huh?

    phrase one, sure that makes sense. Delimiters are ; and the ^ (beginning of string). phrase two, right -- double-quotes.

    But WTF? double-quotes designate beginning and end of a keyword, but don't act as delimiters? That's stupid.

    I get it, I get it. Your docs say, "space-delimited keywords":
    http://www.smugmug.com/help/keywords-tags

    And then all of these pros came along with their fancy tagging programs and screwed you up with their semi-colons. So uh, ok, semi-colons are delimiters too.

    But engh. If you have a mixed case like me, with tags coming from both IPTC and end-users, you're kind of screwed.

    Worst case though (with the above) is that I should've gotten:
    - phrase one
    - onetag twotag
    - phrase two
    - some other words

    I mean, then I could at least see what SmugMug was doing. But concatenating keywords on either side of a double-quoted string? That's nutty.

    Ok, I guess my solution will be: double-quote everything, and hope for the best.
  • darryldarryl Registered Users Posts: 997 Major grins
    edited October 15, 2007
    One other reason to use double-quotes and eschew semi-colons:
    Keywords under 4 letters and numeric keywords are invalid unless double-quoted.
  • devbobodevbobo Registered Users, Retired Mod Posts: 4,339 SmugMug Employee
    edited October 15, 2007
    darryl wrote:
    - Apparently certain photo tagging tools (Photoshop, etc.) use semi-colons
    - SmugMug doesn't have a very elegant way of dealing with this
    - Double-quotes are your friend

    using a semi-colon as a delimiter for keywords works fine with SmugMug, including via the api.
    David Parry
    SmugMug API Developer
    My Photos
  • darryldarryl Registered Users Posts: 997 Major grins
    edited October 15, 2007
    My head hurts
    I just replicated SmugMug's ridiculous keyword parser in Perl. So now I determine keywords the same way SmugMug does. My method for testing was: enter keywords, click Save, see how SmugMug splits it up. Run script to grab same keywords from API, bang head against wall until my script parses it the same as SM.

    Ugh, my original code was embarassingly bad (not that anybody was looking). Here's my rewritten function:
    #!/usr/bin/perl
    
    use HTML::Entities ;
    
    sub parsekeys {
           my $oldkeywords = '' ;
           my @allkeywords = @quotedkeywords = @splitkeywords = () ;
    
           $oldkeywords = shift ;
    
    # Decode the HTML entities
            $oldkeywords = decode_entities($oldkeywords) ;
    
    # Let's pluck out the quoted strings ;
    
            while ($oldkeywords =~ m/"(.+?)"/) {
                    push (@quotedkeywords,$1) ;
                    $oldkeywords =~ s/".+?"// ;
            }
    
            $oldkeywords =~ s/^\s*// ;
            $oldkeywords =~ s/\s*$// ;
            $oldkeywords =~ s/\s\s+/ /g ;
            $oldkeywords =~ s/;\s+/;/g ;
    
            if ($oldkeywords =~ /;/) {
                    @splitkeywords = split (/;/, $oldkeywords) ;
            } else {
                    @splitkeywords = split (/\s/, $oldkeywords) ;
            }
    
            @allkeywords = (@quotedkeywords, @splitkeywords) ;
    
            sort @allkeywords ;
            return @allkeywords ;
    }
    
    1;
    
  • darryldarryl Registered Users Posts: 997 Major grins
    edited October 15, 2007
    I had to come up with the code above because I really do (ok, I really *might*) end up with keywords like this:

    "foo" phrase one; onetag twotag "phrase two" wtftag "other words" crash & burn

    Which SmugMug parses into this (| is my delimiter):

    foo|phrase one|phrase two|other words|onetag twotag wtftag crash burn
  • scottVscottV Registered Users Posts: 354 Major grins
    edited October 18, 2007
    ack, interesting stuff...im not looking forward to tackling this portion in my app. Seems to me like the api should return the list of keywords already parsed and using one common delimiter. They must have the keywords stored individually already to create the keyword cloud on the homepage and for searching. instead of returning whatever junk the user typed in we should get back the SM interpreted list. make sense?
  • darryldarryl Registered Users Posts: 997 Major grins
    edited October 18, 2007
    f00sion wrote:
    ack, interesting stuff...im not looking forward to tackling this portion in my app. Seems to me like the api should return the list of keywords already parsed and using one common delimiter. They must have the keywords stored individually already to create the keyword cloud on the homepage and for searching. instead of returning whatever junk the user typed in we should get back the SM interpreted list. make sense?

    Yeah, I'd love that. It's also a pain in the ass because if they ever do add an option to allow guests (password-authenticated for the site or gallery) to add/edit keywords, there's a high possibility that people will screw it up when they see semi-colons, commas, and double-quotes all being treated differently.

    Oh yeah, I need to fix this because apparently they now accept commas as delimiters. Crap.
  • darryldarryl Registered Users Posts: 997 Major grins
    edited October 18, 2007
    devbobo wrote:
    using a semi-colon as a delimiter for keywords works fine with SmugMug, including via the api.

    Well, I guess my problem is that I want to allow multiple people to tag photos, a la flickr, but if they're all sharing the same text-entry field, there's a lot of potential for screw-ups if there are different delimiters used. And a lot of these users are probably not that technically saavy.

    Flickr allows for:
    - adding tags one at a time.
    - deleting tags one at a time.
    - immediate parsing and display of how tag was parsed.

    This seems like a good model. Also, there is *often* a delay and lag in (I guess) the SmugMug AJAX posts that go from the Keyword UI to the database. I recently tried testing different delimiter combinations, and three separate "save keyword" attempts ended up getting concatenated together. Which is good I suppose that nothing got lost, except that there wasn't any visual feedback of this until the third or fourth attempt.
Sign In or Register to comment.