Keyword weirdness (catastrophic)

24

Comments

  • ChancyRatChancyRat Registered Users Posts: 2,141 Major grins
    edited January 29, 2014
    Allen, I thought it was double-quote-separated strings that aren't permitted? That commas and semis are permitted?
    My keyword page appears correct, yet I know I used commas quite a bit in some of them. So I'm wondering what happens if one opens the gallery keywords in the organizer - is that when the comma-separated keywords break?

    I guess I'm asking, what if I leave everything alone and don't even look at the keywords, and definitely do not go into gallery keyword/captions to edit them. Would leaving things be, be a good strategy to see if SM makes more changes?
  • AllenAllen Registered Users Posts: 10,013 Major grins
    edited January 29, 2014
    The new rules seem to be.
    Use " " if you like but I stopped using them and just separate every keyword with a semicolon.

    xxxx; "xxx xx"; xxx; "xxxxx xx"; xxxxx
    xxxx; "xxx xx"; xxx; "xxxxx xx";
    or
    xxxx; xxx xx; xxx; xxxxx xx; xxxx
    xxxx; xxx xx; xxx; xxxxx xx;

    At least until they change the rules again.
    Al - Just a volunteer here having fun
    My Website index | My Blog
  • denisegoldbergdenisegoldberg Administrators Posts: 14,383 moderator
    edited January 29, 2014
    ChancyRat wrote: »
    I guess I'm asking, what if I leave everything alone and don't even look at the keywords, and definitely do not go into gallery keyword/captions to edit them. Would leaving things be, be a good strategy to see if SM makes more changes?
    Right now if I go to your keyword page and click on some of the longer keywords - the ones that appear to be combinations - I am taken to a "gallery is empty" or "photo not found" page. To see this behavior, click on the keyword "problem body wrap problem bandage photo gallery problem bandages" from your keyword page.

    Also - I'd suggest you remove the images you placed on the side of your keyword page. If I click on a keyword the results page also has those side bars - but they are above the keyword results. That is, in order to see the results of clicking the keywords I need to scroll down to pass the 3 vertical images before I see the results.

    --- Denise
  • pbandjpbandj Registered Users Posts: 237 Major grins
    edited January 30, 2014
    Allen wrote: »
    You are right, this is not a bug. Smugmug intentionally DESTROYED every multi-keyword
    that was separated with a comma which was how we were instructed to do it by Smugmug. They
    could at least have searched and replaced every comma with a semicolon. Maybe part of the indexing code.

    Notice how they removed all the commas? Should be five in this but all gone.
    "mingo national wildlife refuge" autumn mingo "missouri parks" missouri landscapes

    It sounds like Smugmug thinks your keyword string never had commas, but they just used the quotations and spaces as keyword delimiters. Whether that's the case, or whether they removed the commas you think were there, it was very irresponsible of them to simply stop supporting spaces without first running some sort of data conversion to separate keywords with semicolons. Makes me think they're perhaps not the best company to trust with your meta data. What a nightmare for you to have to fix!
  • denisegoldbergdenisegoldberg Administrators Posts: 14,383 moderator
    edited January 30, 2014
    pbandj wrote: »
    ...Whether that's the case, or whether they removed the commas you think were there, it was very irresponsible of them to simply stop supporting spaces without first running some sort of data conversion to separate keywords with semicolons.
    Agreed.

    Beyond the non-conversion of old to new, there was no notice to us that this was happening (I know, not a surprise). I spent hours fixing my keywords after seeing the problem. But if Allen hadn't started this thread I wouldn't have known to look. That's scary given that it caused severe problems with keywords I had carefully added over the years.

    --- Denise
  • ChancyRatChancyRat Registered Users Posts: 2,141 Major grins
    edited January 30, 2014
    Right now if I go to your keyword page and click on some of the longer keywords - the ones that appear to be combinations - I am taken to a "gallery is empty" or "photo not found" page. To see this behavior, click on the keyword "problem body wrap problem bandage photo gallery problem bandages" from your keyword page.

    Also - I'd suggest you remove the images you placed on the side of your keyword page. If I click on a keyword the results page also has those side bars - but they are above the keyword results. That is, in order to see the results of clicking the keywords I need to scroll down to pass the 3 vertical images before I see the results.

    --- Denise

    bowdown.gif Thank you for the catch on the problem with graphics in the keyword gallery, duh on me for not thinking to check that function.

    Now I do see the problem with some of the merged/broken keywords. I hadn't scrutinized the list carefully enough to spot them. I checked several of the other multi-keyword keywords, and they do work properly. So I take it I can use the keyword page - what I see on it - as the proofreading tool? I don't need to go to every gallery and review every keyword for every image (in the caption/keyword area)? In a way finding the problem words by what I see on the keyword page (which is a product of Lamah's code, in case that matters - I have no idea what one would see in the normal SM keyword page or cloud) makes proofreading semi manageable. At least for me who doesn't have thousands of keywords.
  • denisegoldbergdenisegoldberg Administrators Posts: 14,383 moderator
    edited January 30, 2014
    ChancyRat wrote: »
    Now I do see the problem with some of the merged/broken keywords. I hadn't scrutinized the list carefully enough to spot them. I checked several of the other multi-keyword keywords, and they do work properly. So I take it I can use the keyword page - what I see on it - as the proofreading tool? I don't need to go to every gallery and review every keyword for every image (in the caption/keyword area)? In a way finding the problem words by what I see on the keyword page (which is a product of Lamah's code, in case that matters - I have no idea what one would see in the normal SM keyword page or cloud) makes proofreading semi manageable. At least for me who doesn't have thousands of keywords.
    You can use the keyword page (using Lamah's code) as a starting point but you can't fix the problem from there. You need to edit the keywords in the galleries.

    You could try setting up a smart gallery based on the invalid keyword and do the bulk edit from there. I tried that, worked sometimes but not others - I ended up editing the keywords in every gallery. Very tedious.

    --- Denise
  • agalliaagallia Registered Users Posts: 541 Major grins
    edited January 30, 2014
    So glad I ran across this thread! As I get back into my site using 'new' Smugmug I find that I not only need to address redesign of my theme, galleries (pages, folders) and content, but now screwed up keywords. Bummer. I am sure this problem cuts across almost all members. Seems like we lost the comfortable support connection we used to have. Have a good day.

    Just looked at New SM Keyword Help and it says, "Enter your keywords separated by spaces. To group them together you can separate the terms with a comma or enclose the term in quotes." Also addresses semi-colons in examples?
    Acadiana Al
    Smugmug: Bayou Oaks Studio
    Blog: Journey to the Light
    "Serendipity...the faculty of making happy, unexpected discoveries by accident." .... Horace Walpole, 1754 (perhaps that 'lucky shot' wasn't really luck at all!)
  • AllenAllen Registered Users Posts: 10,013 Major grins
    edited January 30, 2014
    agallia wrote: »
    ....
    Just looked at New SM Keyword Help and it says, "Enter your keywords separated by spaces. To group them together you can separate the terms with a comma or enclose the term in quotes." Also addresses semi-colons in examples?
    On that help page there is no mention of semicolons, only colons. We were at one time instructed by
    Smugmug to use commas then later to use semicolons. No mention of having to fix all the earlier
    keywords to comply to the latest rules.

    Smugmug, what do I do with all the semicolons in my thousands of keywords?
    Al - Just a volunteer here having fun
    My Website index | My Blog
  • guyguy Registered Users Posts: 191 Major grins
    edited January 30, 2014
    devbobo wrote: »
    Allen,

    This isn't a bug, we are no longer supporting space separated keyword strings.

    Please use commas or semi-colons.

    Cheers,

    David
    Allen wrote: »
    On that help page there is no mention of semicolons, only colons. We were at one time instructed by
    Smugmug to use commas then later to use semicolons. No mention of having to fix all the earlier
    keywords to comply to the latest rules.

    Smugmug, what do I do with all the semicolons in my thousands of keywords?

    I see that keywords in my early albums are now screwed up too!

    Tough to believe they would make such a change knowing how this would screw up long term users with old galleries without even telling us or caring about the amount of extra work it would take their customers to fix things.

    Is this really the case or do the people working on the site now have no knowledge or interest of how things worked in the past & what we were instructed to do before?
  • agalliaagallia Registered Users Posts: 541 Major grins
    edited January 30, 2014
    Allen wrote: »
    On that help page there is no mention of semicolons, only colons. We were at one time instructed by
    Smugmug to use commas then later to use semicolons. No mention of having to fix all the earlier
    keywords to comply to the latest rules.

    Smugmug, what do I do with all the semicolons in my thousands of keywords?
    Corrected on semicolons. Thanks, Allen. Help page says, "...Each word in your quick entry will be separated by commas." and "...Separate keywords with commas, colons, or enclose the group in quotes. We'll interpret them the same way." Assuming this is now the way it is, lots of work ahead to 'fix' lots of keywords.
    Acadiana Al
    Smugmug: Bayou Oaks Studio
    Blog: Journey to the Light
    "Serendipity...the faculty of making happy, unexpected discoveries by accident." .... Horace Walpole, 1754 (perhaps that 'lucky shot' wasn't really luck at all!)
  • pilotdavepilotdave Registered Users Posts: 785 Major grins
    edited January 31, 2014
    guy wrote: »
    I see that keywords in my early albums are now screwed up too!

    Tough to believe they would make such a change knowing how this would screw up long term users with old galleries without even telling us or caring about the amount of extra work it would take their customers to fix things.

    Is this really the case or do the people working on the site now have no knowledge or interest of how things worked in the past & what we were instructed to do before?

    FYI, I noticed on my site that it isn't finished screwing things up yet. I have many old galleries where keywords were separated by spaces (which was acceptable at the time). Many of the newly created combined keywords haven't been indexed yet. I can go to the galleries and see the ruined keywords, but they don't show up in the keywords list yet and no photos are found when clicking on them. So even if I fix all the smugmug-induced errors showing on the keywords page, I know I'm not even close to cleaning up smugmug's mess.

    Smugmug, you are costing me many hours of work to fix a mistake you made. I'm not happy about this.

    Dave
  • thenickdudethenickdude Registered Users Posts: 1,302 Major grins
    edited January 31, 2014
    Is it just me, or has the behaviour now been fixed? If I enter this keyword string in the UI, which is a horrifying mix of styles:
    one "twenty three" "fourty five" six seven, eight, ninety nine; ten
    

    It parses out into a faultless:
    one | twenty three | fourty five | six | seven | eight | ninety nine | ten
    

    Which in the Edit Photo Info menu renders nicely normalised:
    one; twenty three; fourty five; six; seven; eight; ninety nine; ten
    
  • denisegoldbergdenisegoldberg Administrators Posts: 14,383 moderator
    edited January 31, 2014
    Lamah wrote: »
    Is it just me, or has the behaviour now been fixed? If I enter this keyword string in the UI, which is a horrifying mix of styles:
    one "twenty three" "fourty five" six seven, eight, ninety nine; ten
    

    It parses out into a faultless:
    one | twenty three | fourty five | six | seven | eight | ninety nine | ten
    
    The problem on my site was in old galleries, not in newly created keywords.
    I'm still finding messed up keywords that I missed when I was correcting keywords on my site.

    --- Denise
  • yaypieyaypie Registered Users Posts: 46 Big grins
    edited February 5, 2014
    Thanks for all the feedback about this change, everyone. You're right: this was not a good idea, and we shouldn't have done it. But given that we did do it, we should at least have provided some warning and a better explanation of what was changing and why.

    We're going to revert this change and start supporting space-delimited keywords again soon (once we finish testing the new changes and ensuring that everything works). You don't need to go back and update your old keywords; just leave them be, and once the new changes go live, everything should be back to normal.

    Read on for some (fairly lengthy) technical details if you're interested.

    As a result of your feedback here, we took a deep dive into our keyword parsing code and rewrote it from scratch. We analyzed the keywords of every single image and album SmugMug hosts, identified the most commonly used delimiters, and created a new parser that we think will do a much better job of handling the vast range of keyword formats users (and cameras, and editing software) use.

    (Fun fact: commas are the most popular delimiter character on SmugMug, followed by semicolons, and then spaces, and then pipes.)

    The truth is that our keyword parsing code was some of the oldest and cruftiest code we have, dating from the very earliest days of SmugMug. Any change to it is fraught with peril because keywords are fundamental metadata used throughout SmugMug's interface and backend systems, so over the years the code just got more and more complex and fragmented as it was gradually adapted to handle new requirements, but it was never fully revamped or cleaned up due to the risk involved. Now it has been, and we'll launch the revamped code as soon as we're through testing it.

    These are big changes so we're testing them very thoroughly, but the end result is that keywords (including all existing space-delimited keywords) should soon be parsed more consistently throughout the site. You will be able to continue entering keywords separated by spaces, or by commas, or by semicolons, or by pipes, or by a mix of commas, semicolons, or pipes. Existing keywords that you've already entered should be handled correctly without any changes on your part.

    I hope you'll forgive us for this misstep.
  • AllenAllen Registered Users Posts: 10,013 Major grins
    edited February 5, 2014
    yaypie wrote: »
    ....
    Can you include in the new code to allow single quotes ' ?

    These are used by many people and some words/names look funny without them. It was this way
    on legacy originally, they allowed them.

    example: rosss goose; > ross's goose;

    Double quotes " still option used for multi word separators.

    BTW, I went thru 100's of keywords removing the single quotes but will be happy to add them back.

    Parsing by spaces really scares me, as an Engr with a logic thinking mind all my life. Every multi-word KW has spaces.
    Al - Just a volunteer here having fun
    My Website index | My Blog
  • yaypieyaypie Registered Users Posts: 46 Big grins
    edited February 5, 2014
    Allen wrote: »
    Can you include in the new code to allow single quotes ' ?

    Great question! As a matter of fact, yes, the new code does allow single quotes.
    Allen wrote: »
    Parsing by spaces really scares me, as an Engr with a logic thinking mind all my life. Every multi-word KW has spaces.

    You're not the only one!

    We've tried to reduce the ambiguity of parsing space-separated keywords by implementing the following rules in the new parser (once again, these changes aren't live yet, but will be once we finish testing):
    • If the keyword string contains a comma, semicolon, or pipe, then only use those characters as delimiters. That means that [zoo, San Diego, San Diego Zoo] becomes three keywords: "zoo", "San Diego", and "San Diego Zoo". We don't treat the spaces as delimiters in this case because other delimiters are clearly being used.

    • If the keyword string doesn't contain a comma, semicolon, or pipe, then we treat spaces as delimiters unless the space is enclosed in quotes. So [zoo San Diego San Diego Zoo] becomes "zoo", "San", "Diego" after duplicates are removed, but [zoo "San Diego" "San Diego Zoo"] becomes "zoo", "San Diego", "San Diego Zoo".

    • Double quotes have no effect when commas, semicolons, or pipes are used as delimiters. They're not necessary, and are simply removed.
    One caveat of this approach is that you can't mix spaces with other delimiters, because when any other delimiter is used, spaces are always treated as part of a keyword. But we think that's a small price to pay for more consistent parsing overall.
  • AllenAllen Registered Users Posts: 10,013 Major grins
    edited February 5, 2014
    I have found a bunch of old KW boxes with the commas missing but the "..." are still there.
    I have just about tracked down all the messed up keywords and fixed them after many hours of work.
    Been checking full KW list for combined KW's, some don't show on the list but they do under a gallery photo?
    Some don't even register in Smart gallery settings by KW, some do.

    Great about the single quotes.
    http://www.dgrin.com/showthread.php?t=239449

    Other then a few obvious not allowed characters (system usage + - , ; : . | " ) in keywords all the others
    are just keyboard keys. Why screen for them?
    Al - Just a volunteer here having fun
    My Website index | My Blog
  • yaypieyaypie Registered Users Posts: 46 Big grins
    edited February 5, 2014
    Allen wrote: »
    I have found a bunch of old KW boxes with the commas missing but the "..." are still there.

    Not sure what you mean exactly. Can you post a screenshot?
    I have just about tracked down all the messed up keywords and fixed them after many hours of work.
    Been checking full KW list for combined KW's, some don't show on the list but they do under a gallery photo?
    Some don't even register in Smart gallery settings by KW, some do.

    I recommend not trying to fix old keywords right now. The new keyword code is not yet live, and it fixes quite a few bugs that you might as well not waste time on.
    Other then a few obvious not allowed characters (system usage + - , ; : . | " ) in keywords all the others
    are just keyboard keys. Why screen for them?

    The list of characters that are allowed in keywords is actually pretty huge (literally thousands and thousands, from a wide variety of languages -- far more than are on any keyboard!), but we do remove delimiter characters and certain punctuation for a variety of reasons -- in some cases because those characters just aren't useful, in others because they make search indexing more difficult.

    You'd be surprised how much junk data gets dumped into keywords in image metadata by buggy camera firmware or poorly written image editing tools. That's another reason we strip certain characters out of keywords.
  • denisegoldbergdenisegoldberg Administrators Posts: 14,383 moderator
    edited February 5, 2014
    yaypie wrote: »
    We've tried to reduce the ambiguity of parsing space-separated keywords by implementing the following rules in the new parser (once again, these changes aren't live yet, but will be once we finish testing):
    ....
    Thank you!

    --- Denise
  • AllenAllen Registered Users Posts: 10,013 Major grins
    edited February 5, 2014
    yaypie wrote: »
    ...The list of characters that are allowed in keywords is actually pretty huge (literally thousands and thousands, from a wide variety of languages -- far more than are on any keyboard!), but we do remove delimiter characters and certain punctuation for a variety of reasons -- in some cases because those characters just aren't useful, in others because they make search indexing more difficult.....
    Is there a special character somewhere that I can use to simulate a dash?

    Or like the KW list hack, the dashes where added back in for display only by replacing the _ I used in the actual KW.
    Al - Just a volunteer here having fun
    My Website index | My Blog
  • yaypieyaypie Registered Users Posts: 46 Big grins
    edited February 5, 2014
    Allen wrote: »
    Is there a special character somewhere that I can use to simulate a dash?

    Not at the moment, unfortunately. I'll see what I can do about this, but no promises. :)
  • pilotdavepilotdave Registered Users Posts: 785 Major grins
    edited February 5, 2014
    yaypie wrote: »
    Thanks for all the feedback about this change, everyone. You're right: this was not a good idea, and we shouldn't have done it. But given that we did do it, we should at least have provided some warning and a better explanation of what was changing and why.

    Thanks for listening! I fixed about 5 galleries or so the other day. It's a very slow, frustrating process. Glad I gave up!

    Dave
  • AllenAllen Registered Users Posts: 10,013 Major grins
    edited February 5, 2014
    yaypie wrote: »
    Not at the moment, unfortunately. I'll see what I can do about this, but no promises. :)
    Thanks
    Al - Just a volunteer here having fun
    My Website index | My Blog
  • ChancyRatChancyRat Registered Users Posts: 2,141 Major grins
    edited February 5, 2014
    What about "~" for dashes?
  • AllenAllen Registered Users Posts: 10,013 Major grins
    edited February 5, 2014
    ChancyRat wrote: »
    What about "~" for dashes?
    Thanks, just plugged one in and will see what shows in the KW list.

    Nope, it removes it after save.
    Al - Just a volunteer here having fun
    My Website index | My Blog
  • beardedgitbeardedgit Registered Users Posts: 854 Major grins
    edited February 5, 2014
    Will Alt key code symbols work?

    Alt+22 is ▬
    Alt+196 is ─

    ne_nau.gif
    Yippee ki-yay, footer-muckers!
  • AllenAllen Registered Users Posts: 10,013 Major grins
    edited February 5, 2014
    beardedgit wrote: »
    Will Alt key code symbols work?

    Alt+22 is ▬
    Alt+196 is ─

    ne_nau.gif
    Both removed when saved.
    alt+95 works but it's just an underline

    What I really need is the required hyphen. #8209;
    Al - Just a volunteer here having fun
    My Website index | My Blog
  • beardedgitbeardedgit Registered Users Posts: 854 Major grins
    edited February 5, 2014
    Allen wrote: »
    What I really need is the required hyphen. #8209;
    Yeah, that would be the best solution.
    Until then, how about Unicode? Is there anything useful at http://en.wikipedia.org/wiki/General_Punctuation_%28Unicode_block%29 or at http://en.wikipedia.org/wiki/Dash#Common_dashes ?
    Yippee ki-yay, footer-muckers!
  • AllenAllen Registered Users Posts: 10,013 Major grins
    edited February 5, 2014
    I can add the single quote using the alt method and it shows. But it won't show on the keyword results page.
    Al - Just a volunteer here having fun
    My Website index | My Blog
Sign In or Register to comment.