How to uniquely identify an image

wellmanwellman Registered Users Posts: 961 Major grins
First, a little background... When I first got serious about photography, processed my RAW files in Canon's DPP, saved the JPGs, uploaded to SmugMug, and then did my captioning and keywording within SmugMug. Nowdays, I use Adobe Lightroom. My captioning and keywording is performed locally, the metadata is embedded in the JPGs, and I upload to SmugMug.

My older photos are all imported into Lightroom now, but of course w/o the metadata I entered within SmugMug.

Essentially, I'd like to sync the metadata between SmugMug and Lightroom. My album organization is completely different, so I can't depend on a hierarchy to help me find photos (nor do I want to). So how would you uniquely identify a photo? I wish photos had some sort of GUID I could latch onto, but I don't believe that's the case. Any thoughts? :ear Thanks.

Comments

  • pe2smugmugpe2smugmug Registered Users Posts: 53 Big grins
    edited January 28, 2008
    wellman wrote:
    First, a little background... When I first got serious about photography, processed my RAW files in Canon's DPP, saved the JPGs, uploaded to SmugMug, and then did my captioning and keywording within SmugMug. Nowdays, I use Adobe Lightroom. My captioning and keywording is performed locally, the metadata is embedded in the JPGs, and I upload to SmugMug.

    My older photos are all imported into Lightroom now, but of course w/o the metadata I entered within SmugMug.

    Essentially, I'd like to sync the metadata between SmugMug and Lightroom. My album organization is completely different, so I can't depend on a hierarchy to help me find photos (nor do I want to). So how would you uniquely identify a photo? I wish photos had some sort of GUID I could latch onto, but I don't believe that's the case. Any thoughts? ear.gif Thanks.

    So not the exact problem, but I went through some similar questions on another thread for a syncing app. (see http://dgrin.com/showthread.php?t=81541)

    So, once you add meta-data to the jpg, your MD5 hash changes, and that is a problem.

    Here are some things to look at that might make your life easier.
    1) Filenames? If you haven't changed filesnames after uploading, this is a GREAT first step
    2) File size - again, easy to determine, and can possibly narrow down your comparisons if you use different cameras
    3) Compare some meta-data; if you get lucky and the original DateTime of the picture matches, then this is a safe comparison.
    4) The slowest mechanism, is to do a comparison of the actual image data (assuming you still have a copy of the exact image you uploaded)
    4b) While quite slow (you have to download every image from SM), you could even expand on that to do all comparisons, by converting your image into binary image data only (such as bmp format) and taking the MD5sum of those files, put those into a hashtable for easy lookup, etc.

    A couple of other questions:
    1) I assume this is a one-off job? You want to sync once, and from then on your lightroom DB is the master copy? If so, then you just need to hack something together
    2) How many images do you have? Meaning, its a lot easier to brute force through 1000 pics, then 10,000 or 100,000 :D
    3) I think a major obsticle will be puting the updated meta-data back into lighroom. Its based on the sqlite3 db, so its easy to read and write from, but it can be tough to decipher the underlying structure. What I mean, is that while its not too hard to read from the DB, writing is more challenging because you might end up missing an entry or any assumption that lightroom makes which is not apparent in the DB structure. I hope that made sense?

    Whew, let me know if I can help, this sounds like a "fun" problem rolleyes1.gif
  • wellmanwellman Registered Users Posts: 961 Major grins
    edited January 28, 2008
    Thanks for your reply. I'll need to read through your post a few times to consider your points.

    Off the top of my head, however, is this... My Lightroom library currently has three main types of images:
    1. New RAWs with captions/keywords
    2. Old RAWs without captions/keywords
    3. Old JPGs without captions/keywords
    I have both the RAWs and JPGs of the old stuff because the files were developed in DPP, and I didn't want to lose 2 years worth of "developing history" since LR can't interpret my DPP edits on the original RAW.

    I'm looking to sync metadata between SmugMug JPGs and both the LR RAW and the LR JPG if it exists. So my problem is even more complex than I first let on. I can't really look at things like filesize, MD5, etc, because the RAW will obviously be much different than the JPG.

    I suppose priority number one would be to push SmugMug data into the LR database for the old RAWs and JPGs. In the future, it would be nice to have some general sync capability as well, but the argument could be made that as long as I'm good about keywording before I upload, I shouldn't need this. (Although, the more I think about it, the nicer it would be to have new LR captions/keywords pushed to SmugMug w/o having to re-upload images.)

    Sorry for the rambles. This is obviously very sketch-level thinking. Thanks again for responding.
  • pe2smugmugpe2smugmug Registered Users Posts: 53 Big grins
    edited January 28, 2008
    wellman wrote:
    Thanks for your reply. I'll need to read through your post a few times to consider your points.

    Off the top of my head, however, is this... My Lightroom library currently has three main types of images:
    1. New RAWs with captions/keywords
    2. Old RAWs without captions/keywords
    3. Old JPGs without captions/keywords
    I have both the RAWs and JPGs of the old stuff because the files were developed in DPP, and I didn't want to lose 2 years worth of "developing history" since LR can't interpret my DPP edits on the original RAW.

    I'm looking to sync metadata between SmugMug JPGs and both the LR RAW and the LR JPG if it exists. So my problem is even more complex than I first let on. I can't really look at things like filesize, MD5, etc, because the RAW will obviously be much different than the JPG.

    I suppose priority number one would be to push SmugMug data into the LR database for the old RAWs and JPGs. In the future, it would be nice to have some general sync capability as well, but the argument could be made that as long as I'm good about keywording before I upload, I shouldn't need this. (Although, the more I think about it, the nicer it would be to have new LR captions/keywords pushed to SmugMug w/o having to re-upload images.)

    Sorry for the rambles. This is obviously very sketch-level thinking. Thanks again for responding.

    Sorry, I should have been more clear, when I said filesize, I really was thinking image dimensions.

    Regarding the rest of your post, I think you have two different problems you are trying to solve.
    1) Syncing meta-data from SM->lightroom for old pictures
    2) Synching metadata from lightroom->SM in the future, for times when metadata is updated after the initial upload.

    #2 I think is easier, and its the problem I just recently tackled for PE2SmugMug. Easier, because you should be able to reduce the number of images you are comparing to and (more importantly) you only have to read from your LR DB (or deal with exported files) not write to it, which can be more challenging (and dangerous).

    #1 - Tagging both the RAW and processed JPG's shouldn't be too much harder. You know that the RAW files could not be uploaded to SM, so you can eliminate those from your search/comparison, then its just a matter of pairing the RAW and JPG images locally, so when you update the metadata for the JPG, you update its paired RAW image as well.

    There are a number of ways to do comparisons using the metadata (again, I found the original DateTime to work best), and that should narrow you down to the matching image.

    The "gold standard" for image comparison, could be to compare the actual image data.
    I understand that neither the actual files nor the MD5sums for the actual files will be perfect matches (due to meta-data and JPG compression options). However, JPG->BMP (or any other non-compressed, binary only data) should be a one-way, repeatable, function. You can then do a binary compare of those bmp images which should match; or take the md5sum of those bmp images, which should also match, allowing you to make a mini-db of your image hashses...... yes? I'm not explaining this well, so if it doesn't make sense, let me know headscratch.gif

    Anyway, if you can solve the issue of writing to the LR DB, I think this is a totally solvable problem (though it might take a little crunching from the computer) :D
  • cmasoncmason Registered Users Posts: 2,506 Major grins
    edited January 28, 2008
    You know for a quick fix, you could always give SmugDAV a try: it will allow you to easily download your images from Smugmug, presumably with all the captions and keywords embedded. This way, you can get this info back.

    Lightroom will read those from the embedded JPEG just fine. The only downside is that you will have to put up with the Smugmug name for your images, as you will need to replace them.
Sign In or Register to comment.