Synchronization, Auto-Rotation & MD5Sum's

pe2smugmugpe2smugmug Registered Users Posts: 53 Big grins
I've been working on getting syncronization working from Photoshop Elements, and I've gotten pretty far. One problem I have encountered is in ralation to files that use the EXIF rotate tag.

A little while back SmugMug added a feature to auto-rotate images based on their EXIF orientation flag, which I LOVE!:clap

However, it has the consequence that the MD5Sum returned by SmugMug no longer matches the file that was uploaded. :cry While this is technically the "correct" thing to do, SmugMug is modifying the file, its making synchronization very difficult. All of my verticals photos come up as having different MD5sums, meaning my code wants to delete the image already uploaded and send a new copy of it (it is one way sync right now, with the local copy as the master). While this technically works, and things come out correct in the end, its a huge waste of time and bandwith.

Has anyone experienced this?

Possible solutions I have come up with are:
1) Live with it - ugly
2) After uploading a rotated image the first time, download the new one from SmugMug and save that locally. - also ugly, I want to keep my orginal files untouched.
3) Have a secondary means for checking matching files after MD5Sum fails: filename, date, image dimensions (should be flipped), and try and match all of that up. - Not quite as nice as plain MD5sum, but should work most of the time.
3) Try and replicate SmugMug's rotation, then rotate the image and hash it when syncing. - Slow to rotate and hash locally. Also requires temp disk usage, and vulnerable to breaking is SM changes the algo they use.
4) Have SM save the original uploaded hash in their DB somewhere. - For this problem, it seems like the best solution, but I'm sure it will cause tons plenty of problems I haven't thought about.

Any suggestions? Thoughts?

Comments

  • rkallarkalla Registered Users Posts: 108 Major grins
    edited January 16, 2008
    pe2smugmug wrote:
    I've been working on getting syncronization working from Photoshop Elements, and I've gotten pretty far. One problem I have encountered is in ralation to files that use the EXIF rotate tag.

    A little while back SmugMug added a feature to auto-rotate images based on their EXIF orientation flag, which I LOVE!clap.gif

    However, it has the consequence that the MD5Sum returned by SmugMug no longer matches the file that was uploaded. :cry While this is technically the "correct" thing to do, SmugMug is modifying the file, its making synchronization very difficult. All of my verticals photos come up as having different MD5sums, meaning my code wants to delete the image already uploaded and send a new copy of it (it is one way sync right now, with the local copy as the master). While this technically works, and things come out correct in the end, its a huge waste of time and bandwith.

    Has anyone experienced this?

    Possible solutions I have come up with are:
    1) Live with it - ugly
    2) After uploading a rotated image the first time, download the new one from SmugMug and save that locally. - also ugly, I want to keep my orginal files untouched.
    3) Have a secondary means for checking matching files after MD5Sum fails: filename, date, image dimensions (should be flipped), and try and match all of that up. - Not quite as nice as plain MD5sum, but should work most of the time.
    3) Try and replicate SmugMug's rotation, then rotate the image and hash it when syncing. - Slow to rotate and hash locally. Also requires temp disk usage, and vulnerable to breaking is SM changes the algo they use.
    4) Have SM save the original uploaded hash in their DB somewhere. - For this problem, it seems like the best solution, but I'm sure it will cause tons plenty of problems I haven't thought about.

    Any suggestions? Thoughts?

    pe, since the rotation op can change the file size of the image, I guess that rules out the two most accurate potential matches to use (md5sum and file size)... using dimension is a bust only because if you constantly shoot with the same camera, it would be easy to have 1000s of pictures with the same dimensions, flipped.

    I guess you could just fall back to a combination of the filename, date and maybe 1 piece of detailed EXIF data just to make sure it's "most probably" the same image, then sync it locally with "_rotated" or something appended to file name?
  • pe2smugmugpe2smugmug Registered Users Posts: 53 Big grins
    edited January 16, 2008
    rkalla wrote:
    pe, since the rotation op can change the file size of the image, I guess that rules out the two most accurate potential matches to use (md5sum and file size)... using dimension is a bust only because if you constantly shoot with the same camera, it would be easy to have 1000s of pictures with the same dimensions, flipped.

    I guess you could just fall back to a combination of the filename, date and maybe 1 piece of detailed EXIF data just to make sure it's "most probably" the same image, then sync it locally with "_rotated" or something appended to file name?
    Yeah, so I ended up hacking together option 3, because I needed to do some bulk updating of keywords.

    You are correct that the filesize does change, so it is no longer a valid test. For a two-way sync (or any sync from smugmug->local), adding _rotated to the end of the filename would certainly be a valid approach.


    My algorithm to test if two files were the same that I used last night was pretty much as follows:
    i. first (before the loop) I grabbed all of the image info from albums.getImages(heavy=1), to include the md5 sum.

    1) check if MD5sum the same, if yes, return its the same file.

    2) check if the filenamees are the same, if No, return as different files

    3a) load EXIF data for local file (slow, so I wait until now),

    3b) check that EXIF orientation is a vertical oritation. If not, return as different files

    4) check local height == remote width && local width== remote height. If not, return as different files

    5a) smugmug.images.getExif data - again, wait as long as possible because its slow

    5b) check OriginalDateTime stamp for both local and remote are the same, if not return different. If the times are the same, then I return they are the same images.

    I thought about checking for camera model's etc, but I figured that the exact time of original creation was good enough. The downside to this, is that it might not detect if the local file had its image as modified in some way and saved as the same filename. This was not as much of a concern in my instance because I know photoshop elements will rename the file if I save a new version (and it goes into a version stack which I'm detecting elsewhere in the DB code)

    Thanks,
    Evan
  • rkallarkalla Registered Users Posts: 108 Major grins
    edited January 16, 2008
    Evan, looks like a very thorough implementation (and good job ordering the expensive operations later in the cycle).

    I think you have all your bases covered here except for maybe some obscure corner case that I can't even think of right now but I'm sure a user will find :D
  • pe2smugmugpe2smugmug Registered Users Posts: 53 Big grins
    edited January 16, 2008
    rkalla wrote:
    Evan, looks like a very thorough implementation (and good job ordering the expensive operations later in the cycle).

    I think you have all your bases covered here except for maybe some obscure corner case that I can't even think of right now but I'm sure a user will find :D

    Yeah, I'm kind of debating if I want to even release the syncing part in my next release, I probably will but make it give a little annoying popup saying to use at your own risk :D
  • flyingdutchieflyingdutchie Registered Users Posts: 1,286 Major grins
    edited January 16, 2008
    pe2smugmug wrote:
    Yeah, I'm kind of debating if I want to even release the syncing part in my next release, I probably will but make it give a little annoying popup saying to use at your own risk :D

    If i were you, i would try to detect the EXIF's orientation flag of an image. If it's set to rotate, then warn the user that synching this image won't work unless the user rotates the images his/herself (using photoshop or whatever).

    Too much of a hassle :)
    I can't grasp the notion of time.

    When I hear the earth will melt into the sun,
    in two billion years,
    all I can think is:
        "Will that be on a Monday?"
    ==========================
    http://www.streetsofboston.com
    http://blog.antonspaans.com
  • pe2smugmugpe2smugmug Registered Users Posts: 53 Big grins
    edited January 16, 2008
    If i were you, i would try to detect the EXIF's orientation flag of an image. If it's set to rotate, then warn the user that synching this image won't work unless the user rotates the images his/herself (using photoshop or whatever).

    Too much of a hassle :)

    Interesting idea.... basically "block" uploading of files that have the EXIF orientation flag set. Try and prevent the problem at the initial upload stage, not try and fix it after when we are syncing....

    Its an interesting solution.
    I have to think how I would work that into my existing program. It was originally only an upload app, but I added on syncing because I had mis-spelled peoples names in my tag/keyword inside of PSE and needed to correct hundreds of photos.
Sign In or Register to comment.