Writing a file sync app for smugmug -- Colorspace conversion is killing my Md5s!

horizon180horizon180 Registered Users Posts: 5 Beginner grinner
I have a basic, functional app for Windows built that allows you to scan folders on your computer and find which files you have not uploaded to smugmug yet. It does this by comparing the Md5 hash of the files. However, certain files undergo automated colorspace conversion, and they are changed as a result.

I've now enabled smugvault archiving for minor photo corrections, so the exact original is archived on smugvault.

The last piece of the puzzle is this: I need to be able to get the md5 sum from the smugvault file in order to match it to the file on disk.

This is especially important because many users will have photos with identical filenames, depending on how they import the files.

Right now, I don't see a way to get the smugvault file's checksum. I would love if I could get this information back in the smugmug.images.get API function (with heavy = true).

Comments

  • horizon180horizon180 Registered Users Posts: 5 Beginner grinner
    edited November 5, 2011
    See attached for what my app looks like right now. This could be really useful if only I could get the md5 checksums for the files in smugvault! Without them, I get a ton of false mismatches!
  • SamirDSamirD Registered Users Posts: 3,474 Major grins
    edited November 6, 2011
    Where are you getting the md5's on the existing images on SM? headscratch.gif.

    I manually download each file after I've uploaded to my archive section and do a bit-by-bit compare, but an md5 would definitely be faster.
    Pictures and Videos of the Huntsville Car Scene: www.huntsvillecarscene.com
    Want faster uploading? Vote for FTP!
  • horizon180horizon180 Registered Users Posts: 5 Beginner grinner
    edited November 6, 2011
    SamirD wrote: »
    Where are you getting the md5's on the existing images on SM? headscratch.gif.

    I manually download each file after I've uploaded to my archive section and do a bit-by-bit compare, but an md5 would definitely be faster.

    If you pass 'true' to the 'Heavy' parameter of this api call, you get a whole bunch of the meta data associated with each file, including the Md5 sum.

    http://wiki.smugmug.net/display/API/show+1.3.0?method=smugmug.images.get

    ... except it won't match your local copy if smugmug has converted the colorspace or rotated the image.
  • SamirDSamirD Registered Users Posts: 3,474 Major grins
    edited November 7, 2011
    Thank you for the reply. How are you generating your local md5? This would be a much faster way for me to compare my images versus downloading them all again.
    Pictures and Videos of the Huntsville Car Scene: www.huntsvillecarscene.com
    Want faster uploading? Vote for FTP!
  • rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited November 8, 2011
    horizon180 wrote: »
    Right now, I don't see a way to get the smugvault file's checksum. I would love if I could get this information back in the smugmug.images.get API function (with heavy = true).
    I'm not an API expert, but at this point I don't think access to SmugVault content via the API is possible at all yet.

    Maybe it would possible for you to do an on the fly conversion to sRGB on your end and then compare the MD5 for your file to the one on SmugMug to see if it's a match in case the initial checksum doesn't match? This is just an idea and I have no clue if it's practically possible.

    One other thing that you have to plan for is that we try to reduce files that are above our 24MB file limit to make them fit within our specs.
    Sebastian
    SmugMug Support Hero
  • SamirDSamirD Registered Users Posts: 3,474 Major grins
    edited November 8, 2011
    One other thing that you have to plan for is that we try to reduce files that are above our 24MB file limit to make them fit within our specs.
    Does this happen automatically? headscratch.gif I've seen users here have issues where SM will reject the image.
    Pictures and Videos of the Huntsville Car Scene: www.huntsvillecarscene.com
    Want faster uploading? Vote for FTP!
  • horizon180horizon180 Registered Users Posts: 5 Beginner grinner
    edited November 8, 2011
    SamirD wrote: »
    Thank you for the reply. How are you generating your local md5? This would be a much faster way for me to compare my images versus downloading them all again.

    It's quite simple to do in C# using System.Security.Cryptography.Md5 if you are familiar with programming. See http://msdn.microsoft.com/en-us/library/system.security.cryptography.md5.aspx
  • horizon180horizon180 Registered Users Posts: 5 Beginner grinner
    edited November 8, 2011
    I'm not an API expert, but at this point I don't think access to SmugVault content via the API is possible at all yet.

    Maybe it would possible for you to do an on the fly conversion to sRGB on your end and then compare the MD5 for your file to the one on SmugMug to see if it's a match in case the initial checksum doesn't match? This is just an idea and I have no clue if it's practically possible.

    One other thing that you have to plan for is that we try to reduce files that are above our 24MB file limit to make them fit within our specs.

    I don't know anything about colorspace conversion, if it requires re-compression, etc. I'm also not interested in changing any aspect of my files on disk. I just want to be able to back up the originals without thinking about it.

    I really need access to the smugvault file metadata to do this cleanly. Otherwise, I may be stuck implementing more hackish methods of ensuring files are equivalent, like building a white-list for files that have been verified manually (ugh).
  • SamirDSamirD Registered Users Posts: 3,474 Major grins
    edited November 8, 2011
    I'm not a programmer, but some quick searches led me to some command line programs that can generate an md5 as well as some theory behind md5. For the type of bit errors I'm trying to find, I don't think md5 would be better than a bit-by-bit compare. Oh well. :cry
    Pictures and Videos of the Huntsville Car Scene: www.huntsvillecarscene.com
    Want faster uploading? Vote for FTP!
  • SamirDSamirD Registered Users Posts: 3,474 Major grins
    edited November 8, 2011
    horizon180 wrote: »
    I don't know anything about colorspace conversion, if it requires re-compression, etc. I'm also not interested in changing any aspect of my files on disk. I just want to be able to back up the originals without thinking about it.

    I really need access to the smugvault file metadata to do this cleanly. Otherwise, I may be stuck implementing more hackish methods of ensuring files are equivalent, like building a white-list for files that have been verified manually (ugh).
    What I'd do is have your program create an md5 of an image file that's been uploaded. Then you could quickly compare the md5's as well as file attributes such as size and date/time.

    Or another implementation is to generate the md5 and store in a local db as being uploaded. Then as you scan for new files, you simply compare SM's md5s with those in your local db.
    Pictures and Videos of the Huntsville Car Scene: www.huntsvillecarscene.com
    Want faster uploading? Vote for FTP!
Sign In or Register to comment.