Question about detecting duplicates

stevexstevex Registered Users Posts: 4 Beginner grinner
Hi guys - so now that API keys are back I want to start writing an uploader. One of the things I want to do is quickly detect images that have already been uploaded, without having some sort of local database. I'd like to do this using the MD5 on the photo - I know this isn't the most reliable since any change will invalidate it, but it will be good enough for what I want to do.

So if I have an MD5 of a local photo, is there a way to quickly determine if that photo has been uploaded already or not, and where it is if so?

I see there's a getURLs in 1.2.1 but I don't see docs on it.. if I could pass that an MD5 and get an URL back, I think that'd do what I'm looking for.

Thanks
--Steve

Comments

  • David PLDavid PL Registered Users Posts: 80 Big grins
    edited March 18, 2008
    stevex wrote:
    Hi guys - so now that API keys are back I want to start writing an uploader. One of the things I want to do is quickly detect images that have already been uploaded, without having some sort of local database. I'd like to do this using the MD5 on the photo - I know this isn't the most reliable since any change will invalidate it, but it will be good enough for what I want to do.

    So if I have an MD5 of a local photo, is there a way to quickly determine if that photo has been uploaded already or not, and where it is if so?

    I see there's a getURLs in 1.2.1 but I don't see docs on it.. if I could pass that an MD5 and get an URL back, I think that'd do what I'm looking for.

    Thanks
    --Steve

    One way is to call smugmug.images.get with the heavy setting for the album you are uploading to. This will return the MD5 for all the images in the album, which you can then compare to the MD5 of your local files to determine photos that have already been uploaded. However, like you said, using the MD5 is not always the best way since even a simple change in the metadata will change the MD5. You can also use a combination of the filename, date, photo dimensions, etc (which are also returned by using the above method) as an alternate way to identify duplicates.
  • devbobodevbobo Registered Users, Retired Mod Posts: 4,339 SmugMug Employee
    edited March 18, 2008
    David PL wrote:
    One way is to call smugmug.images.get with the heavy setting for the album you are uploading to. This will return the MD5 for all the images in the album, which you can then compare to the MD5 of your local files to determine photos that have already been uploaded. However, like you said, using the MD5 is not always the best way since even a simple change in the metadata will change the MD5. You can also use a combination of the filename, date, photo dimensions, etc (which are also returned by using the above method) as an alternate way to identify duplicates.

    also using the modified date and LastUpdated along with the MD5Sum, will help determine if the file has been modified locally or on SmugMug.
    David Parry
    SmugMug API Developer
    My Photos
  • stevexstevex Registered Users Posts: 4 Beginner grinner
    edited March 19, 2008
    Problem is, if I have a few photos and I need to determine whether they're duplicates or not, I need to download all the albums to determine if the photo has already been uploaded or not.

    It'd be great to have a way to ask if a particular photo has already been uploaded without needing to do all that.
  • devbobodevbobo Registered Users, Retired Mod Posts: 4,339 SmugMug Employee
    edited March 19, 2008
    stevex wrote:
    Problem is, if I have a few photos and I need to determine whether they're duplicates or not, I need to download all the albums to determine if the photo has already been uploaded or not.

    It'd be great to have a way to ask if a particular photo has already been uploaded without needing to do all that.

    Steve,

    Once I finish the current round of API enhancements I am doing. I will probably be working on some methods that make syncing easier.

    Cheers,

    David
    David Parry
    SmugMug API Developer
    My Photos
  • dounddound Registered Users Posts: 72 Big grins
    edited March 19, 2008
    devbobo wrote:
    Steve,Once I finish the current round of API enhancements I am doing. I will probably be working on some methods that make syncing easier.

    Ooh, now that sounds handy! thumb.gif
  • dounddound Registered Users Posts: 72 Big grins
    edited March 20, 2008
    devbobo wrote:
    Once I finish the current round of API enhancements I am doing. I will probably be working on some methods that make syncing easier.
    Out of curiosity, any hints as to what form these methods might take? mwink.gif ... Perhaps something as simple as timestamps for last modified times on galleries or hooks to get a list photos modified since a certain timestamp perhaps?

    On another note, does the "LastUpdated" get updated whenever any change is made to the photo (e.g. rotated, keyword added, price changed ...)?

    Thanks!
  • devbobodevbobo Registered Users, Retired Mod Posts: 4,339 SmugMug Employee
    edited March 20, 2008
    dound wrote:
    Out of curiosity, any hints as to what form these methods might take? mwink.gif ... Perhaps something as simple as timestamps for last modified times on galleries or hooks to get a list photos modified since a certain timestamp perhaps?

    On another note, does the "LastUpdated" get updated whenever any change is made to the photo (e.g. rotated, keyword added, price changed ...)?

    Thanks!

    I haven't given it a lot of thought but i would think that were be two types of methods, one that provides some sort of sync log of updates to galleries and photos from a given date. And another set of methods that help to locate a file given the filename, modified date and MD5Sum.

    I haven't investigated any of this to see how feasible it is.
    David Parry
    SmugMug API Developer
    My Photos
Sign In or Register to comment.