File Upload - Original File MD5Sum Changes
mjohnsonperl
Registered Users Posts: 41 Big grins
In testing an app I'm building I came across a behavior that seems odd, and I'm curious why it's happening, and also how to handle the difference in my code.
I have my program get all the stats from the file on disk _MG_6644.JPG then build my HTTP PUT request to upload the file to SmugMug. I get a good response back and an ImageID, so then I have my query do a check on the ImageID and retreive the MD5Sum, and Size and compare it against the local file that was just uploaded. Turns out that on this particular file it ends up being different. The image gets uploaded fine and it looks the same, so it doesn't APPEAR to be altered, but the MD5Sum and Size tend to indicate otherwise.
The MD5Sums and Size are identical and unaltered on several other files I've tested.
I tested this thoroughly to isolate that it is a problem with this particular file and not some typo in my code. The file I was uploading was an image that was altered by PhotoShop, and I beleive only PhotoShop, so I pulled the original file to test it, and the original shot from my camera is fine. The fact that it was edited by PhotoShop though doesn't explain the problem. I can imagine that maybe the image could have some sort of file corruption, but it opens just fine in every application I open it in. This still doesn't explain though why it gets altered when uploaded to SmugMug.
I also tested this against several other tools to upload images to SmugMug (Simple, Drag & Drop, Olde faithful, Windows Smugmug Uploader), and all also had the same affect of the file getting altered somehow.
Here are the results of my file upload:
Uploading (File Size: 4042638) (Image ID: 212263417)
HTTP RESPONSE:
<?xml version='1.0' encoding="utf-8" ?>
<rsp stat="ok">
<method>smugmug.images.upload</method>
<ImageID>212263417</ImageID>
</rsp>
filename: _MG_6644.JPG
non-binary size: 4042595
-s size: 4042638
md5: 5b6e9656da0ac71aa5abf599277e2017
state->size: 4042638
smug filename: _MG_6644.JPG
smug size: 3937814
smug md5: 81f5dbba917c1cea764ca2feb07e3c25
I have the script I used to verify this, and the actual file, so if you want to see the results duplicated, or find some situation where it doesn't happen, pleast dig in, just don't burry yourself.
http://digitalmediashelf.com/temp/upload_difference.zip
Any answers or direction would be gladdly appreciated. My primary concern is with the sync program I'm building and the fact that the original image gets altered, and if it can't be prevented, then I'm going to need to determine what to do if the uploaded image ends up being different then the file on the local machine.
I have my program get all the stats from the file on disk _MG_6644.JPG then build my HTTP PUT request to upload the file to SmugMug. I get a good response back and an ImageID, so then I have my query do a check on the ImageID and retreive the MD5Sum, and Size and compare it against the local file that was just uploaded. Turns out that on this particular file it ends up being different. The image gets uploaded fine and it looks the same, so it doesn't APPEAR to be altered, but the MD5Sum and Size tend to indicate otherwise.
The MD5Sums and Size are identical and unaltered on several other files I've tested.
I tested this thoroughly to isolate that it is a problem with this particular file and not some typo in my code. The file I was uploading was an image that was altered by PhotoShop, and I beleive only PhotoShop, so I pulled the original file to test it, and the original shot from my camera is fine. The fact that it was edited by PhotoShop though doesn't explain the problem. I can imagine that maybe the image could have some sort of file corruption, but it opens just fine in every application I open it in. This still doesn't explain though why it gets altered when uploaded to SmugMug.
I also tested this against several other tools to upload images to SmugMug (Simple, Drag & Drop, Olde faithful, Windows Smugmug Uploader), and all also had the same affect of the file getting altered somehow.
Here are the results of my file upload:
Uploading (File Size: 4042638) (Image ID: 212263417)
HTTP RESPONSE:
<?xml version='1.0' encoding="utf-8" ?>
<rsp stat="ok">
<method>smugmug.images.upload</method>
<ImageID>212263417</ImageID>
</rsp>
filename: _MG_6644.JPG
non-binary size: 4042595
-s size: 4042638
md5: 5b6e9656da0ac71aa5abf599277e2017
state->size: 4042638
smug filename: _MG_6644.JPG
smug size: 3937814
smug md5: 81f5dbba917c1cea764ca2feb07e3c25
I have the script I used to verify this, and the actual file, so if you want to see the results duplicated, or find some situation where it doesn't happen, pleast dig in, just don't burry yourself.
http://digitalmediashelf.com/temp/upload_difference.zip
Any answers or direction would be gladdly appreciated. My primary concern is with the sync program I'm building and the fact that the original image gets altered, and if it can't be prevented, then I'm going to need to determine what to do if the uploaded image ends up being different then the file on the local machine.
0
Comments
As a side note, have you considered using the JSON or XML::Simple perl modules ? Both provide a way to convert the SmugMug responses into a hash, which makes parsing so much nicer than this...
$sm_image_info_root->findvalue('/rsp/Image/attribute::FileName')
Cheers,
David
SmugMug API Developer
My Photos
I glanced at JSON a bit, and it looks interesting, but I guess I just didn't find anything to get me quickly started with it, and REST seemed like a very natural aproach using XML. I also wasn't sure how well established JSON was with Perl and what documentation or modules I would be able to find when working with it.
I am also storing a local catalog of images I'm synching in my program, and for storing the local settings and cataloged data I decided to use XML. I figure iTunes uses it to store my entire .mp3 library, I should be able to use it for this. Plus because of this, I'm dealing with a single module that lets me parse the REST responses I'm getting from SmugMug and read, write, and managing the local data in my catalog.
As far as XML vs. some other method of storing the local data, I just thought XML was a cool method, and wanted to get more familiar with using XML. I also figured if this was the approach I was going to take I'd try and find the BEST solution I could for this method.
I was reading some discussions about high memory usage and slower performance of other XML parsing methods, and that's why I chose XML::LibXML, plus it has methods for writing the XML back to a file also.
I came accross this article on PerlMonks by Randal Schwartz:
http://www.perlmonks.org/?node_id=287656
If I download the "original" file that's on SmugMug it has both a different file size an MD5Sum then the image that was originally uploaded. The file is actually modified in some way shape or form after it gets uploaded to SmugMug's servers.
The MD5 calculation is correct, as if the MD5Sum is provided in the upload request, the image is still processed. If the MD5 sum calculated by the server was different to the MD5 sum provided in the upload request, the upload would fail as the MD5s didn't match.
I have verified this fact using some upload tools.
SmugMug API Developer
My Photos
That is very unexpected, devbobo do you guys do ANY parsing of the file? Maybe trimming off unused EXIF data? Or possibly changing the embedded thumbnail?
Perhaps the OP can give this image to devbobo to take a look at?
I uploaded probably 100 images last night and none of them had any MD5Sum differences, meaning that nothing changed to the original file that I uploaded. This only appears to be happening to certain images, and I'm not sure yet what the pattern is for what causes SmugMug's servers to alter the original file after it's uploaded.
I put together a nice little package in a .zip file that I linked to on my first post. The package includes the Perl script and two image files that can be used to reproduce the results. One of the images is the one I discovered this problem on, and the other is one that uploads just fine.
SmugMug API Developer
My Photos
Ok, I figured it was something like that. Makes sense now, and it gives me something to look for, so if I find an image that's not in sRGB then to expect it to be converted when uploaded.
Thanks for getting an answer on this one.