Smugmug uploader makes a big mess (170 duplicates)
jfriend
Registered Users Posts: 8,097 Major grins
The @#$%^& Smugmug uploader made a big mess today. I'm attempting to upload 1615 images to a single gallery (this will be the master gallery from which I make many smart galleries that pull images from this). To summarize, I use the default Smugmug uploader on Chrome. I have duplicate protection on. I drop 1615 images into the uploader. When I'm all said and done, I have 1785 images in the gallery. That's 170 duplicates. Now I've got a giant mess to sort out to get rid of all the duplicates. I will probably have to write a script just to find all the dups and remove them.
I simply can't believe you guys can't design a reliable uploader. I've been encouraging you to fix this for 6 years now. You made some attempts with the latest uploader you released a little while ago, but under stress, it's just chock full of bugs that can make a giant mess.
Now, the upload sequence of events wasn't 100% normal (I'll describe in a bit), but still with duplicate protection on, I have no idea why your uploader would give me 170 dups. How difficult is it to do that part right?
Anyway, here's the sequence of events:
List of bugs/problems encountered:
I can't believe you guys can't make a reliable uploader. This single mess will probably cost me 4-8 hours to clean up and I've got to write code (beyond the means of most customers) just to figure out how many unique images really are there and which dups to get rid of.
FYI, for those of you who know I usually use StarExplorer for uploads like this, I was using a new laptop at a remote location, didn't have StarExplorer installed on that laptop yet (license file complications), had simplified my upload into one gallery and falsely believed that Smugmug has made their uploader a lot more robust for big uploads. Apparently, it isn't yet up to the task.
Edit: I have verified via a script that all 1615 files got uploaded and there are 170 extra duplicates (no two dups are the same - there are 170 unique dups - so it's not like one image caused the duplication). Here's a sampling of the filenames and upload times when there are dups:
I simply can't believe you guys can't design a reliable uploader. I've been encouraging you to fix this for 6 years now. You made some attempts with the latest uploader you released a little while ago, but under stress, it's just chock full of bugs that can make a giant mess.
Now, the upload sequence of events wasn't 100% normal (I'll describe in a bit), but still with duplicate protection on, I have no idea why your uploader would give me 170 dups. How difficult is it to do that part right?
Anyway, here's the sequence of events:
- Create new gallery
- Bring up default Smugmug uploader on a brand new computer using latest version of Chrome
- Drop 1615 JPEGs into the uploader
- Uploading starts
- After several hundred images, there's a neighborhood power outage. I'm using my laptop so the computer is not affected by the power outage, but the internet connection goes down.
- Power is down about 3-4 minutes and then comes back on.
- The downloader does not appear to recover, even when the internet connection has been restored as it appears to still be trying to upload several images and they aren't going anywhere.
- I wait several minutes. Uploader doesn't seem to be recovering.
- I close the uploader.
- Reopen the uploader, redrop all 1615 files into the uploader figuring that duplicate protection is on so it should be OK.
- It starts uploading again. After another few hundred files, I notice that two uploading images (it seems to normally like to do 3 at a time) are permanently stuck on the "verifying..." step that normally happens at the end of each image upload. I wait and wait and they never proceed on beyond that step.
- Because doing one upload image at a time is signfiicantly slower than the usual three at a time, I close the uploader again, reopen it, drop all 1615 files in again.
- Uploads start going again, 3 at a time.
- A few hundred images later, it again shows two images stuck on the verifying step so it's going slower again. I'm using borrowed time on someone else's internet connection for this upload at their house so I can't just walk away and go about doing other things. So, after waiting for awhile, I again close the uploader, reopen it, drop all 1615 files in again.
- Finally, it finishes.
- I check the gallery and see that somehow my gallery has 1785 images in it from 1615 original images - apparently at least 170 dups. Now I don't even trust that all 1615 images are actually there. There could be more than 170 dups and some missing images.
- Big mess.
List of bugs/problems encountered:
- When the internet connection dropped and then recovered, the uploader did not resume properly.
- When uploading lots of images, some of the images get stuck on the verifying step. This happened on four images. It does not recover from that and once it gets stuck, the upload throughput slows down (there are valid reasons to do three at once).
- When stopping the uploader and restarting with all images again, the uploader makes LOTS of duplicates.
I can't believe you guys can't make a reliable uploader. This single mess will probably cost me 4-8 hours to clean up and I've got to write code (beyond the means of most customers) just to figure out how many unique images really are there and which dups to get rid of.
FYI, for those of you who know I usually use StarExplorer for uploads like this, I was using a new laptop at a remote location, didn't have StarExplorer installed on that laptop yet (license file complications), had simplified my upload into one gallery and falsely believed that Smugmug has made their uploader a lot more robust for big uploads. Apparently, it isn't yet up to the task.
Edit: I have verified via a script that all 1615 files got uploaded and there are 170 extra duplicates (no two dups are the same - there are 170 unique dups - so it's not like one image caused the duplication). Here's a sampling of the filenames and upload times when there are dups:
--John
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
0
Comments
I'm very sorry about your uploading troubles. I assume you are using the default html5 uploader.
Can you tell how many photos were already uploaded in the gallery when the power went out?
Would you be able to give the Simple uploader a try instead and see if that works better in your case?
Looking at your jfriend site, I cannot see any recent uploads in the last 7 days. Which account / gallery were you uploading to so we could take a closer look?
SmugMug Support Hero
May I suggest a desktop solution made by us, exactly for situations like this. It not only uploads images to Smugmug (it supports uploading to picasa, flickr, skydrive, dropbox, box and facebook), it checks for duplicates too, and be assured you won't get any duplicate images in your gallery :-) It can handle thousands of images with ease. Give it a try I am sure you will like it :-) you can download it from www.picbackman.com or from cnet http://download.cnet.com/PicBackMan/3000-13455_4-75650267.html?tag=mncol;1
Thanks.
While we look into the issue for you, why not simply delete the gallery and upload again? While the files are uploading, you can be doing other things, so there's no 4-8 hours of clean up.
I'm happy to have a Hero work on your gallery and delete the duplicates, if you don't want to do that, just email us at the help desk ATTN: Andy.
Portfolio • Workshops • Facebook • Twitter
I'll contact you directly Andy about whether it makes sense to have a hero fix this or whether I should write the code to do so. I've written code that has identified the 170 duplicates, but haven't yet written code to delete them (that appears to involve using oauth with the API which I haven't done before).
Sebastian, the gallery is in my friend.smugmug.com account.
And it's in an unlisted gallery which I don't want Google to find at /Sports/Palo-Alto-Rowing-Club-2012/All-Regattas/22955010_vCZHfV.
Andy or Sebastian, do you guys want to have anyone look at the results in that gallery before I start fixing/changing it?
Sebastian, I'm not going to try this upload with the simple uploader. On my home bandwidth, this is about a 24 hour upload with our home internet access compromised the whole time - it's not a popular thing in the house to do. That's why I had arranged to go over to a friend's house with much faster upload access to do this upload. If I were going to upload again, I'd use StarExplorer which has been more reliable to me in the past. The uploader was whatever the default uploader would be in the Chrome browser (I assumed the default is the HTML5 uploader), but if you tell me how to identify one uploader from another, I could confirm which one it was. I always thought your uploaders should have some sort of visible name on them for this type of troubleshooting.
If you guys want to experiment with different uploaders, I'll give you a DVD with the 1615 images on it, but I don't think there's anything special about these images. They are just JPEGs 3-6MB in size (depending upon how much cropping there was). Just run some tests yourself with long/large uploads.
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
My Website index | My Blog
Right now I'm working on what's the best way to fix the gallery that's screwed up.
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
I haven't gotten any acknowledgement from Smugmug about any of the three bugs I observed in the uploader though.
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
We offered to fix the duplicate issue for you.
I also stated that I'd have people looking into this. You haven't heard back from me because I don't have anything to say yet about the issue you filed - except what I did reply to you about, which is we're accepting millions and millions of files and haven't ever seen the issue you wrote us about. Thanks John!
Portfolio • Workshops • Facebook • Twitter
On your response, I guess I was expecting to hear something like: "Thanks for reporting those issues and putting together the detailed sequence of events - we'll file those three issues as bugs and have our sorcerers look into them. When I get more info, I'll post back."
As a piece of feedback to you, when you say: "we're accepting millions and millions of files and haven't ever seen the issue you wrote us about", that offers me zero comfort. In fact, it makes me think you don't think what happened to me has a very high priority or maybe you don't even think it's a credible issue. I don't know if you intended it that way, but put yourself in my shoes. That statement does not make me feel better at all. I'd rather hear that you will file these as issues and have people look into them.
The customer (me in this case) generally doesn't care that it doesn't happen to a lot of other people. If it happens to them, it's real and it seems important to them and trying to deflect the important of the issue by saying it isn't happening to anyone else just feels like you're telling me my issue isn't very important. These bugs wasted a lot of my time because your uploader can't reliably handle some circumstances. Those are facts.
If you don't intend for it to handle 1600 files at a time or don't intend for it to handle an occasional internet connection hiccup or don't intend for the duplicate protection to be reliable, then just let us know so we can lower our expectations and continue to bother you about how you should have a more robust uploader. But, I was under the impression that you thought you now had a more robust uploader and when I finally tried to use it as such, it failed me significantly.
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
Thanks for the feedback, John! You and I communicate by email and have for years. I told you I'd make sure the team saw this, and we'll certainly investigate. Until we can replicate internally we can't do anything - so we'll try and do just that, replicate it. Thanks again for the valuable feedback.
Portfolio • Workshops • Facebook • Twitter
It's hard to know exactly what circumstance triggered the duplicates problem. I would theorize that a smart developer doing a thorough code review of the parts of the code that handles duplicates could probably identify several likely causes in a few hours of code inspection and could likely find issues more thoroughly and find them quicker than someone trying to reproduce the issue with no inspection of the code. Said another way, some issues are fixed much more effectively via whitebox code inspection/review rather than blackbox testing. There are bugs worth filing that you don't have a reproducible case and you challenge the developer to go figure out how the code could fail by examining the code, designing a test case that exploits that code weakness and then fixing the code weakness.
It's also hard to know exactly what circumstance triggered the images that got stuck on the "verifying" step. It could have been a momentary connection issue where something got lost or it could have been a hiccup on the upload server that failed to send it or it could have been some sort of failure to see the right event in the client. Again, code review on the client could identify places where the client isn't properly protected against any of these issues or doesn't have a fallback code path if the verifying event is never received.
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
Portfolio • Workshops • Facebook • Twitter
http://www.huntsvillecarscene.com/smug/duplicates.php
If you don't mind sharing your script, I think it would help a lot of us. A fellow SM'r local to me called me up after something similar happened to him on a 1600+ image gallery.
Want faster uploading? Vote for FTP!
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
And just a word of thanks to the guys at Smugmug for the uploaders. lust
I had no idea how good they were until I tried to upload a couple of thousand thunbmails totaling just 50mb to Facebook. It's been over 12hrs and they're still not uploaded. Duplicates? Yes. Missing ones? Yes. It's a complete nightmare. Facebook is the most alpha-level web-site in production I've ever seen. I'm not looking forward to using it at all. :cry
Want faster uploading? Vote for FTP!
When I'm done, I loop through and remove all the items from the dups array using the API.
Note: this algorithm assumes that different images will never have the same filename. I know that is true for my images because I have a date/time code in my filenames along with the orginal camera-generated file number, but unique filenames is not necessarily true for other people's images so a more careful algorithm would probably also check the image size and perhaps the last modified time.
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
Want faster uploading? Vote for FTP!
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
It takes into account image file name, file size, image dimensions and original date, if available.
You can activate it from the context menu of both albums list and Category/Subcategory tree (so you can check the entire Category for dups:-)
FWIW, it was this thread that brought this problem to my attention, so hopefully somebody else will not suffer through what John and Samir did.
HTH
Nikolai
But you're absolutely right about development of tools like this. Lots of base cases when you expand the use to many users compared to individual use. Thank you very much for the update Nikolai! It's been a while since I've visited SE, and now that I've got some newer computers, I'll have to take a look at it again.
Want faster uploading? Vote for FTP!
This is really getting way beyond annoying.
Hey I don't know if it ok to write here. But we have a product PicBackMan which helps you to upload thousands of pictures without duplicates. You can download it at www.picbackman.com
Sujit
Disclosure : I am involved with PicBackMan as a developer.
What uploader are you using?
If only 1000 pics are read from the gallery and checked against the uploaded pics is fine, but I need to know it.
Are the other uploaders checking for duplicates (I guess I'll just try for myself anyway)?
Note that the duplicate detection of our web uploaders only works for photos already in the gallery when the uploader is opened up. So to have it consider photos you just uploaded, go to the gallery and open the uploader again.
It won't take any photos into account that you just uploaded with the same or any other window.
SmugMug Support Hero
My Star*Explorer (http://www.starexplorer.com) does that and then some... And you don't have to break it down in batches by 100... :-) It has free 30 day trial, so as long as you're on Windows, you may try it and see it fit meets your needs...
But really for my limited use it is too much, in more ways than one.
I'm testing now the "simple" (Java) uploader, it is much slower (more than one second per duplicate skipped, on a quad no less...) but that isn't particularly bad. What's worse is that first time it crashed, then I checked my Java version and it seems it was behind (and it couldn't update automatically). So I updated Java and now it seems to be even slower (and it crashed just when starting first time).
Not giving up yet, if I find one way that works it's fine (I really don't want much, just basic functionality) but if not next year I'll have to move on from smugmug (oh, and get something with real nested categories support, oh, that would be a breath of fresh air).
SmugMug Support Hero
I wouldn't worry about this part (crashing) yet.
Last try yesterday it worked for hours (I said it takes more than one second to skip one duplicate - it's in fact way more than 1s). It did manage to skip everything fine (at least at first sight) yesterday but then I didn't have time to let it upload completely. Hopefully it will run through today and anyway even if it does I plan to run it once more to see if it skips everything as expected. I'll report back in any case.
Can somebody with a largish (1500+ pics) gallery and access to the original folder make a test in the meantime? Just drag+drop all files in the HTML5 uploader and see if it says (for example) 1600 duplicates or it starts again uploading something like 600 pics (if it starts uploading you can cancel it fast enough before it uploads anything, just keep an eye on it and kill it so you don't end up with duplicates as well).
I always have to check the number of files uploaded though as the uploader can miss some or upload dupes. Finding a single dupe in a batch of 2000 is a real pain, so I developed a tool that can help me find it in a second. I used to use a technique that used the uploader, but that just takes waaaay too long on large galleries.
Want faster uploading? Vote for FTP!