Uploading status update

onethumbonethumb Administrators Posts: 1,269 Major grins
edited October 9, 2006 in SmugMug Support
First of all, let me apologize for all of the problems yesterday and today with uploading (well, ok, so mostly it was with processing, not uploading, but to you, the customer, it's pretty much one and the same). We don't have any excuse, since this really shouldn't have happened, but I can at least fill you in on what's going on behind the scenes here.

Our upload queue typically hovers around 50 images pending, and we do parallel processing of images to the tune of dozens every second, on average, depending on the resolution, whether their colorspace needs to be converted to sRGB, etc. On a really heavy day, it may get as high as 1000.

Tonight it peaked at well over 60,000 images waiting to be processed.

We caught it "early", well below 10,000, around noon or so today. I sighed and moaned, then settled in to find whichever stupid server was lagging, determined to be done quickly so I could spend the day with my twins.

A few hours later, the queue was at 30,000 and I was no closer to finding the source of the problem. All servers looked good, things were processing, and I couldn't detect anything too major.

My first thought was that our new software releases on Thursday and Friday were the cause of the problem, especially because a major part of Thursday's release was an overhaul of our uploading backend to provide better error logging, processing, and reliability to our customers. Of course, the whole goal was to make things better, not worse, and our extensive internal testing had shown a dramatic improvement. Nonetheless, I spent quite a bit of time mucking around in the new release, benchmarking things and looking for obvious pitfalls. After a few hours, I came up empty - things seemed to be working as designed.

One of our master databases was under fairly heavy stress, but that's been on our radar for awhile now. We have two new boxes nearly ready to go to take the load, and even loaded like it was, it shouldn't cause these sorts of issues. I spent a few hours logging, tuning, and adjusting code just in case. It helped unload the DB box some, but image processing was still bogging down. It helped get the queue down from 32,000 to about 22,000, which was nice, but certainly no-where near acceptable.

Then BOOM! Someone or someones, I haven't gotten a chance to find out who yet, uploading more than 30,000 new images in the space of a few minutes, and the queue went north again - up to 60,000! This happens periodically, someone at Google or Yahoo or Microsoft with a fast connection can shove stuff down the pipe in a hurry. Usually it only takes us a few minutes to handle the load and move on - but not tonight.

By now, it was after 10pm and I felt no closer to our goal. The Master DB box was basically unloaded by this time, which validated that it wasn't the root cause of the problem. A possible contributing factor, still, sure, but not the root.

And then it hit me. I was looking at the problem all wrong - we'd benchmarked the new code as best we could on our internal test servers, but we didn't have a load like this. More than 300,000 images had passed through our uploading queue today. It could be a teeny, tiny slowdown that, when multiplied by hundreds of thousands, turned out to be huge.

It was. Just like the proverbial hackers who steal just fractions of a cent out of everyone's bank accounts, but still manage to get rich, we were dealing with fractions of a second here. I made one tiny, stupid, silly mistake and it caused a tenth of a second or so of extra delay in processing. Do some quick math, and a tenth of a second per photo for 300,000 photos is more than 8 hours of wasted CPU time. Yikes!

What was it? It was the simplest thing. The worst ones always are. Instead of reading the newly uploaded Original from our local, fast in-house storage, I was accidentally reading it from our storage cloud at Amazon using S3 first. Worse, since it was a brand new upload, it hadn't been stored at Amazon yet. Basically, our servers were going all the way to Seattle, asking for a photo, being told it wasn't on Amazon yet, and then they finally turned around and asked the server two feet away here in Silicon Valley.

So I believe it's fixed. We have a huge queue still (it was at 60,000 when I started writing this post, and it's now down to 40,000, so we're making fast progress), so I'm afraid you'll have to wait a little bit longer for all your photos to finish, but it looks like we're well on our way.

I'm not going to discount the possibility that I simply got lucky and everyone suddenly stopped uploading at the exact same instant I found my supposed fix, but it make so much sense I'm hopeful. :) We'll find out for sure tomorrow. Back to the drawing board, it not - so keep those fingers crossed.

As a nice side-effect, searching is now much much faster than it was (go give it a whirl), and some other portions of the site got some optimizations too.

Thanks for being so patient, I know how frustrating it can be not to have something "just work." We truly do have the best customers in the world.

I promise, even if similar problems do crop up in the future, we'll do everything humanly possible to work on a fix and get things running smoothly again - weekends, holidays, whatever it takes.

Don

Comments

  • SteveMSteveM Registered Users Posts: 482 Major grins
    edited October 9, 2006
    Great news, Don! I'm hopeful. I have to say, you guys certainly have some broad shoulders. Sorry your weekend sucked.
    Steve Mills
    BizDev Account Manager
    Image Specialist & Pro Concierge

    http://www.downriverphotography.com
  • DnaDna Registered Users Posts: 435 Major grins
    edited October 9, 2006
    Thanks for the update and thanks for hard work you put in.

    clap.gifclapclap.gif

    Dna
  • waillywailly Registered Users Posts: 1 Beginner grinner
    edited October 9, 2006
    Thanks for keeping us all informed. Although I'm still waiting, but I feel much better now; knowing you guys are on top of things thumb.gif
  • wslamwslam Registered Users Posts: 277 Major grins
    edited October 9, 2006
    This is the kind of posts that made me sign up with Smugmug in the first place. The CEO proactively made a post and to keep customers informed!
  • mhilbushmhilbush Registered Users Posts: 70 Big grins
    edited October 9, 2006
    Yes, it appears that keyword searches are working again. This is good news.
    Thanks!
    Mark
    Mark
  • JohnRJohnR Registered Users Posts: 732 Major grins
    edited October 9, 2006
    Thanks for the update! It's nice to get something like this (explanation) when it's not expected. clap.gif
  • ivarivar Registered Users Posts: 8,395 Major grins
    edited October 9, 2006
    Hi guys, thank you all for your patience. The queue has been back to normal for a while now, and everything seems to work fine. We're sorry for the inconvenience.
  • CameronCameron Registered Users Posts: 745 Major grins
    edited October 9, 2006
    As always, thanks for your honesty and continued dedication. I haven't associated with many companies that are as up-front about issues or as prompt to fix them - especially when it's obviously inconvenient for you!
    thumb.gif
  • gilbertgilbert Registered Users Posts: 177 Major grins
    edited October 9, 2006
    Thanks
    Thanks for keeping us posted here...I was all ready to upload the bulk of my pictures to the site yesterday but was trying to be patient...Now I know I won't get frustrated by slow processing thanks to the update :D

    I should have realized there was a problem when 30 images took hours to process...headscratch.gif Took another few hours for me to head over to DGrin to check out what the problem was! (not the brightest, am I??) eek7.gif
  • DJKennedyDJKennedy Registered Users Posts: 555 Major grins
    edited October 9, 2006
    onethumb wrote:

    So I believe it's fixed.... We'll find out for sure tomorrow. Back to the drawing board, it not - so keep those fingers crossed.

    Don

    Thanks for the information Don. I know for me at least, the frusteration levels deminish with information as to whats causing the frusteration.

    But I think it's back to the drawing board for you - I uploaded a file this morning (278 kb) and so far I've waited about 18 mins and I don't even show any evidence that I uploaded it yet. Nothing in the upload log.

    ne_nau.gif
    http://www.djkennedy.com

    What did Cinderella say when she left the photo shop? "One day my prints will come."

  • onethumbonethumb Administrators Posts: 1,269 Major grins
    edited October 9, 2006
    DJKennedy wrote:
    Thanks for the information Don. I know for me at least, the frusteration levels deminish with information as to whats causing the frusteration.

    But I think it's back to the drawing board for you - I uploaded a file this morning (278 kb) and so far I've waited about 18 mins and I don't even show any evidence that I uploaded it yet. Nothing in the upload log.

    ne_nau.gif

    I hate to be the bearer of bad news, but it doesn't look like we even think we got a photo from you this morning.

    The upload queue is completely normal, around 50 images pending at any given second, and moving fast.

    What uploader did you use? Even if we get a corrupted file or an empty one, we log it in your upload log, so this is a new one on me. Let's see if we can't figure it out.

    Don
  • DJKennedyDJKennedy Registered Users Posts: 555 Major grins
    edited October 9, 2006
    onethumb wrote:
    I hate to be the bearer of bad news, but it doesn't look like we even think we got a photo from you this morning.

    The upload queue is completely normal, around 50 images pending at any given second, and moving fast.

    What uploader did you use? Even if we get a corrupted file or an empty one, we log it in your upload log, so this is a new one on me. Let's see if we can't figure it out.

    Don
    I tried a different uploader than I normally use. It's the one that opens another window with a status bar. It ran its course....so dunno. I couldn't wait any longer as I had go to work.

    I will try again in about 15 mins.


    EDIT: shows now using the drag n drop uploader (my default) of a upload of 0s and a processing time of n/a but the image is in the gallery.

    I have no idea why that other uploader didn't work, but from my end, all indications shows that it uploaded (except nothing in the log but I assumed this meant the same problem as yesterday)

    Derek
    http://www.djkennedy.com

    What did Cinderella say when she left the photo shop? "One day my prints will come."

  • wellmanwellman Registered Users Posts: 961 Major grins
    edited October 9, 2006
    onethumb wrote:
    I promise, even if similar problems do crop up in the future, we'll do everything humanly possible to work on a fix and get things running smoothly again - weekends, holidays, whatever it takes.

    Don

    While I was unaffected by the issue, I'd still like to voice my appreciation for your candor and transparency.

    I just watched "Thank You for Smoking" over the weekend, and you folks are a wonderful respite from our culture of spinning the truth. Keep telling it like it is, and you'll certainly have no worries about my business.

    Kudos, and I hope you get some rest. thumb.gif
Sign In or Register to comment.