Options

Victory! The evil outage monster has been slain!

onethumbonethumb Administrators Posts: 1,269 Major grins
edited October 14, 2005 in SmugMug Support
I'm going to tenatively declare victory.

That was pretty sucky. I've been feverishly working non-stop trying to stop the site from being dog slow since early this morning. 8 hours later, I think the fire-breathing outage monster is dead for good.

For those who like the gory details, here's what's up:

One of our database servers was getting old and crusty. We bought it more than two years ago, and it lasted far longer than I could have imagined. It was one of the very first AMD Opteron servers sold on the market and contained two Opteron 240 CPUs and 8GB of RAM.

The beast we got to replace is is a 4-way dual-core (8 CPUs!) Opteron 875 monster with 32GB of RAM. It's attached to faster disks, too.

On paper, that means at *least* a 4X speed increase, and very probably more, right? Well, yes, unless you (accidentally!) misconfigure the dang thing.

We've been running it in the cluster for a few weeks and it's been screaming along. After weeks of you and I using it, it seemed like all the kinks had been worked out. So during our standard maintenance window this morning, I swapped the two servers and watched it merrily speed the site up. A lot. Off I went to bed, dreaming of how happy our customers would be to find their beloved photos cruising across the net at blazing speeds.

Hours later, awakened by my awesome customer support reps, I discovered something was wrong. Horribly wrong! The shiny new beast was crawling along far slower than the crusty old box it had replaced.

I spent the next 8 hours wrangling with the server, the operating system, the database software, and everything else in between. I called up the reserves in the form of our Enterprise-level support subscriptions for all the various pieces of software we run. Top engineers all over the world were pulling their hair out alongside me, and no doubt, my poor customers.

I managed to get the site to behave enough to provide a painfully slow version of the site for a few hours while we worked to figure out what had gone wrong.

Finally, long story short, even the authors of said software were stumped. Finally, I stumbled across (after 8 hours of trial and error) the solution(it was a single line of text in a configuration file).

We're back, baby!

I'm so sorry we had such a lousy performance record this morning. I know how painful it was for you, your friends, and your customers.

Thanks for being so patient. You truly are the best customers in the world.

Silver lining moment: The site should be considerably faster than it's ever been, and will continue to improve.

Don
«1

Comments

  • Options
    luke_churchluke_church Registered Users Posts: 507 Major grins
    edited October 11, 2005
    onethumb wrote:
    I'm going to tenatively declare victory.
    Good going. I don't envy you, I've been there in the past... Has the adrenaline gone yet? Well done for getting there anyhow.

    You can sleep again now :):

    Luke

    PS. The outage warning message was very helpful. Good marks for communications there from me. Other people's milage may vary.
  • Options
    GatorGator Registered Users Posts: 192 Major grins
    edited October 11, 2005
    Thanks so much for all the hard work! You are appreciated very much!!
  • Options
    lynnmalynnma Registered Users, Retired Mod Posts: 5,207 Major grins
    edited October 11, 2005
    Thanks dearie.. I knew it was slow but I thought it was me.. 1drink.gif I'd much rather it be you being slow than me rolleyes1.gif Awesome service and support thumb.gif
  • Options
    flyingdutchieflyingdutchie Registered Users Posts: 1,286 Major grins
    edited October 11, 2005
    Good work, Don.

    I'm a software engineer/architect myself.
    I remember, years ago, i was assigned the task to find a resource-leak in our OS/2 version of our product. It took me 6 weeks and a trip to the USA (i lived in the Netherlands then) to fix the damn thing.
    The fix was adding 2 lines of code! No more!
    In Holland we have the proverb: "An accident sits in a little corner". A very little one indeed thumb.gif
    -- Anton.
    I can't grasp the notion of time.

    When I hear the earth will melt into the sun,
    in two billion years,
    all I can think is:
        "Will that be on a Monday?"
    ==========================
    http://www.streetsofboston.com
    http://blog.antonspaans.com
  • Options
    CindyCindy Registered Users Posts: 542 Major grins
    edited October 11, 2005
    YIPPIE!!! Thanks bunches & bunches! I'm sooooooooo glad we're back.
    Time to call the school so the superintendant can see now (timing wasn't so great for being down but you all are fantastic and forgiven :)
    It scared me this morning thinking I'd done something to mess up (an advance warning of maintance & possible problems would have been great... maybe next time - please - thank you).

    Thanks,
    Cindy
    Cindy Colbert (Utterback) • Wishing You Co-Bear Love, Hugs & Laughter!!!
  • Options
    ginger_55ginger_55 Registered Users Posts: 8,416 Major grins
    edited October 11, 2005
    Gosh, and I am busy right now, can't do my photos, gotta watch the d movie before it burns up.

    Yeah, I got 2 gbs of ram installed yesterday, up from 700, and everthing slowed down on smugmug. I was ignoring the fact that I was getting slower stuff after spending so much on memory. SO GLAD, it was YOUR fault, not mine. Smile.

    thanks,
    ginger
    After all is said and done, it is the sweet tea.
  • Options
    Ric GrupeRic Grupe Registered Users Posts: 9,522 Major grins
    edited October 11, 2005
    onethumb wrote:
    Finally, I stumbled across (after 8 hours of trial and error) the solution(it was a single line of text in a configuration file).
    Don
    :whip :whip :whip


    Thanks, Don. ylsuper.gif
  • Options
    jfriendjfriend Registered Users Posts: 8,097 Major grins
    edited October 11, 2005
    Testing at scale is the hardest thing in software
    onethumb wrote:
    I'm going to tenatively declare victory.
    I know from building a carrier/large enterprise-class online service that the hardest thing in software engineering and network operations is to be able to test things at scale in the lab before going online with real traffic.

    We ended up investing nearly 50% of our ongoing engineering resources devoted to the service aspect of our business in the ability to test stuff at scale before customers saw it. On the one hand, it really slowed down our ability to develop new features for the service, but on the other hand, it's what our customers wanted us to do and it's paid off for us. It sounds like you're trying to do the right thing, but it's definitely hard. Good luck.

    --John
    --John
    HomepagePopular
    JFriend's javascript customizationsSecrets for getting fast answers on Dgrin
    Always include a link to your site when posting a question
  • Options
    Techman1Techman1 Registered Users Posts: 155 Major grins
    edited October 11, 2005
    Don,

    Thanks to you and the Smugmug Team for all the hard work getting this back up and running again. It is much faster than eariler today and seems to be running as it should.

    Thanks again! clap.gif

    Fred
  • Options
    BarbBarb Administrators Posts: 3,352 SmugMug Employee
    edited October 11, 2005
    Super support from a super site. Appreciate your hard work :)
    Barb
    Smug since 2006
    SmugMug Help
    PhotoscapeDesign
  • Options
    ppugappuga Registered Users Posts: 100 Major grins
    edited October 11, 2005
    Hello guys!

    Well, my site is wrong till the Maintainace Window disapear ne_nau.gif
    All the things are to the left and with no order. If you click on any gallery they appear the same way, the photos to the left one on top of each other, etc.

    :cry

    PLEASE HELP!

    Check it out:
  • Options
    jfriendjfriend Registered Users Posts: 8,097 Major grins
    edited October 11, 2005
    Looks OK to me
    ppuga wrote:
    Hello guys!

    Well, my site is wrong till the Maintainace Window disapear ne_nau.gif
    All the things are to the left and with no order. If you click on any gallery they appear the same way, the photos to the left one on top of each other, etc.

    :cry

    PLEASE HELP!

    Check it out:
    It looks OK to me:

    39651057-O.jpg
    --John
    HomepagePopular
    JFriend's javascript customizationsSecrets for getting fast answers on Dgrin
    Always include a link to your site when posting a question
  • Options
    Mike LaneMike Lane Registered Users Posts: 7,106 Major grins
    edited October 11, 2005
    jfriend wrote:
    It looks OK to me:

    39651057-M.jpg
    15524779-Ti.gif
    Y'all don't want to hear me, you just want to dance.

    http://photos.mikelanestudios.com/
  • Options
    flyingdutchieflyingdutchie Registered Users Posts: 1,286 Major grins
    edited October 11, 2005
    ppuga wrote:
    Hello guys!

    Well, my site is wrong till the Maintainace Window disapear ne_nau.gif
    All the things are to the left and with no order. If you click on any gallery they appear the same way, the photos to the left one on top of each other, etc.

    :cry

    PLEASE HELP!

    Check it out:
    I tried your site on Mozilla, Mozilla FireFox, IE6.0, Opera 8, Netscape 7 and up: Your site looks fine. It seems to be a problem only on your browser.
    The image i see in your post, is that Safari 1.2 or IE5 for Mac? If it is IE5 for Mac, forget about this browser.... Even IE5 on Windows is no longer (officially) supported by Smugmug.
    -- Anton.
    I can't grasp the notion of time.

    When I hear the earth will melt into the sun,
    in two billion years,
    all I can think is:
        "Will that be on a Monday?"
    ==========================
    http://www.streetsofboston.com
    http://blog.antonspaans.com
  • Options
    ppugappuga Registered Users Posts: 100 Major grins
    edited October 11, 2005
    clap.gifclap.gifclap.gif

    Now my site its ok!

    I'm using Safari 1.3.1 on my office computer. And my laptop I have the 2.0.1 and on both a few minutes ago my site was like my first post. But now it's ok!

    thumb.gif

    Thanks for your answers!
  • Options
    asdasd Registered Users Posts: 115 Major grins
    edited October 11, 2005
    I missed the outage, but I'm loving how lightning fast my site is now - thanks for the speedup!! bowdown.gif
  • Options
    Mac WriteMac Write Registered Users Posts: 208 Major grins
    edited October 11, 2005
    I concur, I had the exact same problem. dumbed cache and now it's fine.
    My Photos | Use this referral code and get $5 off your first year of Smugmug! PIKZSgEQUVtu2 or just click here
    Get busy living or get busy dying
    --Stephen King
  • Options
    gusgus Registered Users Posts: 16,209 Major grins
    edited October 12, 2005
    And here i was thinking you blokes just used a headless fedora 4 box ..
  • Options
    luke_churchluke_church Registered Users Posts: 507 Major grins
    edited October 12, 2005
    ppuga wrote:
    Hello guys!

    Well, my site is wrong till the Maintainace Window disapear ne_nau.gif

    Check it out:
    This happened to me immediatly after the end of maintence. In Interner Explorer Ctrl+R forces a reload. I did that suspecting a daft problem and it went away.

    Try refresh, if that doesn't work, purge your temporary cache and try again. It'll probably go away.

    Cheers,

    Luke
  • Options
    gusgus Registered Users Posts: 16,209 Major grins
    edited October 12, 2005
    Actually ive noticed its a lot faster over here in the never never also. thumb.gif
  • Options
    AndyAndy Registered Users Posts: 50,016 Major grins
    edited October 12, 2005
    Humungus wrote:
    Actually ive noticed its a lot faster over here in the never never also. thumb.gif

    gus, isn't it amazing how fast we can make those hamsters run??
  • Options
    gusgus Registered Users Posts: 16,209 Major grins
    edited October 12, 2005
    andy wrote:
    gus, isn't it amazing how fast we can make those hamsters run??
    I dont know how you do it with such small hampsters...let me know when your ready & i will send some real ones over.
  • Options
    AndyAndy Registered Users Posts: 50,016 Major grins
    edited October 12, 2005
    Humungus wrote:
    I dont know how you do it with such small hampsters...let me know when your ready & i will send some real ones over.


    y'see, gus - even you can help onethumb in building a yet faster smugmug lol3.gif
  • Options
    kwalshkwalsh Registered Users Posts: 223 Major grins
    edited October 12, 2005
    I concur that the site is now wicked fast. Keep up the good work!

    Ken
  • Options
    costantinidiscostantinidis Registered Users Posts: 5 Big grins
    edited October 12, 2005
    ppuga wrote:
    Hello guys!

    Well, my site is wrong till the Maintainace Window disapear ne_nau.gif
    All the things are to the left and with no order. If you click on any gallery they appear the same way, the photos to the left one on top of each other, etc.
    I had the same problem. I think my browser had a cached some corrupt style sheets or something. Restarting the browser made everything appear just fine.
  • Options
    RichSRichS Registered Users Posts: 32 Big grins
    edited October 12, 2005
    Problems are back for me
    Everything had cleared up late yesterday, and things were very fast.

    Now I'm back with the same problem, small thumbnails for each forum lining up on the left one-eigth of the page.

    I've tried multiple browsers, cleared the caches, rebooted, etc.

    richs.smugmug.com
  • Options
    AndyAndy Registered Users Posts: 50,016 Major grins
    edited October 12, 2005
    RichS wrote:
    Everything had cleared up late yesterday, and things were very fast.

    Now I'm back with the same problem, small thumbnails for each forum lining up on the left one-eigth of the page.

    I've tried multiple browsers, cleared the caches, rebooted, etc.

    richs.smugmug.com

    hi there rich -- please email this to help@smugmug.com thanks very much. sorry for your troubles... very strange as your site is showing up normally for me (safari, firefox)
  • Options
    RichSRichS Registered Users Posts: 32 Big grins
    edited October 12, 2005
    Hmmm - I didn't change anything and now it's back to normal.

    As a sometimes-software testing engineer, I hate non-reproducible problems that resolve without a known fix.

    As a user, I'm happy again....:):
  • Options
    jamescalderjamescalder Registered Users Posts: 61 Big grins
    edited October 12, 2005
    RichS wrote:
    Hmmm - I didn't change anything and now it's back to normal.

    As a sometimes-software testing engineer, I hate non-reproducible problems that resolve without a known fix.

    As a user, I'm happy again....:):
    did any IP addresses change as a result of the work done in the past few days? if so, then that might explain the occasional regurgitation of the dud links, due to a DNS server somewhere not updating... or possibly a Proxy server problem if you're on a LAN? of course i don't know what kind of clustering may be happening on the SM servers, so that's a whole nother level of interference that could explain it.

    anyone who really understands these things wanna back me up here... or alternatively expose me for the pseudo-techie i really am and shoot my theory down in nice, toasty, orange flames?

    :hide

    j
  • Options
    galla47galla47 Registered Users Posts: 100 Major grins
    edited October 12, 2005
    The fix
    I just emailed help about this (I was having the same problem).

    The trick they gave me was to hold shift and then press reload. This fixes a corrupt stylesheet.

    It worked for me!!!

    PS... Just noticed this is my 15th post, and Geico wants to remind you that a 15 minnute call can save you a bunch of money on your car insurance.
Sign In or Register to comment.