Victory! The evil outage monster has been slain!
onethumb
Administrators Posts: 1,269 Major grins
I'm going to tenatively declare victory.
That was pretty sucky. I've been feverishly working non-stop trying to stop the site from being dog slow since early this morning. 8 hours later, I think the fire-breathing outage monster is dead for good.
For those who like the gory details, here's what's up:
One of our database servers was getting old and crusty. We bought it more than two years ago, and it lasted far longer than I could have imagined. It was one of the very first AMD Opteron servers sold on the market and contained two Opteron 240 CPUs and 8GB of RAM.
The beast we got to replace is is a 4-way dual-core (8 CPUs!) Opteron 875 monster with 32GB of RAM. It's attached to faster disks, too.
On paper, that means at *least* a 4X speed increase, and very probably more, right? Well, yes, unless you (accidentally!) misconfigure the dang thing.
We've been running it in the cluster for a few weeks and it's been screaming along. After weeks of you and I using it, it seemed like all the kinks had been worked out. So during our standard maintenance window this morning, I swapped the two servers and watched it merrily speed the site up. A lot. Off I went to bed, dreaming of how happy our customers would be to find their beloved photos cruising across the net at blazing speeds.
Hours later, awakened by my awesome customer support reps, I discovered something was wrong. Horribly wrong! The shiny new beast was crawling along far slower than the crusty old box it had replaced.
I spent the next 8 hours wrangling with the server, the operating system, the database software, and everything else in between. I called up the reserves in the form of our Enterprise-level support subscriptions for all the various pieces of software we run. Top engineers all over the world were pulling their hair out alongside me, and no doubt, my poor customers.
I managed to get the site to behave enough to provide a painfully slow version of the site for a few hours while we worked to figure out what had gone wrong.
Finally, long story short, even the authors of said software were stumped. Finally, I stumbled across (after 8 hours of trial and error) the solution(it was a single line of text in a configuration file).
We're back, baby!
I'm so sorry we had such a lousy performance record this morning. I know how painful it was for you, your friends, and your customers.
Thanks for being so patient. You truly are the best customers in the world.
Silver lining moment: The site should be considerably faster than it's ever been, and will continue to improve.
Don
That was pretty sucky. I've been feverishly working non-stop trying to stop the site from being dog slow since early this morning. 8 hours later, I think the fire-breathing outage monster is dead for good.
For those who like the gory details, here's what's up:
One of our database servers was getting old and crusty. We bought it more than two years ago, and it lasted far longer than I could have imagined. It was one of the very first AMD Opteron servers sold on the market and contained two Opteron 240 CPUs and 8GB of RAM.
The beast we got to replace is is a 4-way dual-core (8 CPUs!) Opteron 875 monster with 32GB of RAM. It's attached to faster disks, too.
On paper, that means at *least* a 4X speed increase, and very probably more, right? Well, yes, unless you (accidentally!) misconfigure the dang thing.
We've been running it in the cluster for a few weeks and it's been screaming along. After weeks of you and I using it, it seemed like all the kinks had been worked out. So during our standard maintenance window this morning, I swapped the two servers and watched it merrily speed the site up. A lot. Off I went to bed, dreaming of how happy our customers would be to find their beloved photos cruising across the net at blazing speeds.
Hours later, awakened by my awesome customer support reps, I discovered something was wrong. Horribly wrong! The shiny new beast was crawling along far slower than the crusty old box it had replaced.
I spent the next 8 hours wrangling with the server, the operating system, the database software, and everything else in between. I called up the reserves in the form of our Enterprise-level support subscriptions for all the various pieces of software we run. Top engineers all over the world were pulling their hair out alongside me, and no doubt, my poor customers.
I managed to get the site to behave enough to provide a painfully slow version of the site for a few hours while we worked to figure out what had gone wrong.
Finally, long story short, even the authors of said software were stumped. Finally, I stumbled across (after 8 hours of trial and error) the solution(it was a single line of text in a configuration file).
We're back, baby!
I'm so sorry we had such a lousy performance record this morning. I know how painful it was for you, your friends, and your customers.
Thanks for being so patient. You truly are the best customers in the world.
Silver lining moment: The site should be considerably faster than it's ever been, and will continue to improve.
Don
0
Comments
You can sleep again now :
Luke
PS. The outage warning message was very helpful. Good marks for communications there from me. Other people's milage may vary.
SmugSoftware: www.smugtools.com
I'm a software engineer/architect myself.
I remember, years ago, i was assigned the task to find a resource-leak in our OS/2 version of our product. It took me 6 weeks and a trip to the USA (i lived in the Netherlands then) to fix the damn thing.
The fix was adding 2 lines of code! No more!
In Holland we have the proverb: "An accident sits in a little corner". A very little one indeed
-- Anton.
When I hear the earth will melt into the sun,
in two billion years,
all I can think is:
"Will that be on a Monday?"
==========================
http://www.streetsofboston.com
http://blog.antonspaans.com
Time to call the school so the superintendant can see now (timing wasn't so great for being down but you all are fantastic and forgiven
It scared me this morning thinking I'd done something to mess up (an advance warning of maintance & possible problems would have been great... maybe next time - please - thank you).
Thanks,
Cindy
Yeah, I got 2 gbs of ram installed yesterday, up from 700, and everthing slowed down on smugmug. I was ignoring the fact that I was getting slower stuff after spending so much on memory. SO GLAD, it was YOUR fault, not mine. Smile.
thanks,
ginger
Thanks, Don.
I know from building a carrier/large enterprise-class online service that the hardest thing in software engineering and network operations is to be able to test things at scale in the lab before going online with real traffic.
We ended up investing nearly 50% of our ongoing engineering resources devoted to the service aspect of our business in the ability to test stuff at scale before customers saw it. On the one hand, it really slowed down our ability to develop new features for the service, but on the other hand, it's what our customers wanted us to do and it's paid off for us. It sounds like you're trying to do the right thing, but it's definitely hard. Good luck.
--John
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
Thanks to you and the Smugmug Team for all the hard work getting this back up and running again. It is much faster than eariler today and seems to be running as it should.
Thanks again!
Fred
Smug since 2006
SmugMug Help
PhotoscapeDesign
Well, my site is wrong till the Maintainace Window disapear
All the things are to the left and with no order. If you click on any gallery they appear the same way, the photos to the left one on top of each other, etc.
:cry
PLEASE HELP!
Check it out:
www.pablopuga.com
It looks OK to me:
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
http://photos.mikelanestudios.com/
The image i see in your post, is that Safari 1.2 or IE5 for Mac? If it is IE5 for Mac, forget about this browser.... Even IE5 on Windows is no longer (officially) supported by Smugmug.
-- Anton.
When I hear the earth will melt into the sun,
in two billion years,
all I can think is:
"Will that be on a Monday?"
==========================
http://www.streetsofboston.com
http://blog.antonspaans.com
Now my site its ok!
I'm using Safari 1.3.1 on my office computer. And my laptop I have the 2.0.1 and on both a few minutes ago my site was like my first post. But now it's ok!
Thanks for your answers!
www.pablopuga.com
Try refresh, if that doesn't work, purge your temporary cache and try again. It'll probably go away.
Cheers,
Luke
SmugSoftware: www.smugtools.com
gus, isn't it amazing how fast we can make those hamsters run??
Portfolio • Workshops • Facebook • Twitter
y'see, gus - even you can help onethumb in building a yet faster smugmug
Portfolio • Workshops • Facebook • Twitter
Ken
Everything had cleared up late yesterday, and things were very fast.
Now I'm back with the same problem, small thumbnails for each forum lining up on the left one-eigth of the page.
I've tried multiple browsers, cleared the caches, rebooted, etc.
richs.smugmug.com
hi there rich -- please email this to help@smugmug.com thanks very much. sorry for your troubles... very strange as your site is showing up normally for me (safari, firefox)
Portfolio • Workshops • Facebook • Twitter
As a sometimes-software testing engineer, I hate non-reproducible problems that resolve without a known fix.
As a user, I'm happy again....:):
anyone who really understands these things wanna back me up here... or alternatively expose me for the pseudo-techie i really am and shoot my theory down in nice, toasty, orange flames?
:hide
j
I just emailed help about this (I was having the same problem).
The trick they gave me was to hold shift and then press reload. This fixes a corrupt stylesheet.
It worked for me!!!
PS... Just noticed this is my 15th post, and Geico wants to remind you that a 15 minnute call can save you a bunch of money on your car insurance.