Well I will get real worried and P***ed when they go down and leave a message that it iwll be fixed real soon or down for maintaince and then a day, week and month later they have not shown back up and won't answer any email(s).....happened to me already with another really cool provider.....their servers went down and a message popped up....sever maintiance in progress we'll be back on line shortly.....that was 5 yrs ago....the email still seems to be active as it never bounces back to me but the owners never answer either..............
Do we have email from you that went unanswered? If so, forward it to me, ATTN: Andy at our help desk.
I'm glad to see my site is working smoothly again. But this incident, when added to how rocky the last few months have been, has soured my enthusiasm for SmugMug. Hopefully, not permanently, but we'll have to wait and see what the rest of the year brings.
Andy, et all. Since August 1, 2.5% downtime (not counting scheduled maintenance). That is extremely unacceptable. What is SmugMug doing to make sure this trend doesn't continue?
I just did some quick calculations and, in the 1640 hours since August 1, SmugMug has had approximately 97.5% uptime. And that does NOT include scheduled maintenance. On the web, downtime of .5% is HUGE. 2.5% downtime is totally unacceptable and WELL outside the norms.
Frankly, it worries me because it's starting to feel like a trend instead of a bunch of random events.
Mark
Yeah, I feel your pain. Anecdotally, I've noticed the internet as a whole has been a bit off lately.
Unfortunately this is a flaw in the internet business model. It's really a chain of responsibility, and when it breaks, people downstream have to deal with the pain. Your customers use your site, and you use smugmug, which intern uses a hosting company, that pays an internet company, power company, etc. The chances of failure multiply with each link, and we see the result.
I work for a small internet application company, and even with our small customer base, we have all kinds of issues. I love waking up at 2am sunday morning to my phone going off because one of our load balancers just went kaput! Some of the issues our customers have to deal with are our own, like using a really poor performing SQL query. Other times the issue is out of our hand. Perhaps our hosting company is doing testing, and they screwed something up. Or, they accidently gave us workstation class hardware instead of server class hardware during the last upgrade. Of course these are all excuses, none of which matter to the end user, but I do understand that not everything that happens is under smugmug's control
Part of it is the growing pains and lessons that all companies have to go through. There are many a pro photographer that at one point or another learned the very painful lesson of why you should have a backup camera (or three) for an event. And like so, the people at smugmug might had to have learned a hard lesson as well.
When rumors of Michael Jackson death started to circulate, Google started to have glitches. If even the super smart guys at the allmighty G can screw up, it's not hard to see that at a much smaller company, with a much smaller pool of resources might have issues as well.
I will say this, I've pushed hard to make our incidence response team as good as Smugmug. Having delt with many service providers, Smugmug is without a doubt one of the best in terms of response as well as communication. We all could stand to do a better job, and always strive to improve, but from what I've seen, the people at smugmug are doing one hell of a job.
I like to make pretty pictures. Maybe one day I'll be good at it.
Canon 5D Mark II, Canon 40D
16-35L II, 50F1.4, 50 Macro, 24-105L, 100 Macro
Canon 580EXII, Sigma 500DG ST
Blackrapid RS4 photos.aballs.com
0
BaldyRegistered Users, Super ModeratorsPosts: 2,853moderator
I'm glad to see my site is working smoothly again. But this incident, when added to how rocky the last few months have been, has soured my enthusiasm for SmugMug. Hopefully, not permanently, but we'll have to wait and see what the rest of the year brings.
Andy, et all. Since August 1, 2.5% downtime (not counting scheduled maintenance). That is extremely unacceptable. What is SmugMug doing to make sure this trend doesn't continue?
Mark
Hi Mark,
Yes it is unacceptable and I have to admit that if I were in your shoes I might not be able to be as civil as you have been about it.
There are several issues, the biggest of which is I have continuously looked upon ourselves as a relatively small photo sharing site and have continually underestimate where our traffic is going and how fast. Here's a snapshot of the last three months, and the inflection point is when we added Nice Names so people's sites are found more frequently in Google:
In my wildest dreams I didn't expect us to rocket ahead of sites like zappos in traffic.
In any case we're building quite the infrastructure now and changing the underpinnings in very dramatic ways to scale, so it's causing some pain. We have a very big language/software change that all engineers at SmugMug have been working on for awhile, that is very soon to go out. It's a tough change to make but we think it will make a big difference.
In any case we're building quite the infrastructure now and changing the underpinnings in very dramatic ways to scale...
Show me, don't tell me. And I mean this in the most down-to-earth way, that I think you can appreciate. Post in the status update blog before, during and after scheduled maintainance. Why not some photos of new hardware, specs, reviews. Convince me it's awesome new stuff, worth the downtime. Be a little more open about what's to come in terms of software. You hinted a bit in your post but quite frankly I need more of that, if I'm to "stay tuned". Hook me...
Maybe even more importantly, write post mortems on unexpected failures, all of them. What went wrong, why it did, and why it won't in the future.
Something has to happen to reinstill confidence in your company, atleast from my point of view, and it seems a few others'.
I admire you for your honesty and humilty here Baldy, I really do.
But...
Show me, don't tell me. And I mean this in the most down-to-earth way, that I think you can appreciate. Post in the status update blog before, during and after scheduled maintainance. Why not some photos of new hardware, specs, reviews. Convince me it's awesome new stuff, worth the downtime. Be a little more open about what's to come in terms of software. You hinted a bit in your post but quite frankly I need more of that, if I'm to "stay tuned". Hook me...
Maybe even more importantly, write post mortems on unexpected failures, all of them. What went wrong, why it did, and why it won't in the future.
Something has to happen to reinstill confidence in your company, atleast from my point of view, and it seems a few others'.
...Yes it is unacceptable and I have to admit that if I were in your shoes I might not be able to be as civil as you have been about it.
There are several issues, the biggest of which is I have continuously looked upon ourselves as a relatively small photo sharing site and have continually underestimate where our traffic is going and how fast. Here's a snapshot of the last three months, and the inflection point is when we added Nice Names so people's sites are found more frequently in Google:
<Graph removed by LateSky>
In my wildest dreams I didn't expect us to rocket ahead of sites like zappos in traffic.
In any case we're building quite the infrastructure now and changing the underpinnings in very dramatic ways to scale, so it's causing some pain. We have a very big language/software change that all engineers at SmugMug have been working on for awhile, that is very soon to go out. It's a tough change to make but we think it will make a big difference.
I'm sorry for the pain in the meantime...
Baldy,
Would you care to comment on Post #65 of this thread (copied below) that I made in the midst of the fray yesterday trying to allay concerns by some posters that Smugmug was a large company with ample resources and was simply being stingy with seemingly necessary infrastructure expenditures.
...as far as I know, Smugmug is NOT a big company. Smugmug is a SMALL company trying to provide a BIG service, and hence where the problem lies. Downtime is a function of resources. Too few resourses (hardware + support) for a given load results in sytem failure and downtime. Plain and simple. Smugmug obviously needs to commit more capital to hardware and support but might not be financially capable of doing so at this time.
The feeling that I'm getting lately is that demand for Smugmug is starting to exceed their ablity to provide it satisfactorily to customers (esp. Pro's). This is what some might call "growing pains." Take the pain or jump ship, the choice is yours...
Would you care to comment on Post #65 of this thread (copied below) that I made in the midst of the fray yesterday trying to allay concerns by some posters that Smugmug was a large company with ample resources and was simply being stingy with seemingly necessary infrastructure expenditures.
(snip)
We don't comment much on how large or small we are (we're privately held) ... but I can tell you for a fact that not in a billion years could you call us stingy. We're spending a lot of money on infrastructure, network, storage, speed and performance for our customers. We're really sorry for this outage, but as in other episodes of this nature, we've already begun to make SmugMug stronger with yet more investment, so that we can minimize the chances of this sort of thing in the future.
These are the down or read-only periods since August:
August 9 - Approximately 4.5 hours
August 14 - Approximately 15 minutes
September 1 - Approximately 40 minutes and another emergency maintenance of about 30 minutes
September 4 - Approximately 40 minutes
September 7-8 - Uploading issues through the night
September 8-9 - Uploading issues continued for almost 18 hours
September 19 - Approximately 30 minutes
September 30 - Approximately 20 minutes
October 4 - Approximately 45 minutes
Today - 2+ hours and counting
None of these were scheduled maintenance.
Mark
Any outage under 1 hour can't really be counted. This is normal in the Internet world, and the response time for these outages is first class. However, that being said, the number of outages longer than one hour has definitely increased since I signed on with SM many years ago. I'm chalking it up to growing pains right now, but am keeping my eye out for a competitive service.
I think Smugmug needs to put all of the employees' names into a hat. Every time the site crashes, someone's name is pulled from the hat. That person is fired.
hahaha! Sounds like a reality show in the making...
I hope these things get resolved in a permanent way. The thread on nikoncafe alarms me. It's one thing to see dgrinner's complain as most of us are on SM and check this site often for SM-related stuff, but to see long discussions on SM outages on other sites isn't good. :cry
Funny stuff...
I have to relay a very ironic scenario that happened yesterday, during all of this.
My last post in this thread was about 10 minutes before my DSL went down. I called the phone co internet technical support line and was met by a busy signal for the next half hour. When the busy signal stopped I was then met by a ring that never got picked up and eventually went back to... you guessed it... a busy signal. After a few of hours of house chores, Lightroom catching up, etc. I tried my connection again. Nothing. I finally called the telephone repair number and was greeted by a rep who asked me if I had gone through tech support yet because that was the procedure. Tech support then repair. I explained the situation to the rep and asked her if she could look into it. She nicely said that was a different department (DSL) and they (DSL tech support) would forward my call when they had determined it was not a problem on my end. I asked if there was any other way she could help. She put me on hold and came back a few minutes later and said that the entire area was down (3 area codes!) with no projected time for resolution. I asked how she found that out and she said she asked one of the DSL support guys who was in an office next door to her. What? She told me she IM'd a guy she knows there and he told her he had notification of the outage only minutes before.
Now if you have made it this far into my story do you see where I'm going with this? My DSL came back up at 11pm. That would be an outage of almost 10 hours, give or take. I had to get someone on the phone, who was interested in helping me just get some kind of an answer. That has been almost impossible with my DSL provider. Has anyone here, who was complaining about the SM outage, ever lost their internet gateway? Ever tried to call them and get some help? Where do you go to get a status report. Who, at your provider, would reply to an email (wait, if your connection is out, you have no email!) or forum post like Andy and Baldy did?
I'm sorry to have gone on. I just thought I'd share what I thought was a very funny and ironic story after yesterday mornings SM glitch. I had quite a few posts here after SM went down. Then my DSL went down. And I was just starting to have fun!
The client I mentioned in my other posts that emailed and was understanding wound up calling me and asking why I wasn't replying to her emails. She laughed when I told her the situation and called this morning to say thanks, she likes her pictures and had no problem waiting.
So many of us feel we are entitled to instant gratification.
Thanks SmugMug for what you do. I haven't been here as long as some but it has been over a year and you've been pretty good to me. If something like yesterday happens again, please don't take my DSL with you again, OK?
I have to relay a very ironic scenario that happened yesterday, during all of this.
My last post in this thread was about 10 minutes before my DSL went down. I called the phone co internet technical support line and was met by a busy signal for the next half hour. When the busy signal stopped I was then met by a ring that never got picked up and eventually went back to... you guessed it... a busy signal. After a few of hours of house chores, Lightroom catching up, etc. I tried my connection again. Nothing. I finally called the telephone repair number and was greeted by a rep who asked me if I had gone through tech support yet because that was the procedure. Tech support then repair. I explained the situation to the rep and asked her if she could look into it. She nicely said that was a different department (DSL) and they (DSL tech support) would forward my call when they had determined it was not a problem on my end. I asked if there was any other way she could help. She put me on hold and came back a few minutes later and said that the entire area was down (3 area codes!) with no projected time for resolution. I asked how she found that out and she said she asked one of the DSL support guys who was in an office next door to her. What? She told me she IM'd a guy she knows there and he told her he had notification of the outage only minutes before.
Now if you have made it this far into my story do you see where I'm going with this? My DSL came back up at 11pm. That would be an outage of almost 10 hours, give or take. I had to get someone on the phone, who was interested in helping me just get some kind of an answer. That has been almost impossible with my DSL provider. Has anyone here, who was complaining about the SM outage, ever lost their internet gateway? Ever tried to call them and get some help? Where do you go to get a status report. Who, at your provider, would reply to an email (wait, if your connection is out, you have no email!) or forum post like Andy and Baldy did?
I'm sorry to have gone on. I just thought I'd share what I thought was a very funny and ironic story after yesterday mornings SM glitch. I had quite a few posts here after SM went down. Then my DSL went down. And I was just starting to have fun!
The client I mentioned in my other posts that emailed and was understanding wound up calling me and asking why I wasn't replying to her emails. She laughed when I told her the situation and called this morning to say thanks, she likes her pictures and had no problem waiting.
So many of us feel we are entitled to instant gratification.
Thanks SmugMug for what you do. I haven't been here as long as some but it has been over a year and you've been pretty good to me. If something like yesterday happens again, please don't take my DSL with you again, OK?
I think your great story just illustrates how spoiled we've gotten. And at the same time, because we've been spoiled, we've built our business on SM.
Funny similar story happened to me. The night right before this SM outage, I was about to upload a batch to SM, and I watched all three cable modem lights go out one at a time. I called Knology and asked if they had an outage. They went through all the normal diag stuff, and then told me that there is an outage, and there's no status or eta. I left my house and went to my parents house to upload from there. I left it on all night to wake up to the SM outage. Sometimes when it rains, it pours.
I held my tongue and waited but wanted to give some props to SmugMug for letting this thread continue to go on, not many companies would let people complain on the forum that they sponsor/pay for.
Getting a chance to know virtually some of the people involved, I have an immense amount of respect for what they are doing, but I also think that there is room for improvement. There are a few Twitter feeds/stream; but those were not updated with the status. I think that would be an improvement opportunity.
I understand that the unexpected happens on occasion, I know that at work we lost our core network switch a few months back and did not know that the backup was also failed. However it is hard to know/test backups at times. I know that after that issue, we changed our processes at work to include that backup in test procedures on a monthly basis.
I also was affected by the outage needing to get some of the photos transferred for a press release at work; however I was able to find the images by still having my local copies which I sync using Star*Explorer, so I had to go home and then code a page real quick and place it up on my server. Now I remember why I like using SmugMug for the hosting so much.
It's good to know that you guys (Andy, Baldy, and the staff) get it. You were humble, admitted that there has been too much downtime, that downtime affects the bottom line, and that no matter how much we personally like each other this is a business and people are here to make money, not friends. When folks plunk down $150 (+ commissions) a year they're counting on a certain amount of uptime (not the 100% some accusers claimed).
I'm not a pro, so when Smugmug is down it's just a bummer for me but it doesn't affect my livelihood. For the true pros I certainly understand their frustration.
Kudos to Baldy, Andy, and the staff for your response to these customers. Now please, make the photo service you provide as reliable and great as the customer service.
I'm not a pro, so when Smugmug is down it's just a bummer for me but it doesn't affect my livelihood. For the true pros I certainly understand their frustration.
I could have selected other words from your post- but thanks for this your assessment is spot on. We hate it. It costs us all money, time and energy.
I have been "waffling" for a number of months, my own fault for being lazy, on picking a site to use for sales and marketing. I am also one of the folks who posted in the tread at Nikon Cafe, and as I noted in that thread tonight, Kudos to Andy and staff for their honesty and openness, that is a rare thing these days, unfortunately.
I have sent a message off to Help at SmugMug, as well as to other hosting sites on my "short list", to ascertain what is being put in place to mitigate these these types of issues. One thing to remember, even a 15 minute outage results in a minimum of 30 minutes lost time if you are trying to add images, and if the site is being used to generate sales, especially for things like events, time really is money.
In SmugMugs defense, not that they need me to help them , given the nature of the Internet there is simply no way to guarantee 100%. If you want that, you build your own, and I'll guarantee you that doing your own Admin brings no joy either.
That being said, I think it is incumbent on any supplier of a service, be it a SmugMug or my car mechanic, to be up front with me on what level of service to expect. I fully expect that I will get a a timely and forthright reply, that will enable me to make a resoned decision based on weighing my level of comfort with what the SM folks are putting in place.
I get the sense, and this is a very good thing, that the "management" at SM sees this as a learning experience, and one based on the best of situations, and that is positive growth. And while I am not yet a member, this attitude adds points on the SM side.
Thanks for letting me, a non-member, add a bit here. As well, if it appeared to anyone that my post at Nikon Cafe was meant to in anyway bash or denigrate the folks here, let me assure you that was not the intent at all.
I've been involved in some high reliability enterprise data center services and I can sympathize with both sides of this equation. I've run a service that achieved >99.9% uptime for long periods of time (that's less than 1.5 mins/day of downtime). And, I've run a service that had 6 hours of downtime in the middle of the workday and we had a lot of angry Fortune 1000 enterprise customers wondering what the hell we were doing. Things happen. No matter how well prepared you are, things are going to happen. So given that things are going to happen from time to time, here's how I think about issues like this:
The first thing to do is to see how the company responds to an issue. Regardless of how long the outage is, do they seem to be on top of it? Is it getting an appropriate priority? Do they communicate with their customers? Do they seem committed to fully understanding the issue and implementing a permanent corrective action. If there were glitches in their handling of the problem, are they implementing procedural changes to make that less likely? Do they "get" how serious a protracted outage is?
If you get any specifics about the issue, does it seem like an issue that they should have been much better prepared for than they were? It's often hard to really know what happened (and we didn't get much detail in this case for the Smugmug issue), but in the rough description of the recent pBase outage, it clearly seemed like an issue that they just weren't prepared for (they had a multi-day outage) and weren't necessarily apologizing for not being prepared for it. That issue gave me a sense that they were not committed to serious uptime and aren't changing a whole lot going forward. Their whole site depended upon one single database server (that got nailed by a power glitch) and they didn't have a hot backup. They had to reconstruct a database (a multi-day operation) before the service could come back up again.
Then, going forward does the company seem to prioritize things in favor of improving the uptime. Are changes planned to solidify the things that went wrong? Are other things that could add risk to the site delayed either for more testing or until other stability-improving projects are completed?
When the next issue occurs, does it sound similar? Is it something that probably should have already been fixed based on the previous learnings? Is the response in the next issue at least as good as the first issue, perhaps even improved?
Then, over the long haul is the company starting to design-in less downtime? This includes things like breaking the site into pods so that most outages will only affect a single part of the site, not the whole site. This includes the ability to introduce new software on part of the site rather than the entire site at once so any kinks can be worked out before the whole site gets exposed to the new code. This includes designing a way to upgrade the site without taking the whole site down.
This type of development takes more time to write and implement, but it's ultimately how sites that want the highest uptime do it. It's also how you start trimming the maintenance windows to smaller and smaller time slots and perhaps eventually to no site-wide impact time.
And, finally, how does the stability look when graphed over time (6 mos to 2 yrs). Is it getting better or worse? Right now we're in a rough patch. If you take what they've written about recently, the new nicenames feature caused some instability initially. First, there were some glitches in how the search engines were crawling their site and perhaps a few bugs. Second, they've been surprised by the recent traffic increase and may not have been expecting that. So, for this particular issue, the real question will be whether they get on top of it in the next month and then things get better again. If so, you can chalk it up as an inevitable rough patch that they handled fairly decently. If not, then folks who have uptime as a high priority may need to consider looking elsewhere to find what they need. Right now, I'd say it's a little too soon to know which direction it's headed based on the recent glitches. Performance over the next 6 months will tell that story.
A short followup to me earlier post. Once again SmugMug has done it. I have gotten no less than 4 responses from the Help folks, I am very impressed by both the actions that SmugMug is taking to resolve these issues as well as the treatment that I, not even yet a member, am getting from what could be seen as some rather obnoxious and contentious questions.
Yes it is unacceptable and I have to admit that if I were in your shoes I might not be able to be as civil as you have been about it.
There are several issues, the biggest of which is I have continuously looked upon ourselves as a relatively small photo sharing site and have continually underestimate where our traffic is going and how fast. Here's a snapshot of the last three months, and the inflection point is when we added Nice Names so people's sites are found more frequently in Google:
In my wildest dreams I didn't expect us to rocket ahead of sites like zappos in traffic.
In any case we're building quite the infrastructure now and changing the underpinnings in very dramatic ways to scale, so it's causing some pain. We have a very big language/software change that all engineers at SmugMug have been working on for awhile, that is very soon to go out. It's a tough change to make but we think it will make a big difference.
I'm sorry for the pain in the meantime.
Thanks,
Chris
Chris,
Sorry it took me so long to get back to this thread. I had a busy weekend (including managing to find some time to take pictures! ).
Thank you for your openness and honesty regarding the SmugMug growing pains! It gives me some reassurance that my faith in SmugMug is not ill-placed!
I've been recommending SmugMug to friends and family for about 4-5 years now. I know quite a few of them joined (many of them without using my coupon code :bash: ) so I guess, to some small degree, I'm responsible for the bandwidth overload.
BTW, thank you for also recognizing that my posts were not "whining", but instead, legitimate concerns. I do try to be civil but my messages will sometimes be filled with the same passion I feel toward my hobbies and toward life in general.
Here's hoping the rest of 2009 and all of 2010 are without any further major incidents.
Comments
Personally, I wish Sedona would have become a National Park or Monument or some sort of protected place:
http://joves.smugmug.com/popular/1/296357862_kLSG3#615486_N77CA
Joves - you should consider a custom hostname for your site and all those great shots you've got posted there!
http://www.smugmug.com/help/power-custom-domain
Steve
www.LateSky.com
<><><><><><><><><><><><><><><><><><>
Yeah I think you should too.
You just did.
That's right, you didn't, you generalized.
Malte
Portfolio • Workshops • Facebook • Twitter
I think he was referring to an email that was sent to the "other" company that went down and never reappeared.
Andy, et all. Since August 1, 2.5% downtime (not counting scheduled maintenance). That is extremely unacceptable. What is SmugMug doing to make sure this trend doesn't continue?
Mark
Yeah, I feel your pain. Anecdotally, I've noticed the internet as a whole has been a bit off lately.
Unfortunately this is a flaw in the internet business model. It's really a chain of responsibility, and when it breaks, people downstream have to deal with the pain. Your customers use your site, and you use smugmug, which intern uses a hosting company, that pays an internet company, power company, etc. The chances of failure multiply with each link, and we see the result.
I work for a small internet application company, and even with our small customer base, we have all kinds of issues. I love waking up at 2am sunday morning to my phone going off because one of our load balancers just went kaput! Some of the issues our customers have to deal with are our own, like using a really poor performing SQL query. Other times the issue is out of our hand. Perhaps our hosting company is doing testing, and they screwed something up. Or, they accidently gave us workstation class hardware instead of server class hardware during the last upgrade. Of course these are all excuses, none of which matter to the end user, but I do understand that not everything that happens is under smugmug's control
Part of it is the growing pains and lessons that all companies have to go through. There are many a pro photographer that at one point or another learned the very painful lesson of why you should have a backup camera (or three) for an event. And like so, the people at smugmug might had to have learned a hard lesson as well.
When rumors of Michael Jackson death started to circulate, Google started to have glitches. If even the super smart guys at the allmighty G can screw up, it's not hard to see that at a much smaller company, with a much smaller pool of resources might have issues as well.
I will say this, I've pushed hard to make our incidence response team as good as Smugmug. Having delt with many service providers, Smugmug is without a doubt one of the best in terms of response as well as communication. We all could stand to do a better job, and always strive to improve, but from what I've seen, the people at smugmug are doing one hell of a job.
Canon 5D Mark II, Canon 40D
16-35L II, 50F1.4, 50 Macro, 24-105L, 100 Macro
Canon 580EXII, Sigma 500DG ST
Blackrapid RS4
photos.aballs.com
Yes it is unacceptable and I have to admit that if I were in your shoes I might not be able to be as civil as you have been about it.
There are several issues, the biggest of which is I have continuously looked upon ourselves as a relatively small photo sharing site and have continually underestimate where our traffic is going and how fast. Here's a snapshot of the last three months, and the inflection point is when we added Nice Names so people's sites are found more frequently in Google:
In my wildest dreams I didn't expect us to rocket ahead of sites like zappos in traffic.
In any case we're building quite the infrastructure now and changing the underpinnings in very dramatic ways to scale, so it's causing some pain. We have a very big language/software change that all engineers at SmugMug have been working on for awhile, that is very soon to go out. It's a tough change to make but we think it will make a big difference.
I'm sorry for the pain in the meantime.
Thanks,
Chris
I admire you for your honesty and humilty here Baldy, I really do.
But...
Show me, don't tell me. And I mean this in the most down-to-earth way, that I think you can appreciate. Post in the status update blog before, during and after scheduled maintainance. Why not some photos of new hardware, specs, reviews. Convince me it's awesome new stuff, worth the downtime. Be a little more open about what's to come in terms of software. You hinted a bit in your post but quite frankly I need more of that, if I'm to "stay tuned". Hook me...
Maybe even more importantly, write post mortems on unexpected failures, all of them. What went wrong, why it did, and why it won't in the future.
Something has to happen to reinstill confidence in your company, atleast from my point of view, and it seems a few others'.
Malte
I 100% agree with all of this!
Keith Tharp.com - Champion Photo
~ * ~ Mothers of teens now know why some animals eat their young ~ * ~
Baldy,
Would you care to comment on Post #65 of this thread (copied below) that I made in the midst of the fray yesterday trying to allay concerns by some posters that Smugmug was a large company with ample resources and was simply being stingy with seemingly necessary infrastructure expenditures.
Thanks.
-- sc
Post #65 (http://dgrin.com/showthread.php?p=1230359&highlight=small#post1230359)
...as far as I know, Smugmug is NOT a big company. Smugmug is a SMALL company trying to provide a BIG service, and hence where the problem lies. Downtime is a function of resources. Too few resourses (hardware + support) for a given load results in sytem failure and downtime. Plain and simple. Smugmug obviously needs to commit more capital to hardware and support but might not be financially capable of doing so at this time.
The feeling that I'm getting lately is that demand for Smugmug is starting to exceed their ablity to provide it satisfactorily to customers (esp. Pro's). This is what some might call "growing pains." Take the pain or jump ship, the choice is yours...
Steve
www.LateSky.com
<><><><><><><><><><><><><><><><><><>
Portfolio • Workshops • Facebook • Twitter
Want faster uploading? Vote for FTP!
Want faster uploading? Vote for FTP!
Portfolio • Workshops • Facebook • Twitter
Want faster uploading? Vote for FTP!
http://www.nikoncafe.com/vforums/showthread.php?t=246679
I hope these things get resolved in a permanent way. The thread on nikoncafe alarms me. It's one thing to see dgrinner's complain as most of us are on SM and check this site often for SM-related stuff, but to see long discussions on SM outages on other sites isn't good. :cry
Want faster uploading? Vote for FTP!
From what I can gather, the SM team is more like a family anyways. I don't see you guys having many HR issues.
Want faster uploading? Vote for FTP!
I have to relay a very ironic scenario that happened yesterday, during all of this.
My last post in this thread was about 10 minutes before my DSL went down. I called the phone co internet technical support line and was met by a busy signal for the next half hour. When the busy signal stopped I was then met by a ring that never got picked up and eventually went back to... you guessed it... a busy signal. After a few of hours of house chores, Lightroom catching up, etc. I tried my connection again. Nothing. I finally called the telephone repair number and was greeted by a rep who asked me if I had gone through tech support yet because that was the procedure. Tech support then repair. I explained the situation to the rep and asked her if she could look into it. She nicely said that was a different department (DSL) and they (DSL tech support) would forward my call when they had determined it was not a problem on my end. I asked if there was any other way she could help. She put me on hold and came back a few minutes later and said that the entire area was down (3 area codes!) with no projected time for resolution. I asked how she found that out and she said she asked one of the DSL support guys who was in an office next door to her. What? She told me she IM'd a guy she knows there and he told her he had notification of the outage only minutes before.
Now if you have made it this far into my story do you see where I'm going with this? My DSL came back up at 11pm. That would be an outage of almost 10 hours, give or take. I had to get someone on the phone, who was interested in helping me just get some kind of an answer. That has been almost impossible with my DSL provider. Has anyone here, who was complaining about the SM outage, ever lost their internet gateway? Ever tried to call them and get some help? Where do you go to get a status report. Who, at your provider, would reply to an email (wait, if your connection is out, you have no email!) or forum post like Andy and Baldy did?
I'm sorry to have gone on. I just thought I'd share what I thought was a very funny and ironic story after yesterday mornings SM glitch. I had quite a few posts here after SM went down. Then my DSL went down. And I was just starting to have fun!
The client I mentioned in my other posts that emailed and was understanding wound up calling me and asking why I wasn't replying to her emails. She laughed when I told her the situation and called this morning to say thanks, she likes her pictures and had no problem waiting.
So many of us feel we are entitled to instant gratification.
Thanks SmugMug for what you do. I haven't been here as long as some but it has been over a year and you've been pretty good to me. If something like yesterday happens again, please don't take my DSL with you again, OK?
Funny similar story happened to me. The night right before this SM outage, I was about to upload a batch to SM, and I watched all three cable modem lights go out one at a time. I called Knology and asked if they had an outage. They went through all the normal diag stuff, and then told me that there is an outage, and there's no status or eta. I left my house and went to my parents house to upload from there. I left it on all night to wake up to the SM outage. Sometimes when it rains, it pours.
Want faster uploading? Vote for FTP!
Getting a chance to know virtually some of the people involved, I have an immense amount of respect for what they are doing, but I also think that there is room for improvement. There are a few Twitter feeds/stream; but those were not updated with the status. I think that would be an improvement opportunity.
I understand that the unexpected happens on occasion, I know that at work we lost our core network switch a few months back and did not know that the backup was also failed. However it is hard to know/test backups at times. I know that after that issue, we changed our processes at work to include that backup in test procedures on a monthly basis.
I also was affected by the outage needing to get some of the photos transferred for a press release at work; however I was able to find the images by still having my local copies which I sync using Star*Explorer, so I had to go home and then code a page real quick and place it up on my server. Now I remember why I like using SmugMug for the hosting so much.
Pictures | Website | Blog | Twitter | Contact
It's good to know that you guys (Andy, Baldy, and the staff) get it. You were humble, admitted that there has been too much downtime, that downtime affects the bottom line, and that no matter how much we personally like each other this is a business and people are here to make money, not friends. When folks plunk down $150 (+ commissions) a year they're counting on a certain amount of uptime (not the 100% some accusers claimed).
I'm not a pro, so when Smugmug is down it's just a bummer for me but it doesn't affect my livelihood. For the true pros I certainly understand their frustration.
Kudos to Baldy, Andy, and the staff for your response to these customers. Now please, make the photo service you provide as reliable and great as the customer service.
So we'll improve it, thanks!
Portfolio • Workshops • Facebook • Twitter
I have sent a message off to Help at SmugMug, as well as to other hosting sites on my "short list", to ascertain what is being put in place to mitigate these these types of issues. One thing to remember, even a 15 minute outage results in a minimum of 30 minutes lost time if you are trying to add images, and if the site is being used to generate sales, especially for things like events, time really is money.
In SmugMugs defense, not that they need me to help them , given the nature of the Internet there is simply no way to guarantee 100%. If you want that, you build your own, and I'll guarantee you that doing your own Admin brings no joy either.
That being said, I think it is incumbent on any supplier of a service, be it a SmugMug or my car mechanic, to be up front with me on what level of service to expect. I fully expect that I will get a a timely and forthright reply, that will enable me to make a resoned decision based on weighing my level of comfort with what the SM folks are putting in place.
I get the sense, and this is a very good thing, that the "management" at SM sees this as a learning experience, and one based on the best of situations, and that is positive growth. And while I am not yet a member, this attitude adds points on the SM side.
Thanks for letting me, a non-member, add a bit here. As well, if it appeared to anyone that my post at Nikon Cafe was meant to in anyway bash or denigrate the folks here, let me assure you that was not the intent at all.
The first thing to do is to see how the company responds to an issue. Regardless of how long the outage is, do they seem to be on top of it? Is it getting an appropriate priority? Do they communicate with their customers? Do they seem committed to fully understanding the issue and implementing a permanent corrective action. If there were glitches in their handling of the problem, are they implementing procedural changes to make that less likely? Do they "get" how serious a protracted outage is?
If you get any specifics about the issue, does it seem like an issue that they should have been much better prepared for than they were? It's often hard to really know what happened (and we didn't get much detail in this case for the Smugmug issue), but in the rough description of the recent pBase outage, it clearly seemed like an issue that they just weren't prepared for (they had a multi-day outage) and weren't necessarily apologizing for not being prepared for it. That issue gave me a sense that they were not committed to serious uptime and aren't changing a whole lot going forward. Their whole site depended upon one single database server (that got nailed by a power glitch) and they didn't have a hot backup. They had to reconstruct a database (a multi-day operation) before the service could come back up again.
Then, going forward does the company seem to prioritize things in favor of improving the uptime. Are changes planned to solidify the things that went wrong? Are other things that could add risk to the site delayed either for more testing or until other stability-improving projects are completed?
When the next issue occurs, does it sound similar? Is it something that probably should have already been fixed based on the previous learnings? Is the response in the next issue at least as good as the first issue, perhaps even improved?
Then, over the long haul is the company starting to design-in less downtime? This includes things like breaking the site into pods so that most outages will only affect a single part of the site, not the whole site. This includes the ability to introduce new software on part of the site rather than the entire site at once so any kinks can be worked out before the whole site gets exposed to the new code. This includes designing a way to upgrade the site without taking the whole site down.
This type of development takes more time to write and implement, but it's ultimately how sites that want the highest uptime do it. It's also how you start trimming the maintenance windows to smaller and smaller time slots and perhaps eventually to no site-wide impact time.
And, finally, how does the stability look when graphed over time (6 mos to 2 yrs). Is it getting better or worse? Right now we're in a rough patch. If you take what they've written about recently, the new nicenames feature caused some instability initially. First, there were some glitches in how the search engines were crawling their site and perhaps a few bugs. Second, they've been surprised by the recent traffic increase and may not have been expecting that. So, for this particular issue, the real question will be whether they get on top of it in the next month and then things get better again. If so, you can chalk it up as an inevitable rough patch that they handled fairly decently. If not, then folks who have uptime as a high priority may need to consider looking elsewhere to find what they need. Right now, I'd say it's a little too soon to know which direction it's headed based on the recent glitches. Performance over the next 6 months will tell that story.
Homepage • Popular
JFriend's javascript customizations • Secrets for getting fast answers on Dgrin
Always include a link to your site when posting a question
I was refering a company I was with prior to SM.......believe me if I do not get an answer from SM help desk YOU'LL KNOW................
My thanks and Kudo's to the crew.
Chris,
Sorry it took me so long to get back to this thread. I had a busy weekend (including managing to find some time to take pictures! ).
Thank you for your openness and honesty regarding the SmugMug growing pains! It gives me some reassurance that my faith in SmugMug is not ill-placed!
I've been recommending SmugMug to friends and family for about 4-5 years now. I know quite a few of them joined (many of them without using my coupon code :bash: ) so I guess, to some small degree, I'm responsible for the bandwidth overload.
BTW, thank you for also recognizing that my posts were not "whining", but instead, legitimate concerns. I do try to be civil but my messages will sometimes be filled with the same passion I feel toward my hobbies and toward life in general.
Here's hoping the rest of 2009 and all of 2010 are without any further major incidents.
Mark