Google is complaining Smugmug is not reachable

FergusonFerguson Registered Users Posts: 1,345 Major grins
edited June 1, 2013 in SmugMug Support
I got a warning mail from Google that my site's robot.txt file had dropped below 67% reachable. On looking further it appears to have escalated to 100% UN-reachable.

I only vaguely know what this means, but it sure sounds like something systematic at Smugmug? is it?

I don't have a clue how to approach fixing it. And when I look at the webmaster tools on Google, it also shows server errors starting back at the end of March, with the robot.txt problem starting in May.

Is there some action I should be taking?

Is this something configurable, or is this all your infrastructure?
:scratch

Comments

  • FergusonFerguson Registered Users Posts: 1,345 Major grins
    edited May 23, 2013
    Sorry, in case not obvious from the signature, the site is captivephotons.com which translates to LinwoodFerguson.Smugmug.com
  • mrneutronmrneutron Registered Users Posts: 214 Major grins
    edited May 23, 2013
    Recently Google's web crawling robot has been sending SmugMug customers over 3x the normal levels of traffic. SmugMug restricts the level of bot traffic to prioritize actual usage, and when bots go over a certain level of traffic we respond with an HTTP code "503" which advises Google's robot to come back later. This tracks Google's recommendations for responses to overaggressive bot traffic. The downside of this is that Google's webmaster tools needlessly alerts users that their site is responding with "come back later". This would be a concern if it were happening with human traffic, but with bot traffic it's not a problem.
    Andy K
    SmugMug Support Hero
    help.smugmug.com
  • FergusonFerguson Registered Users Posts: 1,345 Major grins
    edited May 23, 2013
    mrneutron wrote: »
    Recently Google's web crawling robot has been sending SmugMug customers over 3x the normal levels of traffic. SmugMug restricts the level of bot traffic to prioritize actual usage, and when bots go over a certain level of traffic we respond with an HTTP code "503" which advises Google's robot to come back later. This tracks Google's recommendations for responses to overaggressive bot traffic. The downside of this is that Google's webmaster tools needlessly alerts users that their site is responding with "come back later". This would be a concern if it were happening with human traffic, but with bot traffic it's not a problem.

    Well, from Google's perspective (i.e. the graphs I showed) they are not just having to come back later, it has ramped up to a 100% failure rate.

    Should your subscribers be concerned about your capacity, given it seems to be a choice between Google Crawls, and subscriber access? That's an ugly choice, given how long and loud complaints have been about google search results on Smugmug sites. Yes, I know you can show me lots of high ranked sites, just saying that subscribers complaining about that, while simultaneously Google is complaining about inability to access Smugmug, is a bit ugly.
  • FergusonFerguson Registered Users Posts: 1,345 Major grins
    edited May 25, 2013
    It's continuing, now going on 6 days with the last 3 over 75% failure.
  • mbonocorembonocore Registered Users Posts: 2,299 Major grins
    edited May 27, 2013
    Ferguson,

    Could you report your last 3 days of Google Webmaster Robot.txt errors? The more screenshots the better.

    Thank you!

    Michael
  • FergusonFerguson Registered Users Posts: 1,345 Major grins
    edited May 27, 2013
    mbonocore wrote: »
    Ferguson,

    Could you report your last 3 days of Google Webmaster Robot.txt errors? The more screenshots the better.

    Thank you!

    Michael

    Thanks for staying after it.

    Not much better, and the server errors are getting worse (not actually sure what they mean).
  • FergusonFerguson Registered Users Posts: 1,345 Major grins
    edited May 27, 2013
    Does FetchAsGoogle help?
    I don't use this normally so not sure how to interpret it, but not one attempt at accessing my site from Google worked.
  • FergusonFerguson Registered Users Posts: 1,345 Major grins
    edited May 27, 2013
    Fetch as Google
    Here's another site same approximate time and system, so there's something unusual about Smugmug. I also checked the robots crawl section and it shows no errors there at all.
  • mbonocorembonocore Registered Users Posts: 2,299 Major grins
    edited May 28, 2013
    Ferguson,

    My OPS team looked into this and informed me that Everything looks good from our end.
    We've logged about 2800 bot page hits on www.captivephotons.com and
    1600 from googlebot in the past week. Another thing to try is to see if webmaster tools reports better
    results if you register as "www.captivephotons.com" instead of
    "captivephotons.com". Google might not be happy with an extra
    redirect.

    Can you try to register the captivephotons.com instead and let me know if this helps?

    Thanks!

    Michael
  • shandrewshandrew Administrators, Vanilla Admin Posts: 33 SmugMug Employee
    edited May 29, 2013
    Bonocore meant to write "Can you try to register www.captivephotons.com instead" on Google's webmaster tools.
    I work at SmugMug but these opinions are usually my own.
  • FergusonFerguson Registered Users Posts: 1,345 Major grins
    edited May 29, 2013
    shandrew wrote: »
    Bonocore meant to write "Can you try to register www.captivephotons.com instead" on Google's webmaster tools.

    I got that part, need to figure out how -- not sure if this is the analytics piece itself, or a registration I may have done and forgotten for search. Is this just the analytics piece and the code I insert for that?
  • shandrewshandrew Administrators, Vanilla Admin Posts: 33 SmugMug Employee
    edited May 30, 2013
    Go to Google webmaster tools https://www.google.com/webmasters/tools , select "ADD A SITE", enter www.captivephotons.com, and verify it using the tag method or analytics method (whichever you used before should work fine. for tag verfication, you would need to replace the tag in Account Settings->Advanced customization->Head Tag).
    I work at SmugMug but these opinions are usually my own.
  • FergusonFerguson Registered Users Posts: 1,345 Major grins
    edited May 31, 2013
    shandrew wrote: »
    Go to Google webmaster tools https://www.google.com/webmasters/tools , select "ADD A SITE", enter www.captivephotons.com, and verify it using the tag method or analytics method (whichever you used before should work fine. for tag verfication, you would need to replace the tag in Account Settings->Advanced customization->Head Tag).

    Thanks I did figure it out, but I remain a bit confused about what I see.

    First, so far the fetch-as-google will get robots.txt, and it showed no crawl errors but it only has one day's data point so far. It does show two soft (404) errors, which is strange, but I think too early.

    What I don't understand is that it shows "Total Indexed" pages at zero (well, actually there are no data points over the last year, but there are data points for "blocked by robots" in that period in the account with "www" attached). In the account without "www" attached it shows about 12,000 to 18,000 indexed (strangely decreasing though). So all the indexing has been showing up under the non-"WWW" account, even though all the links are to the "WWW" account inside of Smugmug (i.e. if you load a page and follow from one to the next).

    I don't quite know what to make of that -- it seems as though Google in some form or fashion removes the WWW when it does its indexing? Even though Smugmug requires it?
  • FergusonFerguson Registered Users Posts: 1,345 Major grins
    edited June 1, 2013
    I'm still trying to figure this out.

    I added a separate webmaster account for www.captivephotons.com as well as without the www.

    The former is appropriately loading robots.txt, but it is not showing any indexing statistics.

    Crawl errors are 3 (soft 404), and Robots.txt is fine on the www site. It continues not to be loaded consistently on the one without www. This is after 3 days of data.

    But indexing status on the www site is zero across the board. The indexing status of the one without www continues to have reasonable data (circa 18,000), interestingly enough dropping by about 4000 over the time the robots.txt has been failing.

    Since Google allows you to specify that the domain (without www) is displayed with the www, I'm not at all sure I understand the difference in having them in both ways.

    But it concerns me that one is trending down continuing to show crawl errors, and the other is showing no indexing.
Sign In or Register to comment.