robots.txt 503 Service unavailable error
rashbrook
Registered Users Posts: 92 Big grins
So today, google began to finally populate some data in webmaster tools with regard to my site, but it was mostly errors.
When I looked at the site configuration section I saw that it says :
robots.txt file Downloaded Status http://www.ashbrook-photography.com/robots.txt 57 minutes ago 503 (Service unavailable)
and although the file is accessible at the link above, there is nothing in the text box which normally would display the robots.txt data.
The only change I made to the site in the last 24hrs was to change my A record to the proper IP address. I had for someo reason set it to a different smugmug address way back when I did the initial setup. (I do have the cname setup but I had set up the A record for the ability to operate w/o the www)
but other than that, nothing has changed. Since robots txt can inhibit googles ability to crawl my site, this concerns me. What can I do?
Thanks,
-Robert
When I looked at the site configuration section I saw that it says :
robots.txt file Downloaded Status http://www.ashbrook-photography.com/robots.txt 57 minutes ago 503 (Service unavailable)
and although the file is accessible at the link above, there is nothing in the text box which normally would display the robots.txt data.
The only change I made to the site in the last 24hrs was to change my A record to the proper IP address. I had for someo reason set it to a different smugmug address way back when I did the initial setup. (I do have the cname setup but I had set up the A record for the ability to operate w/o the www)
but other than that, nothing has changed. Since robots txt can inhibit googles ability to crawl my site, this concerns me. What can I do?
Thanks,
-Robert
0
Comments
Portfolio • Workshops • Facebook • Twitter
Portfolio • Workshops • Facebook • Twitter
Ashbrook Photography | Facebook | Twitter
Thanks,
- Christos
That's nothing to worry about. It's likely just a temporary glitch that the robots file wasn't available at the time the Google crawler came by. Google will simply retry at a later time and then the message will disappear. You can safely ignore this one.
Sebastian
SmugMug Support Hero
I sent an email to support but I just got back a message saying they could reach the file. I'll give it more time but, I hope someone will be willing to take it a little further if this continues and I need to ask for support in a few days.
Ashbrook Photography | Facebook | Twitter
I'm getting the same issue as what you're mentioning.
Is there a way I can somehow edit the robots.txt file that seems to be automatically generated by Smugmug?
When I try submitting my RSS feed as as sitemap, google is tell me that this is being restricted by robots.txt
you don't want / need to submit an rss feed as a sitemap
Portfolio • Workshops • Facebook • Twitter
The red X's are expected. You should see some red X's that should get replaced with new files that do not have them. The names of the sitemap files were changed, in other words. So the old sitemap files are going away and have been replaced by files with new names.
As long as you have added sitemap-index.xml.gz to Webmaster Tools you will be fine, this will clear itself out.
- Greg
The RSS format is not the best thing to be submitting to webmaster tools. Submit sitemap-index.xml.gz instead and Google will automatically pick up all of the sitemap files that we provide.
- Greg
http://www.amazon.com/robots.txt
Disallow: /rss/people/*/reviews
Disallow: /gp/pdp/rss/*/reviews
... And they were the first one I picked...
- Greg
http://facebook.com/robots.txt
Disallow: /feeds/
http://wall-art.smugmug.com/
Click on http://www.facebook.com/feeds/ - 404!
Please:
1. Show sites, related to this feeds.
2. At the same time look at these lines in robots.txt:
Sitemap: http://www.amazon.com/sitemap-manual-index.xml
Sitemap: http://www.amazon.com/sitemap_dp_index.xml
Sitemap: http://www.amazon.com/sitemap_vendor_videos_us.xml
Sitemap: http://www.amazon.com/sitemap_vod_index.xml
This is just normal sitemaps.
That just tells me they don't have an index handler on that url.... what does that have to do with them disallowing robots to crawl their feeds?
http://wall-art.smugmug.com/
Facebook feeds sit underneath the /feeds URL. They block them all. Find a facebook feed URL (I do not know one of the top of my head) and it will be underneath that URL. Maybe something like http://www.facebook.com/feeds/somefeed.rss (not a valid one I'm sure) and any bot that crawls it will be violating the rules of robots.txt.
- Greg
You are making my point. These are sitemap indexes. We do not compress these either (though we could).
Lets look at http://www.amazon.com/sitemap_dp_index.xml for a minute:
Satisfied?
- Greg