Robots.txt Unreachable - Sitemaps
mshetzer
Registered Users Posts: 22 Big grins
In trying to get google webmaster tools up and running, and my site map complete, I recieve this error:
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt file at the root of your site but were unable to download it. Please ensure that it is accessible or remove it completely.
What else do I need to do?
http://shetzers.com
Thanks,
Matt
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt file at the root of your site but were unable to download it. Please ensure that it is accessible or remove it completely.
What else do I need to do?
http://shetzers.com
Thanks,
Matt
0
This discussion has been closed.
Comments
Now I have two "x"'s in my site map? I'm losing ground.
Any help would be appreciated !
Matt
Ashbrook Photography | Facebook | Twitter
Thanks Robert for the first response to this. I've been reading old postings from October that had this issue that it didn't seem resolved. I'm unable to get any SEO going, therefore I am not really getting anything out of my Smugmug account.
Does anyone have any idea how long this takes to get working properly?
Matt
Does anyone know how long it takes to resolve the robots.txt be found correctly by google?
If I go to my sitemap, the data is all way out of date. Talking like - the beginning of October-ish. It's December.
Trying to have faith in what support told me - to just wait, but, I would hope this gets better one day soon.
Ashbrook Photography | Facebook | Twitter
facebook - my facebook page please LIKE me!
Here's some of the info I have available in my gwt:
Crawl errors
In Sitemaps 2 Not followed 3 Timed out 7 Unreachable 1,727
Sitemap Status URLs in web index /sitemap-index.xml
0
robots.txt file Downloaded Status http://www.ashbrook-photography.com/robots.txt 7 minutes ago 503 (Service unavailable)
URL Googlebot type Status Date submitted http://www.ashbrook-photography.com/robots.txt
Web Missing robots.txt
12/10/10, 06:53 AM
http://www.ashbrook-photography.com/
Web Missing robots.txt
12/7/10, 07:37 PM
http://www.ashbrook-photography.com/
Web Missing robots.txt
12/2/10, 06:52 AM
http://www.ashbrook-photography.com/
Web Missing robots.txt
12/1/10, 07:47 PM
http://www.ashbrook-photography.com/
Web Missing robots.txt
11/30/10, 07:13 PM
http://www.ashbrook-photography.com/
Web Missing robots.txt
11/30/10, 07:05 PM
http://www.ashbrook-photography.com/
Web Missing robots.txt
11/30/10, 07:04 PM
http://www.ashbrook-photography.com/
Web Success
11/28/10, 05:44 PM
Ashbrook Photography | Facebook | Twitter
Same issues here: Sitemap status with a red cross and robots.txt unreachable. Sitemap-base.xml has not been updated since I first opened my account November 18th. So, it's like 3 weeks now that Google is trying to index my site with no success (I've obviously submitted my site to google and went through all the relevant posts on Dgrin from October 2010).
I feel that these issues should have been resolved much faster. With a premium of $150 for a Pro account I would expect that these glitches should at least be acknowledged and would like to hear some course of action from Smugmug's side.
I'm attaching some screenshots of the respective Webmaster tool pages.
No SEO, no fun!
- Christos
... the second screenshot (sorry, I didn't know how to attach two screenshots in the same post)
facebook - my facebook page please LIKE me!
This is how the Smugmug Status page should look like:
Panoramio: Spectacular photos on Google Earth
Google: Allan Hansen - fotograf og IT-nørd
Blogs: Photo and SEO news by LichtenHansen, Allan Hansen foto nyheder, Antarctica Travel
Homepage: Fotograf Allan Hansen, Unusual wildlife photos
Smug support wizards - what's going on here???
Or is this signalling that Google can find my galleries but not my images?
--- Denise
Musings & ramblings at https://denisegoldberg.blogspot.com
I submitted my sitemap per directions in a thread on October 27 and have just been monitoring the situation since we've been told it takes some time, but like others have mentioned the crawlstats for my site show ZERO for the month of December so far so I figured I'd better speak up.
I see that Google shows it downloaded a galleries sitemap on its own prior to the date I submitted one, and that is apparently the only one that is working correctly.
My Photos
My Blog
On Google+
On DrivingLine
Allan,
At the http://status.smugmug.com/ url I'm getting exactly the same table you have, where everything's green, however, I don't get the Sitemap row at all. Any idea why is that?
Thank you,
- Christos
christosandronis.smugmug.com
facebook - my facebook page please LIKE me!
Please submit support tickets to make sure there is awareness. If they keep hearing from me they'll start to think 'its just that nutty guy out east again. - everyone else is fine'
I looked just now and my unreachable errors are in the 2000's. My sitemap still hasn't been updated since 10/04, still the sitemap red x on google, and still the 503 error for robots.txt. I had only 9 hits on my site today - all time low so far.
Ashbrook Photography | Facebook | Twitter
As others have noticed Google webmaster tools reports as issue with robots.txt and it seems that their default position may then not be to index as they can't tell what the intention of the file was (I can't be sure of thise as I couldn't find an actual Google doc on this - only other reports). This doesn't appear to be random as using the 'Fetch as googlebot' tool seems to always fail due to an error in robots.txt
However robots.txt does appear to really exist. Other sites on the web will show it for me and Bing appears to correctly read it as I can find a cached copy of our site from yesterday (10th Dec) on Bing
Google however doesn't have any cached copy of our site since the end of November
This doesn't appear to be a webmaster tools issue as the other parts of our site which aren't SmugMug don't have this problem.
I have no idea why this is, I'm just hoping to add to the picture a little . The error reported is a 503 which seems to suggest that smugmug has a problem when sending the robots.txt - but only to google
facebook - my facebook page please LIKE me!
I have been looking at this problem for several days and have sorted out what is going on with this problem regarding the robots.txt and sitemap files. Google should refetch your robots.txt sometime in the next 24 hours and everything should free up.
If you want to test this for yourselves, you can go into your Google Webmaster tools then navigate to "Labs > Fetch as Googlebot". For the URL type "robots.txt" and leave the selector on "Web". When you submit it, the status will be "pending" and within a few seconds (you may need to refresh the page), the status should be "Success!" - you can then click it and see the contents of that file. I will monitor this thread for the next couple of days - if you want to try it and post your results here.
I am still working on the issue with individual URLs within the sitemap not being escaped properly, my expectation is that this fix will go live around Thursday this week (12/16).
Thank you to everyone for letting me know about this and your patience. A special thank you to the Smugmug user who graciously gave me access to your Google account, your kindness saved me a weekend's worth of work (on my birthday even) tracking this down.
- Greg
Thanks for looking at it Greg,
- Christos
christosandronis.smugmug.com
facebook - my facebook page please LIKE me!
This is more in the domain of Google's algorithms. My experience is that it usually takes a few days for the sitemaps to get fetched and processed, and the bots start coming to crawl the new or updated URLs. There's a bit of "black box" magic that happens at Google behind the scenes on this and a lot of different factors go into how frequently they will crawl and/or index new content. I'm a little worried about saying "X is going to happen by Y date" because what Google does with the sitemaps is obviously out of our control.
Bottom line is: Pending status is good. That means Google was able to pull the sitemaps and is sorting out what to do next. When they come out of Pending status if you see some yellow triangles with "!" marks on them for some URLs, do not be too alarmed - that is the URL escaping issue I mentioned.
- Greg
they are in "!" status now, thanks again for your help so far.
facebook - my facebook page please LIKE me!
Just FYI, the "!" status (if you click it) probably just means that there are some URLs in the sitemap that were not correctly formatted - so Google could not fetch them. If you can verify that is what you are seeing I'd really appreciate it
- Greg
the index is stil pending, the base has a tick and galleries/images has "!" Going into galleries/images the error message is -
"
URLs unreachable
When we tested a sample of the URLs from your Sitemap, we found that some of the URLs were unreachable. Please check your webserver for possible misconfiguration, as these errors may be caused by a server error (such as a 5xx error) or a network error between Googlebot and your server. All reachable URLs will still be submitted."
facebook - my facebook page please LIKE me!
Now - when will my sitemap be updated to reflect my site as it is today? It is dated as being updated (with now incorrect information) the last time on 10/04/10.
It's December 13th.
Will my sitemap update at some point to have the correct pages indexed? If so, how long?
Ashbrook Photography | Facebook | Twitter
Hi Robert,
Once google pulls your sitemap file it goes into kind of a "black box" from our perspective. And for some reason there seems to be kind of a habitual problem around the data in webmaster tools not reflecting reality. For example, one of the my sites has roughly 8,000 pages indexed in Google - but webmaster tools says there are only 136 and that it last fetched the sitemap files months ago (even though I can clearly see in my logs it fetches them several times a day).
That being said, I'd give it a few days - if not a week. What I think you want to be seeing here is that everything is working (no red X's, no "!" warnings) and that will tell you that Google was able to pull the data into their systems where they are being processed.
I'm not sure how many pages webmaster tools is saying have been indexed - but according to Google's web search interface you have about 1,200: http://www.google.com/search?client=...UTF-8&oe=UTF-8
Hope this helps a little - at least in helping to explain where the boundary line is between Smugmug's systems and Google's.
- Greg
Thanks for your message and all of the info. Maybe the forum is best - since it helps others.
What I meant to say, is that when you go to: http://www.ashbrook-photography.com/sitemap-index.xml and look at the file itself - you can see that the last update to that file itself (by smugmug) was 10/04 - and that none of my current pages are indexed in my sitemap itself.
So I guess thats more the question - is it reasonable that on the smugmug side - my sitemap.xml file hasnt been updated in more than two months? If so, how long does it take to update that file with current info?
Thanks again - Robert
Ashbrook Photography | Facebook | Twitter
You are not wrong in your assumptions, but Google does take some time to crawl and process a site. I do see there are 1870 pages in Google's main index (http://www.google.com/search?client=safari&rls=en&q=site:ShawnKrausPhoto.com&ie=UTF-8&oe=UTF-8) - how many images do you have or would you expect to show up in that query? This exemplifies the issue I was alluding to earlier where the webmaster tools statistics are chronically out-of-date.
Once the "!" error goes away, we should give some time for Google to sort itself out and see how the issue evolves. This type of thing is less like fixing a car, and more like baking a cake
- Greg