Options

Robots.txt Unreachable - Sitemaps

124

Comments

  • Options
    psenior1psenior1 Registered Users Posts: 125 Major grins
    edited February 1, 2011
    Sitemaps Base Warning
    After the recent issues with sitemaps, just spotted that the BASE sitemaps file is showing the following warning ('!' in a triangle) -

    "URLs roboted out"

    "When we tested a sample of the URLs from your Sitemap, we found that the site's robots.txt file was blocking access to some of the URLs. If you don't intend to block some of the URLs contained in the Sitemap, please use our robots.txt analysis tool to verify that the URLs you submitted in your Sitemap are accessible by Googlebot. All accessible URLs will still be submitted."

    Also crawl errors are showing -

    In Sitemaps 1
    Not followed 5
    Restricted by robots.txt 570
    Soft 404s 1
    Timed out 2
    Unreachable 77


    Is anyone else seeing this, I've only just climbed back onto page 1 of the google search for my local area, I'm hoping this isn't the start of more issues.
    website - http://www.snrmac.com
    facebook - my facebook page please LIKE me!
  • Options
    TwoofyTwoofy Registered Users Posts: 171 Major grins
    edited February 1, 2011
    psenior1 wrote: »
    After the recent issues with sitemaps, just spotted that the BASE sitemaps file is showing the following warning ('!' in a triangle) -

    "URLs roboted out"

    "When we tested a sample of the URLs from your Sitemap, we found that the site's robots.txt file was blocking access to some of the URLs. If you don't intend to block some of the URLs contained in the Sitemap, please use our robots.txt analysis tool to verify that the URLs you submitted in your Sitemap are accessible by Googlebot. All accessible URLs will still be submitted."

    Also crawl errors are showing -

    In Sitemaps 1
    Not followed 5
    Restricted by robots.txt 570
    Soft 404s 1
    Timed out 2
    Unreachable 77


    Is anyone else seeing this, I've only just climbed back onto page 1 of the google search for my local area, I'm hoping this isn't the start of more issues.


    Hello Psenior1,

    Bare with us for a bit here, we're making some adjustments to the sitemaps to deliver more relevant traffic. We are constantly monitoring traffic coming to SmugMug and improving our search engine optimizations and this is expected.

    - Greg
  • Options
    psenior1psenior1 Registered Users Posts: 125 Major grins
    edited February 1, 2011
    Twoofy wrote: »
    Hello Psenior1,

    Bare with us for a bit here, we're making some adjustments to the sitemaps to deliver more relevant traffic. We are constantly monitoring traffic coming to SmugMug and improving our search engine optimizations and this is expected.

    - Greg


    OK thanks Greg.
    website - http://www.snrmac.com
    facebook - my facebook page please LIKE me!
  • Options
    CFPhotographyCFPhotography Registered Users Posts: 83 Big grins
    edited February 1, 2011
    Greg,

    Anyword on how long it will take to get the sitemaps updated? I am still having problems with mine, google still has yet to update any of my information that is in the site map.

    Also, if just going to google and typing site:chrisfowlerphotography.net and go to images, the only images that show up are the "popular photos". Why is google not seeing the rest of my images on my smugmug site? It sees my images on my blog, but not the ones on my site unless they are in the poplular section.

    Sorry Psenior, didnt mean to hijack your thread, just having some of the same issues as you and more.. lol...
  • Options
    AndyAndy Registered Users Posts: 50,016 Major grins
    edited February 1, 2011
    Almost any image I post to the POTN forum or my blog shows up in a matter of days.
    COME ON SMUGMUG !!!
    My images on Pbase use to show up like crazy in google searches and I didn't even have keywords, so to say image results are based on that and what they suggest in the SEO thread here on dgrin is total BULL FECAL MATTER !!!!
    Please, there is no need for such talk. We talk to Google nearly weekly. GIS pulls images that are LINKED first and foremost- that's why you find photos from POTN, Dgrin, and other places in GIS. We continue ot work with them on getting them to index our images in GIS better and more prominently.
  • Options
    SilvexSilvex Registered Users Posts: 9 Beginner grinner
    edited February 2, 2011
    Sitemaps
    The sitemaps are a nice feature, but they don't work. MY sitemaps have been broken fro two months and since Aug, they were updated only ONCE!. Why ofter a feature that is broken. I have emailed several times for the past months and nothing. They do reply create ticket and just tell me. That it takes time! Please fix them!.
  • Options
    AndyAndy Registered Users Posts: 50,016 Major grins
    edited February 2, 2011
    Silvex wrote: »
    The sitemaps are a nice feature, but they don't work. MY sitemaps have been broken fro two months and since Aug, they were updated only ONCE!. Why ofter a feature that is broken. I have emailed several times for the past months and nothing. They do reply create ticket and just tell me. That it takes time! Please fix them!.

    Hi, can we have a link to your site?

    Sitemaps are being updated (no, I don't get any special treatment by google lol3.gif )

    20110202-cerc6p3q5s3ym4xehd8epriu82.jpg
  • Options
    SilvexSilvex Registered Users Posts: 9 Beginner grinner
    edited February 2, 2011
    Andy wrote: »
    Hi, can we have a link to your site?

    Sitemaps are being updated (no, I don't get any special treatment by google lol3.gif )

    20110202-cerc6p3q5s3ym4xehd8epriu82.jpg

    http://silvex.smugmug.com
  • Options
    CFPhotographyCFPhotography Registered Users Posts: 83 Big grins
    edited February 2, 2011
    Andy,

    I don't think that is the issue, my sitemap says it's current and up to date as well. Problem is, it's like google is not using the most recent sitemap. if you click on "My site on the web" in web master tools, my keywords and links to sites the "data" is old. I am talking 3 - 5 months old.

    I updated my blog last night and today one of my images appeared in google images from last night's blog. But I can't get any images from my smugmug site into google, even after months of having them on my site, keyworded and captioned. Keep in mind the image I link on my blog is an image from my smugmug site! So Google sees the keywords, alt tags and title on my blog from my image on smugmug, yet google doesnt see that same image on my actual smugmug site? headscratch.gif

    Also, my galleries and images sitemaps have never worked, they have a red "X". Is there something I need to do on my end to make this work?

    Thanks!
  • Options
    Luc De JaegerLuc De Jaeger Registered Users Posts: 139 Major grins
    edited February 2, 2011
    That does not explain why pbase images are all over in google searches even without keywords!

    What do we do in the meantime???

    I see images from flickr, zen, and many other sites!

    I AM TOTALLY FRUSTRATED!!!

    AND I AM NOT KIDDING ABOUT THIS !!!

    THIS SHOULD BE A PRIORITY

    look below, I even made my signature smaller to appease the DGRIN gods

    I don't really care about sitemaps. As such, sitemaps have nothing to do with SEO. Googlebot spiders websites continuously. I don't need sitemaps to be found by Google as Google finds me instantly when I add new photos or galleries. I'm talking about Google Search here, NOT Google Images Search. Many photo websites like SmugMug, Zenfolio etc. do not do well in Google Image search. But Flickr does and Opera blogs (from the most speedy Opera browser -- which is not supported by SmugMug though) does.

    When I recently uploaded my latest gallery, it did not take an hour and I was found in Russia, Estonia and Moldavia through Google Search for the keywords I used in my gallery and photos. I'm used to itiloveyou.gif. It has always worked this way for me but I'm aware that SEO changes and I constantly monitor my statcounter and Google Analytics graphs to know if something is changing too.

    Anyway, thank you SmugMug for your continuous efforts to make sure our photos get noticed. People who complain about SEO often do not know how difficult SEO is and what it is all about. Also, they should not blame you because SE will never reveal their algorithms (of course not!). SmugMug rocks!thumb.gifthumbthumb.gif

    Luc
  • Options
    CFPhotographyCFPhotography Registered Users Posts: 83 Big grins
    edited February 2, 2011
    Also in webmaster tools it says I have 6,500 links where the robots.txt was unreachable? is this normal? maybe this is why my stuff is not getting updated?


    URL restricted by robots.txt
  • Options
    SilvexSilvex Registered Users Posts: 9 Beginner grinner
    edited February 3, 2011
    Andy wrote: »
    Hi, can we have a link to your site?

    Sitemaps are being updated (no, I don't get any special treatment by google lol3.gif )

    [IMG]file:///C:/DOCUME%7E1/esilva/LOCALS%7E1/Temp/moz-screenshot.png[/IMG]
  • Options
    SilvexSilvex Registered Users Posts: 9 Beginner grinner
    edited February 3, 2011
    So when are we going to have sitemaps that work?
  • Options
    CFPhotographyCFPhotography Registered Users Posts: 83 Big grins
    edited February 3, 2011
    silvex,

    how did you create your sitemap-galleries and sitemap-images? I can't get mine to work, but when I go directly to that link, it doesnt even exist, so I am assuming it has to be created? I thought smumgmu created these automatically, but I am not seeing those two sitemaps....
  • Options
    SilvexSilvex Registered Users Posts: 9 Beginner grinner
    edited February 3, 2011
    silvex,

    how did you create your sitemap-galleries and sitemap-images? I can't get mine to work, but when I go directly to that link, it doesnt even exist, so I am assuming it has to be created? I thought smumgmu created these automatically, but I am not seeing those two sitemaps....

    I don't think nobody can create sitemaps for your smugmug site. They are generated by some smugmug tool (script). The main issue is that those two sitemaps (sitemap-galleries and sitemap-images) do not exist. My sitemaps were created only once when they were launched -- back in Aug. Then when smugmug updated their sitemap scripts/tools. It updated only two my sitemaps and deleting (or corrupting) the other two. I emailed ( and open a couple of tickets) and they said: That whenever I updated ANY data (remove/add photo(s), add/remoke keyword(s), create/rename/delete galleries). The tool will update/create the sitemaps.

    So far it has been an absolute disaster for me. Sitemaps are not new and smugmug needs to hire staff that knows the in and outs of managing sitemaps.

    Sitemaps are critical for google crawling, since it takes the guesswork of crawling your site. Smugmug galleries are not a "real" website, since most of the data is masked to prevent theft of your photos. We also do not have control on how the photos are named or the ALT TAG is managed. That is critical for google to put your photos in the images database.

    I am thinking of creating another website as a front store and use smugmug as my backend for sales and printing. This way I can have complete control of the tagging, keywords and images name.

    Unless smugmug finds a solution -- which is not an easy task. They might need to redesign their site to accomodate all of these. I still do not understand why the don't allow real names in images -- perhaps the did not saw their growth.

    Then again Facebook has a pretty more complex issue, but they have managed to re-design many times their website infrastructure. The database and storage technology is there - I have worked for 25 years building pretty complex and large computer systems for Fortune 500 and it can be done. They could have done it this way.

    descriptions-uid.jpg
    tiger-woods-at-us-open-2011-<with your uniq id >.jpg

    The description can be limited to say 64 characters and the uid to 32.

    -My two cents.
  • Options
    TwoofyTwoofy Registered Users Posts: 171 Major grins
    edited February 3, 2011
    Greg,

    Anyword on how long it will take to get the sitemaps updated? I am still having problems with mine, google still has yet to update any of my information that is in the site map.

    Also, if just going to google and typing site:chrisfowlerphotography.net and go to images, the only images that show up are the "popular photos". Why is google not seeing the rest of my images on my smugmug site? It sees my images on my blog, but not the ones on my site unless they are in the poplular section.

    Sorry Psenior, didnt mean to hijack your thread, just having some of the same issues as you and more.. lol...

    The sitemaps are working - we are just making some changes to help drive traffic. Of course Webmaster Tools is going to be out-of-date for some time before it gets caught-up to the new changes. That being said, I really think we need to separate out issues with sitemaps from actual search engine results. All we can do with sitemaps are to help crawlers more efficiently crawl through your site - but they are not a critical component. A crawler is perfectly capable of crawling through a SmugMug site without the use of any sitemaps.

    Obviously we want to make is as easy for bots to get around as possible, but they are not a cure-all for anything. You still need to pay careful attention to keywords, content, and promote your site the same as if they were not being generated.

    I have looked at the the sitemaps for your site and they are being generated the way we expect them to be.

    - Greg
  • Options
    TwoofyTwoofy Registered Users Posts: 171 Major grins
    edited February 3, 2011
    Silvex wrote: »
    I don't think nobody can create sitemaps for your smugmug site. They are generated by some smugmug tool (script). The main issue is that those two sitemaps (sitemap-galleries and sitemap-images) do not exist. My sitemaps were created only once when they were launched -- back in Aug. Then when smugmug updated their sitemap scripts/tools. It updated only two my sitemaps and deleting (or corrupting) the other two. I emailed ( and open a couple of tickets) and they said: That whenever I updated ANY data (remove/add photo(s), add/remoke keyword(s), create/rename/delete galleries). The tool will update/create the sitemaps.

    Silvex,

    I would not pay too much attention to the -images/-galleries sitemaps. Those are generated far less frequently (as you realize) but, if you are saying that when you change something on your site that the base sitemap (which is the really important one) is not getting re-generated immediately then we need to know that. You may have found a bug, but with all the testing and work I've done to try and improve this I have not seen it yet.

    Webmaster tools takes an unfortunately long time to update and we are making some changes, but any problems with the main (or "base") sitemap would be unexpected.

    - Greg
  • Options
    CFPhotographyCFPhotography Registered Users Posts: 83 Big grins
    edited February 3, 2011
    Greg,

    how does one go about getting the sitemap for galleries and images created? If this is a script done automatically by smugmug than it is not working for me, as these sitemaps do not exist!
  • Options
    TwoofyTwoofy Registered Users Posts: 171 Major grins
    edited February 3, 2011
    Also in webmaster tools it says I have 6,500 links where the robots.txt was unreachable? is this normal? maybe this is why my stuff is not getting updated?


    URL restricted by robots.txt

    There were a lot of useless links in the sitemaps, causing traffic to go to pages (like feeds) that made no sense. Those have been removed and are probably what you are seeing.

    - Greg
  • Options
    TwoofyTwoofy Registered Users Posts: 171 Major grins
    edited February 3, 2011
    Greg,

    how does one go about getting the sitemap for galleries and images created? If this is a script done automatically by smugmug than it is not working for me, as these sitemaps do not exist!

    Strange, I am able to get them to work:

    http://www.chrisfowlerphotography.net/sitemap-index.xml.gz


    What you should do in Webmaster Tools (after confirming your site, I can't tell if you've done that or not yet) is add the sitemap-index.xml.gz URL to webmaster tools. It will then retrieve that, which is an index to the sitemaps on your site (how many there are depend on how many URLs you have, etc).

    As an example, one of those is: http://www.chrisfowlerphotography.net/sitemap-base.xml.gz

    If you remove the ".gz" from those URLs, you can seem them in an uncompressed form - but leave that on when adding them to Webmaster Tools.

    - Greg
  • Options
    CFPhotographyCFPhotography Registered Users Posts: 83 Big grins
    edited February 3, 2011
    Greg,

    Sorry about the confusion the sitepmap-base and index work. I am talking about the sitemap-galleries.xml and sitemap-images.xml

    I have the other two in there and resubmit after I add new galleries...

    Thanks!
  • Options
    TwoofyTwoofy Registered Users Posts: 171 Major grins
    edited February 3, 2011
    Greg,

    Sorry about the confusion the sitepmap-base and index work. I am talking about the sitemap-galleries.xml and sitemap-images.xml

    I have the other two in there and resubmit after I add new galleries...

    Thanks!

    Ahh, I see, sorry I misunderstood.

    Those are not nearly as important as the base galleries and MUCH hard to generate from a systems perspective, so they go into a queue (which is quite large right now). Its not ideal and thats why we are in the process of changing things - but as we go through this I would expect that the sitemap-base file would continue to be re-generated immediate upon making any changes. This is the one that represents almost 100% of the pages that search engines will actually index and (in my opinion) that you'd want to send traffic to.

    Also, the only sitemap that should be manually added to webmaster tools is the sitemap-index. Because, as you see, we may add, change, or even remove sitemaps that we generate.

    - Greg
  • Options
    OffTopicOffTopic Registered Users Posts: 521 Major grins
    edited February 3, 2011
    Today I actually received a warning e-mail from Google WebMaster Tools about the problem:

    Dear owner or webmaster of http://www.loricareyphoto.com/

    While attempting to crawl your site, we noticed an increase in the number of URLs that we are unable to crawl due to a robots.txt restriction. Here are some sample URLs that are blocked by robots.txt: etc, etc,

    You can see more details about these errors in Webmaster Tools.

    If you've specifically blocked Google from crawling these URLs, there's no need to fix anything. Google will continue to respect robots.txt, and will not crawl these pages. If, however, these errors are unexpected, you should review your robots.txt file to make sure that no files are being inappropriately blocked from search engines.

    For more information, see our Help Center:
  • Options
    TwoofyTwoofy Registered Users Posts: 171 Major grins
    edited February 3, 2011
    OffTopic wrote: »
    Today I actually received a warning e-mail from Google WebMaster Tools about the problem:

    Dear owner or webmaster of http://www.loricareyphoto.com/

    While attempting to crawl your site, we noticed an increase in the number of URLs that we are unable to crawl due to a robots.txt restriction. Here are some sample URLs that are blocked by robots.txt: etc, etc,

    You can see more details about these errors in Webmaster Tools.

    If you've specifically blocked Google from crawling these URLs, there's no need to fix anything. Google will continue to respect robots.txt, and will not crawl these pages. If, however, these errors are unexpected, you should review your robots.txt file to make sure that no files are being inappropriately blocked from search engines.

    For more information, see our Help Center:

    This is expected, we are removing some URLs from the sitemaps that were not getting indexed or really should never have been in them in the first place. The last thing we want to do is drive traffic to irrelevant pages that do not showcase everyone's beautiful photography. These errors should go away in the coming weeks.

    - Greg
  • Options
    CFPhotographyCFPhotography Registered Users Posts: 83 Big grins
    edited February 5, 2011
    Twoofy wrote: »
    There were a lot of useless links in the sitemaps, causing traffic to go to pages (like feeds) that made no sense. Those have been removed and are probably what you are seeing.

    - Greg


    Greg,

    This number is now growing, it was 6,500 last week,today it is now 11,758 links restricted by robots.txt.

    What exactly is being restricted and why?

    Thanks!
  • Options
    mrcoonsmrcoons Registered Users Posts: 653 Major grins
    edited February 6, 2011
    OffTopic wrote: »
    Today I actually received a warning e-mail from Google WebMaster Tools about the problem:

    Dear owner or webmaster of http://www.loricareyphoto.com/

    While attempting to crawl your site, we noticed an increase in the number of URLs that we are unable to crawl due to a robots.txt restriction. Here are some sample URLs that are blocked by robots.txt: etc, etc,

    You can see more details about these errors in Webmaster Tools.

    If you've specifically blocked Google from crawling these URLs, there's no need to fix anything. Google will continue to respect robots.txt, and will not crawl these pages. If, however, these errors are unexpected, you should review your robots.txt file to make sure that no files are being inappropriately blocked from search engines.

    For more information, see our Help Center:

    I received the same message. I've spot checked a number of these links I do not see anything that I would not want found by Google. I have found a number of these links pointing to images that had a number in the keywords (ones I apparently forgot to remove). Is this one of the problems you are attempting to resolve?
  • Options
    OffTopicOffTopic Registered Users Posts: 521 Major grins
    edited February 7, 2011
    Mine appear to be ALL of my keyword URLs from the letters A-S...I think that's where they gave up. ne_nau.gif
  • Options
    psenior1psenior1 Registered Users Posts: 125 Major grins
    edited February 8, 2011
    just for info Greg, my 'resticted by robots' has increased to 20k+ and I have warning for both sitemap galleries and images, both with -

    'When we tested a sample of URLs from your Sitemap, we found that some URLs redirect to other locations. We recommend that your Sitemap contain URLs that point to the final destination (the redirect target) instead of redirecting to another URL.'
    website - http://www.snrmac.com
    facebook - my facebook page please LIKE me!
  • Options
    TwoofyTwoofy Registered Users Posts: 171 Major grins
    edited February 8, 2011
    Hello,

    I am pretty sure that the redirects are actually related to old keywords that were at some point deleted or changed, but not removed from the sitemaps. But, just to be sure if Google is showing you one or two specific ones, can you post them here so I can check them? If not, it is okay, I am going to take a very close look at your's and OffTopic's sitemaps tomorrow and make absolutely sure.

    As for the large number of URLs that are getting blocked by robots.txt - I have checked into this and as scary as it may look, it is intentional. I do not know exactly where this is going to end up, but it is one of those things that in order to fix some of the bigger SEO issues that are going on we have to clean those up first.

    I know this ride may feel a little bumpy right now - please hang in there though.

    - Greg

    P.S. Sorry for my delay in getting back to you - its been a very busy day.
  • Options
    psenior1psenior1 Registered Users Posts: 125 Major grins
    edited February 8, 2011
    Twoofy wrote: »
    Hello,

    I am pretty sure that the redirects are actually related to old keywords that were at some point deleted or changed, but not removed from the sitemaps. But, just to be sure if Google is showing you one or two specific ones, can you post them here so I can check them? If not, it is okay, I am going to take a very close look at your's and OffTopic's sitemaps tomorrow and make absolutely sure.

    As for the large number of URLs that are getting blocked by robots.txt - I have checked into this and as scary as it may look, it is intentional. I do not know exactly where this is going to end up, but it is one of those things that in order to fix some of the bigger SEO issues that are going on we have to clean those up first.

    I know this ride may feel a little bumpy right now - please hang in there though.

    - Greg

    P.S. Sorry for my delay in getting back to you - its been a very busy day.

    thanks, I've just PM'd a couple of links.
    website - http://www.snrmac.com
    facebook - my facebook page please LIKE me!
This discussion has been closed.