New SmugMug - SEO Concern
chipj
Registered Users Posts: 149 Major grins
I have to admit that I love the ease of use of "New SmugMug", but after a few months of analysis it appears that the new version is very much lacking in terms of SEO. In essence, the older version of SmugMug was better when it came to SEO.
Case-in-point, if you use the default drop down menu widget, slideshow, or tag cloud provided through "New SmugMug" on the home page, the site is basically invisible to the bots because all of these elements are constructed with Javascript. Javascript can be real iffy from an SEO perspective and I'm seeing the negative side to this in my rankings and index reporting in Webmaster Tools. Google can sometimes crawl Javscript, but apparently not in this case.
I'm hoping that the SmugMug IT team will look into this a bit further because although Google (in my case) is seeing a total of 844 urls (likely due to the XML Sitemap), they have only indexed a total of 73 urls, and only one indexed image out of several hundred. This is a far cry from legacy SmugMug.
Case-in-point, if you use the default drop down menu widget, slideshow, or tag cloud provided through "New SmugMug" on the home page, the site is basically invisible to the bots because all of these elements are constructed with Javascript. Javascript can be real iffy from an SEO perspective and I'm seeing the negative side to this in my rankings and index reporting in Webmaster Tools. Google can sometimes crawl Javscript, but apparently not in this case.
I'm hoping that the SmugMug IT team will look into this a bit further because although Google (in my case) is seeing a total of 844 urls (likely due to the XML Sitemap), they have only indexed a total of 73 urls, and only one indexed image out of several hundred. This is a far cry from legacy SmugMug.
0
Comments
Smugmug SAYS that they provide a google-specific response when Google is polling, without javascript. Did you use the Google tools to see whether that's true?
Personally I am sure something is strange, but I'm not sure what. My indexed status has been going down since "new", it is about half what it was (yet the page and photo and gallery count is going up).
But the strangest part is that IMAGES are never findable. I've been getting more and more gallery comments and even photo captions to come up on searches, but almost nothing for the actual images. Some time ago I posted proof that the special output to google was truncated, but got no response.
But as a simple example,
site:www.captivephotons.com olivia
as a web search finds correctly the caption on one photo that has that name in it. But do that as an image search and nothing. Very few of my images appear to be indexed by Google (though the number is growing).
It is strange.
Ferguson (& chipj too), I only left in a bit of your post for reference, but people can read it just before this reply. So I'm too tired to look around right now, but am wondering if either of you uses Right Click Protection (RCP) or not. If not, this reply probably isn't going to do you much good. If so, there's a lot to learn about that and its present effects on Smug SEO. And SmugMug officials have maddeningly refused to answer questions about current policy. I've asked in several threads and am met with total silence from them. Here's one of my posts that gives you the pertinent parts of the last 2 yrs. of history on the SmugMug history of RCP & GIS: http://dgrin.com/showpost.php?p=1911986&postcount=300 Very unfortunately (imho) SmugMug has evidently changed its tune recently (& no one says when, but we assume with the New Smug rollout) about what to present to search engines when we have RCP turned on.
They had made what I would call an excellent decision concerning RCP & GIS in 2011, then making our RCPed images show up much better in GIS. (after many pros were wondering why their images weren't showing up) I've waited patiently for weeks now for official responses from them about whether they've actually reversed their decision, and if so, why. OR whether there's a new default that is ignoring the 2011 decision, but not purposefully. But again, they simply refuse to answer. If they've reversed it, I want to know why, when virtually all the advice they'd gotten from their pros was that the decision to begin including at least small versions of our images in GIS was wanted & needed by us. If they keep things the way they appear to be now (where RCPed images aren't appearing in GIS), I may decide to leave... and those are words I never thought I'd say. The fact that they refuse answers makes me feel even more led to move on. Jörg's important SEO question about the /keyword page in that same thread never got an official response either (Twoofy doesn't seem to be around anymore): http://dgrin.com/showpost.php?p=1901473&postcount=291 The rest of that thread, which may have other history that interests you (p. 24 is pithy) is here: http://dgrin.com/showthread.php?t=190216&page=24
And right now, if there's anyone feeling the need to give a lecture about how RCP isn't really "protection", just please drop it, because I've heard it all, yes, I know all the reasons, and I have my own reasons (which many others share) for having it. We all have different types of clients and visitors, and if I hadn't already thought through the whole RCP thing, I wouldn't have it turned on.
DayBreak, my Folk Music Group (some free mp3s!) http://daybreakfolk.com
As Ferguson says, Google web and image search bots get presented with a slightly different version of the page as well as also being able to pick up your sitemaps (which get automatically generated). I know that google image search does work as I created a new site a few months ago now and after about 4 week the images started to show up in image search and now about 12 weeks later most of them are showing up (although there aren't a lot there). They are also showing up for general searches such as 'homage to velasquez' and 'cyclists lovers' as far as I can see although it can be difficult to verify as Google is quite good at checking what sort of things you like. I commented on this on an earlier thread here http://www.dgrin.com/showthread.php?t=240372
But it's complicated as Anna Lisa is saying. RCP is an issue that we don't really understand how it affects search now. There is also the types of galleries that you have you images in and how they show keywords. Finally there is the rate of Google finding images - I find that the report on images indexed in Google Webmaster tools doesn't match what Google itself shows (webmaster tools says that no images are indexed which is wrong).
For the record what I did when I set the new site (petermclarenfineart.com) was
1. Deliberately used the 'journal' style as that exposes the captions and text nicely on the page. Obviously this isn't ideal for every gallery but might be good for some galleries of the very best images
2. RCP off (but yes I understand that not everyone can do that)
3. Worked hard on inbound links with good anchor text and text surrounding the links.
4. Keyword all photos and try to caption them as well where possible
I'm sure that other combinations would or should work as well but it isn't easy. For reference my main site which has been around a long time seemed to show a big drop in images initially when I did the switch but it's catching up again, albeit slowly.
Edit: I forgot to add that the photos that appeared first in Google Image Search were the ones from the galleries that I managed to get external links to and that got a lot of visits. The other images took longer to appear. So, for example, we (it's not my site) put a link to a gallery of cyclist images up first and those were the first to appear in GIS. The others took a lot longer.
Ferguson, yes I do use webmaster tools. That's basically where I monitor this. But, I'm seeing the actual issue when I do a view source. Yes, SmugMug provides a SEO solution, but from what I'm seeing it's only through the sitemap they automatically create.
I'll check out RCP, but the biggest issue I see is that the widgets (menus, cloud object, slideshow) are created with Javascript. I updated my initial post (above) to highlight this better. But again, I'll play around a bit with RCP later on today to see if the code being generated becomes more search bot friendly...
edit:I'm not sure that there is any difference with what Googlebot gets shown. If I change the ua string I see 'class="sm-ua-unknown sm-browser-googlebot' at the top which is the same as if you use 'fetch as googlebot'. However the webpage looks identical as far as I can see both visually and in the code. In both cases I can see real links for images along with Alt text which is what we want.
BTW, I disabled the right click protection on one of my galleries and it made no difference to any of the Javascript embedded links. About the only solution that seems viable right now is to use the HTML widget and create true HTML links. I'm not counting on SmugMug to rectify this because it would be a complete code re-write.
SmugMug developers should have used more CSS, HTML and less Javascript in their hyperlink coding. This would have created a much more friendly SEO environment.
I spent a bit of time tonight and decided I am over my head, but there's some strange stuff in here.
I looked at the fetch-as-google results more carefully. Previously I was interested that the page (a collage landscape gallery) did not show all the images, but this time I noticed that my main navigation (the standard nav bar) is represented as javascript. I THINK that means that Google cannot follow it.
OK, so I started checking site-maps as I think that's where it looks next. Google says there are none (but I'm not sure if that means there are none, or just none submitted manually).
I looked in Robots.txt and found this:
Sitemap: http://www.captivephotons.com/sitemap-index.xml
So I downloaded and looked at that. I found entries like:
And downloaded those and found something like the attachment, which seems to list almost every file size for an image. Mine went from tiny, tiny thumbnail up through X3 (but no original). These are wrapped inside of an image tag, labeled with an image collection tag, and each was an instance.
Now that may be exactly and completely correct.
But what's strange about this is that the whole structure of the web SITE seems missing. At first glance, other than each image having a folder link in the URL, there's nothing here related to the site structure. The Nav Bar and page to page links (and if it can't read javascript, where can it get it?).
These files are LARGE. I downloaded them all, unpacked, and did some searches and was amazed to find every photo represented in them.
But it's just a list of photos. It doesn't deal with content, with structure (again, except as implied with the URL's), links.
Is this what's supposed to be present?
And it's just a bit of a surprise that Google is being fed nine copies of every photo, in different sizes. Is that normal?
Seriously these are not rhetorical questions, I have no clue what should be here, so consider this data if it is helpful.
PS. I don't use right click protection, so to all that discussion I have no insight.
It's possible you missed a bit of your original sitemap-index when you looked at it. When I look at mine I see several sitemap-galleryimages entries but the very first entry is slightly different as is sitemap-base.xml.gz. When I look at it what I see is a list of the different pages of my site which is the structure thing you are looking for I think?
The sitemap-galleryimages file contains a lot of great stuff for Google. The various copies of each photo that you mention are all related to each other in the file so I am guessing that Google will pick the one that it finds most relevant. Each photo is also has it's keyword and captions with it so Google will know they are related. Finally each group of images are also linked to the webpage that they appear on so google can pick up any other info. directly from the page if need be.
I get the impression that sitemap-index will make sure that all individual pages can be found and sitemap-galleryimages will mean that all images can be found (although webmaster tools reports the number of images it never seems to actually mark them as indexed - even when I can find them in google). For navigation links between pages Google will have to read the actual pages but I know that it works because if I go to webmaster tools and look under Internal Links I see loads.
Hope that all makes sense
Rich
The issue is that even if Googlebot hits one of the URLs listed in the base URL sitemap, it can crawl the content on the page, but it's having a tough time crawling to related pages because the hyperlinks are embedded inside Javscript code that they can't crawl. The bots read straight text, and they don't natively have the ability to crawl non-HTML content such as Javacript or a Flash object.
Google has made much improvement in crawling non-HTML code over the past decade, but there are unlimited variations of Javascript coding and sometimes it doesn't understand all variations. Googlebot apparently is choking on the SmugMug variation... at least on my site which is pretty much the SmugMug Default template with only some CSS customization.
Absolutely did miss it, thank you very much.
I'm curious what percentage of URLs on your site are listed as being "indexed" in webmaster tools. This would be the percentage of "Crawled" to "Indexed" URLs. Right now I'm at about 10%.
I'm not sure what 'crawled' means here as that isn't something that I can see in google webmaster tools, sorry. However I will tell you what I can see if that helps. I'll look at the petermclarenfineart.com site as that is the new one and it has the simplest structure which make it easier to get a grip on the numbers:
1. The number of pages crawled per day is 15 on average
2. On the 'crawl/sitemaps' page the number of web pages submitted is 159
3. On the same page the number of web pages indexed is 14 - so about the same 10% you see (if we are looking at the same thing)
4. The number of URLs in sitemap-base.xml is 37
Those are the reports. Now reality seems to be that there are only about 14 'real' pages on the site. There is the home page, all of the pages listed on the menu on the left and then 3 sub pages linked from those pages. That's all
If I do web search using site:petermclarenfineart.com I get 17 results. I think that Google has everything there is to see which raise a couple of questions.
Firstly, what are the 37 pages in sitemap-base.xml? Well they seem to be the 17 main pages but also some urls for some galleries that are public but not exposed through the normal structure. As there aren't any external links to them Google probably hasn't considered them important enough to index.
Secondly, what are the 159 web pages in the index? I have no idea at the moment as the images are listed separately in webmaster tools. I'm going to have to track that down.
So, as far as I can see, Google has everything there is to be had even though it looks like only 10% of the pages are indexed. It's been a couple of months since the site wen't up so I can't tell you exactly how long it took to do though. I have no idea why your link checking tool doesn't pick up the links but I confess I gave up on those things and in the end just check what google reports instead.
Hope that helps. I'm off to see if I can find what the 159 web pages are
Rich
edit: and the answer is that the other urls are actually links to the lightbox view of each images. So, there are 122 images on the site (which is also exactly the number of images reported in webmaster tools). Add to that the 37 urls from sitemap-base and you get to the 159 (it's exact - a bit scary really). So, google web search doesn't appear to be in a great hurry to index pages without text on them. To be expected really
google image search on the other hand already appears to already have 71 of them and the number goes up each day. I'm even seeing some images that only went up in the last two weeks, but... those are also from galleries that we externally linked to.
/404
/password
/date
In fact the /date directory is set to "disallow" in the robots.txt file (for my site), so I'm sure that Googlebot is confused. In addition, in my sitemap.xml I'm seeing several pages that I thought I had deleted a while ago. I'll have to review those later on today to see if they still exist on SmugMug.
Overall, I would just like to see SmugMug do a better job with their SEO implementation. A few simple changes could make a big difference, but I'm sure they feel that it's okay the way it is (which it isn't)
Apart from stopping the inclusion of the /date directory and then blocking it I'm not sure what else they can do. The sitemaps now look good (they weren't initially) and although your SEO tool has difficulty crawling the site it Google seems to have no problems, at least as far as my site is concerned but I may be lucky with the layout that I'm using.
On SEO - is there are reason that the photos on your blog are hosted on your blog rather than your smugmug site? I'm guessing it is likely you have a reason for doing that (as you appear to be very SEO aware) as I always thought that from an SEO standpoint it was better to link to the source image rather than making a second copy on your blog eg. this photo appears on your blog as well as on your smugmug site here ? This isn't a criticism - it's just a question as it may have been deliberate and I want to get things better on the sites I manage as well.
That said, most of my SEO traffic is generated from my Blog and most of the traffic I receive on my SmugMug site is referred traffic from my blog. I get very little organic search traffic to my SmugMug site (never really have) and so I've found that using my blog to capture organic search traffic is more effective for general search terms. If it's a more specific search term, like a search for a location or a landmark, then my SmugMug site generally gets that traffic.
BTW, my blog implementation (WordPress) uses actual html for all menu links and is more easily crawled by the bots. In addition 100%, of my submitted URLs are indexed by Google and ranking fairly well... most of my targeted keywords are at least on page 1 or 2.
Well done with the blog btw, it looks great and they are easily the best way to get traffic
Not sure why the reporting difference between GWT and the personal searches you're doing, but there's lot of variables in play. One, GWT uses sampled data for many of it's reports. Two, a personal search is not a good way to determine search visibility. I can't explain the difference between the "site:" numbers and GWT, but to me it's not really important. I look more at the organic search visibility.
There's always room for improvement and and if I can't get better search traffic results from SmugMug, I'll have to switch to a different platform. Search traffic is as important to my business (if not more) as a cold call or networking face-to-face. If I can improve my search visibility I'm certainly going to actively pursue that.
As a result, I plan on looking more at the traffic channel reports (i.e. search vs. referred) and more specifically at the referring URL reporting, at a URL level (landing page). As I'm sure you're aware, this can be done with Google Analytics. I may need to do some link building now ;-(
Facebook
Google+
Twitter
Photo Blog
I just decided to not watermark my images and rely on RCP because I've always thought watermaks are visually annoying visitors and bam!, it turns out RCP can sink search results wxwax
Leaving aside the technicalities, which I am not able to fully understand and therefore discuss, I'd like to make a clear understanding of the point of this thread as it emerges from the posts so far:
are you guys really saying NewSmugMug SEO sucks?
Please bear with me if I might sound rude but SEO is more than important for myself (as well as for many others, I figure) as I have to rely a lot on web based searches for my business.
I received, as usual, many (interesting and convenient) offers from other platforms before I decided keep staying (I had a SM legacy site) with SmugMug and, to be honest, the main reason that made me confident to go ahead again with them was that I have been rather happy with how SEO worked about my legacy site.
Now, reading your worries about actual SEO performances of NewSmugMug has obviously made me quite nervous.
Venice PhotoBlog
(NOTE: I should say that I am really passionate about SEO, it's my day job and I can be highly critical in this area when needed.)
Basically there needs to be improvements made in the basics... a customizable Title and Meta Description Tag areas, AND the home page. The home page - with all of it's non-HTML widgets - is basically un-crawlable by the search bots. The slideshow widget (if added) is also very slow. If these few items were improved it would go a long ways to improve search visibility.
In the meantime, I'll continue to rely on my WordPress blog and other non-SM areas to continue to improve my search visibility. For most SM users though, I would imagine SEO is a non-issue and so for them, it's a nice tool.
Just so I get this; you mean it's "un-crawable" if you use dropdown menus, keyword clouds, or slideshow? Right?
I'm surely not as passionate as you are about SEO but I'm definitely as much interested as yourself and many other of us here at SmugMug in having an effective SEO
With "customizable Title and Meta Description Tag areas" do you mean these settings? Or is it something else?
I have default drop-down navbar and a customized full width slideshow in homepage, do you mean they do affect search bots in a negative manner?
And, again, RCP is bad for SEO?
Pfewww..I really didn't think SM people could have done things in such a bad way :cry
Venice PhotoBlog
Venice PhotoBlog
You can see what a typical bot see's by using the bot simulator at http://www.webconfs.com/search-engine-spider-simulator.php
Do you see any of your gallery navigation links? If not then it's likely that the search spiders (Google, Bing, Ask, etc) won't as well. Again, they might, but why not just create a more SEO friendly solution from the onset to make sure?