New SmugMug - SEO Concern

chipj · November 22, 2013

bwg wrote: »

This is not correct. Googlebot etc. get a version of the page with javascript content already rendered into the html. I dunno what useragent that webconfs thing is using, but if you change the useragent in Chrome developer tools to the googlebot useragent, you can see what google sees.

Well, I beg to differ. Have you looked at the actual source code being served up (ctrl+u in Chrome)? It's Javascript not html and that's what the bots are seeing. Chrome is translating it into HTML, but bots are rather dumb and don't do that type of translation.

Darter02 · November 22, 2013

chipj wrote: »

Basically yes. Google is getting better at crawling non-html links (links that are embedded in javascript, a flash object & other non-html code) but there's no guarantee. Crawlable links allow Google (and other bots) to create a relationship between one page to the next. These links also expose new URLs that might have been missed in the sitemap or on previous bot visits.

So if you use a standard, non-dropdown, link oriented menu you're good to go? The same if you leave off the slideshow & keyword cloud?

bwg · November 23, 2013

chipj wrote: »

Well, I beg to differ. Have you looked at the actual source code being served up (ctrl+u in Chrome)? It's Javascript not html and that's what the bots are seeing. Chrome is translating it into HTML, but bots are rather dumb and don't do that type of translation.

Well, It doesnt sound like you changed the user-agent to googlebot like i said.

Make sure you change the user-agent to googlebot, then view source. Its html. If you just view a page in chrome, you're gonna get the pure javascript version.

here is your site after changing the user-agent to googlebot:

you'll notice we indicate we identify googlebot, and all your meta info is there:

here are your menus:

and your keyword cloud:
skitch.png?resizeSmall&width=832

chipj · November 23, 2013

bwg wrote: »

Well, It doesnt sound like you changed the user-agent to googlebot like i said.

"View Source" shows you exactly what Googlebot will see when it views the code. Like I said earlier, Googlebot has gotten better at crawling javascript and other non-html objects (Flash, etc..), but I haven't seen any proof that Googlebot can crawl and then process javascript without any issues. Maybe you have other proof that it can? If so, I'm all ears...

bwg · November 23, 2013

chipj wrote: »

"View Source" shows you exactly what Googlebot will see when it views the code. Like I said earlier, Googlebot has gotten better at crawling javascript and other non-html objects (Flash, etc..), but I haven't seen any proof that Googlebot can crawl and then process javascript without any issues. Maybe you have other proof that it can? If so, I'm all ears...

I'm not sure how else to explain this to you. I'm clearly not doing a very good job of it.

When googlebot visits your site, it gets the HTML rendered version (see the screenshots I posted), not the javascript version that you get when visiting from a browser.

chipj · November 23, 2013

bwg wrote: »

I'm not sure how else to explain this to you. I'm clearly not doing a very good job of it.

When googlebot visits your site, it gets the HTML rendered version (see the screenshots I posted), not the javascript version that you get when visiting from a browser.

No, I totally understand what you're saying. What I'm talking about is the fact that a bot does not render content, it merely captures source code. The browser is the element that has a rendering engine. This allows it to render javascript code. Bots do not have that ability. They only capture code. Apparently Google now has ability to take the javascript code that the bot has captured and use it's own rendering engine to try to find links, etc.. The jury is still out as to how well they do this.

A sample of the link code that the bot sees is this (form my site):
"Url":"http:\/\/www.chipjonesphotography.com\/FineArt\/Abstract"

This is what the bot sees. It's not HTML and the bot doesn't translate the code into HTML, they merely capture the code. If Google can determine that this is a link (a backend process), then the bot will crawl that link to capture the source code there. If Google can't determine that this is a link they need to follow, then it's ignored. This is what I'm talking about.

bwg · November 23, 2013

chipj wrote: »

No, I totally understand what you're saying. What I'm talking about is the fact that a bot does not render content, it merely captures source code.

I'm afraid you don't understand, and that might be because the screenshots aren't showing. here are direct links to what I was referring to before

Where we identify googlebot and render the proper meta tags:
https://www.evernote.com/shard/s212/sh/9dd96350-d71e-43bd-a217-7cce7b73d5fa/64d13488a0d1d039ebd4cc7f1f03d98c

How googlebot sees your dropdown menu:
https://www.evernote.com/shard/s212/sh/6ee9512d-b0cb-491d-8e19-532ddcb9a25e/e0dcd076d5d5436de1bb03b21707af3e

And how googlebot sees your keywords:
https://www.evernote.com/shard/s212/sh/5ade2fa6-0e6e-4243-888a-11f66414ab9c/4c8cda0adcbb5eb3d3e6ba16cc041673

All html, as seen by view-source.

Does this make sense now?

chipj · November 23, 2013

bwg wrote: »

I'm afraid you don't understand, and that might be because the screenshots aren't showing. Does this make sense now?

Again, I know this technique very well, but Googlebot is not a browser, nor does it render code. I guess I'll just leave it there. I do not share your feeling that New SmugMug content is being indexed as well as it could, and am confident that the New SmugMug implementation, with it's javascript based widget solution is the primary culprit. If you're happy with SmugMug, then great. I am not and will likely move on (for a number of reasons)...

bwg · November 23, 2013

chipj wrote: »

Again, I know this technique very well, but Googlebot is not a browser, nor does it render code.

This I think is where we are not seeing eye to eye. We aren't asking googlebot to render anything. SmugMug servers pre-render the javascript into html and send that html to googlebot. Googlebot sees the html, it doesn't need to process the javascript into html like the browsers do.

The links I shared earlier were of the source of the page sent from the server when I told Chrome to identify itself as Googlebot. Not the source after javascript executed. If you compare that to the view-source of the page when just letting Chrome identify itself as Chrome you will see, as you you have pointed out, that the source *is* mostly javascript that needs to be executed (rendered), and you are correct that this is not good food for Googlebot. But, again, this is *not* the same source that gets sent to Googlebot. Googlebot gets html.

fabthi · November 23, 2013

I think you guys (ghipj, bwg) are going a bit too far with technicalities which don't help us average SM users to understand if and how we could improve SEO performances on our SmugMug websites

huh

bwg · November 23, 2013

fabthi wrote: »

I think you guys (ghipj, bwg) are going a bit too far with technicalities which don't help us average SM users to understand if and how we could improve SEO performances on our SmugMug websites huh

Well, there was speculation going on so in the interest of making sure the community had the proper information, some technical explanation was needed. My apologies for not being as clear as I could have been initially.

Fill in the appropriate SEO boxes in your account settings (settings > discovery > search) and we will do the rest.

Any content blocks you use will be available to search bots (menus, keyword clouds, etc.)

AdamNP · November 24, 2013

bwg wrote: »

Well, there was speculation going on so in the interest of making sure the community had the proper information, some technical explanation was needed. My apologies for not being as clear as I could have been initially.

Fill in the appropriate SEO boxes in your account settings (settings > discovery > search) and we will do the rest.

Any content blocks you use will be available to search bots (menus, keyword clouds, etc.)

Well, saying SM will "do the rest" is a bit oversimplifying things. There are very basic and large SEO problems with the new SM. This thread is a great example:

http://www.dgrin.com/showthread.php?t=243019

Duplicate meta descriptions, and particularly duplicate title tags, are big negatives for ranking.

fabthi · November 24, 2013

bwg wrote: »

Well, there was speculation going on so in the interest of making sure the community had the proper information, some technical explanation was needed. My apologies for not being as clear as I could have been initially.

Fill in the appropriate SEO boxes in your account settings (settings > discovery > search) and we will do the rest.

Any content blocks you use will be available to search bots (menus, keyword clouds, etc.)

Please bear with me if I sounded rude, I didn't mean to

; I'm sure all the things you were discussing are well worth. But from the point of view of less SEO skilled SmugMug users as myself, which I believe are by far the widest part, the discussion had come to a point it could only add more confusion to our already confused minds about SEO issues.

bwg wrote: »

.......and we will do the rest.

What do you exactly mean with this?

bwg wrote: »

Any content blocks you use will be available to search bots (menus, keyword clouds, etc.)

The thread started with a confutation on this point while you are saying those contents are not affecting SEO; it would be a good thing if SM staff could spend a clarifying word on this.

bwg · November 24, 2013

fabthi wrote: »

The thread started with a confutation on this point while you are saying those contents are not affecting SEO; it would be a good thing if SM staff could spend a clarifying word on this.

I am SM staff, and clarifying the speculation on content blocks was the reason I joined the thread

fabthi · November 24, 2013

bwg wrote: »

I am SM staff, and clarifying the speculation on content blocks was the reason I joined the thread

Ah, sorry I didn't know

now I also understand why you wrote ".....we take care of the rest"

richpepp · November 25, 2013

bwg, can you tell me what user agent string you use for google image search? For standard google search I use 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' which seems to work fine but I have less success finding something that works reliably with google image search. Often I get time outs retrieving the page.

I should say however that it's been a few weeks since I've looked in detail at this though as for me the google image and web search seems to work. My only slight confusion is that Google Webmaster tools says that 154 images have been submitted and none indexed but google image search is showing 124 of the images

Thanks for your input on this bwg

Rich

byoshi · November 25, 2013

richpepp wrote: »

bwg, can you tell me what user agent string you use for google image search? For standard google search I use 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)' which seems to work fine but I have less success finding something that works reliably with google image search. Often I get time outs retrieving the page.

I should say however that it's been a few weeks since I've looked in detail at this though as for me the google image and web search seems to work. My only slight confusion is that Google Webmaster tools says that 154 images have been submitted and none indexed but google image search is showing 124 of the images

Thanks for your input on this bwg

Rich

Would also like to know about this. Never understood the correlation.

Google brings up a few images under search while webmaster doesn't say any are indexed.

bwg · November 25, 2013

I'm not a Google employee, so I can't speak for what exactly the webmaster tools reports represent.

However, this appears relevant to your questions regarding submitted vs. indexed images

https://support.google.com/webmasters/answer/178636?hl=en

chipj · November 25, 2013

bwg wrote: »

I am SM staff, and clarifying the speculation on content blocks was the reason I joined the thread

bwg, I think you and the rest of the SM dev staff should review this and then tell me if what I've said is misleading. Not looking to battle, just trying to help SM get a better product out there that the search engines can more easily index. Here's the documentation:
https://developers.google.com/webmasters/ajax-crawling/docs/learn-more

bwg · November 25, 2013

chipj wrote: »

bwg, I think you and the rest of the SM dev staff should review this and then tell me if what I've said is misleading. Not looking to battle, just trying to help SM get a better product out there that the search engines can more easily index. Here's the documentation:
https://developers.google.com/webmasters/ajax-crawling/docs/learn-more

I've only been addressing the statement you made about content blocks not being visible to googlebot because they are a "javascript based widget solution". That statement is misleading and incorrect.

Borrowing terminology from the documentation you referenced, we *do* create a "HTML Snapshot" for googlebot. In fact, we use a process similar in function to #3 here: https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot

If that doesn't click for you, I'm not sure what else to say.

chipj · November 25, 2013

bwg wrote: »

If that doesn't click for you, I'm not sure what else to say.

Well, I guess we're at an impasse then because there's absolutely no way I can tell if what you say you're doing (programatically) is actually what you are doing and I don't see anything associated with a HTML snapshot.

All I have to go on are the multiple issues being reported by tools like Google Webmaster Tools, Xenu Sleuth, Bright Local, and the non-html content being served up in the page source. I guess I'm imagining things...

bwg · November 25, 2013

chipj wrote: »

Well, I guess we're at an impasse then because there's absolutely no way I can tell if what you say you're doing (programatically) is actually what you are doing and I don't see anything associated with a HTML snapshot.

All I have to go on are the multiple issues being reported by tools like Google Webmaster Tools, Xenu Sleuth, Bright Local, and the non-html content being served up in the page source. I guess I'm imagining things...

I've provided incontrovertible evidence of what we serve googlebot. Ignoring the evidence is certainly your prerogative.

And like I've already mentioned, the reason tools like Xenu Sleuth and that webconfs.com one you posted earlier report errors are because they don't identify themselves with a googlebot (or other recognized bot) user-agent. The Xenu Sleuth FAQ (#20) even mentions this.

Justin B · November 26, 2013

Hi Chip,

Try the following to see what googlebot sees…

1. Enter Chrome's dev tools (View > Developer > Developer Tools).
2. Click the settings icon (bottom right on a Mac).
3. Click the "Overrides" tab.
4. Click the "Enable" and "User Agent" checkboxes.
5. Select "Other" for the "User Agent" drop down and enter "googlebot" in the text field.
6. Visit your SmugMug page and witness all of the HTML they are rendering

This worked well for me but it's possible they aren't rendering HTML for sources that aren't reputable (bwg, correct me if I'm wrong). I hope this works for you!

WinsomeWorks · November 26, 2013

fabthi wrote: »

While I was browsing the forum in search of tips for improving my website SEO, I stumbled on this thread and...well, what a shock!!! :wow:wow:wow
I just decided to not watermark my images and rely on RCP because I've always thought watermaks are visually annoying visitors and bam!, it turns out RCP can sink search results wxwax
Leaving aside the technicalities, which I am not able to fully understand and therefore discuss, I'd like to make a clear understanding of the point of this thread as it emerges from the posts so far:
are you guys really saying NewSmugMug SEO sucks?
Please bear with me if I might sound rude but SEO is more than important for myself (as well as for many others, I figure) as I have to rely a lot on web based searches for my business.
I received, as usual, many (interesting and convenient) offers from other platforms before I decided keep staying (I had a SM legacy site) with SmugMug and, to be honest, the main reason that made me confident to go ahead again with them was that I have been rather happy with how SEO worked about my legacy site.
Now, reading your worries about actual SEO performances of NewSmugMug has obviously made me quite nervous.

I feel your pain about the RCP thing. What's really been ticking me off about that whole thing is that they had decided a couple years ago (wisely, imho) to stop keeping our RCPed images from Google, after talking to we the users!... uh, the people who are paying for this service. BUT then they evidently reversed the decision, told no one, and are now completely refusing to own up to it, to discuss it with us (yes, us, the same users who are now being charged much higher rates for the service but getting many fewer images served up to Google).

Months ago I begged and begged for an official SmugMug response about the decision's reversal. All we got was complete silence. No word as to 1. IF they consciously reversed the decision 2. If consciously, why did they do that after consulting with people and finding out the current procedure is not wanted?? 3. If not consciously, but because some designers of all the new stuff didn't know the history on that decision, where is the explanation / discussion of all this with those of us who are paying for this crappy situation where RCP ruins our SEO? My hunch for a long time is that SmugMug & Google have some problem/issue between them that isn't or hasn't been resolved in ways that do us any good. What I hear/read between the lines is that some part of this is an embarrassment. There are several odd Google issues that simply are not being explained & have gone unresolved for years. (Maps is only one of them)

And now this SEO/ RCP thing that affects so many of us exists (& lots of SmugMuggers probably have no clue!) & we get no satisfaction. It feels very rude, in ways that would not have happened at SmugMug even a couple years ago. I feel completely disregarded, actually. Here I have this great option of using RCP in some places where I want to and choose to. And yet the official SmugMug policy currently (evidently) is to not talk about any of its effect on my SEO, something that's important to me. It seems they think these questions will just go away. But they won't. I think about it every day & wonder, & and it's not going away. It makes me so tempted to leave and find a place where honesty and transparency and real concern for the things we care about are still the norm.

richpepp · November 26, 2013

5. Select "Other" for the "User Agent" drop down and enter "googlebot" in the text field

This does indeed work perfectly for web search. For image search I've been using 'Googlebot-Image/1.0' but I often find I get timeouts waiting for the page to load. Do you happen to know if that is the correct UA? (EDIT: no timeouts today, I wonder if there was a problem before)

chipj · November 26, 2013

bwg wrote: »

I've provided incontrovertible evidence of what we serve googlebot. Ignoring the evidence is certainly your prerogative.

And like I've already mentioned, the reason tools like Xenu Sleuth and that webconfs.com one you posted earlier report errors are because they don't identify themselves with a googlebot (or other recognized bot) user-agent. The Xenu Sleuth FAQ (#20) even mentions this.

The proof is in the search metrics. I guess it's just time to move on...

Darter02 · November 26, 2013

WinsomeWorks wrote: »

I feel your pain about the RCP thing.

Just an FYI about that. I have RCP set to ON in all but one gallery. The other day I saw someone did a Google Image search for "Chinese algae eater and discus." The search turned up a bunch of photos from my site which are RCPed.

So images do indeed appear in image search when using RCP. They appear to be smaller thumbs as found via a keyword search page. "Visit page" opens the lightbox. "View image" opens just the image, with .JPG in the URL. It can then be snagged by a viewer using Right Click. It also doesn't count towards traffic in google stats. It does have my watermark on it.

Also, the interesting thing is if they click "Visit Page" on any thumbnail from the gallery other than the one for the actual Chinese Algae Eater, say one of a discus fish, the Chinese Algae Eater photo is the one that opens in the lightbox.

fabthi · November 27, 2013

Darter02 wrote: »

Just an FYI about that. I have RCP set to ON in all but one gallery. The other day I saw someone did a Google Image search for "Chinese algae eater and discus." The search turned up a bunch of photos from my site which are RCPed.

I just tried a Google images search for "Chinese algae eater" and none of your images turned up; although I only checked the first 50-60.

LPC · November 27, 2013

Darter02 wrote: »

So images do indeed appear in image search when using RCP.

They do indeed. All my photographs have RCP and always have had and I can find them very quickly when doing an image search on any browser. A lot of the things I photograph are hard to find so I can usually find them on the first results page but they are always there somewhere.

Darter02 · November 27, 2013

The search I mentioned was "Chinese Algae Eater and Discus.". The "and discus" must have made my images unique enough for them to appear towards the top. I am not at all surprised that just "Chinese algae eater" doesn't show hits as that gallery is very new. I only reprocessed the photos, and uploaded them in the past month or two.

New SmugMug - SEO Concern

Comments