How to edit .htaccess to block Semalt
DCrane
Registered Users Posts: 3 Beginner grinner
I'm looking to block the website semalt .com (Don't go here!) from crawling my page. If you Google this, they are not a legitimate business and skirt many standards supposed to be used by web crawlers. They also mess up analytics reporting and are apparently amassing a huge database of personal information and even send out spyware to people that visit their site.
So, I'm looking to block them from accessing my site. I've read the best way to do this is by editing the .htaccess file to block referrals from certain domains.
My question then is: how to I edit/add a .htaccess with SmugMug, or is there an alternative way to block referrals from certain domains? Any process of blocking them needs to also block subdomains like semalt.semalt .com or 24.semalt .com as they use an infinite number of variations to avoid being blocked...
Thanks in advance!
David Crane
DCranePhoto.com
So, I'm looking to block them from accessing my site. I've read the best way to do this is by editing the .htaccess file to block referrals from certain domains.
My question then is: how to I edit/add a .htaccess with SmugMug, or is there an alternative way to block referrals from certain domains? Any process of blocking them needs to also block subdomains like semalt.semalt .com or 24.semalt .com as they use an infinite number of variations to avoid being blocked...
Thanks in advance!
David Crane
DCranePhoto.com
0
Comments
I've sent off an email to SmugMug's Heroes asking for help and voicing my concern with Semalt. I suggest you all tell them you are concerned about this issue as well and maybe we can get them to block access site-wide. I can't imagine what kind of server traffic 10-20 site visits per day to every one of their hosted sites would cause :uhoh
User-agent: *
Disallow: /
at the end of the robots.txt file disallows any crawler / bot that wasn't specifically allowed in earlier in the file.
SmugMug Support Hero
These guys have been ignoring the robot.txt files as they are not a well-meaning crawler. They are actually accumulating a list of sites and their bot runs through all of them over and over as a referral from their site. It's been suggested that people that go to their website and sign up for a free trial then become a host for their bots to run from their computer and IP, protecting Semalt from detection and making them harder to block out. Not sure what they're trying to do here but it's sketchy.
Further, SmugMug has come back and basically said they are "looking into it," but I cannot have access .htaccess file so there is nothing they will do about it. "You can block access by making a gallery password protected" ... Not really what I was going for there.
I suggest anyone else with a concern about this issue email smugmug support and ask them to consider blocking access from these domains site wide.
But for now, it looks like there is nothing else smugmug is willing to do.
SmugMug Support Hero
1. I have a public web site, can I block some third party from accessing it?
No. Your site is public, so it's open to the world to see. It may be possible to detect abusive patterns (such as too much activity in a short period of time), and it may be possible to deny access to certain IP blocks, but overall, if your site is public, then anyone with internet access can download the public information from your web site. If you don't want everyone to be able to access your site, you need to make it either unlisted or password-protected.
2. Some third-party is messing up my public web site's analytics. What can I do?
There's not too much you can do. Again, with a public web site, your analytics data is a product of public access to your web site. High end web analytics can sometimes remove activity that looks like automated crawling (when it isn't clearly identified as a crawler), but basic analytics tools like GA/SmugMug stats/statcounter/etc don't. Distinguishing human from shady bot activity is a very difficult problem.
3. What is referrer spam? What can I do about it?
There's a good description here: http://www.incapsula.com/blog/semalt-botnet-spam.html . Some bad actors attempt to raise the SEO profile of their own site by creating lots of redirection links from their site to other sites, hoping that appearing in the analytics logs will benefit themselves if the analytics logs are public. SmugMug's stats are not public so this does them no good. They are merely acting as an annoyance.
4. What should I do about Semalt?
My recommendation is to do nothing about them and to go out and shoot more photos. I don't see Semalt appearing in SmugMug's stats referers, although if you do see them, please PM me and I will take a look. If they are appearing in other stats you use such as GA or Statcounter, you should talk to those organizations about filtering them out. Refer is a voluntary header, so any client can send anything they want in the referer header--thus it's one that is easily polluted with bad data. They do have a removal tool at http://semalt.com/project_crawler.php , which may or may not be effective.
Images in the Backcountry
My SmugMug Customizations | Adding CSS to Your Site | SEO for the Photographer | Locate Your Page/Widget Number | SmugMug Help Desk
Buttons-for-website.com
Instagram
Twitter
Try point 2 here it might help you to stop semalt and buttons-for-website
http://www.ohow.co/block-referrer-spam-list/
In the last week this site, social-buttons.com has turned up in Analytics
Instagram
Twitter