direct imageID flooding?
rainforest1155
Registered Users Posts: 4,566 Major grins
I know that there's not much you can do about these things, but I'll let you know of it anyways, because it's a lot weirder than from what I've heard here before.
I've got my backup galleries where I put all my images - these are private+password protected, but by design the individual pictures can be accessed directly (I'm aware that I can turn that off).
Today I noticed my stats have gone a littlebit higher and I've seen that someone hit 5 imageID's in my January backup gallery that've been uploaded since January 11th. I've not given the imageID's to anyone and that's the count from now (must've happened max. within the last 48h as I check my stats regulary):
51992238 - 24 medium hits
51992275 - 21 medium hits
51992318 - 1223 medium hits
51992365 - 1038 medium hits
51992455 - 772 medium hits
I also noticed that there are a lot of images in a row having only one medium hit and I would say that this could've been me, but firstly I didn't browse the gallery yet and secondly there's only one thumb hit for almost all pictures (I guess that's the thumb shown on the statspage).
To me this looks like someone's doing image harvesting for whatever reasons by sequential going through the SM numbers starting at a specific ID. I don't know where the immense of amounts comes from, but I guess after the harvest by the script the scripter goes over the results and sends out a selected list of links to his buddies or via spam which then multiplies as they forward it too.
Can't explain this massive hit amount otherwise. Even if it didn't happen within 48h - the maximum would be within 7 days as the images didn't exist before. That's scary.
I don't know what you're capable of looking up, but I think it would be worth it at least to check all the imageID's in between mine if those were affected too. If you need any more information - I'll try to help you with what I can. I'm interested in measures against this kind of abuse - maybe you could integrate some kind of anti-harvest regulations for users that are not logged in by limiting the amount of direct link access per minute and IP.
Thanks for having a look at this,
Sebastian
PS: The content of the hitted images isn't that exciting - no bikini girls or something like that - it's more of a bit creepy being a staircase and hallway in an old building. :dunno
I've got my backup galleries where I put all my images - these are private+password protected, but by design the individual pictures can be accessed directly (I'm aware that I can turn that off).
Today I noticed my stats have gone a littlebit higher and I've seen that someone hit 5 imageID's in my January backup gallery that've been uploaded since January 11th. I've not given the imageID's to anyone and that's the count from now (must've happened max. within the last 48h as I check my stats regulary):
51992238 - 24 medium hits
51992275 - 21 medium hits
51992318 - 1223 medium hits
51992365 - 1038 medium hits
51992455 - 772 medium hits
I also noticed that there are a lot of images in a row having only one medium hit and I would say that this could've been me, but firstly I didn't browse the gallery yet and secondly there's only one thumb hit for almost all pictures (I guess that's the thumb shown on the statspage).
To me this looks like someone's doing image harvesting for whatever reasons by sequential going through the SM numbers starting at a specific ID. I don't know where the immense of amounts comes from, but I guess after the harvest by the script the scripter goes over the results and sends out a selected list of links to his buddies or via spam which then multiplies as they forward it too.
Can't explain this massive hit amount otherwise. Even if it didn't happen within 48h - the maximum would be within 7 days as the images didn't exist before. That's scary.
I don't know what you're capable of looking up, but I think it would be worth it at least to check all the imageID's in between mine if those were affected too. If you need any more information - I'll try to help you with what I can. I'm interested in measures against this kind of abuse - maybe you could integrate some kind of anti-harvest regulations for users that are not logged in by limiting the amount of direct link access per minute and IP.
Thanks for having a look at this,
Sebastian
PS: The content of the hitted images isn't that exciting - no bikini girls or something like that - it's more of a bit creepy being a staircase and hallway in an old building. :dunno
Sebastian
SmugMug Support Hero
SmugMug Support Hero
0
Comments
We have no capability to see who/what is going after your images - but you might check via statcounter or google analytics to see who's been hitting your site?
Thanks for the heads up, Sebastian.
Portfolio • Workshops • Facebook • Twitter
I don't even think that my page was never hit by the people doing this (statcounter or Analytics are of no use when direct links are involved) as my theory is that they simply get the images by sequentially crawling through images in a specific range. Here's an example:
They could simply feed a script with the desired image range - let's say from 51,990,000 up to 52,000,000. The script then downloads all images that've direct links enabled using this url:
http://smugmug.com/photos/xxxxxxxx-M.jpg with x being the ID (like 51990000)
Now out of the 10,000 ID's they perhaps end up with 2000 images which had direct links allowed and were still valid links.
That's just the behavior that is followed by the design of smugmug with the sequential numbering without regard to username meaning you can access a image ID from every subdomain and always get the same image you were looking for.
I just wanted to bring this possible abuse to your attention. It's hard if not impossible to do anything against this behavior other than turning of external links on the user level for galleries. There's just the possebility (that I'm aware of) to limit the amount of direct hits per hour by a certain IP - maybe this is something you guys could think over of integrating, even though it can be bypassed by using proxies.
Sebastian
SmugMug Support Hero
These hits are for many random galleries too...stuff that nobody i know of has any reason to be in. It got so bad that through the first 8 days of january, i had already gone through 1.6GB of bandwidth. I should have nowhere near that much traffic.
I eventually had to disable remote linking for the majority of my galleries and that has dramatically slowed the bleeding. I'd be very curious to see if smugmug traffic increased starting at the end of Dec.
Andy, google analytics or statcounter wont give us any info on images accessed directly. I investigated as best i could with google analytics but i still had a big missing piece. I even tried doing a google search for sites that linked to mine, but i still couldnt find anything....which makes Sebastians theory even more interesting.
I had the same happening to one of my images. Just that image alone, at some point, accounted for over a quarter of my total bandwidth.
I 'fixed' this by making a copy of the 'offending' image and then deleting the offending image. Now someone is seeing a broken image-link
When I hear the earth will melt into the sun,
in two billion years,
all I can think is:
"Will that be on a Monday?"
==========================
http://www.streetsofboston.com
http://blog.antonspaans.com
I have something like this going on too, but with an album I deleted...back in November. I just looked at my last few months of stats and nothing shows up for the album in December but somehow it's gotten 115 hits on originals within it so far this month (this was a private album that only 2-3 people had links to). Since the album has been deleted I can't see which images are racking up the hits. :uhoh ...not to mention the weirdness of a deleted album garnering traffic in the first place.
Makes me wonder if it may be google doing some sort of image crawling...
Today I noticed another ~500 medium hits on the 3 very popular images of mine and turned the external linking off.
I don't think this is the reason. From my understanding the usual webbot only crawls what is linked somewhere and also doesn't care for images. Then there's for example google image which provides the image search, but I think they also have to work by following links to find pictures - that's why you almost always find only smugmug thumbnails on google image, because they don't seem to go on single image level very often.
You should also be able to find the image by searching for the imageID in google image when it would've been a google image bot.
Don't think google analytics does any crawling at all. From my understanding they've got enough data already with the script that is called from every page. You only get results for pages that are calling the scripts - so they depend on linking, too.
Hope this makes sense,
Sebastian
SmugMug Support Hero
I'm just trying to make heads or tails of this:
i have 30 galleries that look like that.
month
bandwidth
Jul
517MB
Aug
458MB
Sep
413MB
Oct
629MB
Nov
2.7GB
Dec
3.5GB
Jan
2.3GB
I installed my google analytics in mid november too. This is how my stats look:
month--bandwith
Jul
304MB
Aug
167MB
Sep
226MB
Oct
339MB
Nov
567MB
Dec
616MB
Jan
679MB
Feb
328MB (so far until 21th)
There's a increase of bandwith since I installed google analytics, but it's not that much as for you. Also I credit this more to the overall popularity of my site. I'm having an increased amout of google searches (site rank should be pretty much independent from analytics or this would sooner or later deminish google credibility drasticly and they won't risk that). I had also some private galleries in December that went pretty well as my father spread the link to some relatives over in the US and Canada.
Let's see if someone else chimes in.
Sebastian
SmugMug Support Hero
This is one of my older standard dailies galleries with 31 photos. Note that the graph is wrong as there are 223 small hits and only 35 tiny / 2 thumb-hits! The small hits are distributed more or less equal throughout all the photos. No medium/large hits could indicate that the gallery was never browsed by a regular user, as the standard smugmug style uses medium pictures. It's possible that the users had traditional view and small-preferred set, but how big is the chance of that to happen with only 35 tiny-thumb hits (all thumb would have displayed all 31 pictures for only one user visiting)?
Now let's have a look at another gallery only consisting of one picture:
Looks pretty similar, don't you think? This time the gallery only has one picture - the hover again reveals that we the graph is wrong again - we've got our 220 small hits again - on one single picture with one 9 thumb views!
Same thing with the rest of my galleries - all have at least 220 small hits (galleries getting normal hits usually have more, but never less than 220) and the faulty graphs, when there aren't more thumb views.
This behaviour started this months and results in that I already have 380MB traffic even though not even two weeks passed. In March 2006 I had 450MB traffic in total and my maximum was 680MB in January 2006 with a lot of relatives chiming in to visit a family gallery.
My theory is that it isn't Google Analytics (had this since October of last year or so and now the pecularities start really), but some sort of image search crawler like google images that comes by a couple of times and gets the small pictures. Somehow it stops at 220 small hits per gallery (independent from the number of images in the gallery) and maybe comes back later.
Discuss!
And smugmug - please have a look at those faulty stats.
Cheers,
Sebastian
SmugMug Support Hero
I'll take a look, Sebastian. I can't promise that I'll discover anything, but I'll do my best.
Sebastian
PS: The gallery IDs of the example galleries are: 150986 and 462541.
SmugMug Support Hero
Edit: I jumped to a stupid conclusion. Jump to my next post for an explanation.
Sebastian
SmugMug Support Hero
Edit: Hmm... it seems I jumped to an incorrect conclusion. You caught me!
It appears that the y-axis of each graph is scaled according to the number of thumbs, mediums, larges, or originals, but not correctly according to smalls. Does that make sense? So if "smalls" is the highest count, at least one other bar will be just as high since the scale is taken from the second highest in that case.
This is actually a very old bug. When we initially launched the service, "small" didn't exist. When it was added, it was apparently coded incorrectly into the creation of the graphs on that page. Since this isn't a huge bug, I'm not sure when it will get fixed, but we're now aware of it. Thanks Sebastian!
Sebastian
EDIT: The above said 100% applies to at least all galleries from here if you need more examples. Not enough? Just have a look at those (September is an exception, because here the medium-hits exceed the small ones resulting in a correct graph) - I might have changed the caption and keywords in these ones though.
SmugMug Support Hero
All of the major search engines crawl SmugMug constantly. 24 hours per day. No rest for us, and it's actually a decently large drain on our resources.
I'm positive this is just the various search engines crawling your stuff. No biggie, since it's not destroying your bandwidth or anything.
Don
Hope you'll be successful in hunting down the issue with the cut-off graphs as probably more people will stumble over it.
Sebastian
SmugMug Support Hero
Thanks,
Sebastian
SmugMug Support Hero