Options

direct imageID flooding?

rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
edited April 30, 2006 in SmugMug Support
I know that there's not much you can do about these things, but I'll let you know of it anyways, because it's a lot weirder than from what I've heard here before.
I've got my backup galleries where I put all my images - these are private+password protected, but by design the individual pictures can be accessed directly (I'm aware that I can turn that off).
Today I noticed my stats have gone a littlebit higher and I've seen that someone hit 5 imageID's in my January backup gallery that've been uploaded since January 11th. I've not given the imageID's to anyone and that's the count from now (must've happened max. within the last 48h as I check my stats regulary):
51992238 - 24 medium hits
51992275 - 21 medium hits
51992318 - 1223 medium hits
51992365 - 1038 medium hits
51992455 - 772 medium hits

I also noticed that there are a lot of images in a row having only one medium hit and I would say that this could've been me, but firstly I didn't browse the gallery yet and secondly there's only one thumb hit for almost all pictures (I guess that's the thumb shown on the statspage).

To me this looks like someone's doing image harvesting for whatever reasons by sequential going through the SM numbers starting at a specific ID. I don't know where the immense of amounts comes from, but I guess after the harvest by the script the scripter goes over the results and sends out a selected list of links to his buddies or via spam which then multiplies as they forward it too.
Can't explain this massive hit amount otherwise. Even if it didn't happen within 48h - the maximum would be within 7 days as the images didn't exist before. That's scary.

I don't know what you're capable of looking up, but I think it would be worth it at least to check all the imageID's in between mine if those were affected too. If you need any more information - I'll try to help you with what I can. I'm interested in measures against this kind of abuse - maybe you could integrate some kind of anti-harvest regulations for users that are not logged in by limiting the amount of direct link access per minute and IP.

Thanks for having a look at this,
Sebastian

PS: The content of the hitted images isn't that exciting - no bikini girls or something like that - it's more of a bit creepy being a staircase and hallway in an old building. :dunno
Sebastian
SmugMug Support Hero

Comments

  • Options
    AndyAndy Registered Users Posts: 50,016 Major grins
    edited January 18, 2006

    I don't know what you're capable of looking up,

    We have no capability to see who/what is going after your images - but you might check via statcounter or google analytics to see who's been hitting your site?

    Thanks for the heads up, Sebastian.
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited January 18, 2006
    Andy wrote:
    We have no capability to see who/what is going after your images - but you might check via statcounter or google analytics to see who's been hitting your site?
    It's not so much the who, but more of the how it's done. I don't care about the traffic so much about at the moment. This also could affect others who care about it more.
    I don't even think that my page was never hit by the people doing this (statcounter or Analytics are of no use when direct links are involved) as my theory is that they simply get the images by sequentially crawling through images in a specific range. Here's an example:
    They could simply feed a script with the desired image range - let's say from 51,990,000 up to 52,000,000. The script then downloads all images that've direct links enabled using this url:
    http://smugmug.com/photos/xxxxxxxx-M.jpg with x being the ID (like 51990000)
    Now out of the 10,000 ID's they perhaps end up with 2000 images which had direct links allowed and were still valid links.

    That's just the behavior that is followed by the design of smugmug with the sequential numbering without regard to username meaning you can access a image ID from every subdomain and always get the same image you were looking for.
    I just wanted to bring this possible abuse to your attention. It's hard if not impossible to do anything against this behavior other than turning of external links on the user level for galleries. There's just the possebility (that I'm aware of) to limit the amount of direct hits per hour by a certain IP - maybe this is something you guys could think over of integrating, even though it can be bypassed by using proxies.

    Sebastian
    Sebastian
    SmugMug Support Hero
  • Options
    bwgbwg Registered Users, Retired Mod Posts: 2,119 SmugMug Employee
    edited January 18, 2006
    ...
    Today I noticed my stats have gone a littlebit higher and I've seen that someone hit 5 imageID's in my January backup gallery that've been uploaded since January 11th. I've not given the imageID's to anyone and that's the count from now (must've happened max. within the last 48h as I check my stats regulary):
    ...
    To me this looks like someone's doing image harvesting for whatever reasons by sequential going through the SM numbers starting at a specific ID. I don't know where the immense of amounts comes from, but I guess after the harvest by the script the scripter goes over the results and sends out a selected list of links to his buddies or via spam which then multiplies as they forward it too.
    Can't explain this massive hit amount otherwise. Even if it didn't happen within 48h - the maximum would be within 7 days as the images didn't exist before. That's scary.
    holy smokes! i had the same type of thing start with my stuff at the end of dec beginning of jan. i have galleries that have conversion %'s near 1000%. My total conversion% for december was 122%. That's ridiculous.

    These hits are for many random galleries too...stuff that nobody i know of has any reason to be in. It got so bad that through the first 8 days of january, i had already gone through 1.6GB of bandwidth. I should have nowhere near that much traffic.

    I eventually had to disable remote linking for the majority of my galleries and that has dramatically slowed the bleeding. I'd be very curious to see if smugmug traffic increased starting at the end of Dec.


    Andy, google analytics or statcounter wont give us any info on images accessed directly. I investigated as best i could with google analytics but i still had a big missing piece. I even tried doing a google search for sites that linked to mine, but i still couldnt find anything....which makes Sebastians theory even more interesting.
    Pedal faster
  • Options
    flyingdutchieflyingdutchie Registered Users Posts: 1,286 Major grins
    edited January 18, 2006
    I know that there's not much you can do about these things, but I'll let you know of it anyways, because it's a lot weirder than from what I've heard here before.
    I've got my backup galleries where I put all my images - these are private+password protected, but by design the individual pictures can be accessed directly (I'm aware that I can turn that off).
    Today I noticed my stats have gone a littlebit higher and I've seen that someone hit 5 imageID's in my January backup gallery that've been uploaded since January 11th. I've not given the imageID's to anyone and that's the count from now (must've happened max. within the last 48h as I check my stats regulary):
    51992238 - 24 medium hits
    51992275 - 21 medium hits
    51992318 - 1223 medium hits
    51992365 - 1038 medium hits
    51992455 - 772 medium hits

    I also noticed that there are a lot of images in a row having only one medium hit and I would say that this could've been me, but firstly I didn't browse the gallery yet and secondly there's only one thumb hit for almost all pictures (I guess that's the thumb shown on the statspage).

    To me this looks like someone's doing image harvesting for whatever reasons by sequential going through the SM numbers starting at a specific ID. I don't know where the immense of amounts comes from, but I guess after the harvest by the script the scripter goes over the results and sends out a selected list of links to his buddies or via spam which then multiplies as they forward it too.
    Can't explain this massive hit amount otherwise. Even if it didn't happen within 48h - the maximum would be within 7 days as the images didn't exist before. That's scary.

    I don't know what you're capable of looking up, but I think it would be worth it at least to check all the imageID's in between mine if those were affected too. If you need any more information - I'll try to help you with what I can. I'm interested in measures against this kind of abuse - maybe you could integrate some kind of anti-harvest regulations for users that are not logged in by limiting the amount of direct link access per minute and IP.

    Thanks for having a look at this,
    Sebastian

    PS: The content of the hitted images isn't that exciting - no bikini girls or something like that - it's more of a bit creepy being a staircase and hallway in an old building. ne_nau.gif

    I had the same happening to one of my images. Just that image alone, at some point, accounted for over a quarter of my total bandwidth.

    I 'fixed' this by making a copy of the 'offending' image and then deleting the offending image. Now someone is seeing a broken image-link :)
    I can't grasp the notion of time.

    When I hear the earth will melt into the sun,
    in two billion years,
    all I can think is:
        "Will that be on a Monday?"
    ==========================
    http://www.streetsofboston.com
    http://blog.antonspaans.com
  • Options
    asdasd Registered Users Posts: 115 Major grins
    edited January 18, 2006
    I had the same happening to one of my images. Just that image alone, at some point, accounted for over a quarter of my total bandwidth.

    I 'fixed' this by making a copy of the 'offending' image and then deleting the offending image. Now someone is seeing a broken image-link :)

    I have something like this going on too, but with an album I deleted...back in November. I just looked at my last few months of stats and nothing shows up for the album in December but somehow it's gotten 115 hits on originals within it so far this month (this was a private album that only 2-3 people had links to). Since the album has been deleted I can't see which images are racking up the hits. headscratch.gif :uhoh ...not to mention the weirdness of a deleted album garnering traffic in the first place.
  • Options
    bwgbwg Registered Users, Retired Mod Posts: 2,119 SmugMug Employee
    edited January 19, 2006
    Sebastian...i noticed that you have google analytics running on your site. I do as well.

    Makes me wonder if it may be google doing some sort of image crawling...
    Pedal faster
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited January 19, 2006
    Thanks you guys for chiming in on this. I think it's very important that the people are just aware of the problem and take measures against their precious private images. I can imagine that it's fun to pick a random ID out of the growing SM stock and see what's behind it. It's like playing those games where you buy a fortune and then scratch off the cover to see what you got - most of the time just a boring photo or worse a blank (external linking not allowed), but there are times when you hit the jackpot in unrevealing a very delicate image of someone else, because he had the external linking turned on. Of course this is than shared with the rest of the community. This is very comprehenseable for me and I also sometimes share a link to a video I found or someone else send me with a couple of friends.

    Today I noticed another ~500 medium hits on the 3 very popular images of mine and turned the external linking off.
    bigwebguy wrote:
    Sebastian...i noticed that you have google analytics running on your site. I do as well.

    Makes me wonder if it may be google doing some sort of image crawling...
    I don't think this is the reason. From my understanding the usual webbot only crawls what is linked somewhere and also doesn't care for images. Then there's for example google image which provides the image search, but I think they also have to work by following links to find pictures - that's why you almost always find only smugmug thumbnails on google image, because they don't seem to go on single image level very often.
    You should also be able to find the image by searching for the imageID in google image when it would've been a google image bot.

    Don't think google analytics does any crawling at all. From my understanding they've got enough data already with the script that is called from every page. You only get results for pages that are calling the scripts - so they depend on linking, too.

    Hope this makes sense,
    Sebastian
    Sebastian
    SmugMug Support Hero
  • Options
    bwgbwg Registered Users, Retired Mod Posts: 2,119 SmugMug Employee
    edited January 19, 2006
    Don't think google analytics does any crawling at all. From my understanding they've got enough data already with the script that is called from every page. You only get results for pages that are calling the scripts - so they depend on linking, too.

    Hope this makes sense,
    Sebastian
    I'm not saying that google analytics tool itself is doing any crawling...just pure unsubstantiated speculation on my part that google may be doing further exploration of sites w/their analytics tools installed. The only basis for me to say this is the fact that both you and I have analytics installed...so its just my own conspiracy theory.

    I'm just trying to make heads or tails of this:
    53053680-M.jpg

    i have 30 galleries that look like that.
    Pedal faster
  • Options
    bwgbwg Registered Users, Retired Mod Posts: 2,119 SmugMug Employee
    edited January 19, 2006
    and to add to my conspiracy theory....this trend didnt start happening until november...which is when i added google analytics to my site.

    month
    bandwidth
    Jul
    517MB
    Aug
    458MB
    Sep
    413MB
    Oct
    629MB
    Nov
    2.7GB
    Dec
    3.5GB
    Jan
    2.3GB
    Pedal faster
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited February 21, 2006
    bigwebguy wrote:
    just pure unsubstantiated speculation on my part that google may be doing further exploration of sites w/their analytics tools installed. The only basis for me to say this is the fact that both you and I have analytics installed...so its just my own conspiracy theory.

    I'm just trying to make heads or tails of this:
    53053680-Th.jpg

    i have 30 galleries that look like that.

    {added from the next post}
    month
    bandwidth
    Jul
    517MB
    Aug
    458MB
    Sep
    413MB
    Oct
    629MB
    Nov
    2.7GB
    Dec
    3.5GB
    Jan
    2.3GB
    I'm having maybe a handful galleries showing that behaviour, but I've at least one that looked this way before I installed the google analytics even though on a smaller scale.
    I installed my google analytics in mid november too. This is how my stats look:
    month--bandwith
    Jul
    304MB
    Aug
    167MB
    Sep
    226MB
    Oct
    339MB
    Nov
    567MB
    Dec
    616MB
    Jan
    679MB
    Feb
    328MB (so far until 21th)

    There's a increase of bandwith since I installed google analytics, but it's not that much as for you. Also I credit this more to the overall popularity of my site. I'm having an increased amout of google searches (site rank should be pretty much independent from analytics or this would sooner or later deminish google credibility drasticly and they won't risk that). I had also some private galleries in December that went pretty well as my father spread the link to some relatives over in the US and Canada. mwink.gif

    Let's see if someone else chimes in.

    Sebastian
    Sebastian
    SmugMug Support Hero
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited April 11, 2006
    bigwebguy wrote:
    I'm not saying that google analytics tool itself is doing any crawling...just pure unsubstantiated speculation on my part that google may be doing further exploration of sites w/their analytics tools installed. The only basis for me to say this is the fact that both you and I have analytics installed...so its just my own conspiracy theory.

    I'm just trying to make heads or tails of this:
    53053680-Th.jpg

    i have 30 galleries that look like that.
    As an update I've to say that this is the first month that around 30 galleries of mine look exactly like that. Practical all my public galleries are affected, but you only see it at the ones that are a bit older. Let's have closer look at two examples:
    64023450-M.jpg
    This is one of my older standard dailies galleries with 31 photos. Note that the graph is wrong as there are 223 small hits and only 35 tiny / 2 thumb-hits! The small hits are distributed more or less equal throughout all the photos. No medium/large hits could indicate that the gallery was never browsed by a regular user, as the standard smugmug style uses medium pictures. It's possible that the users had traditional view and small-preferred set, but how big is the chance of that to happen with only 35 tiny-thumb hits (all thumb would have displayed all 31 pictures for only one user visiting)?

    Now let's have a look at another gallery only consisting of one picture:
    64023458-M.jpg
    Looks pretty similar, don't you think? This time the gallery only has one picture - the hover again reveals that we the graph is wrong again - we've got our 220 small hits again - on one single picture with one 9 thumb views!

    Same thing with the rest of my galleries - all have at least 220 small hits (galleries getting normal hits usually have more, but never less than 220) and the faulty graphs, when there aren't more thumb views.
    This behaviour started this months and results in that I already have 380MB traffic even though not even two weeks passed. In March 2006 I had 450MB traffic in total and my maximum was 680MB in January 2006 with a lot of relatives chiming in to visit a family gallery.

    My theory is that it isn't Google Analytics (had this since October of last year or so and now the pecularities start really), but some sort of image search crawler like google images that comes by a couple of times and gets the small pictures. Somehow it stops at 220 small hits per gallery (independent from the number of images in the gallery) and maybe comes back later. ne_nau.gif

    Discuss! :D
    And smugmug - please have a look at those faulty stats.mwink.gif


    Cheers,
    Sebastian 1drink.gif
    Sebastian
    SmugMug Support Hero
  • Options
    SheafSheaf Registered Users, SmugMug Product Team Posts: 775 SmugMug Employee
    edited April 11, 2006
    And smugmug - please have a look at those faulty stats.mwink.gif


    Cheers,
    Sebastian 1drink.gif

    I'll take a look, Sebastian. I can't promise that I'll discover anything, but I'll do my best.
    SmugMug Product Manager
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited April 11, 2006
    Sheaf wrote:
    I'll take a look, Sebastian. I can't promise that I'll discover anything, but I'll do my best.
    Thanks - I know you guys always give your best! Very much appreciated. thumb.gif

    Sebastian

    PS: The gallery IDs of the example galleries are: 150986 and 462541.
    Sebastian
    SmugMug Support Hero
  • Options
    SheafSheaf Registered Users, SmugMug Product Team Posts: 775 SmugMug Employee
    edited April 11, 2006
    Thanks - I know you guys always give your best! Very much appreciated. thumb.gif

    Sebastian

    PS: The gallery IDs of the example galleries are: 150986 and 462541.

    Edit: I jumped to a stupid conclusion. Jump to my next post for an explanation.
    SmugMug Product Manager
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited April 11, 2006
    Sheaf wrote:
    Right away I noticed one thing: our alt text is often wrong on that page (the text that appears when you hover over a graph). The actual bars of the graphs appear to be correct, as determined by clicking on a graph and counting the actual number of hits for the photos in various sizes.
    I think the alt-text seems to be fine for all galleries I checked, but the graphs are way off. Just have a look at my Landscapes gallery (for which I posted the statgraph as an example that only consists of one photo: the small-size got hit 220 times, while the graph indicates 8 hits or so!

    Sebastian
    Sebastian
    SmugMug Support Hero
  • Options
    SheafSheaf Registered Users, SmugMug Product Team Posts: 775 SmugMug Employee
    edited April 11, 2006
    I think the alt-text seems to be fine for all galleries I checked, but the graphs are way off. Just have a look at my Landscapes gallery (for which I posted the statgraph as an example that only consists of one photo: the small-size got hit 220 times, while the graph indicates 8 hits or so!

    Sebastian

    Edit: Hmm... it seems I jumped to an incorrect conclusion. You caught me! =)

    It appears that the y-axis of each graph is scaled according to the number of thumbs, mediums, larges, or originals, but not correctly according to smalls. Does that make sense? So if "smalls" is the highest count, at least one other bar will be just as high since the scale is taken from the second highest in that case.

    This is actually a very old bug. When we initially launched the service, "small" didn't exist. When it was added, it was apparently coded incorrectly into the creation of the graphs on that page. Since this isn't a huge bug, I'm not sure when it will get fixed, but we're now aware of it. Thanks Sebastian!
    SmugMug Product Manager
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited April 11, 2006
    Sheaf wrote:
    Did you make any changes to that gallery? Deleting a photo, replacing a photo, etc.?
    I'm not 100% percent sure with the Landscapes gallery, but my April 2005 daily gallery definately didn't get changed at all since May 2005 - again only 37 thumb+tiny-hits and 223 small ones while the stat cuts everything above 37. Note the total hitcount of 260, exorbitant conversion ratio and whopping 4MB traffic - that shouldn't be possible with the 37 small hits the graph indicates and the individual image stats seem to add up to the total hitcount, too.

    Sebastian

    EDIT: The above said 100% applies to at least all galleries from here if you need more examples. Not enough? Just have a look at those (September is an exception, because here the medium-hits exceed the small ones resulting in a correct graph) - I might have changed the caption and keywords in these ones though.
    Sebastian
    SmugMug Support Hero
  • Options
    onethumbonethumb Administrators Posts: 1,269 Major grins
    edited April 11, 2006
    As an update I've to say that this is the first month that around 30 galleries of mine look exactly like that. Practical all my public galleries are affected, but you only see it at the ones that are a bit older. Let's have closer look at two examples:
    64023450-M.jpg
    This is one of my older standard dailies galleries with 31 photos. Note that the graph is wrong as there are 223 small hits and only 35 tiny / 2 thumb-hits! The small hits are distributed more or less equal throughout all the photos. No medium/large hits could indicate that the gallery was never browsed by a regular user, as the standard smugmug style uses medium pictures. It's possible that the users had traditional view and small-preferred set, but how big is the chance of that to happen with only 35 tiny-thumb hits (all thumb would have displayed all 31 pictures for only one user visiting)?

    Now let's have a look at another gallery only consisting of one picture:
    64023458-M.jpg
    Looks pretty similar, don't you think? This time the gallery only has one picture - the hover again reveals that we the graph is wrong again - we've got our 220 small hits again - on one single picture with one 9 thumb views!

    Same thing with the rest of my galleries - all have at least 220 small hits (galleries getting normal hits usually have more, but never less than 220) and the faulty graphs, when there aren't more thumb views.
    This behaviour started this months and results in that I already have 380MB traffic even though not even two weeks passed. In March 2006 I had 450MB traffic in total and my maximum was 680MB in January 2006 with a lot of relatives chiming in to visit a family gallery.

    My theory is that it isn't Google Analytics (had this since October of last year or so and now the pecularities start really), but some sort of image search crawler like google images that comes by a couple of times and gets the small pictures. Somehow it stops at 220 small hits per gallery (independent from the number of images in the gallery) and maybe comes back later. ne_nau.gif

    Discuss! :D
    And smugmug - please have a look at those faulty stats.mwink.gif


    Cheers,
    Sebastian 1drink.gif

    All of the major search engines crawl SmugMug constantly. 24 hours per day. No rest for us, and it's actually a decently large drain on our resources.

    I'm positive this is just the various search engines crawling your stuff. No biggie, since it's not destroying your bandwidth or anything.

    Don
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited April 12, 2006
    onethumb wrote:
    I'm positive this is just the various search engines crawling your stuff. No biggie, since it's not destroying your bandwidth or anything.
    I'm not complaining and just added my recent ideas on this to the thread - especially the pecularity with the minimum 220 small-hits per gallery is very interesting. Maybe someone has more inside into search engines and can comment on this.

    Hope you'll be successful in hunting down the issue with the cut-off graphs as probably more people will stumble over it.

    Sebastian
    Sebastian
    SmugMug Support Hero
  • Options
    rainforest1155rainforest1155 Registered Users Posts: 4,566 Major grins
    edited April 30, 2006
    Any update on this? Do you need more help figuring out why the graphs are faulty? I still have a lot of galleries with more actual small-hits than are shown in the small-bar from the graph, because the y-axis is cut.

    Thanks,
    Sebastian
    Sebastian
    SmugMug Support Hero
Sign In or Register to comment.