Screen-scrape clubphoto.com

tylertyler Registered Users Posts: 5 Beginner grinner
First post messed up due to un-escaped HTML in my post. Here's a better version:
http://dgrin.com/showthread.php?t=21987

Comments

  • tylertyler Registered Users Posts: 5 Beginner grinner
    edited November 7, 2005
    Forgot to escape HTML :(
    #!/bin/bash

    # URLs of the clubphoto albums you want to copy
    url[1 ]=http://members3.clubphoto.com/tyler256499/3736536/owner-63ee.phtml
    url[2 ]=http://members3.clubphoto.com/tyler256499/3627315/owner-63ee.phtml
    url[3 ]=http://members3.clubphoto.com/tyler256499/3303128/owner-63ee.phtml

    # loop over albums in the url array
    for index in 1 2 3
    do
        echo "${url[index]}"

        wget -O temp.html ${url[index]}
        # get the title
        TITLE=`egrep "((\<TITLE\>)(.*)(\<.+\>))" temp.html | sed s:\<TITLE\>::g | sed s:\<\/TITLE\>::g`
        echo $TITLE
        # make a new directory using the title
        mkdir "$TITLE"
        cd "$TITLE"

        # get the imgage ID and image title and write them to t line in a file separated by a :
        egrep "new pObj(.+)" ../temp.html | awk -F , '{print $2 ":"$5}' | sed s/\"//g > img_ids.txt

        echo $line

        cat img_ids.txt |
        while read line
        do
            # get the id and name of the image
            imgID=`echo $line | awk -F : '{print $1}'`
            imgName=`echo $line | awk -F : '{print $2}'`
            pwd
            echo $imgID
            echo $imgName
            # get the image and save it using the image name
            wget -O "$imgName.jpg" "http://members3.clubphoto.com/_cgi-bin/getImage.pl?imgID="$imgID
        done
        cd ..
    done
  • AndyAndy Registered Users Posts: 50,016 Major grins
    edited November 7, 2005
    wave.gif Hi Tyler,

    Wonderful - thanks so much for posting this thumb.gif
Sign In or Register to comment.