Screen-scrape clubphoto.com - Part 2

tyler · November 7, 2005

OK, I'm having a hard time posting this as I didn't know I needed to escape the HTML, so here's my 2nd try:

After over 5 years at clubphoto.com, I'm a smugmug convert. clubphoto was good back in the day as they were one of the first to allow full-size downloads, but they are VERY slow and haven't added a new feature in years. I doubt they could even spell "RSS". Anyway, enough bashing and on to the code.

I wanted to get all of my albums off clubphoto that I didn't have locally, so I wrote this bash shell script last night. You still have to grab the URL of each album (took me 10 minutes to get the URL for 50 albums), but once you have them in the array at the beginning, it will create a local folder for each album name and download the photos into that folder.

Hope this helps somone...

#!/bin/bash

# URLs of the clubphoto albums you want to copy
url[1 ]=http://members3.clubphoto.com/tyler256499/3736536/owner-63ee.phtml
url[2 ]=http://members3.clubphoto.com/tyler256499/3627315/owner-63ee.phtml
url[3 ]=http://members3.clubphoto.com/tyler256499/3303128/owner-63ee.phtml

# loop over albums in the url array
for index in 1 2 3
do
    echo "${url[index]}"

    wget -O temp.html ${url[index]}
    # get the title
    TITLE=`egrep "((\<TITLE\>)(.*)(\<.+\>))" temp.html | sed s:\<TITLE\>::g | sed s:\<\/TITLE\>::g`
    echo $TITLE
    # make a new directory using the title
    mkdir "$TITLE"
    cd "$TITLE"

    # get the imgage ID and image title and write them to t line in a file separated by a :
    egrep "new pObj(.+)" ../temp.html | awk -F , '{print $2 ":"$5}' | sed s/\"//g > img_ids.txt

    echo $line

    cat img_ids.txt |
    while read line
    do
        # get the id and name of the image
        imgID=`echo $line | awk -F : '{print $1}'`
        imgName=`echo $line | awk -F : '{print $2}'`
        pwd
        echo $imgID
        echo $imgName
        # get the image and save it using the image name
        wget -O "$imgName.jpg" "http://members3.clubphoto.com/_cgi-bin/getImage.pl?imgID="$imgID
    done
    cd ..
done

tyler · November 7, 2005

FYI, I tested this on Linux (Redhat EL 4) and Windows XP with cygwin installed.

Thanks,
Tyler

Screen-scrape clubphoto.com - Part 2

Comments