Screen-scrape clubphoto.com - Part 2
tyler
Registered Users Posts: 5 Beginner grinner
OK, I'm having a hard time posting this as I didn't know I needed to escape the HTML, so here's my 2nd try:
After over 5 years at clubphoto.com, I'm a smugmug convert. clubphoto was good back in the day as they were one of the first to allow full-size downloads, but they are VERY slow and haven't added a new feature in years. I doubt they could even spell "RSS". Anyway, enough bashing and on to the code.
I wanted to get all of my albums off clubphoto that I didn't have locally, so I wrote this bash shell script last night. You still have to grab the URL of each album (took me 10 minutes to get the URL for 50 albums), but once you have them in the array at the beginning, it will create a local folder for each album name and download the photos into that folder.
Hope this helps somone...
#!/bin/bash
# URLs of the clubphoto albums you want to copy
url[1 ]=http://members3.clubphoto.com/tyler256499/3736536/owner-63ee.phtml
url[2 ]=http://members3.clubphoto.com/tyler256499/3627315/owner-63ee.phtml
url[3 ]=http://members3.clubphoto.com/tyler256499/3303128/owner-63ee.phtml
# loop over albums in the url array
for index in 1 2 3
do
echo "${url[index]}"
wget -O temp.html ${url[index]}
# get the title
TITLE=`egrep "((\<TITLE\>)(.*)(\<.+\>))" temp.html | sed s:\<TITLE\>::g | sed s:\<\/TITLE\>::g`
echo $TITLE
# make a new directory using the title
mkdir "$TITLE"
cd "$TITLE"
# get the imgage ID and image title and write them to t line in a file separated by a :
egrep "new pObj(.+)" ../temp.html | awk -F , '{print $2 ":"$5}' | sed s/\"//g > img_ids.txt
echo $line
cat img_ids.txt |
while read line
do
# get the id and name of the image
imgID=`echo $line | awk -F : '{print $1}'`
imgName=`echo $line | awk -F : '{print $2}'`
pwd
echo $imgID
echo $imgName
# get the image and save it using the image name
wget -O "$imgName.jpg" "http://members3.clubphoto.com/_cgi-bin/getImage.pl?imgID="$imgID
done
cd ..
done
After over 5 years at clubphoto.com, I'm a smugmug convert. clubphoto was good back in the day as they were one of the first to allow full-size downloads, but they are VERY slow and haven't added a new feature in years. I doubt they could even spell "RSS". Anyway, enough bashing and on to the code.
I wanted to get all of my albums off clubphoto that I didn't have locally, so I wrote this bash shell script last night. You still have to grab the URL of each album (took me 10 minutes to get the URL for 50 albums), but once you have them in the array at the beginning, it will create a local folder for each album name and download the photos into that folder.
Hope this helps somone...
#!/bin/bash
# URLs of the clubphoto albums you want to copy
url[1 ]=http://members3.clubphoto.com/tyler256499/3736536/owner-63ee.phtml
url[2 ]=http://members3.clubphoto.com/tyler256499/3627315/owner-63ee.phtml
url[3 ]=http://members3.clubphoto.com/tyler256499/3303128/owner-63ee.phtml
# loop over albums in the url array
for index in 1 2 3
do
echo "${url[index]}"
wget -O temp.html ${url[index]}
# get the title
TITLE=`egrep "((\<TITLE\>)(.*)(\<.+\>))" temp.html | sed s:\<TITLE\>::g | sed s:\<\/TITLE\>::g`
echo $TITLE
# make a new directory using the title
mkdir "$TITLE"
cd "$TITLE"
# get the imgage ID and image title and write them to t line in a file separated by a :
egrep "new pObj(.+)" ../temp.html | awk -F , '{print $2 ":"$5}' | sed s/\"//g > img_ids.txt
echo $line
cat img_ids.txt |
while read line
do
# get the id and name of the image
imgID=`echo $line | awk -F : '{print $1}'`
imgName=`echo $line | awk -F : '{print $2}'`
pwd
echo $imgID
echo $imgName
# get the image and save it using the image name
wget -O "$imgName.jpg" "http://members3.clubphoto.com/_cgi-bin/getImage.pl?imgID="$imgID
done
cd ..
done
0
Comments
Thanks,
Tyler