Our new Indie Games subforum is now open for business in G&T. Go and check it out, you might land a code for a free game. If you're developing an indie game and want to post about it, follow these directions. If you don't, he'll break your legs! Hahaha! Seriously though.
Our rules have been updated and given their own forum. Go and look at them! They are nice, and there may be new ones that you didn't know about! Hooray for rules! Hooray for The System! Hooray for Conforming!
Automatically downloading a series of images from a website
So hopefully the title is clear enough to get you in this thread... What I'm looking for is this. I'd like to create a collection of webcomics, purely for my own amusement and not for distribution or profit, but I'm having difficulty finding an efficient way of gathering the images. Here's exactly what I want to do:
1. Have this process or macro recognize the image I want to save (for easiness' sake, let's just say the PA strip.).
2. Save this image to a pre-designated folder.
3. Move to the next comic in the series, basically by clicking the "next comic" button or whatever the website has (all of the comics have an easy image link for clicking, no java screwiness.)
4. Repeat steps 1-3 until there is no "next comic" button, or the button does nothing.
For my own ease, I'd prefer a Mac-friendly way of doing this, but I can work with PC. I just made need some additional instructions for the PC side. I'm not too good with some of the technical aspects of PC's.
To let you know what I've tried, I've attempted to use the FireFox extension DownThemAll, but it's not really doing what I'd like it to do. If it's the only way, that's fine, but I may need a bit of assistance getting the extension to work.
The goal of all this is to eventually put these onto my iPhone or iPod Touch, and fill boring parts of the day. As I said, no profit or wide distribution. Purely my own amusement.
A combination of DownThemAll and AutoPager will probably do the job just fine. AutoPager is user-extensible so you can create rules for each individual comic, and once you load each page (it's basically appended to the current page) use DTA to download the images.
Wouldn't that eat up a lot of the web page's bandwidth?
The same amount as doing things manually, just over a shorter period of time. You can narrow specify what sections to load and what to admit via XPath (just like how you select the link). The penny-arcade.com comic page is 9KB, so if you just say that they started in 1998 and they've been going at it for about 9 years while maintaining an output of three comics a week that should come out to 12.3 megabytes of plain HTML (which should be compressed so in reality that number will be lower for bandwidth purposes).
Then add up all the images. You should really only allow DTA to download one image at a time to be polite. If it has a bandwidth limiter than I'd use that too and just be patient.
A simple *insert favorite scripting language here* script combined with wget would handle it much better than doing anything by hand. You just need to figure out how the page is written or the images named, if you can figure out the image naming process it's best to skip the page loading and get the image directly.
You're going to need some sort of macro software. I would recommend AutoHotKey, which I know for sure could do what you want with a little scripting, but you're on a Mac. Try QuicKeys, Keyboard Maestro or HotApp; although I haven't used any of them myself.
Also, if it's a popular webcomic you're after, search for a torrent of it. I found ones for Penny-Arcade and just downloaded those a while ago.
This is pretty easy using python/perl since the majority of webcomics index the images based on the day it was posted. So you just keep changing the image request based on the day you want. This allows you to grab all the images faster.
A combination of DownThemAll and AutoPager will probably do the job just fine. AutoPager is user-extensible so you can create rules for each individual comic, and once you load each page (it's basically appended to the current page) use DTA to download the images.
Thanks a ton! This suggestion worked wonders. It took a little bit of effort to get AutoPager to work, but once I did this process worked like a dream.
Posts
Then add up all the images. You should really only allow DTA to download one image at a time to be polite. If it has a bandwidth limiter than I'd use that too and just be patient.
actually here's a program to do it for you, with 945 supported comics and the ability to define custom ones: http://collector.skumleren.net/supported_comics.php?version=devel
Kekekekekeke.
the internets are changing!
Also, if it's a popular webcomic you're after, search for a torrent of it. I found ones for Penny-Arcade and just downloaded those a while ago.
Thanks a ton! This suggestion worked wonders. It took a little bit of effort to get AutoPager to work, but once I did this process worked like a dream.
Thanks a lot, guys!