As was foretold, we've added advertisements to the forums! If you have questions, or if you encounter any bugs, please visit this thread: https://forums.penny-arcade.com/discussion/240191/forum-advertisement-faq-and-reports-thread/
Options

Automatically downloading a series of images from a website

hoodie13hoodie13 punch broRegistered User regular
edited August 2009 in Help / Advice Forum
So hopefully the title is clear enough to get you in this thread... What I'm looking for is this. I'd like to create a collection of webcomics, purely for my own amusement and not for distribution or profit, but I'm having difficulty finding an efficient way of gathering the images. Here's exactly what I want to do:

1. Have this process or macro recognize the image I want to save (for easiness' sake, let's just say the PA strip.).
2. Save this image to a pre-designated folder.
3. Move to the next comic in the series, basically by clicking the "next comic" button or whatever the website has (all of the comics have an easy image link for clicking, no java screwiness.)
4. Repeat steps 1-3 until there is no "next comic" button, or the button does nothing.

For my own ease, I'd prefer a Mac-friendly way of doing this, but I can work with PC. I just made need some additional instructions for the PC side. I'm not too good with some of the technical aspects of PC's.

To let you know what I've tried, I've attempted to use the FireFox extension DownThemAll, but it's not really doing what I'd like it to do. If it's the only way, that's fine, but I may need a bit of assistance getting the extension to work.

The goal of all this is to eventually put these onto my iPhone or iPod Touch, and fill boring parts of the day. As I said, no profit or wide distribution. Purely my own amusement.

Help?

PSN: HoodieThirteen
XBL: Torn Hoodie
@hoodiethirteen
hoodie13 on

Posts

  • Options
    BarrakkethBarrakketh Registered User regular
    edited August 2009
    A combination of DownThemAll and AutoPager will probably do the job just fine. AutoPager is user-extensible so you can create rules for each individual comic, and once you load each page (it's basically appended to the current page) use DTA to download the images.

    Barrakketh on
    Rollers are red, chargers are blue....omae wa mou shindeiru
  • Options
    MagicToasterMagicToaster JapanRegistered User regular
    edited August 2009
    Wouldn't that eat up a lot of the web page's bandwidth?

    MagicToaster on
  • Options
    BarrakkethBarrakketh Registered User regular
    edited August 2009
    Wouldn't that eat up a lot of the web page's bandwidth?
    The same amount as doing things manually, just over a shorter period of time. You can narrow specify what sections to load and what to admit via XPath (just like how you select the link). The penny-arcade.com comic page is 9KB, so if you just say that they started in 1998 and they've been going at it for about 9 years while maintaining an output of three comics a week that should come out to 12.3 megabytes of plain HTML (which should be compressed so in reality that number will be lower for bandwidth purposes).

    Then add up all the images. You should really only allow DTA to download one image at a time to be polite. If it has a bandwidth limiter than I'd use that too and just be patient.

    Barrakketh on
    Rollers are red, chargers are blue....omae wa mou shindeiru
  • Options
    JasconiusJasconius sword criminal mad onlineRegistered User regular
    edited August 2009
    You scraping a site for images is not going to kill the server unless they are hosted on Tripod or something.

    Jasconius on
  • Options
    ascannerlightlyascannerlightly Registered User regular
    edited August 2009
    Jasconius wrote: »
    You scraping a site for images is not going to kill the server unless they are hosted on Tripod or something.
    i <3 geocities

    ascannerlightly on
    armedroberty.jpg
  • Options
    PracticalProblemSolverPracticalProblemSolver Registered User regular
    edited August 2009
    A simple *insert favorite scripting language here* script combined with wget would handle it much better than doing anything by hand. You just need to figure out how the page is written or the images named, if you can figure out the image naming process it's best to skip the page loading and get the image directly.

    actually here's a program to do it for you, with 945 supported comics and the ability to define custom ones: http://collector.skumleren.net/supported_comics.php?version=devel

    PracticalProblemSolver on
  • Options
    kathoskathos Registered User regular
    edited August 2009
    Yeah downloading all those delicious cake pictures all at once into one folder really helps out a lot ;).

    Kekekekekeke.

    kathos on
    Brlito.png
  • Options
    AwkAwk Registered User regular
    edited August 2009
    after ~10 years theyre closing my geocities account! ;(

    the internets are changing!

    Awk on
  • Options
    ÆthelredÆthelred Registered User regular
    edited August 2009
    You're going to need some sort of macro software. I would recommend AutoHotKey, which I know for sure could do what you want with a little scripting, but you're on a Mac. Try QuicKeys, Keyboard Maestro or HotApp; although I haven't used any of them myself.

    Also, if it's a popular webcomic you're after, search for a torrent of it. I found ones for Penny-Arcade and just downloaded those a while ago.

    Æthelred on
    pokes: 1505 8032 8399
  • Options
    JNighthawkJNighthawk Registered User regular
    edited August 2009
    http://www.httrack.com/ - lets you download a full copy of a website.

    JNighthawk on
    Game programmer
  • Options
    EtheaEthea Registered User regular
    edited August 2009
    This is pretty easy using python/perl since the majority of webcomics index the images based on the day it was posted. So you just keep changing the image request based on the day you want. This allows you to grab all the images faster.

    Ethea on
  • Options
    hoodie13hoodie13 punch bro Registered User regular
    edited August 2009
    Barrakketh wrote: »
    A combination of DownThemAll and AutoPager will probably do the job just fine. AutoPager is user-extensible so you can create rules for each individual comic, and once you load each page (it's basically appended to the current page) use DTA to download the images.

    Thanks a ton! This suggestion worked wonders. It took a little bit of effort to get AutoPager to work, but once I did this process worked like a dream.

    Thanks a lot, guys!

    hoodie13 on
    PSN: HoodieThirteen
    XBL: Torn Hoodie
    @hoodiethirteen
Sign In or Register to comment.