The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Automatically downloading a series of images from a website

hoodie13hoodie13 punch broRegistered User regular
edited August 2009 in Help / Advice Forum
So hopefully the title is clear enough to get you in this thread... What I'm looking for is this. I'd like to create a collection of webcomics, purely for my own amusement and not for distribution or profit, but I'm having difficulty finding an efficient way of gathering the images. Here's exactly what I want to do:

1. Have this process or macro recognize the image I want to save (for easiness' sake, let's just say the PA strip.).
2. Save this image to a pre-designated folder.
3. Move to the next comic in the series, basically by clicking the "next comic" button or whatever the website has (all of the comics have an easy image link for clicking, no java screwiness.)
4. Repeat steps 1-3 until there is no "next comic" button, or the button does nothing.

For my own ease, I'd prefer a Mac-friendly way of doing this, but I can work with PC. I just made need some additional instructions for the PC side. I'm not too good with some of the technical aspects of PC's.

To let you know what I've tried, I've attempted to use the FireFox extension DownThemAll, but it's not really doing what I'd like it to do. If it's the only way, that's fine, but I may need a bit of assistance getting the extension to work.

The goal of all this is to eventually put these onto my iPhone or iPod Touch, and fill boring parts of the day. As I said, no profit or wide distribution. Purely my own amusement.

Help?

PSN: HoodieThirteen
XBL: Torn Hoodie
@hoodiethirteen
hoodie13 on

Posts

  • BarrakkethBarrakketh Registered User regular
    edited August 2009
    A combination of DownThemAll and AutoPager will probably do the job just fine. AutoPager is user-extensible so you can create rules for each individual comic, and once you load each page (it's basically appended to the current page) use DTA to download the images.

    Barrakketh on
    Rollers are red, chargers are blue....omae wa mou shindeiru
  • MagicToasterMagicToaster JapanRegistered User regular
    edited August 2009
    Wouldn't that eat up a lot of the web page's bandwidth?

    MagicToaster on
  • BarrakkethBarrakketh Registered User regular
    edited August 2009
    Wouldn't that eat up a lot of the web page's bandwidth?
    The same amount as doing things manually, just over a shorter period of time. You can narrow specify what sections to load and what to admit via XPath (just like how you select the link). The penny-arcade.com comic page is 9KB, so if you just say that they started in 1998 and they've been going at it for about 9 years while maintaining an output of three comics a week that should come out to 12.3 megabytes of plain HTML (which should be compressed so in reality that number will be lower for bandwidth purposes).

    Then add up all the images. You should really only allow DTA to download one image at a time to be polite. If it has a bandwidth limiter than I'd use that too and just be patient.

    Barrakketh on
    Rollers are red, chargers are blue....omae wa mou shindeiru
  • JasconiusJasconius sword criminal mad onlineRegistered User regular
    edited August 2009
    You scraping a site for images is not going to kill the server unless they are hosted on Tripod or something.

    Jasconius on
    this is a discord of mostly PA people interested in fighting games: https://discord.gg/DZWa97d5rz

    we also talk about other random shit and clown upon each other
  • ascannerlightlyascannerlightly Registered User regular
    edited August 2009
    Jasconius wrote: »
    You scraping a site for images is not going to kill the server unless they are hosted on Tripod or something.
    i <3 geocities

    ascannerlightly on
    armedroberty.jpg
  • PracticalProblemSolverPracticalProblemSolver Registered User regular
    edited August 2009
    A simple *insert favorite scripting language here* script combined with wget would handle it much better than doing anything by hand. You just need to figure out how the page is written or the images named, if you can figure out the image naming process it's best to skip the page loading and get the image directly.

    actually here's a program to do it for you, with 945 supported comics and the ability to define custom ones: http://collector.skumleren.net/supported_comics.php?version=devel

    PracticalProblemSolver on
  • kathoskathos Registered User regular
    edited August 2009
    Yeah downloading all those delicious cake pictures all at once into one folder really helps out a lot ;).

    Kekekekekeke.

    kathos on
    Brlito.png
  • AwkAwk Registered User regular
    edited August 2009
    after ~10 years theyre closing my geocities account! ;(

    the internets are changing!

    Awk on
  • ÆthelredÆthelred Registered User regular
    edited August 2009
    You're going to need some sort of macro software. I would recommend AutoHotKey, which I know for sure could do what you want with a little scripting, but you're on a Mac. Try QuicKeys, Keyboard Maestro or HotApp; although I haven't used any of them myself.

    Also, if it's a popular webcomic you're after, search for a torrent of it. I found ones for Penny-Arcade and just downloaded those a while ago.

    Æthelred on
    pokes: 1505 8032 8399
  • JNighthawkJNighthawk Registered User regular
    edited August 2009
    http://www.httrack.com/ - lets you download a full copy of a website.

    JNighthawk on
    Game programmer
  • EtheaEthea Registered User regular
    edited August 2009
    This is pretty easy using python/perl since the majority of webcomics index the images based on the day it was posted. So you just keep changing the image request based on the day you want. This allows you to grab all the images faster.

    Ethea on
  • hoodie13hoodie13 punch bro Registered User regular
    edited August 2009
    Barrakketh wrote: »
    A combination of DownThemAll and AutoPager will probably do the job just fine. AutoPager is user-extensible so you can create rules for each individual comic, and once you load each page (it's basically appended to the current page) use DTA to download the images.

    Thanks a ton! This suggestion worked wonders. It took a little bit of effort to get AutoPager to work, but once I did this process worked like a dream.

    Thanks a lot, guys!

    hoodie13 on
    PSN: HoodieThirteen
    XBL: Torn Hoodie
    @hoodiethirteen
Sign In or Register to comment.