Where to begin

Datarape · February 2007

GrEETz:: ::

I\'m looking to begin a new project
a program that can read and save information from webpages.
(Mainly, wikipedia)

The programs I\'ve written in the past have never been of this sort of function.
My goal is to start small, a program that can just connect to various random pages.

But eventually I want to use the program to return information from those pages.
Obviously it will have to utilize whatever functions the website has put in place such as buttons and search boxes.

I have disposable machines for the task, just have no guidance. Any programmers out there can offer experience? thanks for aid

Nerissa · February 2007

So what you are wanting is a kind of automated browser?

What platform(s) and language(s) do you have available? Are you willing to learn a new language, and if not, what languages do you already know? What kinds of projects have you done before?

I'd start with building your own browser -- you should be able to find tutorials in a variety of places, depending on the language.

In order to interact with the elements of the page, though, what you would need to do would be to parse the HTML and find the buttons, etc. and send the same command as they do. If you're looking at just one specific set of pages that doesn't change format, only content, then you might be able to build the functions into your code, but unless you have control of the page (in which case, I'm not sure what your purpose is) you can't count on that not changing.

Obs · February 2007

I think what you are looking to build first is a web crawler.

Typically these are machines that crawl into the Deep Web to find websites that no search engine can ever find, because they aren't linked by any hyper links anywhere and their names are kept secret and hidden. The deep web is a very huge and potentially dangerous place.

But, you can also use web crawlers to read specific known sites and index information off of them. Just do a search for web crawlers.

Penny Arcade

Quick Links

Where to begin

Posts