The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.
My professor's class notes are just a bunch of handouts he has scanned, and not text searchable. Is there software that can OCR scan this document and spit out the text? In case I need it, I have his permission to do this, but he only has the notes in this format as it was a grad assistant who put it all together years ago. Thanks!
(oh- I tried googling, but was overwhelmed with hits that didn't give me what I needed but rather gave OCRS that outputted to PDF. Thanks!)
Plaintext? Hmm... Do you have MS office? I know they have an OCR engine included in the most recent few (I think since Office XP came out).
There's not many free ones out there sadly. This is due to the fact that there's three companies that own most of the patents for OCR and have basically brutalized the industry with Intellectual Property rights domination. And all three of these companies charge unreasonable amounts of money to license their OCR technology in products.
If you can find freeware, it'll likely be a project by a group of people who are trying to come up with an alternative OCR technology that won't cost as much. All power to them... but as it stands, most reliable OCR comes from those sources. It's sad too, because many aspects of OCR are relatively obvious from a computer science standpoint, and they probably should never have been able to get patents on them. But I digress. It's a sucky situation.
Okay, here's the deal... this is the only OCR engine I've seen that's royalty free (therefore, free to use)... I don't know how reliable it is, and I don't know if you can directly OCR PDF files with it... but you should be able to find something that'll convert your PDF file to another format for free fairly easily. Once you have it converted to something like a TIFF image, you'll be able to run it through this I imagine.
Give it a shot. I can't guarantee it'll work well (or at all), but they say it's free. According to the reviews on download.com, it's good at extracting text but terrible at extracting formatting information. Anyways, it'll probably be the best you can get legally these days.
OCR's better than it used to be, but its still pretty terrible. If you can type at speed at all, you're probably better off just transcribing the stuff. Gives you the prof a chance to review and improve as you go, too.
Ok this is kind of a late response, but you might find it useful for doing OCR: http://evernote.com/
It's an application that runs on multiple platforms that helps you capture images and then collect them on a site you can access from anywhere, plus it will do OCR on the text information in the image.
Acrobat 8 has an OCR scan built right in, and its pretty fucking good. I dont think the reader has it built in though so youd have to go to a full version of Acrobat.
And i disagree about retyping it.... Even using a scanner and OCR software, you can still do a 100 or 200 page document fairly reliably in under an hour. Its time consuming, but no where near as time consuming as retyping 200 pages of text. With acrobat, its basically unattended. Yeah it will miss a letter here or there (depending on how poor quality the scan is), and the resulting PDF will be huge, but it will definately be searchable.
Alternatively you could go for something like Abbyy Finereader, which is 400 dollars, and doesnt do that great a job. Or SEC Publisher, which is like fuckgodly expensive. It does ok.
Like others said, there isnt a lot of decent OCR software out there. Most of what is i out there works mediocre. I prefer the engine in Acrobat, and since youre in school, you can probably get a student licence fairly cheap.
Posts
There's not many free ones out there sadly. This is due to the fact that there's three companies that own most of the patents for OCR and have basically brutalized the industry with Intellectual Property rights domination. And all three of these companies charge unreasonable amounts of money to license their OCR technology in products.
If you can find freeware, it'll likely be a project by a group of people who are trying to come up with an alternative OCR technology that won't cost as much. All power to them... but as it stands, most reliable OCR comes from those sources. It's sad too, because many aspects of OCR are relatively obvious from a computer science standpoint, and they probably should never have been able to get patents on them. But I digress. It's a sucky situation.
http://www.simpleocr.com/
Give it a shot. I can't guarantee it'll work well (or at all), but they say it's free. According to the reviews on download.com, it's good at extracting text but terrible at extracting formatting information. Anyways, it'll probably be the best you can get legally these days.
It's an application that runs on multiple platforms that helps you capture images and then collect them on a site you can access from anywhere, plus it will do OCR on the text information in the image.
And i disagree about retyping it.... Even using a scanner and OCR software, you can still do a 100 or 200 page document fairly reliably in under an hour. Its time consuming, but no where near as time consuming as retyping 200 pages of text. With acrobat, its basically unattended. Yeah it will miss a letter here or there (depending on how poor quality the scan is), and the resulting PDF will be huge, but it will definately be searchable.
Alternatively you could go for something like Abbyy Finereader, which is 400 dollars, and doesnt do that great a job. Or SEC Publisher, which is like fuckgodly expensive. It does ok.
Like others said, there isnt a lot of decent OCR software out there. Most of what is i out there works mediocre. I prefer the engine in Acrobat, and since youre in school, you can probably get a student licence fairly cheap.
Check out my band, click the banner.