So, my job is to head the scanning team for a state court. We take all the legal documents and scan them into the database, for attorneys and the public to access.
The problem comes in when we save a 2000+ page document, the process comes to a crawl, and the more pages, the slower and slower it saves. Now obviously more is going to take longer, but not like this, at 200-500 pages it saves about 40 pages a second, at 2000 its about 1 page a second, and at 6000 it drops to around a page per two seconds or so, which means a document that large is taking hours to save.
The courts don't quite have the budget to get us new equipment, and the tech guys have tried everything possible with hardware outside of actually changing the processors themselves(Which I think may be part of the problem). It's not the connection choking as we have a 1 Gb(bit) connection for each computer to the servers, and from what I'm told its definitely not a serverside problem. The files are saved one for each image, not all merged into one, and we are running on Dell Optiplex 755's, with Kodak i610 scanners.
My questions I suppose would be, would setting the memory to run with priority for system cache, or turning off windows timestamping have an effect? (I'm under the assumption the files are stored locally on the scanning computers until saved.) Any other ideas?
Some people are like slinkies, useless, but they always manage to bring a smile to your face when you...push them down a flight of stairs.
Posts
If you want to find the cause of the problem you need to do basic troubleshooting. Just split the problem in half over and over until you're done. First you can try removing the network and the server from the equation as that is the easiest. Just do the scan and have it save to your local PC and see if there is a speed difference. If available try it on a XP PC and a Vista PC and see if you can tell a difference. Vista has a lot of issues copying large files over the network.
Is there no procedural way to fix this? Break up the 2000 page document into 4 500 page jobs and then combine them server side?
Also, does the system OCR the pages when scanned, saved, or after they have been archived? Generally I have our system (we use a similar style system for city documents) to OCR when the documents are scanned, but when a large job comes up, I turn off OCR until the pages have been archived because with 1000+ pages it could take all night.
It seems to me it's a problem of asking too much at once.
In short: Never work for the NYS Court system.
Edit: No character recognition systems used, legal documents have to be exact copies of the original.
Basically it's your only option.. if you can't throw hardware at it, you gotta find some way to slice the job into segments the available hardware can handle.
The problem with altering the program itself is that it was written and supported by MDY Advanced Technologies, who were absorbed by another company who are refusing to support our Clerks minutes. From what the IT guys have told me, they haven't been able to get the source code either so that the state programmers can tamper with the program.
And then you get 6 months of meetings talking about the problem and how to prevent it from happening again.
Isn't it awesome?! Like when our payroll/finance server decided to take a dive on a Friday afternoon.. Woo!
We have a $200,000 bag printer in our warehouse that's been sitting in it's box for 2 years.
This bag printer is the kind where you actually print directly onto industrial bags that store chemicals. We're talking 50 LB bags that hold industrial quality calcium carbonate, soda ash, etc. The system in use right now is that we just have generic bags, and print our own labels and stick them on. Well, 3 years ago someone in management just decided that they wanted this bag printer, and went and bought it without telling IT.
Now, this bag printer runs on some proprietary software, and can't get on the network and hook up with our existing databases where we keep the label information. So we were forced with either burning another $100k to get the company to convert our DB so the printer can use it, and go to them every time we needed to update it, or not use the machine.
They mulled over this decision for 6 months, and the company we bought the bag printer from went bankrupt. From what I understand we were one of only 3 companies that actually bought the thing. And now it's useless. All because one manager decided that the system we have wasn't goo enough, and saw this on a road trip.
I've seen 2 million dollar tape robots sit unused because it was purchased during a management upheaval, and the new management didn't like the vendor it came from and prevented anyone from using it.
(it was eventually unloaded as scrap metal and a new 1.2 million dollar tape robot from an approved vendor replaced it)
Basically youre probably screwed, just deal with it i guess. You get paid by the hour, right?
The other possibly solution though would just be to get more machines, that way if one is tied up saving down the file, you can use another machine to keep the jobs moving. For the NYS Courts, i bet they have a pretty sweet licencing deal with dell where adding a few new machines (especially the optiplex 700 line..) would probably only cost maybe 1000-2000 dollars for your department of 5 people, giving each person 2 machines.
Check out my band, click the banner.