The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Slow saving speed(explanation inside)

JaedlynJaedlyn Registered User regular
So, my job is to head the scanning team for a state court. We take all the legal documents and scan them into the database, for attorneys and the public to access.

The problem comes in when we save a 2000+ page document, the process comes to a crawl, and the more pages, the slower and slower it saves. Now obviously more is going to take longer, but not like this, at 200-500 pages it saves about 40 pages a second, at 2000 its about 1 page a second, and at 6000 it drops to around a page per two seconds or so, which means a document that large is taking hours to save.

The courts don't quite have the budget to get us new equipment, and the tech guys have tried everything possible with hardware outside of actually changing the processors themselves(Which I think may be part of the problem). It's not the connection choking as we have a 1 Gb(bit) connection for each computer to the servers, and from what I'm told its definitely not a serverside problem. The files are saved one for each image, not all merged into one, and we are running on Dell Optiplex 755's, with Kodak i610 scanners.

My questions I suppose would be, would setting the memory to run with priority for system cache, or turning off windows timestamping have an effect? (I'm under the assumption the files are stored locally on the scanning computers until saved.) Any other ideas?

Some people are like slinkies, useless, but they always manage to bring a smile to your face when you...push them down a flight of stairs.
Jaedlyn on

Posts

  • wunderbarwunderbar What Have I Done? Registered User regular
    edited April 2009
    if you're saving large files over the network, especially that large, it's going to be slow. Have you considered saving the document to the local machine initially, and then copying to the network share after?

    wunderbar on
    XBL: thewunderbar PSN: thewunderbar NNID: thewunderbar Steam: wunderbar87 Twitter: wunderbar
  • LuqLuq Registered User regular
    edited April 2009
    First guess would be a memory issue. Your IT guys probably already tried bumping you up to 2GB or even 3GB of RAM though. I have to mention it though as it is the most obvious route. How big is a 2000 page doc? In the past I've seen some scanners with default settings having each page be around a MB. That would be taking you to 2GB on a 2k page scan which is ridiculous.

    If you want to find the cause of the problem you need to do basic troubleshooting. Just split the problem in half over and over until you're done. First you can try removing the network and the server from the equation as that is the easiest. Just do the scan and have it save to your local PC and see if there is a speed difference. If available try it on a XP PC and a Vista PC and see if you can tell a difference. Vista has a lot of issues copying large files over the network.

    Is there no procedural way to fix this? Break up the 2000 page document into 4 500 page jobs and then combine them server side?

    Luq on
    FFRK:jWwH RW:Onion Knight's Sage USB
  • JaedlynJaedlyn Registered User regular
    edited April 2009
    We bumped it from 2 to 4(3 and change) gigs of ram with no change at all in documents of any size, all documents, even pictures are scanned in black and white, which drastically reduces the size of the individual images. The scanning is done through a special interface, the Clerks Minutes, which was designed and programmed specifically for the use of editing and accessing these documents. There is no other way for the clients to interface with the servers in question, at the moment at least. The only network drive I have any type of access to is a 1TB drive with various install programs, forms, and inventories. The documents are legal documents of various types, some are public, others are private, and others sealed. Even if we could get the access for that type of thing, the budget for the extra people needed to manually copy the documents to the directories is out of the question. I'm limited to our side of things, or at most the local servers here connecting to the main servers.

    Jaedlyn on
    Some people are like slinkies, useless, but they always manage to bring a smile to your face when you...push them down a flight of stairs.
  • ArcSynArcSyn Registered User regular
    edited April 2009
    Unfortunately, it sounds like it's a software problem with the custom program that is being used. It's probably designed to handle multi-hundred page documents, but not so much multiple thousands.

    Also, does the system OCR the pages when scanned, saved, or after they have been archived? Generally I have our system (we use a similar style system for city documents) to OCR when the documents are scanned, but when a large job comes up, I turn off OCR until the pages have been archived because with 1000+ pages it could take all night.

    ArcSyn on
    4dm3dwuxq302.png
  • xzzyxzzy Registered User regular
    edited April 2009
    Can you split the job up? Make it 4 1000 page documents, instead of 1 4000 page one?

    It seems to me it's a problem of asking too much at once.

    xzzy on
  • JaedlynJaedlyn Registered User regular
    edited April 2009
    I'm actually in a bit of a war with the people who make the entries of the stuff to be scanned to do exactly that, splitting the pages up would be an immense help, but since the courts here do everything bssackwards here, we don't have access to create, edit, or delete the actual entries themselves, only edit the contents of the entry. (Each document is one "entry" in the minutes). The people working here are the laziest bunch of slackers imaginable, and asking them to take the time to copy paste entry data into two or more sections will get me in trouble for making a reasonable suggestion. I have four people to take apart, scan, and put back together tens of thousands of pages of documents a day, and the department who does the entering has maybe...10 or 15 people, lazy bastards.

    In short: Never work for the NYS Court system.


    Edit: No character recognition systems used, legal documents have to be exact copies of the original.

    Jaedlyn on
    Some people are like slinkies, useless, but they always manage to bring a smile to your face when you...push them down a flight of stairs.
  • ArcSynArcSyn Registered User regular
    edited April 2009
    Ah, bureaucracy. :D I know how you feel. Unfortunately, there's probably nothing to solve the problem unless you can talk directly with the software people about it. See if they know the limits of the software or perhaps some tweaks to speed things along.

    ArcSyn on
    4dm3dwuxq302.png
  • xzzyxzzy Registered User regular
    edited April 2009
    Yeah, if you can't modify the stuff you're inputting, maybe you can get whoever wrote the program to save checkpoints, to reduce the work it has to do in one go.

    Basically it's your only option.. if you can't throw hardware at it, you gotta find some way to slice the job into segments the available hardware can handle.

    xzzy on
  • KrikeeKrikee Registered User regular
    edited April 2009
    So you are scanning these on the Optiplex's (what OS?) and piping them to the server via what service? NFS? SMB (Windows file sharing)? Something else?

    Krikee on
  • JaedlynJaedlyn Registered User regular
    edited April 2009
    Its a customized Xp pro 32 bit build, to my knowledge, the minutes program is the only way to access the data across the server, the program itself handles the editing, saving and retrieving, if its using another service to send/retrieve it masks it well.

    The problem with altering the program itself is that it was written and supported by MDY Advanced Technologies, who were absorbed by another company who are refusing to support our Clerks minutes. From what the IT guys have told me, they haven't been able to get the source code either so that the state programmers can tamper with the program.

    Jaedlyn on
    Some people are like slinkies, useless, but they always manage to bring a smile to your face when you...push them down a flight of stairs.
  • ArcSynArcSyn Registered User regular
    edited April 2009
    Wow. If I were in charge of IT I would be scrambling to get that thing converted to a new system ASAP. Especially if it's a proprietary database.

    ArcSyn on
    4dm3dwuxq302.png
  • xzzyxzzy Registered User regular
    edited April 2009
    That's the fun of bureaucracy.. there's no scramble until there's a crisis.

    And then you get 6 months of meetings talking about the problem and how to prevent it from happening again.

    xzzy on
  • ArcSynArcSyn Registered User regular
    edited April 2009
    xzzy wrote: »
    That's the fun of bureaucracy.. there's no scramble until there's a crisis.

    And then you get 6 months of meetings talking about the problem and how to prevent it from happening again.

    :D Isn't it awesome?! Like when our payroll/finance server decided to take a dive on a Friday afternoon.. Woo!

    ArcSyn on
    4dm3dwuxq302.png
  • wunderbarwunderbar What Have I Done? Registered User regular
    edited April 2009
    bureaucracy(fuck, that word is a bitch to spell) is not limited to government.

    We have a $200,000 bag printer in our warehouse that's been sitting in it's box for 2 years.

    This bag printer is the kind where you actually print directly onto industrial bags that store chemicals. We're talking 50 LB bags that hold industrial quality calcium carbonate, soda ash, etc. The system in use right now is that we just have generic bags, and print our own labels and stick them on. Well, 3 years ago someone in management just decided that they wanted this bag printer, and went and bought it without telling IT.

    Now, this bag printer runs on some proprietary software, and can't get on the network and hook up with our existing databases where we keep the label information. So we were forced with either burning another $100k to get the company to convert our DB so the printer can use it, and go to them every time we needed to update it, or not use the machine.

    They mulled over this decision for 6 months, and the company we bought the bag printer from went bankrupt. From what I understand we were one of only 3 companies that actually bought the thing. And now it's useless. All because one manager decided that the system we have wasn't goo enough, and saw this on a road trip.

    wunderbar on
    XBL: thewunderbar PSN: thewunderbar NNID: thewunderbar Steam: wunderbar87 Twitter: wunderbar
  • xzzyxzzy Registered User regular
    edited April 2009
    Pfft, $200,000 is peanuts.

    I've seen 2 million dollar tape robots sit unused because it was purchased during a management upheaval, and the new management didn't like the vendor it came from and prevented anyone from using it.

    (it was eventually unloaded as scrap metal and a new 1.2 million dollar tape robot from an approved vendor replaced it)

    xzzy on
  • KrikeeKrikee Registered User regular
    edited April 2009
    Jaedlyn wrote: »
    Its a customized Xp pro 32 bit build, to my knowledge, the minutes program is the only way to access the data across the server, the program itself handles the editing, saving and retrieving, if its using another service to send/retrieve it masks it well.

    The problem with altering the program itself is that it was written and supported by MDY Advanced Technologies, who were absorbed by another company who are refusing to support our Clerks minutes. From what the IT guys have told me, they haven't been able to get the source code either so that the state programmers can tamper with the program.
    Run 'perfmon.msc,' do a scan with counters for your RAM usage, disk usage, network usage & processor usage logging in perfmon and take a look to see what's maxing. If none of your hardware is maxing then it is a software issue.

    Krikee on
  • ZeonZeon Registered User regular
    edited April 2009
    What format are you converting the documents into? is it PDF's or are they TIFFs or something else entirely? If its PDF's, the acrobat distiller (the program youre using is probably just a front-end for it) has problems distilling multithousand page "scan" style PDFs. We have pretty beefy machines at work, and yeah sure, for true PDFs it works awesome but if we get scanned PDFs and have to try to redistill them (or the time i was tasked with turning the training binders into PDFs, using the image scanner...) it chokes like hell.

    Basically youre probably screwed, just deal with it i guess. You get paid by the hour, right?

    The other possibly solution though would just be to get more machines, that way if one is tied up saving down the file, you can use another machine to keep the jobs moving. For the NYS Courts, i bet they have a pretty sweet licencing deal with dell where adding a few new machines (especially the optiplex 700 line..) would probably only cost maybe 1000-2000 dollars for your department of 5 people, giving each person 2 machines.

    Zeon on
    btworbanner.jpg
    Check out my band, click the banner.
  • ArcSynArcSyn Registered User regular
    edited April 2009
    I doubt it's PDFs. Most document imaging software for archival purposes uses tiff.

    ArcSyn on
    4dm3dwuxq302.png
Sign In or Register to comment.