The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Python gurus unite! (to help me)

clsCorwinclsCorwin Registered User regular
edited May 2009 in Help / Advice Forum
Just a quick question. I need to process some huge data files and tag each paragraph with a numeric label. The label is just sequential order, starting from 44487.

What I tried doing, was to use readline() in conjunction with a while look, and search for a regex defined as "\n\n" however, its not finding anything, which in retrospect I know because readline should only take in 1 \n. So, uhh, how should I do this?

I mean, I could easily detect it just doing read() and using re.sub on the whole thing, but then I lose my loop for my counter being incremented.

So yea, help me see what I should have seen yesterday.

clsCorwin on

Posts

  • HalgwetHalgwet Registered User regular
    edited May 2009
    Maybe I'm misinterpreting your problem, but couldn't you just do readline and if the line you get is just '\n' then you've got an empty line and thus a break between paragraphs?

    Halgwet on
  • DocDoc Registered User, ClubPA regular
    edited May 2009
    in_file = open('input.txt', 'r')
    out_file = open('output.txt', 'rw')
    
    i = 44487
    for line in in_file:
        out_file.write(str(i)+" "+line)
    
    in_file.close()
    out_file.close()
    

    How is that not right? If the paragraphs are split up by "\n", then each paragraph is actually its own line, right? Just number the lines.

    Or is it "\n" on a line by itself?

    Doc on
  • clsCorwinclsCorwin Registered User regular
    edited May 2009
    Lets me specify, since I wasn't so clear.

    Right now, my data is formatted 80 characters per line, followed by a \n. So paragraphs have 2 \ns separating them, since there is a blank line in between.

    clsCorwin on
  • DocDoc Registered User, ClubPA regular
    edited May 2009
    clsCorwin wrote: »
    Lets me specify, since I wasn't so clear.

    Right now, my data is formatted 80 characters per line, followed by a \n. So paragraphs have 2 \ns separating them, since there is a blank line in between.
    in_file = open('input.txt', 'r')
    out_file = open('output.txt', 'rw')
    
    i = 44487
    for line in in_file:
        if line = "\n":
            out_file.write(line+str(i))
            i+=1
        else:
            out_file.write(line)
    
    in_file.close()
    out_file.close()
    

    Doc on
Sign In or Register to comment.