Just a quick question. I need to process some huge data files and tag each paragraph with a numeric label. The label is just sequential order, starting from 44487.
What I tried doing, was to use readline() in conjunction with a while look, and search for a regex defined as "\n\n" however, its not finding anything, which in retrospect I know because readline should only take in 1 \n. So, uhh, how should I do this?
I mean, I could easily detect it just doing read() and using re.sub on the whole thing, but then I lose my loop for my counter being incremented.
So yea, help me see what I should have seen yesterday.
Posts
How is that not right? If the paragraphs are split up by "\n", then each paragraph is actually its own line, right? Just number the lines.
Or is it "\n" on a line by itself?
Right now, my data is formatted 80 characters per line, followed by a \n. So paragraphs have 2 \ns separating them, since there is a blank line in between.
See how many books I've read so far in 2010