The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.
Just a quick question. I need to process some huge data files and tag each paragraph with a numeric label. The label is just sequential order, starting from 44487.
What I tried doing, was to use readline() in conjunction with a while look, and search for a regex defined as "\n\n" however, its not finding anything, which in retrospect I know because readline should only take in 1 \n. So, uhh, how should I do this?
I mean, I could easily detect it just doing read() and using re.sub on the whole thing, but then I lose my loop for my counter being incremented.
So yea, help me see what I should have seen yesterday.
Maybe I'm misinterpreting your problem, but couldn't you just do readline and if the line you get is just '\n' then you've got an empty line and thus a break between paragraphs?
in_file = open('input.txt', 'r')
out_file = open('output.txt', 'rw')
i = 44487
for line in in_file:
out_file.write(str(i)+" "+line)
in_file.close()
out_file.close()
How is that not right? If the paragraphs are split up by "\n", then each paragraph is actually its own line, right? Just number the lines.
Right now, my data is formatted 80 characters per line, followed by a \n. So paragraphs have 2 \ns separating them, since there is a blank line in between.
Right now, my data is formatted 80 characters per line, followed by a \n. So paragraphs have 2 \ns separating them, since there is a blank line in between.
in_file = open('input.txt', 'r')
out_file = open('output.txt', 'rw')
i = 44487
for line in in_file:
if line = "\n":
out_file.write(line+str(i))
i+=1
else:
out_file.write(line)
in_file.close()
out_file.close()
Posts
How is that not right? If the paragraphs are split up by "\n", then each paragraph is actually its own line, right? Just number the lines.
Or is it "\n" on a line by itself?
Right now, my data is formatted 80 characters per line, followed by a \n. So paragraphs have 2 \ns separating them, since there is a blank line in between.
See how many books I've read so far in 2010