I'm trying to make a regular expression that will identify subsequent lines in a data file that have the same date. The columns are tab delimited. Here's an example of the data, followed by my regular expression:
1955 05/08/2004 129 0 75.1 68.6 71.8 90.5 227. 7.81 ENE 23.8 73.8 74.0 74.4 0.3
1957 05/09/2004 130 0 79.9 68.4 74.5 87.6 484. 9.55 E 23.8 75.8 75.1 74.6 0
([0-9]{2}/[0-9]{2}/[0-9]{4}).*$\n[^\t]+\t\1
It matches the date, saves it via the capturing parentheses, matches everything else on the line, matches the newline marker, matches whatever the first column of the subsequent line is, and then looks for the backreferenced date: \1. This regexp works perfectly fine without the \1 in it, although obviously it just matches every line except the last. As soon as I put the \1 in there, though, Textpad, which uses the POSIX engine, complains that the regexp isn't valid.
Anyone have some idea why in the heck this would be?
I'm also open to any comments on the quality/brevity of the regexp here--I'm still learning.
Posts
If a simple expression like "(\d)\1" works fine, then I'm not sure what the problem is, your regex looks OK to me.
Anyway, if you're still having problems, I would just re-work it to not use a backreference. Assuming that you're looping through a bunch of lines, something like this:
The one problem I can see with this implementation is it will only match two successive lines — which I guess could be enough for what you need, but it's a slow Sunday at work, so I'm going to see how feasible it is to match an arbitrary number of lines.
[edit:
behold!
oh. my. god.
I'm Jacob Wilson. | facebook | thegreat2nd | [url="aim:goim?screenname=TheGreatSecond&message=Hello+from+the+Penny+Arcade+Forums!"]aim[/url]
So assuming that is the problem--and I will try some of these other expressions as well--can someone recommend a free PC text editor with better, hopefully more standard regexp support?