The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Regular expression help

Dance CommanderDance Commander Registered User regular
edited January 2010 in Help / Advice Forum
I'm trying to make a regular expression that will identify subsequent lines in a data file that have the same date. The columns are tab delimited. Here's an example of the data, followed by my regular expression:
1955	05/08/2004	129	0	75.1	68.6	71.8	90.5	227.	7.81	ENE	23.8	73.8	74.0	74.4	0.3	
1957	05/09/2004	130	0	79.9	68.4	74.5	87.6	484.	9.55	E	23.8	75.8	75.1	74.6	0
([0-9]{2}/[0-9]{2}/[0-9]{4}).*$\n[^\t]+\t\1
It matches the date, saves it via the capturing parentheses, matches everything else on the line, matches the newline marker, matches whatever the first column of the subsequent line is, and then looks for the backreferenced date: \1. This regexp works perfectly fine without the \1 in it, although obviously it just matches every line except the last. As soon as I put the \1 in there, though, Textpad, which uses the POSIX engine, complains that the regexp isn't valid.
Anyone have some idea why in the heck this would be?
I'm also open to any comments on the quality/brevity of the regexp here--I'm still learning.

Dance Commander on

Posts

  • Vrtra TheoryVrtra Theory Registered User regular
    edited January 2010
    Have you tried testing a very simple regex with a back reference (the "\1")? I ask because even though it is listed in the POSIX standard, you can be POSIX-compliant without it, and most newer extended RE implementations leave it out, AFAIK.

    If a simple expression like "(\d)\1" works fine, then I'm not sure what the problem is, your regex looks OK to me.

    Vrtra Theory on
    Are you a Software Engineer living in Seattle? HBO is hiring, message me.
  • localh77localh77 Registered User regular
    edited January 2010
    Hmm, interesting. We should be able to figure this out. I not exactly a regexp expert, but what is the dollar sign in there for? When I take it out, it matches fine for me. And when I leave it in, it doesn't match, although I don't get an error.

    Anyway, if you're still having problems, I would just re-work it to not use a backreference. Assuming that you're looping through a bunch of lines, something like this:
    foreach $line(split(/\n/,$lines))
    {
    	if($line =~ /^[^\t]+\t([0-9]{2}\/[0-9]{2}\/[0-9]{4})/)
    	{
    		if($previous_date eq $1)
    		{
    			print "match: $1\n";
    		}
    		$previous_date = $1;
    	}
    }
    

    localh77 on
  • ronyaronya Arrrrrf. the ivory tower's basementRegistered User regular
    edited January 2010
    A wise sensei linked me this once.

    ronya on
    aRkpc.gif
  • Baron DirigibleBaron Dirigible Registered User regular
    edited January 2010
    Unless I'm missing something, your regex never actually matches the first column of the subsequent line. I don't have a copy of TextPad, but I tested the following regex using TextWrangler, and it worked fine:
    ^[\d]+\t([\d\/]+)\t.+\n.+?\t\1.+
    

    The one problem I can see with this implementation is it will only match two successive lines — which I guess could be enough for what you need, but it's a slow Sunday at work, so I'm going to see how feasible it is to match an arbitrary number of lines.

    [edit:

    behold!
    ^([\d]+\t([\d\/]+)\t.+\n)(.+?\t\2.+\r)+
    

    Baron Dirigible on
  • TheGreat2ndTheGreat2nd Registered User regular
    edited January 2010
    ronya wrote: »
    A wise sensei linked me this once.

    oh. my. god.
    :^: :^:

    TheGreat2nd on
    BinghamtonUniversity.png
    I'm Jacob Wilson. | facebook | thegreat2nd | [url="aim:goim?screenname=TheGreatSecond&message=Hello+from+the+Penny+Arcade+Forums!"]aim[/url]
  • Dance CommanderDance Commander Registered User regular
    edited January 2010
    I will have to take a look through this again when I'm at work on Tuesday. I think that V-Theory probably has it--I remember now trying to use other regular expressions with backreferences in the search string and having them fail similarly. It does support backreferences in the replacement string, oddly.
    So assuming that is the problem--and I will try some of these other expressions as well--can someone recommend a free PC text editor with better, hopefully more standard regexp support?

    Dance Commander on
  • Dance CommanderDance Commander Registered User regular
    edited January 2010
    So, simple backreferences work ok, but as soon as you go past a newline the whole thing shits the bed. Can someone recommend a text editor with a better RE engine? I do an awful lot of find-and-replace on data files that is greatly sped up by regular expressions, so something fully featured would be a tremendous help.

    Dance Commander on
Sign In or Register to comment.