The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Curse you, Regular Expressions!

SeñorAmorSeñorAmor !!!Registered User regular
edited April 2008 in Help / Advice Forum
Here's a snippet of information I'm pulling from a device, within which I want to grab certain parts:
* Release
Model <model> 
Factory IP Address <factory ip>
Hardware T1r1.1.UU, 520 MHz, 128 MByte RAM 
Image Sensor and Lens b/w (F/2.0), color (F/2.0) 
Software <software> (2007-12-20) 

* Networking
BOOTP/DHCP off 
Zeroconf on 
Camera Name <name>
IP Address 10.10.10.3 
Network Mask 255.0.0.0 
Broadcast 10.255.255.255 
Link Local Address 169.254.200.49 
DNS Server 10.0.0.9 
Statistics Dropped: 0.0&#37; Collisions: 0% 
  LEC: 0 SEC: 0 

* Routing
Default Route Gateway: 10.0.0.1 Connection: Ethernet interface  

* ISDN Dial-In
Camera MSN answer calls to every MSN 
Security PAP 
Login Name guest 
Camera IP Address <ip>

* System
Date and Time <date>
Current Uptime <uptime>

* Audio

This is exactly how the data is sent to me, linebreaks and all. I only want the parts within angle brackets. This is the regexp I was using (which isn't working perfectly):
//                   1                        2     3            4    5               6     7                 8                    9
$pattern = "/Model (.*)\nFactory IP Address (.*)\n(.*)Software (.*) (.*)Camera Name (.*)\n(.*)Date and Time (.*)\nCurrent Uptime (.*)\n/s";
//	1	=	Model #
//	2	=	Factory IP
//	3	= 	Filler
//	4	=	Firmware
//	5	=	Filler
//	6	=	Name
//	7	=	Filler
//	8	=	Date
//	9	=	Uptime

The problem I'm having is that the matches immediately before a special character which are followed by another match (2, 4, 6, and 9) are being excessively greedy and not stopping at the boundary I want them to (2, 6, and 9 should be stopping at the new line \n, and 4 should stop at the space). Matches 1 and 8 work fine.

I'm sure there's something wrong with my pattern, but I'm not versed enough in regexps to spot the issue. Can anyone help?

Thanks in advance.

SeñorAmor on

Posts

  • Vrtra TheoryVrtra Theory Registered User regular
    edited April 2008
    Try changing all of your (.*) to (.*?).

    The question mark means "non-greedy", which I think is exactly what you need.

    Vrtra Theory on
    Are you a Software Engineer living in Seattle? HBO is hiring, message me.
  • LegionnairedLegionnaired Registered User regular
    edited April 2008
    Since it looks like you're using perl, you might try this alternate approach:
    @lines = $incoming_lines.split(\n);
    $lines[1] =~ /\<(.*)\>/o
    $model = $1
    (...)

    Since the running time for a Perl regex is roughly O(<regex_length> * <text_length>), cutting down on both of those is very helpful.

    Using the /o flag means it'll only build the finite state automatia behind the match once - and since it's the same thing for every match, it'll also be a lot more efficient.

    Legionnaired on
Sign In or Register to comment.