Curse you, Regular Expressions!

SeñorAmor · April 2008

Here's a snippet of information I'm pulling from a device, within which I want to grab certain parts:

* Release
Model <model> 
Factory IP Address <factory ip>
Hardware T1r1.1.UU, 520 MHz, 128 MByte RAM 
Image Sensor and Lens b/w (F/2.0), color (F/2.0) 
Software <software> (2007-12-20) 

* Networking
BOOTP/DHCP off 
Zeroconf on 
Camera Name <name>
IP Address 10.10.10.3 
Network Mask 255.0.0.0 
Broadcast 10.255.255.255 
Link Local Address 169.254.200.49 
DNS Server 10.0.0.9 
Statistics Dropped: 0.0&#37; Collisions: 0% 
  LEC: 0 SEC: 0 

* Routing
Default Route Gateway: 10.0.0.1 Connection: Ethernet interface  

* ISDN Dial-In
Camera MSN answer calls to every MSN 
Security PAP 
Login Name guest 
Camera IP Address <ip>

* System
Date and Time <date>
Current Uptime <uptime>

* Audio

This is exactly how the data is sent to me, linebreaks and all. I only want the parts within angle brackets. This is the regexp I was using (which isn't working perfectly):

//                   1                        2     3            4    5               6     7                 8                    9
$pattern = "/Model (.*)\nFactory IP Address (.*)\n(.*)Software (.*) (.*)Camera Name (.*)\n(.*)Date and Time (.*)\nCurrent Uptime (.*)\n/s";
//	1	=	Model #
//	2	=	Factory IP
//	3	= 	Filler
//	4	=	Firmware
//	5	=	Filler
//	6	=	Name
//	7	=	Filler
//	8	=	Date
//	9	=	Uptime

The problem I'm having is that the matches immediately before a special character which are followed by another match (2, 4, 6, and 9) are being excessively greedy and not stopping at the boundary I want them to (2, 6, and 9 should be stopping at the new line \n, and 4 should stop at the space). Matches 1 and 8 work fine.

I'm sure there's something wrong with my pattern, but I'm not versed enough in regexps to spot the issue. Can anyone help?

Thanks in advance.

Vrtra Theory · April 2008

Try changing all of your (.*) to (.*?).

The question mark means "non-greedy", which I think is exactly what you need.

Legionnaired · April 2008

Since it looks like you're using perl, you might try this alternate approach:

@lines = $incoming_lines.split(\n);
$lines[1] =~ /\<(.*)\>/o
$model = $1
(...)

Since the running time for a Perl regex is roughly O(<regex_length> * <text_length>), cutting down on both of those is very helpful.

Using the /o flag means it'll only build the finite state automatia behind the match once - and since it's the same thing for every match, it'll also be a lot more efficient.

Penny Arcade

Quick Links

Curse you, Regular Expressions!

Posts