Regular expression question

Drez · June 2011

Is there a way to find the first "word" in a string? My brain's tired and I can't seem to wrap my head around this today. Reading left to right, I want to find the first set of characters (which will vary in length) that is not interrupted by a space, dash, or ome other special character. So "GYG009- gy hub nngn urgh" and "GYG009 - $&bfg bbfghh" will both return "GYG009".

Drez · June 2011

By the way, I'm using this in an SQL query.

Seol · June 2011

<Input anchor symbol> <word character class symbol>*

I'm not sure what those various escapes are in your implementation, but that's the form of the regex you need. In Java's pattern matcher, that'd be \A(\w)*

schuss · June 2011

If it's always the same length - yes. If there's any variance in the character that comes after it (such as space OR dash), it gets a lot more complicated.
same length: SUBSTR(field, 1, 6) as First_Word (exact syntax depends on database)

Daenris · June 2011

What server are you using (MySQL, MS SQL Server, Oracle, etc)?

Drez · June 2011

I'm not connecting to a server. This is just a small Access 2007 app I'm writing.

The string length will vary between 3 and 10 characters. I just want to match the first set of alphanumerics before hitting a non-alphanumeric.

I can do this with VBA or if statements in SQL but that seems stupidly cumbersome to me.

Daenris · June 2011

I don't think you can really do this easily in SQL. Here's an example of a stored procedure that parses an input string based on a single separator character.

Here's another question with some responses that detail how to parse strings in SQL.

If you already have a working solution in VBA or something, I would say you may just want to stick with it.

CptHamilton · June 2011

Perhaps I'm misunderstanding your question, but I believe the pattern would be:
[^\s-]\{3,10\}[\s-]

Which, unless I'm misremembering my regular expression syntax, finds 3-10 of any non-whitespace, non-hyphen characters followed by whitespace or a hyphen. I don't recall if the range indicator thing expects the character to repeat when you're using a wildcard match...

[^\s-]+[\s-]

should work if that one doesn't.

Edit: Note that I'm not a SQL programmer so I have no idea how you'd implement that in a sql query. But if you know how to use regular expressions in the language at all then subbing in a pattern shouldn't be a major issue, I'd think.

L Ron Howard · June 2011

Wouldn't it be better to just use \w for word? It separates on any non digit or letter character, so I'd think that would be ample, no?

^\w.*$

Edit
Or hell, try this:
^\s*\w.*$

I think it would be easiest to grab all the data (using where clauses as necessary), and then sift through the data. I don't know enough about regexes and SQL to know if there's an easy way to do it. With my limited knowledge, I'd think it would be easier to poll the db for the info, and then create a sub/function/proc/etc to sift through the relevant data.

Penny Arcade

Quick Links

Regular expression question

Posts