The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Text parsing

EclecticGrooveEclecticGroove Registered User regular
edited March 2009 in Help / Advice Forum
Hey all.

It's been, well.. damn near forever since I've had to do anything of this sort and I can't think/find an easy way to do it at the moment.

What I need is either a script/macro/software that is free or has a fully functional trial that will do the following things (this is on a Windows xp system):

1)Search through a directory of files (text files in this case)
2)Find a particular variable I define (in this case all data that occurs between "[" and "]".
3)Output the results to a file (text, spreadsheet, doesn't matter).

Optionally if it would only return unique results that would be ideal, but not a requirement.

EDIT:

If this can be done with some sort of macro/vbscript within MS office that's fine too as I have access to those (2003 and 2007).

EclecticGroove on

Posts

  • PeregrineFalconPeregrineFalcon Registered User regular
    edited March 2009
    When you say "search through a directory" do you mean just the filenames? Or the entire contents of each file?

    Either way, you're basically going to open it and read it character-by-character - when you see "[" you start dumping to a file, when you see "]" you stop. Assuming there's no nesting, it's that simple. If you start getting to the point of multiple [[]]][]][][]][[[][][[][][]]] sets like that, you're into more complex shit and I have to ask which part is most important.

    PeregrineFalcon on
    Looking for a DX:HR OnLive code for my kid brother.
    Can trade TF2 items or whatever else you're interested in. PM me.
  • EclecticGrooveEclecticGroove Registered User regular
    edited March 2009
    All contents within the file, and there will be potentially hundreds of points of data per file (multiple instances of [ and ] per file).

    EclecticGroove on
  • PeregrineFalconPeregrineFalcon Registered User regular
    edited March 2009
    All contents within the file, and there will be potentially hundreds of points of data per file (multiple instances of [ and ] per file).

    Okay, but are they nested? If they're not nested ( [ data1 [ data2 ] data3 ] ) or staggered ( [ [ [ ] [ ] ] [ ] ] ) then you can just parse character-by-character like I outlined in the first post.

    PeregrineFalcon on
    Looking for a DX:HR OnLive code for my kid brother.
    Can trade TF2 items or whatever else you're interested in. PM me.
  • ScrubletScrublet Registered User regular
    edited March 2009
    And are they nested, or can you assume that one ''?

    Scrublet on
    subedii wrote: »
    I hear PC gaming is huge off the coast of Somalia right now.

    PSN: TheScrublet
  • EclecticGrooveEclecticGroove Registered User regular
    edited March 2009
    ah, no no nested. there will never be a set of [] within another set.

    EclecticGroove on
  • PeregrineFalconPeregrineFalcon Registered User regular
    edited March 2009
    ah, no no nested. there will never be a set of [] within another set.

    Stupidly easy then, you can write this in VBScript. Free, included with XP, and to open an IDE you just type notepad :P

    PeregrineFalcon on
    Looking for a DX:HR OnLive code for my kid brother.
    Can trade TF2 items or whatever else you're interested in. PM me.
  • RiemannLivesRiemannLives Registered User regular
    edited March 2009
    Is this for a Dwarf Fortress utility by any chance?

    RiemannLives on
    Attacked by tweeeeeeees!
  • TofystedethTofystedeth Registered User regular
    edited March 2009
    ah, no no nested. there will never be a set of [] within another set.

    Stupidly easy then, you can write this in VBScript. Free, included with XP, and to open an IDE you just type notepad :P
    I was going to suggest Python, but that's just because I know Python, and don't know much about/hate VBScript.

    Python is also free, but not included with Windows.

    Tofystedeth on
    steam_sig.png
  • EclecticGrooveEclecticGroove Registered User regular
    edited March 2009
    nope, no dwarf fortress, just need to grab a bunch of names out of some old log files. Anyone have a copy of a vbscript that could do this that's either built this way or easy enough to edit? Haven't ever looked into it, so not sure how much is involved.

    EclecticGroove on
  • TofystedethTofystedeth Registered User regular
    edited March 2009
    nope, no dwarf fortress, just need to grab a bunch of names out of some old log files. Anyone have a copy of a vbscript that could do this that's either built this way or easy enough to edit? Haven't ever looked into it, so not sure how much is involved.
    I could probably whip up a Python script in fairly short order. I've got a little free time and I've had to do similar projects at work here. VBS is mostly foreign to me though.

    Tofystedeth on
    steam_sig.png
  • vonPoonBurGervonPoonBurGer Registered User regular
    edited March 2009
    This is one line with grep:
    grep -o -P "\[.*?\]" filename.txt
    

    For example, I created a file named test.txt that contains the following:
    blahblah[test]blah[test]yadda
    yadda[success!]blah
    

    The above grep statement gives me the following output when given test.txt as the input file:
    [test]
    [test]
    [success!]
    

    If you have many files in one folder, you can use wildcards to return results from all files at once (e.g. *.log or *.* instead of filename.txt). The version of grep I used is 2.5.1 for Windows. It is free, open source, and available for download here.

    Edit: Oh, missed the uniqueness requirement. You can do that by also downloading GNU CoreUtils for Windows and piping the grep output to uniq:
    grep -o -P "\[.*?\]" filename.txt | uniq
    

    Here's why that does with my test file:
    [test]
    [success!]
    

    If you need the output in a file, just redirect to a file using the standard file redirection operator:
    grep -o -P "\[.*?\]" filename.txt | uniq > output.txt
    

    vonPoonBurGer on
    Xbox Live:vonPoon | PSN: vonPoon | Steam: vonPoonBurGer
  • EclecticGrooveEclecticGroove Registered User regular
    edited March 2009
    Python or VBscript makes no difference too me. I can hook up python into a system easily.

    EclecticGroove on
  • TofystedethTofystedeth Registered User regular
    edited March 2009
    Here's a quick and dirty and probably very unPythonic script for this.
    Replace the directory path on line 6 (where it says os.walk() ) with the directory your logs live in.
    You'll need to install Python. This was done in 2.5, but should work in any version.
    save it as a .py file. You'll need to make sure the indentation is as appears here.
    import os
    import re
    find_var = re.compile(r"\[.*?\]")
    outfile = file("./Log Parse Results.txt",'w')
    results = []
    for root, dirs, files in os.walk("./logtest"):
        for name in files:
            fname = root + "/" + name
            infile = file(fname)
            tmp = find_var.findall(infile.read())
            for var in tmp:
                if var not in results:
                    results.append(var)
    
    for r in results:
        outfile.write(r + "\n")
    

    If someone knows a more elegant Pythonic way that'd be neat. I only started learning Python a few months ago, so I'm still mostly writting C++ in Python.

    Tofystedeth on
    steam_sig.png
  • PeregrineFalconPeregrineFalcon Registered User regular
    edited March 2009
    Yeah, I think vonPoonBurGer wins this one with grep.

    PeregrineFalcon on
    Looking for a DX:HR OnLive code for my kid brother.
    Can trade TF2 items or whatever else you're interested in. PM me.
  • TofystedethTofystedeth Registered User regular
    edited March 2009
    Yeah, I think vonPoonBurGer wins this one with grep.

    To rip off Tycho:
    Sometimes, the old magic is best.

    Tofystedeth on
    steam_sig.png
  • EclecticGrooveEclecticGroove Registered User regular
    edited March 2009
    Ah.. id' forgotten about the grep for windows! That more or less did the trick.. I couldn't get uniq to actually.. well, do anything except give me the same output as without it or error out... but the resulting data wasn't too terrible to work with. Can consider this closed and can lock, thanks all!

    EclecticGroove on
Sign In or Register to comment.