As was foretold, we've added advertisements to the forums! If you have questions, or if you encounter any bugs, please visit this thread: https://forums.penny-arcade.com/discussion/240191/forum-advertisement-faq-and-reports-thread/
Options

[UNIX] grep (or maybe gawk) problem in a .csh script

TheCanManTheCanMan GT: Gasman122009JerseyRegistered User regular
edited January 2012 in Help / Advice Forum
I just have a very rudimentary understanding of UNIX and scripting, mostly just bits and pieces I've picked up through staring at code (all self-taught). So I'm not even really sure "why" this is a problem. But I do know what's causing it. Hopefully someone that actually knows what they're doing can help me with this.

Just a little background first. This is a script used at my job that is run monthly. It takes an instrument log file ("Inmracct.brief") and outputs usage data (specifically how much time the instrument was used and how many times each individual experiment was run). Everything was fine until we had to upgrade the software on one of them to a newer version. This new version creates a slightly different log file. Here's an entry in the log file that give a pretty good picture.
#Login
name: name
group: nmrsu 500
date: 12/02/2011 09:06:10 1322834770
nameOfExperiment: N PROTON1Page Proton (16 scans, 1 page)
fileSizeAcq: 32768
fileSizeProc: 32768
timeOfStart: 12/02/2011 09:06:10 1322834770
timeOfTermination: 12/02/2011 09:11:46 1322835106
nameOfExperiment: GCOSY.HW
fileSizeAcq: 1024
fileSizeProc: 1024
timeOfStart: 12/02/2011 09:11:46 1322835106
timeOfTermination: 12/02/2011 09:15:24 1322835324
nameOfExperiment: N FLUORINE Fluorine with Proton Decoupling
fileSizeAcq: 262144
fileSizeProc: 131072
timeOfStart: 12/02/2011 09:15:25 1322835325
timeOfTermination: 12/02/2011 09:16:33 1322835393
nameOfExperiment: 1D_NOESY.HW
fileSizeAcq: 262144
fileSizeProc: 131072
timeOfStart: 12/02/2011 09:26:20 1322835980
timeOfTermination: 12/02/2011 09:27:30 1322836050
nameOfExperiment: NOESY.HW
fileSizeAcq: 32768
fileSizeProc: 32768
timeOfStart: 12/02/2011 09:27:31 1322836051
timeOfTermination: 12/02/2011 09:33:32 1322836412
#Logout

The problem is in the "nameOfExperiment" line. Previously that line was just "nameOfExperiment: PROTON1Page". But in the new log the leading 'N' and the trailing experiment description seem to be breaking the script. To compound the problem, you'll notice that some experiments don't have the offensive bits.

Here are (at least what I believe are) the relevant portions of the script.
cat Inmracct.brief | gawk '/Experiment/, /Start/ {print}' | grep -v fileSizeAcq | grep -v fileSizeProc > sss1
gawk '{print} NR%2==0 {print ""}' sss1 | gawk 'BEGIN {FS=" "; RS=""} {print $2, $4, $5, $6}' | sed s/"\ /"/" "/ | sed s/"\ /"/" y"/ | grep y$yr | gawk '{print $0, strftime("%a %b", $6)}' | grep $mon | sed s/":"/" "/ | sed s/":"/" "/ > sss_exp
cat sss_exp | gawk 'END {print "\n", " Total exps number in", $10, "is: ", NR}' >> TimeExp_${mon}${yr}
gawk '{if ($5>=07 && $5<19) print $0}' sss_exp > sss_day
gawk 'END {print " day-time exps number in", $10, "is: ", NR}' sss_day >> TimeExp_${mon}${yr}
gawk '{if ($5<07 || $5>=19) print $0}' sss_exp > sss_night
gawk 'END {print " Night-time exps number in", $10, "is: ", NR}' sss_night >> TimeExp_${mon}${yr}
echo " " >> TimeExp_${mon}${yr}

grep PROTON sss_exp | wc -l | gawk '{print $1}' > sss_1H_t
grep PROTON sss_day | wc -l | gawk '{print $1}' > sss_1H_d
grep PROTON sss_night | wc -l | gawk '{print $1}' > sss_1H_e
paste sss_1H_t sss_1H_d sss_1H_e | gawk '{print "1H_exp: ", "\t", $1, "\t", "(Day:", $2, "\t", "Night:", $3, ")"}' >> TimeExp_${mon}${yr}

It seems to create "sss1" correctly, but when it pulls the data out of sss1 to create "sss_exp" it doesn't include any of the entries with the leading "N" and trailing description.

I'm not even sure if it's possible to have the script do what I need it to do (at least without becoming monstrously complex). :?

Thanks for any help!

TheCanMan on

Posts

  • Options
    SevorakSevorak Registered User regular
    Without fully understanding what this script is supposed to do or what the intermediate files should look like, I would guess you have a regular expression in their that was written to expect the nameOfExperiment to have a single word after it. You'll need to figure out which regular expression it is and change it to accept multiple words.

    steam_sig.png 3DS: 0748-2282-4229
  • Options
    BlazeFireBlazeFire Registered User regular
    edited January 2012
    At least one problem is that in the second line that begins:
    gawk '{print} NR%2==...

    it is using a space as a field delimiter and then references specific fields. The space after the "N" messes that up.

    I'm pretty sure it can still be a short script but I don't really know what the intended output should look like.

    Could you post one of the good .brief files and then a good sss1, sss_exp, sss_day, etc? Also, did you copy and paste that excerpt of the script or re-type it? It seems like it shouldn't work at all...

    BlazeFire on
  • Options
    TheCanManTheCanMan GT: Gasman122009 JerseyRegistered User regular
    Sevorak wrote:
    Without fully understanding what this script is supposed to do or what the intermediate files should look like, I would guess you have a regular expression in their that was written to expect the nameOfExperiment to have a single word after it. You'll need to figure out which regular expression it is and change it to accept multiple words.

    I copy & pasted. The only thing I changed was adding a space between the \ and / in the second line (sed s/"\ /"/" "/ | sed s/"\ /"/" y"/) because otherwise the forum font make it look like a capital letter 'V'. But it's only a small portion of the entire script.

    The sss_exp should look like (for the first entry in the .brief file excerpt from my OP):

    PROTON1page 12 02 y2011 09 06 10 1322834770 Fri Dec

  • Options
    BlazeFireBlazeFire Registered User regular
    Okay, this seems to work on my end. There are probably better ways to do it but this will get you by. Let me know if something else looks funny. Is the fluorine test supposed to not show up in any of the proton stuff?
    gawk 'BEGIN {FS=": "} {if ($1 ~ /nameOfExperiment/ || $1 ~ /timeOfStart/) print $2}' Inmracct.brief > sss1
    gawk 'NR%2==1 {d=$0} NR%2==0 {d=d" "$0; print d}' sss1 | gawk 'BEGIN {FS=" "} {if (length($1) == 1)  {print $2, $(NF-2),$(NF-1),$NF} else print}' | sed 's/\// /' | sed 's/\// y/' | grep y$yr | gawk '{print $0, strftime("%a %b", $6)}' | grep $mon | sed 's/:/ /g' > sss_exp
    
    cat sss_exp | gawk 'END {print "\n", " Total exps number in", $10, "is: ", NR}' >> TimeExp_${mon}${yr}
    gawk '{if ($5>=07 && $5<19) print $0}' sss_exp > sss_day
    gawk 'END {print " day-time exps number in", $10, "is: ", NR}' sss_day >> TimeExp_${mon}${yr}
    gawk '{if ($5<07 || $5>=19) print $0}' sss_exp > sss_night
    gawk 'END {print " Night-time exps number in", $10, "is: ", NR}' sss_night >> TimeExp_${mon}${yr}
    echo " " >> TimeExp_${mon}${yr}
    
    grep PROTON sss_exp | wc -l | gawk '{print $1}' > sss_1H_t
    grep PROTON sss_day | wc -l | gawk '{print $1}' > sss_1H_d
    grep PROTON sss_night | wc -l | gawk '{print $1}' > sss_1H_e
    paste sss_1H_t sss_1H_d sss_1H_e | gawk '{print "1H_exp: ", "\t", $1, "\t", "(Day:", $2, "\t", "Night:", $3, ")"}' >> TimeExp_${mon}${yr}
    

  • Options
    TheCanManTheCanMan GT: Gasman122009 JerseyRegistered User regular
    Hot damn, that worked like a charm!

    Yeah, the last four lines are repeated for each individual experiment.

    Thanks!

  • Options
    BlazeFireBlazeFire Registered User regular
    No problem. Like I said, if anything strange happens let me know.

Sign In or Register to comment.