0

I have a log file having a output:

Time = 1

smoothSolver:  Solving for Ux, Initial residual = 0.230812, Final residual = 0.0134171, No Iterations 2
smoothSolver:  Solving for Uy, Initial residual = 0.283614, Final residual = 0.0158797, No Iterations 3
smoothSolver:  Solving for Uz, Initial residual = 0.190444, Final residual = 0.016567, No Iterations 2
GAMG:  Solving for p, Initial residual = 0.0850116, Final residual = 0.00375608, No Iterations 3
time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00142109
smoothSolver:  Solving for omega, Initial residual = 0.00267604, Final residual = 0.000166675, No Iterations 3
bounding omega, min: -26.6597 max: 18468.7 average: 219.43
smoothSolver:  Solving for k, Initial residual = 1, Final residual = 0.0862096, No Iterations 2
ExecutionTime = 4.84 s  ClockTime = 5 s

I need to extract cumulative = 0.00142109 (which is in line 5 of the about output) using Python's regular expression. More precisely, I need to extract only the value 0.00142109 that corresponds to cumulative and write to an another file.

Currently, this is what I have:

contCumulative_0_out = open('contCumulative_0', 'w+')

with open(logFile, 'r') as logfile_read:
    for line in logfile_read:
        line = line.rstrip()
        if re.findall('cumulative = ([+-]?\d+)(?:\.\d+)?(?:[eE][+-]?\d+)?', line):
            print line
            contCumulative_0_out.write(line)

However, the output with the above code is:

time step continuity errors : sum local = 0.00999678, global = 0.00142109, cumulative = 0.00142109

I am basically getting the entire line that matches cumulative

Please let me know how to extract only the value corresponding to cumulative.

0

2 Answers 2

0

If the number is in the format you specified, I'd use simpler regex pattern:

for line in logfile_read:
    res = re.search(r'cumulative = ((\d|.)+)', line)
    if res:
        contCumulative_0_out.write(res.group(1))

Otherwise, just use your pattern, but use re.match and write the walue of res.group(n) as well, where res is the resulf or re.match and n is the number of subexpression in your regex exp enclosed in '(' and ')'.

Sign up to request clarification or add additional context in comments.

2 Comments

Minor point: ((\d|.)+) is better written as ([\d\.]+) and more efficient to parse.
Thanks @smci. However, when I use ([\d\.]+) all negative values are omitted. In my original post, I had shown only one set of output. However, I do get have negative values for subsequent outputs. Hence, with ([\d\.]+) all the negative values are omitted.
0

That's because re.findall() returns you a list of strings, instead of re.search() which returns a MatchObject. In any case you're throwing away that return value from your re.find/search() call, and then your code just uses line.

# Wrong
if re.findall(<regex>, line):
    print line
    contCumulative_0_out.write(line)

# Right
mat = re.search(<regex>, line) # but your regex needs changing, see below
if mat:
    cumvalue = mat.groups()
    print cumvalue
    contCumulative_0_out.write(cumvalue)
    #break # if you know you only have at most one match per file

However as @Andrew_Lvov points out, your regex is too complicated and doesn't force a start with a digit. So now you need to fix that. Andrew's regex is faster and good-enough (we know the number won't be malformed, and we can't pick up something with multiple periods, like an IP address).

(Btw, for efficiency, if you're guaranteed at most one instance of 'cumulative' line per file, you have no reason not to break from the for-loop after you process your match. Also, for efficiency, the line = line.rstrip() is unnecessary.)

Anyway, skim the doc on the difference between match, search and findall/finditer. Essential to know which is which. Pattern-matching fns and their variants are a pain in the ass in pretty much every language. Or type help(re) inside a Python shell.

6 Comments

Thanks @Andrew_Lvov for your corrections. I have now used your regex syntax.
With @Andrew_Lvov regex syntax, my output is ('0.00142109', '9'). How to eliminate '9'?
Then you want mat.group(0). But anyway, this goes away if you write the regex using ([\d\.]+) rather than ((\d|.)+)'
Is there a way to extract multiple strings? For example my final aim is to extract Time = <> (present in the first line of logfile) and the cumulative value corresponding to that. The output should be 1 <space> 0.00142109.
Hi @smci have a look at stackoverflow.com/questions/28017121/…. Much appreciated.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.