0

I have requirement to parse a large file(> 1GB). the lines are of following format.

2014-03-11- 00.02.46.149069 TBegin(EventId="XXXX",RequestId="Request",SrvName="service",TxTime="TransactionTime") ... ... End_TxTime New state for EntityId = 'XXXX' new state set to 'DONE' EventId = 'XXXX' RequestId = Request

I have to perform two operations- 1)Parse for file for specific service and record request and beginning TransactionTime 2)Parse the file again based on RequestId and record ending transactionTime

My code is provided below.

    requestId={}
    request_arry=[]
    start_time={}
    end_time={}
    f= open(sys.argv[2],"r")
    for line in f:
        searchObj1=re.search(r'.*RequestId="(.*)",SrvName="%s.*TxTime="(.*)"\)' % service,line,re.M)
        if searchObj1:
            if searchObj1.group(1) in requestId:
                pass
        else:
             requestId[searchObj1.group(1)]=i
             request_arry.append(searchObj1.group(1))
             start_time[searchObj1.group(1)]=searchObj1.group(2)
             i=i+1
        searchObj2=re.search(r'.*new state set to(.*).*RequestId = \'(.{16}).*',line,re.M)
        if searchObj2:
             if searchObj2.group(2) in requestId:
             end_time[searchObj2.group(2)]=line[:26]

The above code works fine but it takes 20 mins to parse 1GB of data. Is there any method to make this faster..?? If i can get this result in half the time it will be really helpful.. Kindly advice

2
  • Can you add a complete code sample? Commented Mar 19, 2014 at 6:20
  • did you tried writing generator for file read? Commented Mar 19, 2014 at 6:28

1 Answer 1

2
re.search(r'.*RequestId="(.*)",SrvName="%s.*TxTime="(.*)"\)' % service,line,re.M)

Here if service keeps changing it might be better to use a group .* and then after matching check whether that group is equal to service so that Python doesn't have to compile a new regex every time.

Use i+=1 rather than i = i+1 (this might be a micro optimization but it's cleaner code anyways).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.