0

I have a file formatted something like this:

./07/00-post.log:Referer: http://domain1.com/example/launch.jsp?BANKID=123&SOMEPARAM=123&...
./07/00-post.log:Referer: http://domain2.com/example/launch.jsp?PARAM=313&BANKID=13&...
...
...
./07/00-post.log:Referer: http://domainN.com/example/launch.jsp?BANKID=3213

Need to find and extract followed substrings for each line into separate file using shell script:

  1. Domain names between "http://" and "/" (domain1.com, domain2.com, ...)
  2. BANKID's for that domains (can be at different positions)

so i can have pairs of domains and ids at output.

I think cut won't work here. What utils can i use?

1
  • Look at awk and sed, these will meet your needs fine. There are lots of examples on the net and the google search will be very quick. Also it will make an excellent addition to your tool set as a developer. sed's regular expressions will handle this easily. Commented Mar 11, 2014 at 12:14

1 Answer 1

1

As the text is noYou can use grep for this:

$ grep -Po '(?<=http://)[^/]*|(?<=BANKID=)\d*' file
domain1.com
123
domain2.com
13
domainN.com
3213

Which in fact is joining to different grep expressions:

Get the numbers after BANKID=:

$ grep -Po '(?<=BANKID=)\d*' file
123
13
3213

and get the domain after http:// and up to next /:

$ grep -Po '(?<=http://)[^/]*' file
domain1.com
domain2.com
domainN.com

Note that cut is a tool to be used when the text format is homogeneous. It can work for the domains part:

$ cut -d/ -f5 file
domain1.com
domain2.com
domainN.com

But in general, it is a better job for grep or sed as per the BANKID requirement.

Sign up to request clarification or add additional context in comments.

1 Comment

Nice, I like it, though he could use egrep (true but just joking the answer is correct)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.