0

I have a text document I want to parse through. I want to be able to get the strings between "@5c00\n" and "@ffd2\n" and also between "@ffd2\n" and "@"

@5c00
81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 
@ffd2
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C 
@
q

I have tried to use regular expressions but this seems to give me ['',''].

file = open("app_blink.txt","r") #app_blink.txt being the string above
contents = file.read()
data = re.findall('\n(.*)@',contents,re.M)

I expected to get:

data
['81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 \nB1 13 38 01 32 D0 10 00..
 FD 3F 03 43 00 00 00 02','14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C..
 \n14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C \n14 5C 14 5C 14 5C 14..
 5C 14 5C 14 5C 00 5C CF 0C \n']

but actually got:

data
['','']

5 Answers 5

1

You were close. You needed the re.DOTALL flag instead, and a non-greedy match:

contents = '''\
@5c00
81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 
@ffd2
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C 
@
q
'''

import re
for x in re.findall(r'\n(.*?)@',contents,re.DOTALL):
    print(x)

Output:

81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 

14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C 
Sign up to request clarification or add additional context in comments.

Comments

0

This sounds like a job for regular expressions!

\@[^\n]*\n([^\@]*)\n(?=\@)

This regular expression will match:

  • First, a literal @ sign
  • Then, any line of characters, ending with a newline
  • Then, everything it can find that doesn't include an @: this part is saved into group #1
  • Then, a newline ending it all
  • Finally, accept only if the next character is an @ (but don't consume that character)

As an example:

>>> re.search(r'\@[^\n]*\n([^\@]*)\n(?=\@)', your_string).group(1)
'81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 \nB1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 '

So to get a list of the important stuff:

>>> [m.group(1) for m in re.finditer(r'\@[^\n]*\n([^\@]*)\n(?=\@)', your_string)]
['81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 \nB1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 ', '14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C \n14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C \n14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C ']

Or, for a simpler answer:

re.split(r'\@[^\n]*\n', your_string)

Split the string whenever you find a line starting with @.

1 Comment

Wow! Thank you for the help! I greatly appreciate you taking the time in explaining your regular expression also
0

Check this regex:

data = re.findall('^[\d \w]{2,}$',contents,re.M)

It's just taking the lines that have hexadecimal numbers.

Comments

0

This regex ought to work Tryit

import re

regex = r"^[^\@].*"

test_str = ("@5c00\n81 00 00\n76 20 11\n@ffd2\n")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Comments

0

Here, we may not want to use regular expressions because it might become slightly expensive. Maybe a string split would be fine. For example, we can split by @.

Example

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

test_str = '''
@bb00
81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02
@5c00
81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 
@ffd2
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 
14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C 
@
81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 
B1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 

'''

split_str = test_str.split('@')
data=[]
for matches in split_str:
  if (matches[:4] == '5c00' or matches[:4] == 'ffd2'):
    data.append(matches[5:])


print(data)

Output

['81 00 00 5C B1 13 3E 01 0C 43 B1 13 A6 00 1C 43 \nB1 13 38 01 32 D0 10 00 FD 3F 03 43 00 00 00 02 \n', '14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C \n14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 14 5C \n14 5C 14 5C 14 5C 14 5C 14 5C 14 5C 00 5C CF 0C \n']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.