1

I wrote a basic python program to parse android's resources.arsc. it prints out all strings found in the file. The strings have a zero value byte in between each character. This suggests to me that the strings are stored in utf-16. I don't know if that is correct, but android strings are localizable so I think it is. I am using string.decode('hex') to print the string out in human readable format. Here's a sample with a list of bytes that make up the string:

>>> print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')
res/drawable/about.png

The issue is, when I pipe this program to grep, I cannot grep for any of the strings read. How can I print it out to the shell so that grep will be able to match in its output? Thanks!

(EDIT) I did indeed print the string, but in my example I thought it would be better to show both the 'print'ed version and the returned version. sorry for the confusion. In this example, it is the '/res/drawable/about.png' that cannot be grepped.

(EDIT2) a simple demonstration:

11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')"
res/drawable/about.png
11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')" | grep about
11:33 AM ~/learning_python $ 

(EDIT3) another demonstration, I think this proves the data is in utf-16-be:

11:33 AM ~/learning_python $ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')" > testfile
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile
res/drawable/about.png
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile | grep about
Binary file (standard input) matches
11:35 AM ~/learning_python $ iconv -f utf16be -t utf8 testfile | grep -a about
res/drawable/about.png
4
  • Did you "print" the decoded string? Commented Oct 18, 2012 at 17:12
  • Yes, that's how the final string was produced. I edited my question for clarity. Commented Oct 18, 2012 at 17:14
  • put all in an array and you will notice the prob python -c "print [''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72', '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00']).decode('hex')]" Commented Oct 18, 2012 at 18:03
  • Possible duplicate of grepping binary files and UTF16 Commented Jan 17, 2019 at 13:11

2 Answers 2

2

Decode the characters:

'\x00r\x00e\x00s'.decode('utf-16-be') # produces u'res'

Then you can print out the decoded string:

$ python -c "print ''.join(['00', '72', '00', '65', '00', '73', '00', '2f', '00', '64', '00', '72',    '00', '61', '00', '77', '00', '61', '00', '62', '00', '6c', '00', '65', '00', '2f', '00', '61', '00', '62', '00', '6f', '00', '75', '00', '74', '00', '2e', '00', '70', '00', '6e', '00', '67', '00', '00', '00', '00']).decode('hex').decode('utf-16-be').rstrip('\0')" | grep about
res/drawable/about.png
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for that... I can now grep the output of my program, but I have to use the -a switch. I can live with that :)
1

Use ripgrep utility instead of grep which can support UTF-16 files.

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.).

Example syntax:

rg sometext file

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.