2

I'm using ctypes bit fields to dissect tightly packed binary data. I stuff a record's worth of data into a union as a string, then pull out key fields as integers.

This works great when there are no nulls in the buffer, but any embedded nulls cause cytpes to truncate the string.

Example:

from ctypes import *

class H(BigEndianStructure):
    _fields_ = [ ('f1', c_int, 8),
                 ('f2', c_int, 8),
                 ('f3', c_int, 8),
                 ('f4', c_int, 2)
                 # ...
                 ]

class U(Union):
    _fields_ = [ ('fld', H),
                 ('buf', c_char * 6)
                 ]

# With no nulls, works as expected...
u1 = U()
u1.buf='abcabc'
print '{} {} {} (expect: 97 98 99)'.format(u1.fld.f1, u1.fld.f2, u1.fld.f3)

# Embedded null breaks it...  This prints '97 0 0', NOT '97 0 99'
u2 = U()
u2.buf='a\x00cabc'
print '{} {} {} (expect: 97 0 99)'.format(u2.fld.f1, u2.fld.f2, u2.fld.f3)

Browsing the ctypes source, I see two methods to set a char array, CharArray_set_value() and CharArray_set_raw(). It appears that CharArray_set_raw() will handle nulls properly whereas CharArray_set_value() will not.

But I can't figure out how to invoke the raw version... It looks like a property, so I'd expect something like:

ui.buf.raw = 'abcabc'

but that yields:

AttributeError: 'str' object has no attribute raw

Any guidance appreciated. (Including a completely different approach!)

(Note: I need to process thousands of records per second, so efficiency is critical. Using an array comprehension to stuff a byte array in the structure works, but it's 100x slower.)

2 Answers 2

1

You can also create the raw-string array outside of your struct/union:

mystring = (c_char * 6).from_buffer(u2)
print mystring.raw

This way you don't have any overhead for conversion. I wonder why a (c_char * 6) behaves differently when used alone vs. used in a Structure/Union...

Sign up to request clarification or add additional context in comments.

3 Comments

For convenience (usually), the CField descriptors for c_char and c_wchar arrays are special-cased in PyCField_FromDesc (in Modules/_ctypes/cfield.c) to convert to and from native Python strings using s_get / s_set and U_get / U_set.
Don't use from_address for this since the resulting array doesn't own a reference on the source buffer, u2. This is a recipe for segfault disaster. Use (c_char * 6).from_buffer(u2).
In cases such as this I prefer to make the field name private (e.g. _buf) and use a public property.
0

c_char*6 is handled, unfortunately, as a nul-terminated string. Switch to c_byte*6 instead, but lose the convenience of initializing with strings:

from ctypes import *

class H(BigEndianStructure):
    _fields_ = [ ('f1', c_int, 8),
                 ('f2', c_int, 8),
                 ('f3', c_int, 8),
                 ('f4', c_int, 2)
                 # ...
                 ]

class U(Union):
    _fields_ = [ ('fld', H),
                 ('buf', c_byte * 6)
                 ]

u1 = U()
u1.buf=(c_byte*6)(97,98,99,97,98,99)
print '{} {} {} (expect: 97 98 99)'.format(u1.fld.f1, u1.fld.f2, u1.fld.f3)

u2 = U()
u2.buf=(c_byte*6)(97,0,99,97,98,99)
print '{} {} {} (expect: 97 0 99)'.format(u2.fld.f1, u2.fld.f2, u2.fld.f3)

Output:

97 98 99 (expect: 97 98 99)
97 0 99 (expect: 97 0 99)

1 Comment

Thanks, Mark. This works, but the CPU overhead associated with marshaling the bytes from the string into the byte array makes the approach too slow for my application. (My trials were about 100x slower than ctype's memcpy()).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.