Reading binary data on bit level

Question

I have a binary file in which the data is organised in 16 bit integer blocks like so:

bit 15: digital bit 1
bit 14: digital bit 2
bits 13 to 0: 14 bit signed integer

The only way that I found how to extract the data from file to 3 arrays is:

data = np.fromfile("test1.bin", dtype=np.uint16)

digbit1 = data >= 2**15

data = np.array([x - 2**15 if x >= 2**15 else x for x in data], dtype=np.uint16)

digbit2 = data >= 2**14

data = np.array([x-2**14 if x >= 2**14 else x for x in data])

data = np.array([x-2**14 if x >= 2**13 else x for x in data], dtype=np.int16)

Now I know that I could do the same with with the for loop over the original data and fill out 3 separate arrays, but this would still be ugly. What I would like to know is how to do this more efficiently in style of dtype=[('db', [('1', bit), ('2', bit)]), ('temp', 14bit-signed-int)]) so that it would be easy to access like data['db']['1'] = array of ones and zeros.

Ever heard of pack. I do not really see what numpy has to do with binary files. — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Oct 20, 2017 at 12:18
Could you phrase your comment in a more helping way? Maybe it doesn't have a lot to do with binary files themselves but with there content it is really useful. At least in my case. — TheoryX
– TheoryX, Commented Oct 20, 2017 at 22:18

PM 2Ring · Accepted Answer · 2017-10-21 04:34:07Z

3

Here's a way that is more efficient than your code because Numpy does the looping at compiled speed, which is much faster than using Python loops. And we can use bitwise arithmetic instead of those if tests.

You didn't supply any sample data, so I wrote some plain Python 3 code to create some fake data. I save that data to file in big-endian format, but that's easy enough to change if your data is actually stored in little-endian. I don't use numpy.fromfile to read that data because it's faster to read the file in plain Python and then convert the read bytes using numpy.frombuffer.

The only tricky part is handling those 14 bit signed integers. I assume you're using two's complement representation.

import numpy as np

# Make some fake data
bdata = []
bitlen = 14
mask = (1 << bitlen) - 1
for i in range(12):
    # Two initial bits
    a = i % 4
    # A signed number
    b = i - 6
    # Combine initial bits with the signed number,
    # using 14 bit two's complement.
    n = (a << bitlen) | (b & mask)
    # Convert to bytes, using 16 bit big-endian
    nbytes = n.to_bytes(2, 'big')
    bdata.append(nbytes)
    print('{} {:2} {:016b} {} {:>5}'.format(a, b, n, nbytes.hex(), n))
print()

# Save the data to a file
fname = 'test1.bin'
with open(fname, 'wb') as f:
    f.write(b''.join(bdata))

# And read it back in
with open(fname, 'rb') as f:
    data = np.frombuffer(f.read(), dtype='>u2')

print(data)

# Get the leading bits
digbit1 = data >> 15
print(digbit1)

# Get the second bits
digbit2 = (data >> 14) & 1
print(digbit2)

# Get the 14 bit signed integers
data = ((data & mask) << 2).astype(np.int16) >> 2
print(data)

output

0 -6 0011111111111010 3ffa 16378
1 -5 0111111111111011 7ffb 32763
2 -4 1011111111111100 bffc 49148
3 -3 1111111111111101 fffd 65533
0 -2 0011111111111110 3ffe 16382
1 -1 0111111111111111 7fff 32767
2  0 1000000000000000 8000 32768
3  1 1100000000000001 c001 49153
0  2 0000000000000010 0002     2
1  3 0100000000000011 4003 16387
2  4 1000000000000100 8004 32772
3  5 1100000000000101 c005 49157

[16378 32763 49148 65533 16382 32767 32768 49153     2 16387 32772 49157]
[0 0 1 1 0 0 1 1 0 0 1 1]
[0 1 0 1 0 1 0 1 0 1 0 1]
[-6 -5 -4 -3 -2 -1  0  1  2  3  4  5]

If you do need to use little-endian byte ordering, just change the dtype to '<u2' in the np.frombuffer call. And to test it, change 'big' to 'little' in the n.to_bytes call in the fake data making section.

edited Oct 21, 2017 at 4:34

answered Oct 20, 2017 at 13:26

PM 2Ring

55.6k6 gold badges96 silver badges202 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

TheoryX Over a year ago

Thank you for really nice answer. It is exactly what I wanted. According to numpy documentation fromfile function does support dtype and on your example and my data they both work. Maybe it is worth adding, that time-wise open with frombuffer is roughly 25% faster than fromfile.

PM 2Ring Over a year ago

@TheoryX No worries. I misread the Notes section of the fromfile docs. I'll adjust my answer shortly.

Collectives™ on Stack Overflow

Reading binary data on bit level

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related