Storing 'struct' data to binary file

I need to store a binary file with a 12 byte header composed of 4 fields. They are namely: sSamples (4-bytes integer), sSampPeriod (4-bytes integer), sSampSize (2-bytes integer), and finally sParmKind (2-bytes integer). I'm using 'struct' to my variables to the desired fields. Now that I have them defined separately, how could I merge them all to store the '12 bytes header'?

sSamples        = struct.pack('i', nSamples) # 4-bytes integer
sSampPeriod     = struct.pack('i', nSampPeriod) # 4-bytes integer
sSampSize       = struct.pack('H', nSampSize) # 2-bytes integer / unsigned short
sParmKind       = struct.pack('H', 9) # 2-bytes integer / unsigned short

In addition, I've a npVect float array of dimensionality D (numpy.ndarray - float32). How could I store this vector in the same binary file, but after the header?

Answers


As Cody Brocious wrote, you can pack your entire header at once:

header = struct.pack('<iiHH', nSamples, nSampPeriod, nSampSize, nParmKind)

He also mentioned endianness, which is important if you want to pack your data so as to reliably unpack it on machines with different architectures. The < at the beginning of my format string specifies "pack this data using a little-endian convention".

As for the array, you'll have to pack its length in order to determine how many values to unpack when you read it again. Doing it all in one call:

flattened = npVect.ravel()  # get a 1-D array of numbers
arrSize = len(flattened)
# pack header, count of numbers, and numbers, all in one call
packed = struct.pack('<iiHHi%df' % arrSize,
    nSamples, nSampPeriod, nSampSize, nParmKind, arrSize, *flattened)

Depending on how big your array is likely to be, you could end up with a huge string representing the entire contents of your binary file, and you might want to look into alternatives to struct which don't require you to have the entire file in memory.

Unpacking:

fmt = '<iiHHi'
nSamples, nSampPeriod, nSampSize, nParmKind, arrSize = struct.unpack(fmt, packed)
# Use unpack_from to start reading after the packed header and count
flattened = struct.unpack_from('<%df' % arrSize, packed, struct.calcsize(fmt))
npVect = np.ndarray(flattened, dtype='float32').reshape(# your dimensions go here
    )

EDIT: Oops, the array format isn't quite as simple as that :) The general idea holds, though: flatten your array into a list of numbers using any method you like, pack the number of values, then pack each value. On the other side, read the array as a flat list, then impose whatever structure you need on it.

EDIT: Changed format strings to use repeat specifiers, rather than string multiplication. Thanks to John Machin for pointing it out.

EDIT: Added numpy code to flatten the array before packing and reconstruct it after unpacking.


struct.pack returns a string, so you can combine the fields simply by string concatenation:

header = sSamples + sSampPeriod + sSampSize + sParmKind
assert len( header ) == 12

Need Your Help

Unable to get Java to trust my self-signed certificate

java security ssl https

Even after adding my self signed certificate to cacerts, I still can't get Java to trust it

Debugging memory corruption on MinGW

c++ debugging mingw application-verifier dr-memory

I'm having some trouble with memory corruption in a rather large project to control some scientific hardware (ca. 6000 lines), and I'm not sure which is the best way/tool to solve the problem. The