Stupid Python Tricks: C-Structures using the ctypes Module

Summary

A discussion of reading data out of stream of bytes (encoded in a C-like structure) using the Python ctypes module.  The data in the stream is a UDP packet that represents an mDNS query or request.  The purpose of this article is to explain a process for decoding bytes streams in Python

Story

While I was working on A-Class Linux implementations I fell down the rabbit hole of mDNS.  mDNS is a part of the set of protocols that make up “Zero Configuration Networking”.  In order to understand the protocol I decided to implement (partially) an mDNS server.  You can read about that protocol and my implementation – when I get done :-).  However, all of that isn’t really important to this article, but it did bring me to dig into techniques for examining bytes in Python.

I doubt that this article is canonical, but I hope that it is at least useful.  I did find quite a few partial discussions of this topic, but I had to dig into to really understand.

Python Comment
bytes A built in object to represent an immutable sequence of single bytes.
bytearray A built in object to represent a mutable sequence of single bytes.
struct A module to encode and decode bytes from c-like structures (unfortunately the byte is an atomic unit of the struct module)
ctypes A module to interface to C functions and data.  It contains a bunch of classes which can be used to interface with C-Structures (like the struct module)

There are bunches of web hits on this topic.  However, here are a few which I found useful.

Link Comment
link A basic discussion of the ctypes module and the basic classes
link A discussion of the ctypes.sizeof function
link A discussion of the bytearray
link A Better Way to Work with Raw Data Types in Python

The UDP Header for a mDNS Packet

The IETF RFC 6895 documents the header format for mDNS (and DNS) packets.  The header contains data in Big Endian format encoded into 12 bytes that are broken up into bits, several-bits, and a few 16-bit integers.  Here is snapshot from the RFC.

 

A Red Herring

OK, I admit it.  I am a C-Programmer from way back.  My first inclination to decode the bytes looked like this:

  • Using shifts and or’s to assemble the bytes into big endian uint16s e.g. line 2
  • Using bit masks and logic “and” with or’s and shifts to pick out bit fields e.g. line 12
  • Using a tower of if/elif/elif/else to decode the individual values e.g. lines

Here is my first crack at this.

Encoding a c-structure with Bits

I didn’t really like the above implementation.  So I kept digging.  After a while I found the ctypes module.  This lets you

  • Derive a new class from the BigEndianStructure class (line 1)
  • Pack all of the bits and bytes next to each other (line 2)
  • Specify the field names, type and optionally the length in bits (line

When you receive data from a socket you will get a tuple that contains

  1. a “bytes” type object containing the raw bytes of the message
  2. a “tuple” containing the IP address (not relevant to this discussion)

Now that you have the bytes you can create an object of dnsHeader type to interpret the bytes.  The ctypes class method “from_buffer_copy” will take an array of bytes that is at least the length of the structure and return an object of the type of “dnsHeader”.

Then you can look at the individual fields like this: