od tricks

I've been working on a transparent SQL router for PostgreSQL, something a bit like pgpool, but without all the "extra" functionality (no connection pooling, limited authentication intervention, etc.)

One of the key features of this new proxy is its transparency. Clients connecting to the router should be completely unaware that they are in fact talking to some piece of middleware that sits in front of PostgreSQL. They talk native FE/BE protocol, and the router responds in kind.

The documentation for PostgreSQL is normally top-shelf, but Chapter 46, which deals with the protocol, is surprisingly lacking in quality and confidence. It seems more like a design document than an API spec. So, armed with the information in there, I set out to probe a live PostgreSQL instance and see what it was going to spit back at me.

The PostgreSQL FE/BE protocol is a binary protocol that features lots of bits of text. SQL, after all, is a very ASCII-friendly language. As usual, when dealing with binary files, I turned to my old friend, od.

od is a utility for dumping the contents of files into whatever format you want for analysis. Here's some basic use cases that have made my life easier over the years.

First, treat the file like it's got embedded ASCII:

$ od -a /some/file
0000000 nul nul nul   % nul etx nul nul   u   s   e   r nul   j   a   m
0000020   e   s nul   d   a   t   a   b   a   s   e nul   e   x   a   m
0000040   p   l   e nul
0000044

Or, if you prefer, hexadecimal:

$ od -x /some/file
0000000 0000 2500 0300 0000 7375 7265 6a00 6d61
0000020 7365 6400 7461 6261 7361 0065 7865 6d61
0000040 6c70 0065
0000044

You can even bypass the canned formats and specify your own with -t:

$ od -t x1 /some/file
0000000 00 00 00 25 00 03 00 00 75 73 65 72 00 6a 61 6d
0000020 65 73 00 64 61 74 61 62 61 73 65 00 65 78 61 6d
0000040 70 6c 65 00
0000044

Now let's get back to my PostgreSQL story, and let's get really fancy.

Every PostgreSQL client connection starts out by sending a StartupMessage to the backend server. This startup identifies the version of the FE/BE protocol that the client understands, and contains some basic startup parameters. Each Startupmessage starts with a 4-octet length field, followed by a 4-octet version field. The most significant 16 bits of the version field encode the major version, and the least significant 16 bits represent the minor version. All values are interpreted in network byte-order.

Next comes zero or more pairs of strings, represented as a sequence of ASCII characters terminated by a NULL byte. To make testing easier and more reproducible, I wrote a small Perl script to generate the wire-protocol representation via the excellent pack utility. Here's the gist:

my $s = '';
$s .= pack('nn', 3, 0);                   # v3.0
$s .= pack('A*xA', "user", "james");
$s .= pack('A*xA', "database", "example");
print pack('N', 4 + length($s)) . "$s";

I can use od to verify the pack, and inspect the binary structure of the generated StartupMessage:

$ ./pg | od -a
0000000 nul nul nul   $ nul etx nul nul   u   s   e   r nul   j   a   m
0000020   e   s nul   d   a   t   a   b   a   s   e nul   e   x   a   m
0000040   p   l   e nul
0000044

It's kind of annoying that the 4-byte length field (bytes 0-4) displays as nul nul nul $. Sure, I could go look up the ASCII value of '$' (it's 36). Or, I could just chain some -t format specifiers together. I eventually ended up with:

$ ./pg | od -ta -tx1 -to4 --endian=big
0000000 nul nul nul   $ nul etx nul nul   u   s   e   r nul   j   a   m
         00  00  00  24  00  03  00  00  75  73  65  72  00  6a  61  6d
            00000000044     00000600000     16534662562     00032460555
0000020   e   s nul   d   a   t   a   b   a   s   e nul   e   x   a   m
         65  73  00  64  61  74  61  62  61  73  65  00  65  78  61  6d
            14534600144     14135060542     14134662400     14536060555
0000040   p   l   e nul
         70  6c  65  00
            16033062400
0000044

Excellent. Now I can easily verify that the length field is 044, which matches up with the length of the input text there at the bottom (also 044).

Here's the play-by-play:

StartupMessage looks good, so let's run it through a live PostgreSQL node!

$ ./pg | nc 10.244.232.2 6432 | od -ta -tx1 -to4 --endian=big
0000000   E nul nul nul   ~   S   F   A   T   A   L nul   C   0   8   P
         45  00  00  00  7e  53  46  41  54  41  4c  00  43  30  38  50
            10500000000     17624643101     12420246000     10314034120
0000020   0   1 nul   M   i   n   v   a   l   i   d  sp   s   t   a   r
         30  31  00  4d  69  6e  76  61  6c  69  64  20  73  74  61  72
            06014200115     15133473141     15432262040     16335060562
0000040   t   u   p  sp   p   a   c   k   e   t  sp   l   a   y   o   u
         74  75  70  20  70  61  63  6b  65  74  20  6c  61  79  6f  75
            16435270040     16030261553     14535020154     14136267565
0000060   t   :  sp   e   x   p   e   c   t   e   d  sp   t   e   r   m
         74  3a  20  65  78  70  65  63  74  65  64  20  74  65  72  6d
            16416420145     17034062543     16431262040     16431271155
0000100   i   n   a   t   o   r  sp   a   s  sp   l   a   s   t  sp   b
         69  6e  61  74  6f  72  20  61  73  20  6c  61  73  74  20  62
            15133460564     15734420141     16310066141     16335020142
0000120   y   t   e nul   F   p   o   s   t   m   a   s   t   e   r   .
         79  74  65  00  46  70  6f  73  74  6d  61  73  74  65  72  2e
            17135062400     10634067563     16433260563     16431271056
0000140   c nul   L   2   0   7   3 nul   R   P   r   o   c   e   s   s
         63  00  4c  32  30  37  33  00  52  50  72  6f  63  65  73  73
            14300046062     06015631400     12224071157     14331271563
0000160   S   t   a   r   t   u   p   P   a   c   k   e   t nul nul
         53  74  61  72  74  75  70  50  61  63  6b  65  74  00  00
            12335060562     16435270120     14130665545     16400000000
0000177

Oh noes! PostgreSQL is definitely not happy with our StartupMessage/ packet. It returns an ErrorResponse message letting us know why: invalid startup packet layout: expected terminator as last byte. Unlike the message we sent, the ErrorResponse reply message is typed; the first octet is an ASCII character that identifies what type of message it is. E tells us we are dealing with an ErrorResponse.

Unfortunately, that single leading octet throws off our -to4, leading to the laughably high message length of 10500000000 (about 1.2GB). This is what --skip-bytes (-j to her friends) was made for! We can skip the type byte, since we now know that it is an 'E', and re-align on our 4-byte boundary:

$ ./pg | nc 10.244.232.2 6432 | od -ta -tx1 -to4 --endian=big -j1
0000001 nul nul nul   ~   S   F   A   T   A   L nul   C   0   8   P   0
         00  00  00  7e  53  46  41  54  41  4c  00  43  30  38  50  30
            00000000176     12321440524     10123000103     06016050060
0000021   1 nul   M   i   n   v   a   l   i   d  sp   s   t   a   r   t
         31  00  4d  69  6e  76  61  6c  69  64  20  73  74  61  72  74
            06100046551     15635460554     15131020163     16430271164
0000041   u   p  sp   p   a   c   k   e   t  sp   l   a   y   o   u   t
         75  70  20  70  61  63  6b  65  74  20  6c  61  79  6f  75  74
            16534020160     14130665545     16410066141     17133672564
0000061   :  sp   e   x   p   e   c   t   e   d  sp   t   e   r   m   i
         3a  20  65  78  70  65  63  74  65  64  20  74  65  72  6d  69
            07210062570     16031261564     14531020164     14534466551
0000101   n   a   t   o   r  sp   a   s  sp   l   a   s   t  sp   b   y
         6e  61  74  6f  72  20  61  73  20  6c  61  73  74  20  62  79
            15630272157     16210060563     04033060563     16410061171
0000121   t   e nul   F   p   o   s   t   m   a   s   t   e   r   .   c
         74  65  00  46  70  6f  73  74  6d  61  73  74  65  72  2e  63
            16431200106     16033671564     15530271564     14534427143
0000141 nul   L   2   0   7   3 nul   R   P   r   o   c   e   s   s   S
         00  4c  32  30  37  33  00  52  50  72  6f  63  65  73  73  53
            00023031060     06714600122     12034467543     14534671523
0000161   t   a   r   t   u   p   P   a   c   k   e   t nul nul
         74  61  72  74  75  70  50  61  63  6b  65  74  00  00
            16430271164     16534050141     14332662564     00000000000
0000177

Everything seems to be as expected. A length of 0176 (which matches our 0177 final address if you remember the skipped octet). We can also see the structure that the PostgreSQL manual says we ought to. An 'S' frame, a 'C' frame, a few of the optionals and NULL-terminated string payloads for each.

If you look closesly, you'll notice that the final message itself is terminated by a NULL \\0 octet. Is this what the ErrorResponse is complaining about?

Yes!

By adding a final NULL terminator to the Perl script:

my $s = '';
$s .= pack('nn', 3, 0);                   # v3.0
$s .= pack('A*xA', "user", "james");
$s .= pack('A*xA', "database", "example");
print pack('N', 4 + length($s)) . "$s\0";    # NULL!

We can now elicit a Ready response from the PostgreSQL backend:

$ ./pg | od -ta -tx1 -to4 --endian=big
0000000 nul nul nul   % nul etx nul nul   u   s   e   r nul   j   a   m
         00  00  00  25  00  03  00  00  75  73  65  72  00  6a  61  6d
            00000000045     00000600000     16534662562     00032460555
0000020   e   s nul   d   a   t   a   b   a   s   e nul   e   x   a   m
         65  73  00  64  61  74  61  62  61  73  65  00  65  78  61  6d
            14534600144     14135060542     14134662400     14536060555
0000040   p   l   e nul nul
         70  6c  65  00  00
            16033062400     00000000000
0000045

./pg | nc 10.244.232.2 6432 | od -ta -tx1 -to4 --endian=big
0000000   R nul nul nul  ff nul nul nul enq   L   d   '   `
         52  00  00  00  0c  00  00  00  05  4c  64  a7  60
            12200000000     01400000000     00523062247     14000000000
0000015

Success! A ReadyResponse (note the R)!

od is a small but flexible tool for analyzing weird files, binary network protocols, and even text-based network data (HTTP anyone?) to make sure it doesn't have any weird or unexpected octets in it. It's one of the many tools I use on a daily basis, both in writing code and in administering systems.

Happy Hacking!

James (@iamjameshunt) works on the Internet, spends his weekends developing new and interesting bits of software and his nights trying to make sense of research papers.

Currently working on Rook.