I've been working on a transparent SQL router for PostgreSQL, something a bit like pgpool, but without all the "extra" functionality (no connection pooling, limited authentication intervention, etc.)
One of the key features of this new proxy is its transparency. Clients connecting to the router should be completely unaware that they are in fact talking to some piece of middleware that sits in front of PostgreSQL. They talk native FE/BE protocol, and the router responds in kind.
The documentation for PostgreSQL is normally top-shelf, but Chapter 46, which deals with the protocol, is surprisingly lacking in quality and confidence. It seems more like a design document than an API spec. So, armed with the information in there, I set out to probe a live PostgreSQL instance and see what it was going to spit back at me.
The PostgreSQL FE/BE protocol is a binary protocol that features lots of bits of text. SQL, after all, is a very ASCII-friendly language. As usual, when dealing with binary files, I turned to my old friend, od
.
od
is a utility for dumping the contents of files into whatever format you want for analysis. Here's some basic use cases that have made my life easier over the years.
First, treat the file like it's got embedded ASCII:
$ od -a /some/file
0000000 nul nul nul % nul etx nul nul u s e r nul j a m
0000020 e s nul d a t a b a s e nul e x a m
0000040 p l e nul
0000044
Or, if you prefer, hexadecimal:
$ od -x /some/file
0000000 0000 2500 0300 0000 7375 7265 6a00 6d61
0000020 7365 6400 7461 6261 7361 0065 7865 6d61
0000040 6c70 0065
0000044
You can even bypass the canned formats and specify your own with -t
:
$ od -t x1 /some/file
0000000 00 00 00 25 00 03 00 00 75 73 65 72 00 6a 61 6d
0000020 65 73 00 64 61 74 61 62 61 73 65 00 65 78 61 6d
0000040 70 6c 65 00
0000044
Now let's get back to my PostgreSQL story, and let's get really fancy.
Every PostgreSQL client connection starts out by sending a StartupMessage to the backend server. This startup identifies the version of the FE/BE protocol that the client understands, and contains some basic startup parameters. Each Startupmessage starts with a 4-octet length field, followed by a 4-octet version field. The most significant 16 bits of the version field encode the major version, and the least significant 16 bits represent the minor version. All values are interpreted in network byte-order.
Next comes zero or more pairs of strings, represented as a sequence of ASCII characters terminated by a NULL byte. To make testing easier and more reproducible, I wrote a small Perl script to generate the wire-protocol representation via the excellent pack utility. Here's the gist:
my $s = '';
$s .= pack('nn', 3, 0); # v3.0
$s .= pack('A*xA', "user", "james");
$s .= pack('A*xA', "database", "example");
print pack('N', 4 + length($s)) . "$s";
I can use od
to verify the pack, and inspect the binary structure of the generated StartupMessage:
$ ./pg | od -a
0000000 nul nul nul $ nul etx nul nul u s e r nul j a m
0000020 e s nul d a t a b a s e nul e x a m
0000040 p l e nul
0000044
It's kind of annoying that the 4-byte length field (bytes 0-4) displays as nul nul nul $
. Sure, I could go look up the ASCII value of '$' (it's 36). Or, I could just chain some -t
format specifiers together. I eventually ended up with:
$ ./pg | od -ta -tx1 -to4 --endian=big
0000000 nul nul nul $ nul etx nul nul u s e r nul j a m
00 00 00 24 00 03 00 00 75 73 65 72 00 6a 61 6d
00000000044 00000600000 16534662562 00032460555
0000020 e s nul d a t a b a s e nul e x a m
65 73 00 64 61 74 61 62 61 73 65 00 65 78 61 6d
14534600144 14135060542 14134662400 14536060555
0000040 p l e nul
70 6c 65 00
16033062400
0000044
Excellent. Now I can easily verify that the length field is 044, which matches up with the length of the input text there at the bottom (also 044).
Here's the play-by-play:
- -ta Turns on mostly-helpful-to-humans ASCII dumping.
- -tx1 Gives us the octets in hexadecimal, which is useful when there is no good ASCII/printable representation.
- -t04 Interprets each sequence of 4 octets as a single value, and prints it in octal, the same format used for addresses, allowing that quick scan and verify.
- --endian=big Tells
od
to use big-endian or network byte order when interpreting multi-octet sequences like our 4-byte length field.
StartupMessage looks good, so let's run it through a live PostgreSQL node!
$ ./pg | nc 10.244.232.2 6432 | od -ta -tx1 -to4 --endian=big
0000000 E nul nul nul ~ S F A T A L nul C 0 8 P
45 00 00 00 7e 53 46 41 54 41 4c 00 43 30 38 50
10500000000 17624643101 12420246000 10314034120
0000020 0 1 nul M i n v a l i d sp s t a r
30 31 00 4d 69 6e 76 61 6c 69 64 20 73 74 61 72
06014200115 15133473141 15432262040 16335060562
0000040 t u p sp p a c k e t sp l a y o u
74 75 70 20 70 61 63 6b 65 74 20 6c 61 79 6f 75
16435270040 16030261553 14535020154 14136267565
0000060 t : sp e x p e c t e d sp t e r m
74 3a 20 65 78 70 65 63 74 65 64 20 74 65 72 6d
16416420145 17034062543 16431262040 16431271155
0000100 i n a t o r sp a s sp l a s t sp b
69 6e 61 74 6f 72 20 61 73 20 6c 61 73 74 20 62
15133460564 15734420141 16310066141 16335020142
0000120 y t e nul F p o s t m a s t e r .
79 74 65 00 46 70 6f 73 74 6d 61 73 74 65 72 2e
17135062400 10634067563 16433260563 16431271056
0000140 c nul L 2 0 7 3 nul R P r o c e s s
63 00 4c 32 30 37 33 00 52 50 72 6f 63 65 73 73
14300046062 06015631400 12224071157 14331271563
0000160 S t a r t u p P a c k e t nul nul
53 74 61 72 74 75 70 50 61 63 6b 65 74 00 00
12335060562 16435270120 14130665545 16400000000
0000177
Oh noes! PostgreSQL is definitely not happy with our StartupMessage/ packet. It returns an ErrorResponse message letting us know why: invalid startup packet layout: expected terminator as last byte. Unlike the message we sent, the ErrorResponse reply message is typed; the first octet is an ASCII character that identifies what type of message it is. E tells us we are dealing with an ErrorResponse.
Unfortunately, that single leading octet throws off our -to4
, leading to the laughably high message length of 10500000000 (about 1.2GB). This is what --skip-bytes
(-j
to her friends) was made for! We can skip the type byte, since we now know that it is an 'E', and re-align on our 4-byte boundary:
$ ./pg | nc 10.244.232.2 6432 | od -ta -tx1 -to4 --endian=big -j1
0000001 nul nul nul ~ S F A T A L nul C 0 8 P 0
00 00 00 7e 53 46 41 54 41 4c 00 43 30 38 50 30
00000000176 12321440524 10123000103 06016050060
0000021 1 nul M i n v a l i d sp s t a r t
31 00 4d 69 6e 76 61 6c 69 64 20 73 74 61 72 74
06100046551 15635460554 15131020163 16430271164
0000041 u p sp p a c k e t sp l a y o u t
75 70 20 70 61 63 6b 65 74 20 6c 61 79 6f 75 74
16534020160 14130665545 16410066141 17133672564
0000061 : sp e x p e c t e d sp t e r m i
3a 20 65 78 70 65 63 74 65 64 20 74 65 72 6d 69
07210062570 16031261564 14531020164 14534466551
0000101 n a t o r sp a s sp l a s t sp b y
6e 61 74 6f 72 20 61 73 20 6c 61 73 74 20 62 79
15630272157 16210060563 04033060563 16410061171
0000121 t e nul F p o s t m a s t e r . c
74 65 00 46 70 6f 73 74 6d 61 73 74 65 72 2e 63
16431200106 16033671564 15530271564 14534427143
0000141 nul L 2 0 7 3 nul R P r o c e s s S
00 4c 32 30 37 33 00 52 50 72 6f 63 65 73 73 53
00023031060 06714600122 12034467543 14534671523
0000161 t a r t u p P a c k e t nul nul
74 61 72 74 75 70 50 61 63 6b 65 74 00 00
16430271164 16534050141 14332662564 00000000000
0000177
Everything seems to be as expected. A length of 0176
(which matches our 0177
final address if you remember the skipped octet). We can also see the structure that the PostgreSQL manual says we ought to. An 'S' frame, a 'C' frame, a few of the optionals and NULL-terminated string payloads for each.
If you look closesly, you'll notice that the final message itself is terminated by a NULL \0
octet. Is this what the ErrorResponse is complaining about?
Yes!
By adding a final NULL terminator to the Perl script:
my $s = '';
$s .= pack('nn', 3, 0); # v3.0
$s .= pack('A*xA', "user", "james");
$s .= pack('A*xA', "database", "example");
print pack('N', 4 + length($s)) . "$s\0"; # NULL!
We can now elicit a Ready response from the PostgreSQL backend:
$ ./pg | od -ta -tx1 -to4 --endian=big
0000000 nul nul nul % nul etx nul nul u s e r nul j a m
00 00 00 25 00 03 00 00 75 73 65 72 00 6a 61 6d
00000000045 00000600000 16534662562 00032460555
0000020 e s nul d a t a b a s e nul e x a m
65 73 00 64 61 74 61 62 61 73 65 00 65 78 61 6d
14534600144 14135060542 14134662400 14536060555
0000040 p l e nul nul
70 6c 65 00 00
16033062400 00000000000
0000045
./pg | nc 10.244.232.2 6432 | od -ta -tx1 -to4 --endian=big
0000000 R nul nul nul ff nul nul nul enq L d ' `
52 00 00 00 0c 00 00 00 05 4c 64 a7 60
12200000000 01400000000 00523062247 14000000000
0000015
Success! A ReadyResponse (note the R)!
od
is a small but flexible tool for analyzing weird files, binary network protocols, and even text-based network data (HTTP anyone?) to make sure it doesn't have any weird or unexpected octets in it. It's one of the many tools I use on a daily basis, both in writing code and in administering systems.
Happy Hacking!