To commemorate the third annual GopherCon US in Denver this week, we’re releasing cgo bindings to two compression libraries that we’ve been using in production at Datadog for a while now: czlib and zstd.
czlib started as a fork of the vitess project’s cgzip package. Our primary data pipeline uses zlib compressed messages, but the standard library’s pure Go implementation can be significantly slower than the C zlib library. In order to address this gap, we modified a few flags in cgzip
to make it encode and decode with zlib wrapping rather than with gzip headers.
We’ve detailed some of the other more novel design decisions in czlib
, including its batch interfaces, in our general blog on performance in Go a couple of years ago. Performance varies quite a bit among the various interfaces, so it pays to benchmark using a message that is typical for your system by running the czlib
benchmark suite with PAYLOAD=path_to_message go test -run=NONE -bench .
Here are modern benchmark results running go1.7beta2 for compression and decompression using the non-streaming interface in czlib, the streaming interface, and the standard library’s compress/zlib that show the variance in performance:
# using a 2kb plaintext message
BenchmarkCompress-4 30000 47415 ns/op 44.42 MB/s
BenchmarkCompressStream-4 20000 61732 ns/op 34.11 MB/s
BenchmarkCompressStdZlib-4 5000 227182 ns/op 9.27 MB/s
BenchmarkDecompress-4 200000 8238 ns/op 255.62 MB/s
BenchmarkDecompressStream-4 100000 18352 ns/op 114.75 MB/s
BenchmarkDecompressStdZlib-4 50000 31565 ns/op 66.72 MB/s
# using a 1.7MB plaintext message
BenchmarkCompress-4 20 69808144 ns/op 24.70 MB/s
BenchmarkCompressStream-4 20 73170819 ns/op 23.56 MB/s
BenchmarkCompressStdZlib-4 20 70498763 ns/op 24.46 MB/s
BenchmarkDecompress-4 200 6709252 ns/op 256.98 MB/s
BenchmarkDecompressStream-4 200 6891833 ns/op 250.18 MB/s
BenchmarkDecompressStdZlib-4 100 14256445 ns/op 120.94 MB/s
zstd, pronounced Zstandard, is a relatively new fast compression library from Yann Collet, the author of lz4. It has recently finalized its format, and a 1.0 release is pending. It compresses slightly faster than zlib at level 6 at a slightly better ratio, and decompresses much faster, making it a great general purpose zlib replacement.
The zstd library supports some interfaces that are common in more advanced compression libraries like stream compression, compression levels and pre-computed dictionaries. These are all exposed by our zstd Go binding, with the dictionary builder available in the upstream repos. The binding intentionally mimics the zlib interface, and aside from a few functions that do not return error in zstd, it is functionally a drop-in replacement. It also exposes a fixed-length batch compression interface present in the underlying library, very similar to the lz4 interface.