You may not have noticed it, but broadpeak.io is doing a fundamental tiny little action that may have tremendous and positive consequences. When possible, we are compressing HLS and DASH manifest files using gzip or Brotli.
All of that may seem irrelevant or useless to you but let us get into the details to explain why it is essential.
First, what are we talking about?
As you may have noticed, broadpeak.io is a service delivering HLS and DASH streams. If you are familiar with ABR streaming protocols, you know that this leverages the concept of “manifest” files.
Manifests (also called “playlists” in HLS) are text files. They list the different variants proposed by the feed. It lists also video/audio fragments that can be downloaded by the video player for playout.
broadpeak.io is specialized in manipulating these manifests to enable use cases such as Ad Insertion/Replacement or Automatic Content Replacement. Today, this blog post is about why and how we compress these manifests in gzip or Brotli.
gzip (short for GNU zip) was created by Jean-Loup Gailly and Mark Adler in 1992 as an open-source alternative to the popular Unix compression utility, “compress”.
The gzip algorithm is based on the DEFLATE algorithm. DEFLATE is a lossless data compression algorithm based on two fundamental compression techniques: LZ77 and Huffman coding. The logic of DEFLATE is first to apply LZ77 to the input data to find repeated sequences and then apply Huffman coding to compress these sequences.
In addition to being widely used for compressing and decompressing files on Unix and Linux systems, gzip is also commonly used for web compression. That is the use employed by broadpeak.io here.
Brotli is a compression algorithm developed by Google in 2015 as an open source to compete with gzip. Of course, the main goal of Brotli was to provide more efficiency than other existing algorithms.
Brotli leverages the LZ77/Huffman algorithms too, but with a technique called Second Order context modeling. In context modeling, you can build a model of the input data based on the context or the surrounding data. This context can include the characters before and after a given character, the frequency of occurrence of specific patterns, or other relevant information.
Context modeling is especially effective in compressing data with a lot of redundancy or repetitive patterns, such as natural language text, where certain words and phrases appear more frequently than others. By identifying these patterns and using them to compress the data, context modeling can achieve higher compression ratios than other compression techniques. As you can imagine, this is an excellent fit for HLS and DASH manifest files.
There are multiple reasons for us to compress these manifests.
- First, you will tell me that is a sufficient reason by itself; the standards promote the use of compressed manifests! For example, HLS Authoring Specs 10.1 specifies, “The server MUST deliver playlists using gzip content-encoding”
- Secondly, of course, it considerably reduces the manifest size: HLS and DASH manifest files can be pretty heavy, especially if they contain information about long DVR (Digital Video Recording, in other words the capacity to pause and rewind for a period of time your live channel). Compression can significantly reduce the size of these files, making them quicker to download and improving the streaming session’s startup time.
- Third, as there is broad support for gzip and Brotli among modern web browsers and platforms, these algorithms can be easily used in all kind of context, making their impact even more prominent.
How is this working?
We compress the manifest whenever a client includes an Accept-Encoding header with either br, gzip, or both. When both are accepted, Brotli is selected because, as I will explain in a minute, it is the most optimized algorithm with the best results for us.
For the implementation of this feature, we decided to use Ingress-Nginx for making the compression. Indeed, Ingress-Nginx is an open-source Kubernetes controller that we use to manage and route traffic in our Kubernetes cluster. The good thing about Ingress-Nginx is that it includes built-in gzip and Brotli compression support.
We decided to go with Ingress-Nginx instead of Cloudfront for few reasons:
- With Cloudfront, you need to enable caching to compress the files. In our case, we have many use cases when manifest are highly personalized, and consequently cannot be cached.,
- We were looking to compress as early as possible the files in order to optimize the overall delivery chain, not only Cloudfront.
Here is the configuration (inspired by this doc) we used in our yaml to enable it:
use-gzip: "true" gzip-level: "6" gzip-types: "application/json text/plain application/xml application/dash+xml application/vnd.apple.mpegurl application/x-mpegURL audio/mpegURL" enable-brotli: "true" brotli-level: "6" brotli-types: "application/json text/plain application/xml application/dash+xml application/vnd.apple.mpegurl application/x-mpegURL audio/mpegURL"
gzip vs Brotli
When it comes to compressing text files such as manifests, there are some tiny differences between gzip and Brotli:
- Compression ratio: As explained before, Brotli is generally more effective at compressing text files than gzip, which means it can produce smaller output files.
- Speed: gzip is generally faster at compressing text files than Brotli. This is because gzip uses a simpler algorithm to compress data more quickly. Nevertheless, we did not experience any impact on our flow and latency using Brotli instead of gzip.
- Browser support: Brotli is supported by most modern web browsers, while gzip is supported by all browsers. This makes sense as gzip is older than Brotli.
In summary, if you’re compressing text files such as m3u8 or mpd and looking for the best compression ratio, Brotli is the best choice, but not all devices/browsers support it; this is the reason why we propose both, with a preference for Brotli if available.
Finops (financial optimization) for us, and you
Finally, and to conclude, this tiny feature is essential to make sure our customers do not overpay for our service. As you may see on our pricing page, we charge by a metric called “egress” which is the amount of data coming out of broadpeak.io to our customers. There are multiple ways to reduce that metric, particularly with a CDN, but compression is one of the easiest and most transparent ways to optimize the number of bits we deliver.
By compressing, we mechanically optimized our bill and yours. We believe it is vital for a healthy vendor-customer relationship!
The broadpeak.io team.
Photo by Karolina Grabowska from Pexels