This blog post, written co-jointly with our partners from THEO, aims to take a step back on Manifest Manipulation happening in HLS/DASH. It is going over the concepts themselves and then shares the main challenges faced by our teams. You will also find best practices to deliver a better experience when transitions happen at the boundaries of each manipulation.

We are excited to write this with our colleagues from THEO. THEO is well-known in our industry and provides the THEOplayer for several devices. On our web app, we have decided to include THEOplayer to preview your content replacements. We are very happy to use THEO there, as it guarantees the best experience when you try to evaluate the benefits of our platform.

The reason we decided to write this together with THEO is to give a unique perspective from both the server (broadpeak.io) and the player (THEOplayer) sides. If you want to be successful in manipulations, it is important to make each side work with each other.

Video concatenation: how to do it in ABR

Historically, the concatenation of videos was done at the source: you were taking a single video, and you were mixing it with other content to create a total new piece. Even though this was functional, that was expensive to put in place (needs more transcoding/splicing capacity) and not flexible at all. More personalization meant more investment and time.

When you are using ABR formats such as HLS or DASH to deliver your video content, you use the concept of Manifest. And it opens a lot more flexibility than the old way I just described. Manifests are text files getting parsed by clients/players to request video segments which are going to be playout.

If you are not familiar at all with ABR, I recommend you check out this poster for a good high-level view.

The “generic” flow is typically the following one in HTTP:

Manifest is a great tool, because it enables referring to several levels of quality/bitrates, that the client can choose based on its own bandwidth allocation. That is the reason you call these formats ABR, for Adaptive BitRate. Each level lists the segments (video, audio, subtitles) that the player can download.

The different levels are typically called either:

  • Variants for HLS: Each Variant is a version of the stream at a particular bit rate and is contained in a separate playlist.
  • Representations in Adaptation Sets in MPEG-DASH: Each Adaptation Set catalogs the different representations of the media. Video Adaptation Sets typically contain multiple Representations, one for each resolution/bitrate, allowing the media player to select the best available quality without buffering.

We will come back to that in more detail later in this article, but from a high-level perspective, you can represent the theory using the following diagram:

Figure 1 from Cloudinary blog

Additionally, manifest, as something which is dynamically generated, can be “manipulated”, and it opens a ton of use cases to simplify operations and elevate the experience of the viewers. This is heavily used during the insertion of adverts or to comply with requirements such as Blackout.

Manifest Manipulation

HLS EXT-DISCONTINUITY-TAG

In HLS, the manifest is a .m3u8 file. Basically, it is a file combining metadata and links to video segments.

You will find two kinds of m3u8:

  • Master playlist, referencing variant playlists (the different profiles of quality/bitrate the player can access to):
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=150000,RESOLUTION=416x234,CODECS="avc1.42e00a,mp4a.40.2"
http://example.com/low/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=240000,RESOLUTION=416x234,CODECS="avc1.42e00a,mp4a.40.2"
http://example.com/lo_mid/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=440000,RESOLUTION=416x234,CODECS="avc1.42e00a,mp4a.40.2"
http://example.com/hi_mid/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=640000,RESOLUTION=640x360,CODECS="avc1.42e00a,mp4a.40.2"
http://example.com/high/index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=64000,CODECS="mp4a.40.5"
http://example.com/audio/index.m3u8
  • Variant or media playlists, referencing the actual video fragments:
#EXTM3U
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-TARGETDURATION:10
#EXT-X-VERSION:4
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:10.0,
fileSequenceA.ts
#EXTINF:10.0,
fileSequenceB.ts
#EXTINF:10.0,
fileSequenceC.ts
#EXTINF:9.0,
fileSequenceD.ts
#EXT-X-ENDLIST

Based on that, if you start imagining two simple variant playlists from two different contents, let’s say Video A and Video B, you will have something like this:

  • Example Video A
#EXTM3U
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:10
#EXTINF:10,
Exemplary_A_segment-01.ts
#EXTINF:10,
Exemplary_A_segment-02.ts
  • Example Video B
#EXTM3U
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:10
#EXTINF:10,
Exemplary_B_segment-01.ts
#EXTINF:10,
Exemplary_B_segment-02.ts

Then, if you would like to combine A and B in one single stream with B playing before A, you will do it like this:

  • Combination Video AB
#EXTM3U
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-TARGETDURATION:10
#EXTINF:10,
Exemplary_B_segment-01.ts
#EXTINF:10,
Exemplary_B_segment-02.ts
#EXT-X-DISCONTINUITY
#EXTINF:10,
Exemplary_A_segment-01.ts
#EXTINF:10,
Exemplary_A_segment-02.ts

The tag used to manage the transition is EXT-X-DISCONTINUITY, called the discontinuity flag.

Placing this tag in the m3u8 tells the player to expect the next video segment to be from a different source than the last, and that it should be prepared for it. It is used if you change bitrates or encryption type, for example.

If you want to insert/replace ads, you typically need to combine the discontinuity tags with signaling tags such as EXT-X-DATERANGE or EXT-X-CUE-OUT to share with the player the duration of the ad breaks and other ad metadata. The tags time information is often based on original in band SCTE35 messages present in the video itself.

DASH MULTI-PERIOD

In DASH, the manifest manipulation can be achieved by using Multiple Periods in the MPD.

To start with, let’s get back to what is the MPD, the manifest used in DASH:

  • Media Presentation Description (MPD) is a text file that contains the information required by a DASH Player Client to build appropriate URLs to access Video Fragments and playout to the viewer.
  • It includes:
    • A sequence of one or more Periods.
    • A Period contains one or more Adaptation Sets.
    • A Period represents the availability of a Media under its different forms (Adaptation Sets) over a specific period.
    • An Adaptation Set contains one or more Representations.
    • A Representation contains one or more Segments.
    • Segments carry the actual media data and associated metadata.

By default, playback continues seamlessly from one Period to the next, and this feature can be used to provide an easy way to make the player switch from one source of content to another. For example, if you practice ad insertion, you can introduce an ad by splitting the single period of a content into multi-period, let’s take an example applicable for VOD here:

Example of midroll Ad insertion in VOD

In the manifest, you will see this type of signaling:

<Period id="0" start="PT0.00S  ">
   … 
</Period>

<Period id="1" start="PT8.00S  >
    … 
</Period>

<Period id="2" start="PT12.00S  >
   … 
</Period>

With the Period 1 including indicators such as EventStream, Event or SCTE35 tags to indicate the presence, and details of the ads.

Challenges on the server side and how to overcome them

We wanted to share our experience around some challenges involving manifest manipulation, which can be easily overlooked.

HLS

Management of different variants between source and inserted content

Ideally, you will want to have the content sources to insert, and the original source prepared in the same way, so you don’t run into compatibility issue.

Indeed, HLS players being very sensitive to general stream changes and more specifically to media track or codec changes within the same stream, this can be very quickly problematic and generate playout failures.

However, we know reality is always more complex than theory, and a usual thing that you must deal with, when you look at inserting external content into an existing HLS stream, is that the original content and the inserted ones sometimes come from different video preparation pipelines, with different encoding and packaging characteristics.

One way to avoid finding yourself in that situation is obviously to re-process the contents, but this is not always possible, and it is adding delay and can come with significant costs.

Below is a list of guidelines and recommendations for compatibility that can help you maintain a certain level of consistency at the manifest level, which is satisfactory for players.

You need to check the compatibility between video, audio and I-frame variants by checking the following characteristics:

  • Number of media tracks: If the number of Video tracks of one content does not match the number of tracks of the other content, and you don’t have a logic against this, you are heading for disaster. That works also for Audio, I-frame, and subtitles. Some solutions like broadpeak.io offer mapping options through defined set of business rules to overcome this issue.
  • Codecs (attribute CODECS= on #EXT-X-STREAM-INF): 100% compatibility is needed on media tracks. The reason is, even with discontinuities tags, some client devices are not capable of handling a codec change quickly enough.
  • Language (audio rendition ID with attribute): an audio language which exists in the original content but does not exist in the content that has been inserted, will generate a playback error as the media playlist for this language will suddenly not exist (404). A good way to overcome this is to map the default inserted audio track (if codec compatible) of the inserted content to this specific language audio track of the original content. The outcome might not be optimal as Language would be different, but that remains the best possible option when content does not exist.
  • Bandwidth (attribute BANDWIDTH= on #EXT-X-STREAM-INF): in general, players are not so sensitive to bandwidth value. However, to provide the best experience, it makes sense to ensure smooth transition. Bandwidth should be used as reference to match variants between original content and contents to be inserted. Compatible variants with the closest bandwidth value shall be matched together. In the case where the audio and video variants are independent, the compatibility check must be run independently too (with a first focus on video).
  • Key Frames/Subtitles: the subtlety with subtitles is that, typically, ad creatives do not include them. We have seen good ways to overcome this, one of the best being to generate “empty” subtitles. It has the benefit of faking the presence of subtitles and ensuring a smooth experience.

Consistency in MEDIA-SEQUENCE value of media playlists

As you may have seen in the above HLS playlist, there is a tag called EXT-MEDIA-SEQUENCE. This tag indicates the sequence number of the first media segment URI that appears in a media playlist. Each media file URL in a playlist has a unique integer sequence number. The sequence number of a URL is higher by 1 than the sequence number of the URL that precedes it.

Players heavily rely on the value of this tag to identify where it is in the playlist and to transition from a variant to another one.

As you can imagine, the logic here must be respected when you do Manifest Manipulation to maintain the right level of consistency between media manifest of each variant. Therefore, each time there is an addition or a removal on media playlist, the MEDIA-SEQUENCE value shall be updated. The difficulty when doing content replacement, and especially Ad replacement, is that media segment size might differ between the existing content and the “replacement content”. This means that the number of segments differ in comparison to the number of segments which would have existed in the original content. 

Different ways to signal your ads

In HLS, there are two major types of SCTE-35 markers used to signal ad opportunities in a manifest. They are listed below:

  • EXT-X-DATERANGE tag,
  • EXT-X-CUE-OUT and EXT-X-CUE-IN tags.

The reference standard for ad markers in HLS manifest files is EXT-X-DATERANGE in the HLS RFC. EXT-X-CUE-OUT/IN are proprietary markers but very popular among the Ad industry.

These markers, with the right level of information, can allow complex scenarios to extend or shorten ad breaks. In Live stream, it is strongly recommended to know the marker length in advance so that the system can question ad-servers or ad-exchange with a duration to fill. The presence of a new EXT-X-CUE-OUT marker before the presence of an EXT-X-CUE-IN marker, or the absence of an EXT-X-CUE-IN can indicate the need to extend or shorten a break.

Not all systems can work with both markers or have the maturity to handle complex use-cases. It is important to understand that the level of interoperability between the different components of the chain (i.e. origin/packager) can be a barrier.

DASH

Diversity of DASH manifests

Even though DASH is a standard, there are different flavors which can be generated by the DASH packagers on the market.

For example, at Broadpeak, our packager, the BkS350 can generate the following types:

  • urn:mpeg:dash:profile:isoff-live:2011 corresponding to the ISO/IEC 23009-1 section 8.4, and it is the most popular one,
  • urn:com:dashif:dash264 corresponding to DASH-IF DASH-AVC/264 section 6.3,
  • urn:hbbtv:dash:profile:isoff-live:2012 corresponding to the HbbTv Profile,
  • urn:dvb:dash:profile:dvb-dash:2014 corresponding to the DVB Profile (DVB Document A168 July 2014),
  • urn:mpeg:dash:profile:mp2t-main:2011, to generate dash streaming formats but with .ts chunks,
  • urn:mpeg:dash:profile:isoff-on-demand:2011, a DASH profile only used for streaming VOD assets, based on progressive download.

Additionally, in DASH, you can choose to have dynamic or static playlists for VOD (for Live, it will always be dynamic).

Finally, on top of this diversity, there is also the possibility to use either segment templates with timeline, based on numbering, or segment templates with a mix of both.

Managing and maintaining all DASH flavors is complex and time consuming in comparison to the value that they each bring. Based on our experience, unless you have specific requirements that justify it, we recommend to use the ISO profile urn:mpeg:dash:profile:isoff-live:2011, which content formatting works perfectly fine for both HLS and DASH scenarios. We also recommend using DASH with templates and using segment timeline as it brings considerable benefits in the ability to signal discontinuities in a stream. If you sometimes lose your input, it helps with recovery. Using a timeline, handling a discontinuity is as straightforward as not signaling any new segments until the input is available again. Looking for more reason to go with timelines? Check this article: https://www.unified-streaming.com/blog/stop-numbering-underappreciated-power-dashs-segmenttimeline

Right use of timestampoffset

In the context of content insertion in Video-on-demand DASH assets, multi-period manifests are used to represent the original content and the content that is being inserted. In the context of ad insertion, and as we mentioned earlier in this article, the manifest would be broken down into three periods:

  • One first period that represents the original content until the ad break,
  • One second period that represents the ad break,
  • One third period that represents the original content after the ad break. This one should resume referencing media segments right where the first period stopped.

Note: we are vulgarizing for the sake of simplicity.

In such cases, it gets a little more complex for players to manage the timeline accordingly as the earliest presentation time in the third period is not necessarily aligned to play the right content at the right time.

To handle these use cases properly and to avoid inconsistencies in the media buffer, players need to rely on additional information to help position themselves properly within the virtual timeline, through a timestampoffset value.

That timestampoffset value is directly deduced from the Tag @presentationTimeOffset in the MPD which is placed, for each period at the time manifest manipulation. It is extremely important to use a system which correctly handles @presentationTimeOffset value to avoid playout issues.

Encryption

When performing Ad insertion or replacement, managing the switch from content to ad creative, or from ad creative to content is not very complex to manage in terms of encryption. Indeed, in most use-cases, ads are not encrypted, and the player only needs to switch from an encrypted content to a clear content, and back. Fairly easy ?.

Nevertheless, when it comes to other types of content substitution use cases (Blackout, Simsub, etc…), it is much more likely that the original feed and the alternate feed are either encrypted with different methods, or with different keys. If you substitute an encrypted content by another one, it is important to follow some rules in HLS/DASH:

  • As per MPEG-DASH specification, each AdaptationSet within the period must contain encryption information in the case of encrypted content.
  • For HLS, it is key to add #EXT-X-KEY:METHOD after the #EXT-X-DISCONTINUITY tag.

This helps the player to retrieve the keys for the inserted content.

Fillers and Slates

One of the challenges you will face when doing Ad or Content Replacement, is that the replacement content may be shorter than the original content.

In the ad replacement use case, the ad decision server will send back a list of ads whose total duration is shorter than the duration of ad break.

In that case, you need to have a solution which manages what is called a “filler”.

Fillers (or Slates) are short video clips that are played before the end of an ad break or a content replacement slot.

They must be transcoded to match source content.

Best practices to offer a seamless experience from the player POV

Please find some of our best practices to keep your players happy below.

Avoid change on the active period’s ID as player will drop its buffer because period became invalid

In MPEG-DASH, server-side ad or content replacement is done through different periods, each with a specific identifier, start and duration.

For the player – during live playback – the period identifier is used to identify the content and the already buffered data for that content. When during live playback the identifier of the period (buffered content) changes, the player will lose the reference to this content and consider the already buffered data invalid as it could have been replaced by other content meanwhile.

Therefore, the player will drop its buffer and require re-buffering to continue playback. For the end-user, this will result in a short black screen while the re-buffering will take place.

For this reason, it’s important to ensure the history of the playback remains untouched throughout the manifest updates ensuring the player can reconstruct it and map to its already buffered content.

Make sure the availabilityStartTime is set correctly

The availabilityStartTime attribute indicates the start of the DASH timeline in a dynamic (live) manifest. All timing related references are offset against this attribute, for example segment URLs in the segmentTimeline of your adaptation/representation. The default value for this attribute is Unix EPOCH time, in other words 1 January 1970 00:00:00. When calculating the time offset against this default value, you quickly run into very high number ranges.

On modern devices and browsers, this is not such a big deal except for potential precision issues, but on older devices (like Smart TVs) this quickly becomes problematic.

The rationale for this is that these devices only support 32-bit integer values causing an overflow when doing offset calculations. Either way, players have clever tricks to circumvent this or ensure that the offset doesn’t run into very high number ranges.

Gaps are problematic depending on the device

When doing multi-period DASH or discontinuities in HLS, it is not always trivial to ensure there are no gaps in the timeline when switching between periods/discontinuities.

When you have a gap in your timeline a player will try to “jump” this gap by seeking forward in the content. As a viewer this is hardly noticeable on some platforms; on others this can result in very strange side-effects.

For example, some Smart TVs only support seeking to a keyframe, typically the nearest keyframe to the time point you are seeking towards. Strangely enough by seeking forward the playback could resume in the past which results in an endless playback loop if the player is not able to handle this behavior.

Ensure your timeline overlaps with the actual live point

We regularly see problems on MPEG-DASH live streams where the availability time window based on the segment timeline of the manifest is not overlapping the actual live point where video playback will take place.

As a result, the player will not be able to determine the relevant segment(s) to download resulting in the playback to stall. Care should be taken that this availability window does not lie in the past too much.

Choosing Playready or Widevine greatly impacts performance on Smart TVs

Depending on the Smart TV model, the choice between Playready and Widevine can greatly impact your playback performance.

In particular, this is true when you try to playback UHD or high framerates in combination with DRM. For certain devices, mostly older Smart TVs, it is advised to leverage their hardware capabilities and align your choice of DRM with it.

For example, WebOS will make the choice even harder as older models only support PlayReady with persistence allowed.

Performance aspects on Smart TVs

When building an app on Smart TVs that incorporates video playback, special care needs to be taken not to perform too much work on the main thread during video playback.

Examples here are analytics integrations on top of the video player.

Make sure to offload work as much as possible to separate worker threads so as not to impact video playback which happens on the main thread.

Conclusion

As you were able to see, there are many traps to avoid when it comes to Manifest Manipulation. It is important that you can rely on partners who have the experience and the expertise to help you with your projects, especially when it comes to scaling up.

A combined eco-system where THEO provides the player and broadpeak.io, the SSAI takes away the complexity of solving these challenges and ensures stable playback on a wide range of devices.

Please do not hesitate to contact us if you have any questions or any ideas of project related to content insertion/replacement in adaptive bitrate.

The THEO and broadpeak.io teams

contact@THEOplayer.com –  contact@broadpeak.io

Photo by Christina Morillo

Mathias Guille
https://linkedin.com/in/mathiasguille
Mathias Guille is the Vice President Cloud Platform at Broadpeak. He leads the strategic development of Broadpeak’s cloud platform, including the building of the company’s infrastructure in the cloud and in public datacenters, the design of Broadpeak’s platform on top of the infrastructure and the shaping of the company’s applications to accommodate SaaS offerings.