Google Powers YouTube With Its Own Video-Transcoding Chips


YouTube is the world’s most popular video-sharing platform. Keeping it going was once thought to be difficult before Google purchased it in 2006. Since then, Google has fought hard to keep the site’s costs down, often reinventing Internet technology and copyright to do so. The primary infrastructure issue that YouTube must address for end-users today is delivering content optimized for your platform and bandwidth while ensuring efficiency. That involves using a resolution that suits your display and using a codec that your computer supports (and not blowing up your Internet connection with a massive file).

This entails converting a single video into a large number of different ones. Simply click on the gear for an 8K video to see nine different resolutions built from a single upload such as say 144p, 240p, 360p, 480p, 720p, 1080p, 1440p, 2160p, and 4320p. Any one of these video files must be made from the original 8K uploaded file—and bear in mind, this is just for your particular computer. YouTube aims to deliver videos in the most modern, effective codec possible to defeat latency, which costs a notable portion of the company’s expenses. However, encoding a video codec consumes a lot of computing power. Without dedicated hardware acceleration support for each new codec, decoding would be slow and inefficient on cheaper mobile devices. That means Google can only use the best codecs on modern devices, and it must have backups of the video in older codecs on hand for legacy devices.

When you remember that codecs are constantly being moved forward—and, again, with bandwidth being such a high expense of hosting the platform, it profits Google to press for an update to these new codecs as quickly as possible. Upgrading to a new codec necessitates transcoding any video (or at least a large portion of them) to the latest codec, which must be done every several years. The overall number of videos is too high, such as “500 hours of video are posted to YouTube every minute”. That’s not even considering YouTube Live (imagine all of this transcoding occurring in real-time, with a 100-millisecond delay), making it the world’s largest transcoding project. YouTube’s massive transcoding workload necessitates the creation of its own server chips.

Google takes the lead in creating codecs because they are so critical to YouTube’s popularity. Google purchased codec maker On2 Technologies in 2009 (the organization that provided the VP6 codec used in Flash content, which fuelled YouTube). The search giant has become a prominent codec developer ever since. After deploying and updating VP8 and VP9, Google moves on to the next codec, dubbed “AV1,” which expects to see widespread adoption one day. AV1 was developed as a result of industry collaboration. They are planning multiple generations of this chip, with machine tuning in the interim. And one of the core features of the next-generation chip is the use of AV1. This modern advanced coding format compresses data more effectively than VP9 while requiring a higher computational load to encode. AV1 is currently available in beta on YouTube and many other video platforms. These second-generation chips are also being phased into Google’s server farms.

The “VCU,” or “Video (trans)Coding Machine,” lets YouTube transcode a single video into over a dozen versions which helps YouTube to have a smooth, bandwidth-efficient, and profitable video platform. Similar to how GPUs help with graphics workloads and Google’s TPU (tensor processing unit) helps with Artificial Intelligence workloads. The VCU box is a full-length PCI-E card that resembles a graphics card in appearance. Two Argos ASIC chips are buried under a massive, passively cooled aluminum heat sink on a sheet. Since PCI-E isn’t powerful enough, there’s also what seems to be an 8-pin power connector on end. There will be 10 encoder cores on each chip, and each encoder core will encode 2160p in real-time at up to 60 FPS (frames per second).

Pulling off all this isn’t going to be an easy task as the cards have been developed to fit into Google’s large-scale computing framework. YouTube’s device would have a portion of dedicated “VCU computers” filled with the latest cards in each computing cluster, preventing Google from having to break open any server and load it with a new card. The cards are said to mimic GPUs because they fit in Google’s current accelerator trays. Also, with the TCO (total cost of ownership) of the setup relative to running the algorithm on Intel Skylake chips and Nvidia T4 Tensor core GPUs, this VCU strategy might save the business a lot of money.