Netflix: what happens when you click play?







This article is a chapter from my new book, Explain the Cloud to a Ten Year Old. The first version was written specifically for those who need an introduction to the clouds. Then I made a few updates and added a couple of chapters - “Netflix: what happens when you click“ start ”?” And “What is cloud computing?” - which are already a little beyond the knowledge of a beginner. I think that they will be able to interest even quite experienced people.



So if you need a good introduction to the clouds or you know someone who needs it, please take a look. I think you will like it. I am proud of what I did.



I wrote this chapter on the basis of a dozen sources that are at odds with each other. The basic facts change over time and depend on who tells the story and what audience. I tried as far as possible to make a coherent story. Please note that this is not a technical manual - this is an article with large pictures. For example, I have never even mentioned microservices .



Netflix seems so simple. Click "play" and the video will magically appear. Simple, isn't it? Yes, not especially.







After our discussions in the “What is Cloud Computing?” Chapter, you could expect Netflix to work with video using AWS. Click "play" in the application Netflix, and the video stored on S3, in the streaming mode via the Internet will go directly to you on the device.



Absolutely reasonable approach - for service sizes much smaller.



But Netflix works completely differently. Everything is much more complicated and interesting than you could imagine.



To understand why, let's look at Netflix statistics for 2017:





What have we learned?



Netflix is ​​huge. She is global, she has many subscribers, she plays a huge amount of video and she has a lot of money.



Another relevant fact is that Netflix works by subscription. Subscribers pay Netflix monthly and can unsubscribe at any time. When you click "play" to relax under Netflix, it would be better for this service to work normally. Dissatisfied subscribers unsubscribe.



Look deeper



Netflix is ​​a great example of the ideas we discussed, so this chapter will have much more detail than when describing other cloud services. One of the reasons for learning deeply about Netflix is ​​that they provide much more information than other companies. Communication in Netflix is ​​considered a major cultural value, and the company more than successfully maintains its standards.



I would even like to thank Netflix for their openness about their architecture. Over the years, the company has read hundreds of lectures and has written hundreds of articles about the details of their internal work patterns. It helps to improve the whole industry.



Another reason for such detailed detail is that the Netflix service is just amazing. Most of us used it - and who would not be interested to look behind the scenes and find out how it works?



Netflix works in two clouds - AWS and Open Connect



How does a company satisfy customers? With the help of clouds. In fact, two different clouds - AWS and Open Connect. Both of them should work together without problems to produce many hours of video content that satisfies users.



Three parts of Netflix: client, backend, content delivery network



You can mentally divide Netflix into three parts: the client, the backend, the content delivery network (CDN).



Client - a user interface that runs on any device used to view video lists and play them. It can be a mobile application on a smartphone, a website on a desktop computer, or even an application on a smart TV. Netflix controls every client on every device.



Everything that happens before you click on the play button happens in the AWS backend. This includes such things as preparing all new incoming videos and processing requests from all applications, websites, TVs and other devices.



Everything that happens after clicking "play" is processed on Open Connect. This is Netflix’s own content delivery network. She stores videos in various places around the world. When you click play, the video stream comes from Open Connect to your device, and the client demonstrates it. Don't worry - we'll talk about CDN later. Interestingly, in the company itself, the launch of the video is called not “click on“ play ”, but click on“ start ”on the title.” Each industry has its own jargon.



By controlling all three areas — client, backend, CDN — Netflix achieved complete vertical integration. The company controls your video viewing experience from start to finish. Therefore, it just works when you click “play”, being anywhere in the world. You are guaranteed to get the content you need, the one you want to watch and when you want it.



Let's see how it turns out.



Netflix began moving to AWS in 2008



Netflix has been operating since 1998. At first, she dealt with renting DVDs by mail. But the company saw the future in streaming video on demand. In 2007, Netflix introduced its video on demand service, which allowed subscribers to stream TV shows and movies through the company's website on personal computers, or through the company's special software on various platforms it supports, including smartphones and tablets, digital players, gaming consoles and "smart" TV.



The fact that there was a future for streaming video on demand might seem obvious. In principle, it was so. Personally, I worked in a couple of startups trying to produce video on demand. They failed. And Netflix was waiting for success. The company definitely worked well, but it came late to the market, which helped it. By 2007, the Internet was fast and cheap enough to support streaming video services. Before that, this was not. The emergence of fast, not very expensive mobile traffic and the introduction of powerful mobile devices, such as smartphones and tablets, simplified and cheapened the viewing of streaming video anytime, anywhere. The right moment - the key to success.



Netflix started with its own data centers



In 2007, EC2 was just starting, about the same time that Netflix started. Therefore, Netflix could not start using EC2. The company has built two data centers nearby. And they experienced all the problems that we discussed in previous chapters.



The construction of data centers is very costly. Time spent on ordering equipment, its installation and launch. Immediately after launching, they ran out of power, and everything had to be started anew. The time spent on equipment has forced the company to adopt a vertical scaling strategy.



Netflix wrote large programs that worked on large computers. This approach is called building a monolith. One program did everything. The problem is that if you grow as fast as Netflix, it is very difficult to make a reliable monolith. He was not reliable.



Denial of service caused Netflix to move to AWS



In August 2008, Netflix could not send DVDs for three consecutive days due to a failure in their database. It was unacceptable. It was necessary to solve something. The experience of building data centers taught the company an important thing: it did not work out well in building data centers. She was good at delivering videos to her customers. She had to focus on improvements in video delivery, and not on improvements in the construction of data centers. The construction of data centers was not a competitive advantage of the company - their advantage was the delivery of video.



At the time, Netflix decided to move to AWS. AWS was just starting, so it was a bold decision. The company moved to AWS because it needed a more reliable infrastructure. Netflix wanted to eliminate all weaknesses in the system. AWS offered highly reliable databases, storage, and data redundancy data centers. Netflix needed cloud services to no longer build unreliable monoliths. Netflix wanted to be a global service without building its data centers. There were no such opportunities in its old data centers and could not be.



Netflix chose AWS because it didn’t want to do undifferentiated hard work. Undifferentiated hard work is something that needs to be done, but that does not give any advantage to the core business of providing high-quality video viewing. AWS does all this hard work for Netflix. And this gives netfliksovtsam opportunity to concentrate on the provision of valuable business services.



Moving the company from its data centers to AWS took the company more than eight years. At this time, Netflix increased the number of clients eight times. Netflix now runs on hundreds of thousands of EC2 instances.



On AWS, Netflix is ​​more reliable.



Not that Netflix has ever experienced problems with AWS, but overall its service has become much more reliable than before. You will no longer see complaints like this:











The company has become so reliable because it has taken extraordinary steps for this. She works in three AWS regions: in Northern Virginia, in Portland, Oregon and in Dublin, Ireland. In each of them, Netflix operates in three different access zones.



The company has no plans to increase the number of regions of work. Adding new regions is very difficult and expensive. Most companies operate in only one region, not to mention two or three.



The advantage of working in three regions is that one of them may fall, and the other two will take its place and serve customers from the affected region. The fall of the Netflix region calls evacuation.



Consider one example. Suppose you are watching a new episode of "The House of Cards" in London, England. Most likely, your device is connected to the Dublin region, because it is the closest. What happens when the Dublin region falls? Will Netflix stop working for you? Not. After detecting a problem, Netflix will redirect you to Virginia. Your device will communicate with the Virginia region instead of Dublin. You may not even notice bounce.



How often does a region fail at AWS? Once a month. Well, in fact, the region does not refuse every month. Netflix performs monthly tests. Each month, Netflix specifically drops the entire region, just to make sure that their system copes with regional failures. You can evacuate the region in six minutes.



Netflix calls this a global service model. Every customer can be served from any region. It's amazing. And this does not happen automatically. AWS does not have a magic sauce to handle regional failure or customer service from different regions. Netflix did the work itself. The company is a pioneer in creating reliable systems using several regions. I do not know of other companies that would take such measures to ensure such a reliable service.



Another advantage of working from these three regions is that Netflix can cover the whole world. Netflix did the checks and found out that by launching the application anywhere in the world, in any case, you will receive quick service from one of these regions.



Netflix saves using AWS



This may surprise many, but AWS for Netflix is ​​cheaper. The cost of the cloud in terms of viewing the video goes many times less than when using the old data centers. Why? Due to the elasticity of the cloud.



Netflix can add servers as needed and return them when they are not needed. Instead of a bunch of additional computers doing nothing, just waiting for the peak load, the company uses exactly as many computers as it needs and when necessary.



What happens on AWS before you click "play"?



Everything that is not related to the video feed is processed on AWS. These include scalable computing power, scalable storage, business logic, scalable distributed databases, big data processing and analytics, recommendations, transcoding, and hundreds of other functions. You do not need to understand them all, but since this may seem interesting to you, I will explain them briefly.



Scalable computing power and storage



Scalable computing power is EC2, and storage is S3. There is nothing new for us here. Your device - iPhone, TV, Xbox, Android smartphone, tablet, etc. - communicates with the service operating in EC2. A list of potential movies to watch your device receives by contacting a computer running EC2. Detailed information about the video your device receives there. Everything works the same as in other cloud services.



Scalable distributed databases



Netflix uses DynamoDB and Cassandra as distributed bases. These names should not mean anything to you - they are just high-quality databases. Data is stored in the database. Information on your profile, on accounts, on all watched movies - all this is stored in the database. Distributed databases do not work on a single large computer, but on multiple computers. Your data is copied to many computers so that even if one or two computers storing your data fail, your data will be safe. In reality, all data is copied across all three regions. Due to this, if a region fails, your data will be available in another region to which you will be switched. And scaling means that the database can store as much data as you put into it. This is one of the main advantages of distributed databases. If more data arrives, more computers can be added.



Processing and analyzing big data



Big data is just a lot of data. Netflix collects a lot of information. The company knows that who watched, when and where they watched it. The company knows which videos its clients viewed, but did not begin to tell. She knows how many times each video was watched - and a lot more.



The collection and reduction of all data to a standard format is called processing. And extracting meaning from them is analysis. Data is analyzed to get answers to specific questions.



Netflix personalizes images for you



Here's a great example of how Netflix lures you into watching more movies using data analytics. When you look through the lists, choosing what to see - have you paid attention to the fact that for each film there is a picture? This is the title image.



The title image should intrigue you, get your attention and make you choose this video. The idea is that the more intriguing the image is, the more likely you are to watch the video. And the more videos you watch, the less likely you are to unsubscribe from Netflix.



Here is an example of the various title images for the Stranger Things series:







You may be surprised that each image for each video is selected specifically for you. Not everyone sees the same images.



Previously, everyone saw the same title picture. This is how it worked. Customers were shown one image selected at random from a set such as the one shown in the illustration above. Netflix counted all video views and recorded which image was shown to the user when the video was selected. Suppose, for our example with the series, that when showing the central image from the collage, the video was viewed 1000 times. And when showing the other pictures - only once.



Since the group image attracted users better than others, Netflix should have made it the title of the show forever. This is called data-based selection. Netflix works on the basis of data analysis. Data is collected — in this case, the number of views associated with each picture — and is used to make the best possible decisions — in this case, to select the title picture.



Sly, but can it be made even better? Yes, using even more data. Behind this topic, the future is about solving problems through data learning. We are different people. Do you think we can be motivated by the same images? Probably not. We have different tastes, different preferences. And Netflix knows this - so now it personalizes all the images that it shows you. She is trying to choose a picture that emphasizes the most important aspect of the video for you. How does she do it?



Remember that Netflix records and counts everything you do on its website. She knows which films you like, which actors you like, and so on. Suppose that among your recommendations was the film "Good Will Hunting." Netflix should choose a suitable title picture. The goal is to show a picture that will tell you that this film may be interesting for you. What picture do you need to show?



If you like comedy, Netflix will show you an image with Robin Williams. If you prefer melodramas, Netflix will show you an image of Matt Damon and Minnie Driver about to kiss.







Featuring Robin Williams, the service tells you that the film is likely to have humor, and since Netflix knows that you like comedies, this video is right for you. Image of Matt Damon and Minnie Driver conveys a completely different message. If you like comedies, and see this picture, you will most likely miss it. Therefore, choosing the right image is so important. It sends a strong, personalized signal about the subject of the film.



Here is another example - “Pulp Fiction”:







If you watched a lot of movies with Uma Thurman, then you will most likely see the title image with Uma. If you watched a lot of movies with Travolta, then you probably see the title image with John. See how choosing the best possible personalized image can increase the likelihood of watching a particular video?



Netflix appeals to your interest by choosing images, but the service does not want to lie to you. He does not want to show you just some kind of attractive image to make you watch a video that you don’t like. There is no incentive. The service does not charge for the number of views. The service tries to minimize regret and wants you to like what you are watching, so it selects the best title image for you from the possible ones. And this is just one small example of data analysis. Netflix uses similar strategies everywhere.



Recommendations



Usually, Netflix shows you 40-50 video choices, but the service has offers from thousands of videos. How does Netflix decide what to show? With the help of machine learning.



This is part of the processing of big data and analytics that we just talked about. The service studies the data and predicts what you might like. In general, everything you see on the Netflix screen was tailored specifically for you with the help of machine learning.



Transcoding from source file to the format you need



And here we are getting to how video is processed in Netflix. Before you can start browsing on your favorite device, Netflix must convert the video to the format best suited for your device. This process is called transcoding. It converts a video file from one format to another, so that the video can be watched on different platforms and devices. Netflix encodes all its videos on AWS on 300,000 processors simultaneously. This is more than almost any supercomputer!



Source data source



Who sends videos to Netflix? Studios and production associations. Netflix calls this raw data. A new video is being processed by the content team. It comes in a high resolution format, in volumes of many terabytes. To imagine a terabyte of information, imagine 60 pillars of paper, each one equal in height to the Eiffel Tower.



Before you can view the video, Netflix subjects it to a harsh, multi-step process.







Quality checking



At first, Netflix spends a lot of time checking video quality. She is looking for digital artifacts, color changes, missing frames that may have appeared due to previous transcoding attempts or data transfer problems. If any problems are detected, the video is rejected.



Processing conveyor



After confirming the quality of the video is sent to the processing pipeline. This is the sequence of steps that the data goes through before it can be used — something like a conveyor belt in a factory. More than 70 different processing programs are used to create each video.



It is impractical to process one file of several terabytes in size, so the first step of the pipeline will be to divide the video into many small pieces. The pieces pass through the pipeline so that they can be encoded in parallel - that is, they are processed simultaneously.



Let's show parallelization by example.







Suppose you have a hundred dirty dogs that need to be washed. How will this be done faster - if one person will wash one dog after another, or if you hire a hundred dog washers and wash them all at the same time?



Obviously, the process will be faster if a hundred washers work at the same time. This is parallelism. That is why Netflix uses so many EC2 servers. They need a lot of servers to process these giant video files in parallel. And it works - the company says that the source file can be encoded and sent to the CDN in just 30 minutes.



When the pieces are coded, they are checked to make sure that they do not have any new problems. Then they are collected again in one file and checked again.



The result is a bunch of files.



The encoding process creates many files. Why? Netflix’s ultimate goal is to support every device connected to the Internet. The company began working in 2007 on the Microsoft Windows platform. Over time, many other devices have been added - Roku, LG, Samsung Blu-ray, Apple Mac, Xbox 360, LG DTV, Sony PS3, Nintendo Wii, Apple iPad, Apple iPhone, Apple TV, Android, Kindle Fire and Comcast X1.



Overall, Netflix supports 2200 different devices. For each of them there is a video format that looks best on it. If you watch Netflix on the iPhone, you'll see the video that looks best on the iPhone. The company calls all the different video formats coding profile. Netflix also creates files optimized for different network speeds. If you watch a file on a high-speed network, you will see a video of better quality than if you watched it over a slower network.



There are also files for various audio formats. Audio is encoded for different levels of quality and different languages. In addition, there are files with subtitles. A video can have multiple subtitles in different languages. For each video there are several different viewing options. What you see depends on your device, the quality of the connection, the service tariff and the choice of language.



So how many files are there?



For the series The Crown Netflix stores about 1200 files. The second season of Stranger Things has even more files. He was filmed in 8K resolution and he has nine episodes. Source files occupy many terabytes of data. 190,000 processor hours were spent on coding only one season. The result was 9570 different video, audio and text files.



Now let's see how Netflix plays all these files.



Three different strategies for playing streaming video



Netflix has three different strategies for streaming video — its own small CDN, third-party CDN and Open Connect. CDN. CDN — (content distribution network). Netflix — , , , . , , . Netflix — S3.



CDN?



: , . , . CDN .



, , , . , , . , .



, , (PoP, point of presence). PoP — . , . PoP .



CDN



2007 , Netflix , 36 50 , , . Netflix CDN . , .



CDN



2009 Netflix CDN. CDN . Netflix . , CDN?



Netflix CDN , Akamai, Limelight Level 3. , CDN. . , NFL Akamai, . CDN, Netflix .



. Netflix , . , , , . — — , CDN — .



Netflix AWS-, . Netflix AWS . — , , . — .



Netflix , , CDN.



Open Connect



2011 Netflix , CDN- . — , . Netflix Open Connect, CDN. 2012 . Open Connect :





CDN , . Netflix .



Netflix , . , . , , , CDN. Netflix . , .



Netflix CDN. , Open Connect.



Open Connect



, , CDN ? Netflix . Netflix Open Connect (Open Connect Appliances, OCA). , OCA:







— OCA . OCA . OCA — , , -, . , OCA:







. , . , . .



. PC- . , . , Netflix . , .



, FreeBSD nginx -. , — -. nginx. — , .



, Netflix , , , . «», , — , , .



Netflix . . — -Netflix . ?



Netflix ?



Netflix , 1000 - . :







, YouTube Amazon, . . It is very difficult and expensive. Netflix CDN.



Netflix , -. - (ISP) . Netflix , , (internet exchange location, IXP).



Netflix -, — - . Brilliant! .



ISP CDN



ISP — . , . Verizon, Comcast, . , ISP . - ISP, Netflix .



IXP CDN



— -, ISP CDN . , , . , .



IXP :







IXP :







, ( AMS-IX ):







. . IXP — , :







Netflix . IXP . IXP, -.





Netflix , S3. . : ! Netflix , , .







[. cache — ]? , , , . , ? , — . . , , , . , , . , ; - , .



— , , , . Netflix , , . Netflix , . , , Netflix — , ? , , , , . , ISP IXP. . . , - . . , , , .







Netflix .. . ISP IXP. , . , Netflix. - , Netflix. S3.



AWS, . , , . , . , . - , .



Netflix , , , . , , , ISP.



Open Connect . — , , . CDN, . Netflix , , , . , .



Netflix ? . 2013 3 ; , , , . , , , , .



. « » — . ? , , . , , « »? Netflix , , .



. , . Why? , .



, . Netflix . , , .



, Daredevil 2016 , Netflix .



: ?



Why do providers agree to host OCA clusters? At first glance, this is too generous - but you will be interested to know that in fact there is a personal benefit involved. To understand this, we need to talk about how networks work. In this book, we discussed that access to cloud services via the Internet. In the case of Netflix, this is not the case - when watching a video through the Netflix application, it communicates with AWS over the Internet. [it is not very clear what difference the author had in mind // comment. transl.]



The Internet is the interconnection of networks. You have a provider that provides access to the Internet. I get my online service from Comcast. This means that my home connects to the Comcast network using fiber. The Comcast network belongs to them - this is not the Internet, the Internet is something else.



Suppose I want to perform a search in Google, and I type a query in the browser and hit Enter. My request first goes over the Comcast network. Google is not located on the Comcast network. At some point, my request should go to the Google Network. For this, there is the Internet. The Internet connects the Comcast network to the Google network. There are routing protocols working in the manner of a street traffic controller and directing network traffic. When my request is sent to the Internet, it is outside the Comcast network and outside the Google network. It is located on the Internet backbone . The Internet is connected from a variety of private networks that have decided to interact with each other. IXP is one of the ways that networks communicate with each other.



In the US, there is a map of distant fiber optic networks:







Netflix, with its Open Connect, did the following: it placed the OCA clusters within the ISP networks. This means that if I watch a Netflix video, I communicate with the OCA located on the Comcast network. All my video traffic goes through their network and does not go online.



The key to scaling video delivery is to be as close to the user as possible. When you do this, you do not use the Internet backbone. Requests are satisfied in the local part of the network. Why is it good? Remember, we said that Netflix already consumes more than 37% of Internet traffic in the United States. If the ISP did not cooperate with the company, it would use even more Internet. The Internet could not handle all traffic. Providers would need to add more capacity, and this is very expensive.



Now almost 100% of Netflix content is transmitted within provider networks. This reduces the cost of network maintenance, because it does not clog the Internet. At the same time, Netflix subscribers receive high-quality video, and network performance improves for everyone. Everybody wins.



Open Connect is robust and fault tolerant



We have already discussed how Netflix increases system reliability by operating in three AWS regions. The architecture of Open Connect achieves the same goals. This may not be obvious, but OCAs are independent of each other. OCA operate as self-sufficient archipelagoes for video delivery. Subscribers receiving video from one OCA do not suffer from failures of other OCA.



What happens when a OCA fails? The client program you are using instantly switches to another OCA and resumes the display. What happens if OCA uses too many people in one place? The client program finds a less loaded OCA. What happens if the network used to transmit video is overloaded? Same thing - the program finds another OCA on a better network. Open Connect is a very reliable and fault-tolerant system.



Netflix controls the client



Netflix cleverly handles failures because it controls the client on all devices. The company itself develops applications for Android and iOS, so you can expect that it controls them too. But even on platforms such as Smart TV, where Netflix did not create clients, it still controls it, because it controls the application development package (SDK).



The SDK is a set of development programs that allow you to create applications. Each Netflix application makes requests to AWS and plays videos using the SDK. By controlling the SDK, Netflix can constantly and in real time adapt to slow networks, failed OCA and other possible problems.



And finally: here's what happens when you click "play"



We walked towards this for a long time, and learned a lot. Here is what we know at the moment:





Here is a picture with which Netflix describes the playback process:







Let's finish it:





This is what happens when you click “play” on Netflix. Who would have thought that such a simple thing as watching a video could be so complicated?



All Articles