Binging on content : Data driven
25 minutes
Have you ever noticed how Netflix recommends movies and shows that seem just right for you? Or wondered how streaming remains so seamless? The secret behind these features is Data Science!
Data Science powers Netflix's recommendation algorithms, ensuring you get content tailored to your interests. It also optimizes streaming quality, making sure your experience is smooth and uninterrupted. Curious about the magic behind Netflix's smart recommendations and flawless streaming? It’s all thanks to advanced data science techniques!
Netflix’s strategic adaptability in the digital age has stunned industry leaders, positioning it as the top streaming platform. By leveraging Big Data analytics, Netflix optimizes video stream quality and analyzes user viewing patterns. The platform invests significantly in advanced algorithms to ensure a flawless movie experience for its users.
A brief history of Netflix and its digital transformation journey
Founded in 1997 in Scotts Valley, California, Netflix began as a rent-by-mail DVD service, relying on third-party postal services for delivery. This model proved unprofitable, leading to a strategic shift in 2007 towards online streaming. This transition allowed Netflix to challenge established rental giants like Blockbuster. Today, Netflix’s adaptability and focus on data science have contributed to its success, with over 151 million paid subscribers across more than 190 countries.
The digital transformation journey of Netflix can be broadly classified into two categories
Netflix invests heavily in Artificial Intelligence to optimize content spending, saving over $1 billion annually. By continually enhancing their algorithms, they minimize the need for human intervention in programming decisions. Additionally, Netflix uses its proprietary CDN, Open Connect, to deliver content efficiently to multiple consumer groups within the same region. This approach reduces bandwidth costs, accelerates data recovery, and simplifies data management in case of mishandling.
Furthermore, by leveraging sophisticated data analytics models, Netflix uncovers customer behavior and buying patterns, allowing them to recommend the most likely preferred content to users.
Data Science at Netflix
So, what does the role of a Data Scientist at Netflix looks like?
The below image illustrates the four major responsibilities of a typical Data Scientist at Netflix:
Questions a data scientist at Netflix asks:
A data scientist at Netflix is looking to answer the following questions:
Content Interests: What specific content does a customer enjoy?
Browsing Behavior: How do users navigate or scroll through the platform?
Watch Duration: For how long do users watch content?
Geographical Location: Where are the customers located?
Preferred Viewing Time: When do users typically watch content?
Playback Behavior: How many times does a customer pause before finishing a movie?
Skipping Behavior: Do users fast forward to skip less interesting parts?
Rewatch Patterns: Are there any movies or videos that users watch repeatedly?
Device Usage: What type of device does the customer use (e.g., desktop, laptop, tablet, mobile, TV)?
Search Queries: What content do users actively search for?
Ratings: What are the IMDB, Rotten Tomatoes, and user ratings for the content?
Social Media Engagement: What factors drive social media brand engagement related to Netflix content?
In short, Netflix captures every click across its applications. This data is crucial for determining user preferences and interests, allowing Netflix to continually refine and update its content to retain customers effectively.
Recommendation systems and its types
A recommendation system uses Machine Learning algorithms to predict how much a user might rate or prefer a product. It typically relies on input such as past usage data or previous ratings to make these predictions.
In a cold start scenario, where there is little to no past data on user preferences, recommendation systems can still be effective. They achieve this by finding similarities between different products. Netflix, for example, uses this approach to recommend films that share similarities with those the user has already shown interest in.
When John buys two of the things that Tim buys, both have similarities, based on which John is recommended other two items Tim has bought. We often read the phrase “because you bought this, you should try...”
Because Tim and Amy buy two similar things, when John buys one of the two things, he is recommended the other too. We often read the phrase “people who bought this also bought...”
Use of Interleaving technique to identify best algorithms
In order to generate a ranked list of movies and TV shows that interest most to the users Netflix takes help of Ranking Algorithms. In order to choose which ranking algorithm shall perform best, Netflix abandoned the traditional A/B testing technique and decided to innovate its own algorithmic process.
While it saves a lot of time, interleaving technique provides much better results and allows to identify best algorithms. It uses the blend of rankings of algorithm A and B, unlike the traditional A/B testing where the two groups of viewers are exposed to the two ranking algorithms.
The use of interleaving technique is highly sensitive towards the quality of ranking algorithms and helps Netflix to provide personalized recommendations to its users.
Technology at Netflix:
Netflix leverages a combination of AWS infrastructure and its own proprietary technologies to enhance its streaming service:. They use Hadoop, Spark and Hive to store and process content in various formats.
At AWS, apart from the use of S3 servers for storing the data, they use the Amazon Kinesis service, which analyse the logs of terabytes of data streams efficiently. Amazon Kinesis, in real time, manages by itself data logs and events. It is a quick way to process the data from data streams, which is used by Netflix on an open source Apache Druid to derive insights about network performance. This real time processing and insights leads to high availability of content and better customer experience.
With Open Connect, Netflix’ own content delivery system, they take video streaming to the next level. Half the efforts are reduced by duplicating the most streamed movies/series on up to 80% of the CDN’s. But the key here is the use of client intelligence, that is, the Netflix user is the one calculating which CDN to retrieve information from instead of the other way around. Continuous tests for all available CDN’s helps determine what CDN to connect to and stream from. Data science at Netflix also tries to predict what content is watched by which customer, hence caching videos on different CDNs based on what the user’s closest to these CDNs might watch. These computations are called the content allocation within clusters (server clusters).
Based on user data, Netflix knows what content to invest in. They know which production house to work with for what genre of content, what actors and directors to invest in and what content to bring in at what time of the year. The popularity of Dark, Money Heist or Stranger Things come from a series of data driven decisions, from investing, releasing to recommending. When a new show or movie releases, a user’s content binging habits decide what trailer to show as to seek a positive interest. For example, a user that likes woman-centric content will see a trailer with the female lead even if the movie or series is not necessarily woman-centric.
80% of content watched on Netflix is a result of recommendation. Netflix has highly specific micro-genres when it comes to content, created by manual tagging of each show or series. This is done for over 15000 titles. Recommendation is said to create over 1 billion USD in terms of customer retention for Netflix.
With over 1300 recommendation clusters based on viewing preferences and 2000 groups in which users are classified, Netflix tries to personalize recommendation as much as possible. The recommendations begin on the home page, where each row consists of similar titles. Different rows consist of different genres. The recommendations follow a large number of data mining algorithms such as:
Reinforcement learning
Neural networks
Causal modelling
Probabilistic graphical models
Matrix factorization
Collaborative Filtering techniques use similarity matrices, which is a structured way of presenting the trends in viewing content. For each user, we could mark whether they have seen or not seen a series or movie. Then, we can find similar rows (users) in the similarity matrix.
Conclusion
Netflix shows have a success rate of 80%, that is, 80% of viewers will watch recommended shows and movies and 75% percent of the views is a result of the highly optimized recommendation algorithm. In addition, using data and algorithms for prediction or success rates, Netflix has reduced expenses on marketing and promotional events.
Netflix is also trying to implement a statistical approach, where each user is shown a different thumbnail for a series/movie. Then the thumbnail which is more likely to be successful in having people choose it would be used on a larger audience. Thumbnail attractiveness and intrigue generated is key in this type of recommendation. Hence, it is important to note that be it conventional AWS servers or conventional algorithms in recommendation, Netflix will try to surpass current technologies to provide a better streaming experience to its 180 million users.
A company that caught the data wave at the right time and moved from traditional DVD lending services to streaming services, has taken the world by storm with its implementation of data related technologies. Using the most modern technologies, focusing on cloud storage, and collecting every possible bit of data and analyzing it, Netlix has proved to the world that data is the future of businesses.