Since the apparently successful ending of Breaking Bad, I decided to give it a chance. So far I’m at the third episode of the second season and still don’t feel as attached to the series as everyone told me I should, so inspired by this post I decided to have a look at the IMDB ratings to see if the quality of the episodes actually increases with time (as I was told).
To begin with, I implemented a small R script that converts the page of IMDB ratings of a series into an R data frame. At that point I decided that, rather to only analyze Breaking Bad, I would also have a look at some of my favorite series: Lost, How I Met Your Mother, Homeland, Big Bang Theory and Dexter.
First of all I had a look at the distribution of the ratings:
As can be seen from the graph, the user ratings follow a normal distribution with a mean of 8.3.
To be able to compare all of the ratings of all episodes among all the series I decided to use the z-score index. With this, I could show a value that indicates which episodes are below or above the mean, and by how much considering the standard deviation of the whole dataset.
Considering that the dataset contains six different series, and a total of 641 episodes, I decided to use a heatmap as the visual metaphor to explore it. Using Tableau I came to this:
From this visualization, a few conclusions can be extracted:
One of the characteristics that appeals me about data visualization is that, while having the ability to actually see the data, many questions pop up in your mind. In this case, the downfall of Dexter’s finale made me wonder how many people rated that episode, and furthermore, is there any kind of relationship between the number of user votes and the actual rating of each episode? One way to visually answer this question is to make a scatter plot. In this case, I showed user ratings in the X axis against the logarithm of the number of votes to reduce the high difference between the most voted episode (Breaking Bad 5×14).
As the plot suggests, it seems that there does exist a subtle correlation between the average of user ratings and the number of votes (in fact, there is a correlation of 0.44 between both values). This could lead to two different interpretations:
Nevertheless, Dexter’s finale (which is rated very poorly by a large amount of people) is an exception to both interpretations, which suggests the need to study this fact in more detail. My hypothesis is that both best and worst episodes get the largest amount of votes. Although maybe people that don’t like an episode don’t even care about rating it.
As a final conclusion to this analysis, the data clearly shown that I should give Breaking Bad a chance, as apparently it is going to get extremely interesting!
Que bo! Ets super friqui!! :D:D
M’encanta l’article i la seva simplicitat.
Ara seria fantàstic poder portar això un pas endavant. Definir “seriós”, tant com a epítet de les series i dels usuaris. És probable que només els usuaris amb un cert perfil “seriós” votin a IMDB, i això podria tenir un efecte negatiu en les mitjanes de les sèries menys “serioses”.