Road to beer expert

Introduction

From our own experience, we can tell that our beer tastes have drastically evolved over the past 10 years. At first, we only drank Pils and found other types of beer difficult to enjoy. Today, we can enjoy an IPA but are not yet satisfied when being served a Guinness.
An obvious question occurs: are these shifts in beer tastes we experience common to other beer drinkers as well? By analyzing detailed beer ratings submitted by millions of users on one of the largest beer rating website worldwide, namely BeerAdvocate, we want to determine if and how beer tastes of people evolve through time. To tackle this question, we explore and combine different data representation techniques. Moreover, using several unsupervised learning techniques, we detect similarities in the beer tastes at the beginning of the rating career of the users and analyze how they vary as they become more experienced. Lastly, we mathematically consolidate our findings with the use of statistics.

Why does it matter

It is of interest to two organizations involved in the beer industry to understand how people’s preference for beer changes over time.
On one hand, it can help beer businesses to make targeted advertising and better respond to changing consumer preferences. By understanding the factors that influence their preferences, breweries can tailor their products to better meet the evolving desires of their customers. By doing so, breweries can help clients make better choices as they continue to try new types of beer.
On the other hand, it can help beer consumers to try new types of beers. By identifying people that had similar preferences to their current ones, they can follow their beer tasting path. This will help them to make better choices and to confidently explore new horizons.

Dataset

The dataset consists of beer reviews from BeerAdvocate, one of the largest online beer-related websites, where thousands of users rate thousands of beers. The ratings are a weighted average of five different aspects rated by the reviewers: look, smell/aroma, taste, feel/palate and overall. Moreover, users are free to leave a textual comment. The dataset contains every single submitted review between 2001 and 2017. A preview of the front page can be seen on the left.

The unprocessed dataset can be summarized with the following statistics:

Users

Reviews

Beers

Years

However, we can see on the following histogram that a lot of users have very few ratings and few users make a large amount of them, it is a heavy-tailed distribution. The super-users rate a number of beers that is not possible for a single human, they may therefore be an organization or even a bot. They will thus need to be removed from the dataset. Users with only a few ratings are also to no use for our analysis as we cannot track their evolution.

Users filtering

For our results to be as meaningful as possible, we need to determine the proper subset of users on which we are going to conduct our analyses. As a matter of fact, not everyone posting an online review can be considered a beer expert and we need to impose some self-defined but yet solid criteria to define expertises. The filtering performed on the raw dataset can be divided into four different steps:

Market selection:

We decided to narrow down our research on users based in the United States. They represent the vast majority (73.6%) of the website’s active community. This selection makes users more comparable in the sense that they share a common drinking culture and should have access to the same products.

Expertise:

We decided that users can be called experts as soon as they have submitted 500 ratings on the website. Consequently, our analyses will focus on the evolution in the beer taste of users between their 1st and their 500th rating.

Evolution:

The number of reviews written by users is not a solid enough criterion to ensure a temporal evolution of their beer taste. For example, there are several users that were highly active during a very short period of time. We therefore required that users were active on the website for at least 100 different days. Moreover, to guarantee a temporal evolution that is not too short and not too long for 500 ratings, we filtered out users that took less than 1 year and more than 7 years to submit their 500 ratings.

Normal behavior:

We noticed the presence of several organizations on the website that would usually release many reviews at the same time and hence decided to threshold the number of monthly reviews to 70. The latter is also a small way to filter out eventual spam accounts.

Moreover, as the website divides the beers into 115 different styles, it would be complicated to extract meaningful findings. We have therefore reduced those styles into 15 more general categories.

The filtered dataset can now be summarized with the following statistics:

Relevant users

Reviews

Beer categories

Years

Temporality and graph analysis

Categories popularity through the years

The popularity of the different beer categories is changing over time. The following plot is a visualization of this fact and it shows the percentage of rating for each category per year:

Without surprise, IPA beers have gained much popularity during the analyzed period, becoming the undisputed most rated category. On the other hand, some beers did experience a popularity decrease such as Pale Ales and Pale Lagers. These shifts will be important to take into account when comparing a user from the beginning of the period and a user from the end.

Categories popularity vs. user expertise

As we intend to see the evolution of tastes during the users journeys and not through the years, we want to use the rating number instead of the date as the timeline. The next plot shows the change in percent of ratings for each category through the users "careers":

It seems that all the categories are quite constant along the rating numbers, however, we can observe that the Wild/Sour beers clearly increase the more the users gain experience. Since almost all the categories seem to have no meaningful evolution along the rating numbers, it will be necessary to cluster the different profiles to be able to extract the main patterns.
To examine these subtle evolutions and their relevance of beer preferences from a statistical perspective, we perform a linear regression analysis for each category.

Category	Linear regression coefficient [ % / 10 ratings ]	P-values
Wild/Sour Beers	0.107	5.893691e-24
Pale Ales	0.029	2.097780e-05
Stouts	0.026	6.179905e-06
India Pale Ales	0.016	3.576478e-02
Porters	0.01	2.492937e-03
Hybrid Beers	0.001	4.302372e-01
Weird Cocktail	-0.003	2.603927e-05
Specialty Beer	-0.004	2.149614e-01
Bocks	-0.015	7.395393e-08
Strong Ales	-0.016	5.366440e-03
Dark Ales	-0.018	3.196788e-11
Dark Lagers	-0.024	1.295243e-10
Brown Ales	-0.026	7.941614e-09
Wheat Beers	-0.027	1.578396e-08
Pale Lagers	-0.059	7.112391e-10

Linear regression coefficient in percent per 10 ratings:

By looking at the linear coefficients, we have the same observations. Wild/Sour beers seem to gain popularity when the users are more experienced. For the other categories, the effect is less important. However, there could be some crossing between multiple type of users, artificially flattening the curves

Users favorite beer evolution

To show the transition in taste between the beginner user and when he's gained experienced, the following graph shows an edge for each transition (favorite beer in the first and last 50 beers rated). Furthermore, the width of the edges shows the number of users taking the path and the size of the nodes represents the ratio of users ending up on this category over users starting.
By hovering on the edges or nodes, the statistics are revealed.

It can be seen that there is no single path, this could explain the lack of change in the ratios in the previous plot. We can still see that the Wild/Sour beers are more appreciated by experienced users, thus having a larger node. It is however necessary to cluster the different profiles to be able to extract the main patterns.

Clusters

As we want to see the evolution of similar users, we cluster them at the beginning of their "career". To do so, the K-means clustering is applied to the users represented as a 15-long vector containing the number of ratings in each category for their first 50 ratings. By doing so, the resulting clusters represent users with similar starting tastes.

Choice of the number of clusters

As K-means requires the number of clusters to be set, we used the silhouette score to choose it in a way that will give well separateds compact groups. By plotting the score for a reasonable range of cluster number, we can visually determine the best k.

Since we were hesitating between 4 or 5, we decided to plot the centroids to have a better visualization. It happened that k=5 had 3 clusters greatly correlated, so we finally decided to keep k=4, as the clusters were nicely spread out and different.
Clustering with GMM was also tried and gave us similar results. The simpler K-means algorithm was therefore chosen.

Visualization of the clusters

We can now apply the K-means algorithm and plot the average vector representations for the 4 clusters in the following figure:

Cluster 0:

The members of this cluster have a clear preference for the beers in the IPA category, the other ones are less represented. The Stouts and Strong Ales are the most rated from the lesser represented categories.

Cluster 1:

The second cluster is the most balanced with nonetheless some categories that stand out; The Pale Ales, India Pale Ales and Strong Ales.

Cluster 2:

The third cluster drinks and rates mainly Pale Lagers.

Cluster 3:

The last one has two favorites, the main one is the Stout category and the second one is the India Pale Ales. All the other categories lack representation.

The distribution of users in the clusters cannot be directly visualized as they reside in a 15-dimensions space. However, With the help of Principal Component Analyses (PCA) and UMAP, we are able to project those dimensions onto only two significant ones.

With both projections, we can see that the clusters are separated and that there are three "specialized" teams with each a favorite beer (IPA, Stouts and Pale Lagers) and a "generalist" team in the middle with a more spread-out distribution of ratings but still a slight preference for Pale Ales.

We decided to personify these 4 average users that will represent their respective team, you can meet them by swiping in the following carousel:

Cluster 0:
Jean-Michel IPA

I'm an hipster and the only thing I like is IPAs

Cluster 1:
Ada the explorer

I like to explore the different styles of beers

Cluster 2:
James Bland

I just started drinking beer and enjoy mainly Pale Lagers

Cluster 3:
Marcel Stout

I'm tough and I drink stouts

Users joining date

As we noticed previously it is important to take into account the change in popularity of the different categories through the years when comparing different users. This is particularly true when clustering, as we don't want to group them temporarily. The following plot shows the distributions of the joining dates of the users in the four clusters. It can be seen that the distributions are similar, with a peak around 2014, it is also at that time that the website experienced a massive gain in popularity. The clusters are therefore not influenced by the changes in popularity of the categories.

Taste evolution

First and last 50 beers

In this plot, we visualize the taste progression of the users. Each user was categorized into one of the four clusters presented above. During the 50 first ratings of their “career” we plot the average number of beers rated from each category (green line). We can already see that depending on the cluster a user is in, there are many differences. Then we plot the average number of beers rated during the 50 last ratings (450 to 499) for each cluster. Note that the separation of people into each cluster stays the same, we just look at their last ratings.
This plot shows that the taste of the average user has evolved (e.g. the team “James Bland” have given up drinking Pale Lagers and prefers now IPAs). Furthermore, we see that depending on the initial “team” of a user, its taste hasn't evolved the same. As an example, the team “James Bland” started drinking some dark lager beers, while the “Marcel Stout” haven’t changed their habits at all regarding this category.
We also added a plot showing all users together (independently of their team). This plot shows only small changes from first to last ratings in most of the categories. It is interesting to notice that many changes happened in each team, which are not visible in the overall graph, giving credit to the idea of clustering the users. Furthermore, we notice a large increase in interest in the Wild/Sour category, confirming that this category is reserved for the “experts”. Each plot also shows 95% confidence intervals which were computed using bootstrap resampling in each cluster.

Jean-Michel IPA team:

For this team, we can observe that the interest in the IPAs clearly decreases (from 45.6% to 32.2%). The members of this team seem to still like this category, but this interest is less than before. Their tastes are increasing in favor of Stout beers and also Wild/ Sour beers.

Marcel Stout team:

The behaviour of this team is quite the opposite of the behaviour of Jean-Michel's IPA team in that instead of having an increase in interest in Stout beers and a decrease in interest in IPA beers, we have the opposite. In this team, tastes are increasing for IPA beers (increase of about 5%) and there is clearly a decrease for Stout beers (about 8%).

James Bland team

This is the most interesting team. We can clearly see a shift from the Pale Lagers beers to the IPA beers. At first, the members of this team had a strong interest for Pale Lagers, then, this interest fades out (around 9% less) and the IPA beers interest drastically increases (around 7% more).

Ada the explorer team:

We note that overall, there is no huge difference between the first and last 50 ratings. We can see that there are some increases in the number of ratings for the IPAs (around 4%), the Stouts (around 2%) and the Wild/Sour beers (around 3.2%). We may also notice that there is a small decrease for the Wheat beers (1%).

Visualisation of the p-values

The following heatmap complements the plot above, showing the confidence we have in changes in each category of each cluster. We computed the P-value of the null hypothesis that there is no change between the first and last ratings. Since there are some extremely small values, spanning large orders of magnitude, we decided to take the logarithm and to clip some values. Values of 0 are statistically insignificant (p>0.05) and values of 20 are p values smaller than 10^-20.

To show the evolutions presented above throughout the whole journey of the users divided into the four clusters, the next plot shows the popularity of the categories through their 500 ratings.

With the clusters, the changes that compensated themselves in the overall analysis are revealed. However, as we will see in the following graphs, the members of a cluster don't always take the same path to expertise. Indeed, as the clusters were defined on the first 50 ratings, there is no assurance that they will stay similar. Some changes in taste may then still be hidden.

Favorite beer transitions in the clusters

The following graphs show the evolutions in taste for the users in each cluster. In the same way as for the general graph, the edges represent the transition in favorite beer from the 50 firsts to the 50 lasts for each user. The edge width shows the number of users taking this path and the size of the nodes represents the ratio of users ending up in this category over users starting.
By hovering on the edges or nodes, the statistics are revealed.

Conclusion

Looking back at our main question, we asked ourselves if beer tastes of users evolve as they gain more experience ?
In this work, we were able to identify trends and patterns in the evolution of beer tastes of BeerAdvocate reviewers from the United States of America. Overall, we found that the majority of experts showed a general trend of increasing preference for one type of beer : the Wild/Sour beer. However, there was significant variation among individuals which we detected with an unsupervised clustering method.
Even though we noticed some subtle changes inside the teams, this analysis also showed the supremacy of the IPAs, Pale Ales and Stouts. These categories are very popular and turned out to be the favorite beers of all teams as they became experts.
The favorite beers of the clusters as beginners and experts are summarized here:

Jean-Michel IPA:

45.6% IPAs
14.9% Pale Ales
9.1% Stouts

32.3% IPAs
16.3% Pale Ales
15.2% Stouts

Ada the explorer:

18.8% Pale Ales
16.4% IPAs
12.3% Strong Ales

21.1% IPAs
17.7% Pale Ales
12.4% Stouts

James Bland:

27.2% Pales Lagers
15.5% Pale Ales
9.5% Wheat Beers

20.3% IPAs
19.0% Pale Ales
10.6% Stouts

Marcel Stout:

30.3% Stouts
19.7% IPAs
12.3% Strong Ales

23.0% IPAs
20.3% Stouts
14.1% Pale Ales

Finally, our findings provide valuable insights about the behavior of beer reviewers and their preferences in the beer market. Future research could expand upon our findings by incorporating additional factors that may impact changes in preferences or examining other beer ratings websites.