Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
The Academy Awards are a week away, and I’m sharing my
machine-learning-based predictions for Best Picture as well as some
insights I took away from the process (particularly XGBoost’s
sparsity-aware split finding). Oppenheimer is a heavy favorite
at 97% likely to win—but major surprises are not uncommon, as we’ll
see.
I pulled data from three sources. First, industry awards. Most unions
and guilds for filmmakers—producers, directors, actors,
cinematographers, editors, production designers—have their own awards.
Second, critical awards. I collected as wide as possible, from the
Golden Globes to the Georgia Film Critics Association. More or less: If
an organization had a Wikipedia page showing a historical list of
nominees and/or winners, I scraped it. Third, miscellaneous information
like Metacritic score and keywords taken from synopses to learn if it
was adapted from a book, what genre it is, the topics it covers, and so
on. Combining all of these was a pain, especially for films that have
bonkers names like BİRDMAN or (The Unexpected Virtue of
Ignorance).
The source data generally aligns with what
FiveThirtyEight used to do, except I casted a far wider net in
collecting awards. Other differences include FiveThirtyEight choosing a
closed-form solution for weighting the importance of awards and then
rating films in terms of “points” they accrued (out of the potential
pool of points) throughout the season. I chose to build a machine
learning model, which was tricky.
To make the merging of data feasible (e.g., different tables had
different spellings of the film or different years associated with the
film), I only looked at the movies who received a nomination for Best
Picture, making for a tiny dataset of 591 rows for the first 95
ceremonies. The wildly small N presents a challenge for building a
machine learning model, as does sparsity and missing data.
Sparsity and Missing Data
There are a ton of zeroes in the data, creating sparsity. Every
variable (save for the Metacritic score) is binary. Nomination variables
(i.e., was the film nominated for the award?) may have multiple films
for a given year with a 1, but winning variables (i.e., did the film win
the award?) only have a single 1 each year.
There is also the challenge of missing data. Not every award in the
model goes back to the late 1920s, meaning that each film has an
NA
if it was released in a year before a given award. For
example, I only included Metacritic scores for contemporaneous releases,
and the site launched in 2001, while the Screen Actors Guild started
their awards in 1995.
My first thought was an ensemble model. Segment each group of awards,
based on their start date, into different models. Get predicted
probabilities from these, and combine them weighted on the inverse of
out-of-sample error. After experimenting a bit, I came to the conclusion
so many of us do when building models: Use XGBoost. With so little data
to use for tuning, I simply stuck with model defaults for
hyper-parameters.
Outside of its reputation for being accurate out of the box, it
handles missing data. The docs simply
state: “XGBoost supports missing values by default. In tree algorithms,
branch directions for missing values are learned during training.” This
is discussed in deeper detail in the “sparsity-aware split finding”
section of the paper
introducing XGBoost. The full algorithm is shown in that paper, but the
general idea is that an optimal default direction at each split in a
tree is learned from the data, and missing values follow that
default.
Backtesting
To assess performance, I backtested on the last thirty years of
Academy Awards. I believe scikit-learn would call this group
k-fold cross-validation. I removed a given year from the dataset,
fit the model, and then made predictions on the held-out year. The last
hiccup is that the model does not know that if Movie A from Year X wins
Best Picture, it means Movies B – E from Year X cannot. It also does not
know that one of the films from Year X must win. My cheat
around this is I re-scale all the predicted probabilities to sum to
one.
The predictions for the last thirty years:
Year | Predicted Winner | Modeled Win Probability | Won Best Picture? | Actual Winner |
---|---|---|---|---|
1993 | schindler’s list | 0.996 | 1 | schindler’s list |
1994 | forrest gump | 0.990 | 1 | forrest gump |
1995 | apollo 13 | 0.987 | 0 | braveheart |
1996 | the english patient | 0.923 | 1 | the english patient |
1997 | titanic | 0.980 | 1 | titanic |
1998 | saving private ryan | 0.938 | 0 | shakespeare in love |
1999 | american beauty | 0.995 | 1 | american beauty |
2000 | gladiator | 0.586 | 1 | gladiator |
2001 | a beautiful mind | 0.554 | 1 | a beautiful mind |
2002 | chicago | 0.963 | 1 | chicago |
2003 | the lord of the rings: the return of the king | 0.986 | 1 | the lord of the rings: the return of the king |
2004 | the aviator | 0.713 | 0 | million dollar baby |
2005 | brokeback mountain | 0.681 | 0 | crash |
2006 | the departed | 0.680 | 1 | the departed |
2007 | no country for old men | 0.997 | 1 | no country for old men |
2008 | slumdog millionaire | 0.886 | 1 | slumdog millionaire |
2009 | the hurt locker | 0.988 | 1 | the hurt locker |
2010 | the king’s speech | 0.730 | 1 | the king’s speech |
2011 | the artist | 0.909 | 1 | the artist |
2012 | argo | 0.984 | 1 | argo |
2013 | 12 years a slave | 0.551 | 1 | 12 years a slave |
2014 | birdman | 0.929 | 1 | birdman |
2015 | spotlight | 0.502 | 1 | spotlight |
2016 | la la land | 0.984 | 0 | moonlight |
2017 | the shape of water | 0.783 | 1 | the shape of water |
2018 | roma | 0.928 | 0 | green book |
2019 | parasite | 0.576 | 1 | parasite |
2020 | nomadland | 0.878 | 1 | nomadland |
2021 | the power of the dog | 0.981 | 0 | coda |
2022 | everything everywhere all at once | 0.959 | 1 | everything everywhere all at once |
Of the last 30 years, 23 predicted winners actually won, while 7
lost—making for an accuracy of about 77%. Not terrible. (And,
paradoxically, many of the misses are predictable ones to those familiar
with Best Picture history.) However, the mean predicted probability of
winning from these 30 cases is about 85%, which means the model is maybe
8 points over-confident. We do see recent years being more prone to
upsets—is that due to a larger pool of nominees? Or something else, like
a change in the Academy’s makeup or voting procedures? At any rate, some
ideas I am going to play with before next year are weighting more
proximate years higher (as rules, voting body, voting trends, etc.,
change over time), finding additional awards, and pulling in other
metadata on films. It might just be, though, that the Academy likes to
swerve away from everyone else sometimes in a way that is not readily
predictable from outside data sources. (Hence the fun of watching and
speculating and modeling in the first place.)
This Year
I wanted to include a chart showing probabilities over time, but the
story has largely remained the same. The major inflection point was the
Directors Guild of America (DGA) Awards.
Of the data we had on the day the nominees were
announced (January 23rd), the predictions were:
Film | Predicted Probability |
---|---|
Killers of the Flower Moon | 0.549 |
The Zone of Interest | 0.160 |
Oppenheimer | 0.147 |
American Fiction | 0.061 |
Barbie | 0.039 |
Poor Things | 0.023 |
The Holdovers | 0.012 |
Past Lives | 0.005 |
Anatomy of a Fall | 0.005 |
Maestro | 0.001 |
I was shocked to see Oppenheimer lagging in third and to see
The Zone of Interest so high. The reason here is that, while
backtesting, I saw that the variable importance for winning the DGA
award for Outstanding Directing – Feature Film was the highest by about
a factor of ten. Since XGBoost handles missing values nicely, we can
rely on the sparsity-aware split testing to get a little more
information from these data. If we know the nominees of an award but not
the winner yet, we can still infer: Anyone who was nominated is left
NA
, while anyone who was not nominated is set to zero. That
allows us to partially use this DGA variable (and the other awards where
we knew the nominees on January 23rd, but not the winners). When we do
that, the predicted probabilities as of the announcing of the
Best Picture nominees were:
Film | Predicted Probability |
---|---|
Killers of the Flower Moon | 0.380 |
Poor Things | 0.313 |
Oppenheimer | 0.160 |
The Zone of Interest | 0.116 |
American Fiction | 0.012 |
Barbie | 0.007 |
Past Lives | 0.007 |
Maestro | 0.003 |
Anatomy of a Fall | 0.002 |
The Holdovers | 0.001 |
The Zone of Interest falls in favor of Poor Things,
since the former was not nominated for the DGA award while the latter
was. I was still puzzled, but I knew that the model wouldn’t start being
certain until we knew the DGA award. Those top three films were
nominated for many of the same awards. Then Christopher Nolan won the
DGA award for Oppenheimer, and the film hasn’t been below a 95%
chance for winning Best Picture since.
Final Predictions
The probabilities as they stand today, a week before the ceremony,
have Oppenheimer as the presumptive winner at a 97% chance of
winning.
Film | Predicted Probability |
---|---|
Oppenheimer | 0.973 |
Poor Things | 0.010 |
Killers of the Flower Moon | 0.005 |
The Zone of Interest | 0.004 |
Anatomy of a Fall | 0.003 |
American Fiction | 0.002 |
Past Lives | 0.001 |
Barbie | 0.001 |
The Holdovers | 0.001 |
Maestro | 0.000 |
There are a few awards being announced tonight (Satellite Awards, the
awards for the cinematographers guild and the edtiors guild), but they
should not impact the model much. So, we are in for a year of a
predictable winner—or another shocking year where a CODA or a
Moonlight takes home film’s biggest award. (If you’ve read this
far and enjoyed Cillian Murphy in Oppenheimer… go check out his
leading performance in Sunshine,
directed by Danny Boyle and written by Alex Garland.)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Continue reading: Modeling the Oscar for Best Picture (and Some Insights About XGBoost)
Predicting the Oscars with Machine Learning
The original article offered data on the use of machine learning to predict the winners of the Academy Awards. This approach used numerous datasets related to previous awards, critical accolades, and additional factors such as genre or adaptation. This data was then fed into machine learning algorithms, notably the XGBoost Model, which can deal with missing values and data sparsity, common issues that occur when compiling studies this vast and arduous.
Implications
The ability to predict Oscar outcomes accurately, even with an accuracy rate of about 77%, is both intriguing and revealing. Additionally, the algorithm was observed to be slightly overconfident, indicating a potential area for future work. On the other hand, the consistent accuracy might also signify that the model was able to capture some outright patterns or rules that determine Oscar wins, possibly shedding light on tendencies or biases inherent in the Academy’s decision-making process.
Possible Future Developments
Technology and artificial intelligence are gradually ingraining themselves into the film industry, and this predictive algorithm is a clear example of their potential usage. The model could theoretically be extended to predict other outcomes, perhaps even aiding film production companies in designing films to maximize their Oscar potentials. Though this would require the accuracy to be substantially improved and the existence of consistent, predictable patterns in Oscar decision-making.
Actionable Advice
Model Improvement
The first area where action can be taken is model improvement. As mentioned in the original article, there are changes in the rules, voting body, and voting trends over time – weighting more proximate years higher might be a feasible improvement to the model. It may also be worthwhile to consider if any other variables might impact the Academy’s voting behavior and try incorporating them into the current model.
Field Usage
The model could be of interest to film production companies, news agencies, or even betting companies – all of which would profit from accurate predictions about the Oscars. This could create market demand, leading to commercialization opportunities for such a model.
Studying Voting Decisions
If the model continues to predict Oscar outcomes correctly, it might indicate that there are consistent rules behind the voting decisions. Further exploration might reveal tendencies or biases in the Academy’s voting, which would pose interesting questions about the fairness and independence of the voting process.
In Conclusion
Although this a promising and exciting predictive model, its accuracy and subsequent analysis must be taken with a grain of salt, as this model isn’t perfect. Regardless, this use of machine learning is a fascinating peek into possible applications of AI and data science within the film industry. Keep an eye on future developments in this area – it’s definitely a space worth watching.