[This article was first published on R on Publishable Stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

In statistics, there are a number of classic datasets that pop up in examples, tutorials, etc. There’s
the infamous iris dataset (just type iris in your nearest R prompt),
the Palmer penguins (the modern iris replacement),
the titanic dataset(s) (I hope you’re not a guy in 3rd class!), etc. While looking for a dataset to illustrate a simple hierarchical model I stumbled upon another one: The cake dataset in
the lme4 package which is described as containing “data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures [as] presented in Cook (1938)1”. For me, this raised a lot of questions: Why measure the breakage angle of chocolate cakes? Why was this data collected? And what were the recipes?

I assumed the answers to my questions would be found in Cook (1938)1 but, after a fair bit of flustered searching, I realized that this scholarly work, despite its obvious relevance to society, was nowhere to be found online. However, I managed to track down that there existed a hard copy at Iowa State University, accessible only to faculty staff.

The tl;dr: After receiving help from several kind people at Iowa State University, I received a scanned version of Frances E. Cook’s Master’s thesis, the source of the cake dataset. Here it is:

Cook, Frances E. (1938). Chocolate cake: I. Optimum baking temperature. (Master’s thesis, Iowa State College).

It contains it all, the background, the details, and the cake recipes! Here’s some more details on the cake dataset, how I got help finding its source, and, finally, the cake recipes.

The cake dataset

The cake dataset can be found in
the lme4 package with the following description:

Data on the breakage angle of chocolate cakes made with three different recipes and baked at six different temperatures. This is a split-plot design with the recipes being whole-units and the different temperatures being applied to sub-units (within replicates). The experimental notes suggest that the replicate numbering represents temporal ordering.

So for each of the $3 times 6 = 18$ recipe and temperature combinations, Cook made 15 (!) replicates, resulting in a total of $3 times 6 times 15 = 270$ cakes/datapoints. Here’s the first couple of rows:

replicate recipe angle temperature
1 A 42 175
1 A 46 185
1 A 47 195
1 A 39 205
1 A 53 215
1 A 42 225
1 B 39 175

If you want the full dataset without getting lme4 here’s the cake dataset as a CSV file. Plotting this dataset we can quickly conclude that the cake breakage angle increases as a function of baking temperature:

While the cake dataset is found in lme4, the original source is Cochran and Cox’s book Experimental designs2. But what’s the original original source? Any why measure the cake breakage angle?

The hunt for the source of the cake dataset

From the lme4 documentation I knew that the cake dataset came from the study by Cook (1938)1 but no amount of Googling, Binging, nor Google Scholaring resulted in any trace of a digital copy.
I did find that physical copies existed at Iowa State University and at Cornell, which presented a problem for me, being physically in Sweden.
There was an option to request that the copy would be digitized, an option available to Iowa State faculty only.

Twitter to the rescue, I thought, and fired away a tweet that got a tumbleweed response.
But, final proof for me that Twitter is dying, the same request on Mastodon (
come join me!) was an astounding success!

I got many helpful responses, with several pointing me directly at Iowa State staff that might help me out. Like this one from
Karl Broman:

A quick e-mail later and I got this very encouraging e-mail from Dan Nettleton at the Department of Statistics, Iowa State:

He recruited the help of Philip M. Dixon, Department of Statistics, and Megan O’Donnell, Research Data Services Lead, and after a couple of days more I got this from Megan:

She (the busy Research Data Services Lead with a looming deadline) is apologizing to me (the random Swede with an eccentric cake thesis digitization request) that it took a few days to get me everything I asked for!? Still, the feeling of shame for having wasted Megan’s time was overshadowed by joy. Attached to the e-mail was, of course, also the full Master’s thesis of Frances E. Cook from 1938: Chocolate cake: I. Optimum baking temperature..

Highlights from Chocolate cake: I. Optimum baking temperature

Reading the thesis, it’s immediately clear that the breakage angle of cakes wasn’t the main focus. Instead, Cook was after some “accurate scientific information” on the optimum baking temperature for chocolate cake.

To figure out what was the best chocolate cake, she needed a battery of measures of cake goodness, such as cake tenderness, as measured objectively by its breaking angle. There were also several subjective measures, as found in the “Score Card for Cake” on page 50.

But how was the breaking angle of the cakes measured? In the thesis, we learn that “The tenderness of the cake was tested with the breaking angle apparatus as described by Myers (1936)3”, but there are no images that show us how it functioned. While I can’t find an online trace of Myers (1936)3 I do believe I’ve found a description of this very apparatus in Lowe and Nelson (1939)4!

From an outsider perspective, not being active in the field of culinary research myself, the thesis of Cook comes off as being fantastically serious about cake. I especially adore that it includes photographs of all the cakes:

But, to be fair, in the photos above, you can clearly see how the baking temperature influences the volume of the cake.

The cake recipes

Like in a food blog that has been SEOed to death, here, finally, at the very end, are the cake recipes. I might not be the most experienced cake maker, but this is by far the most complicated chocolate cake recipe I’ve ever seen.

Now, for the baking time and temperature above you get a matrix of options.
The answer for which option to pick can be found a bit further down in table XV, which displays the total scores for each option.

The winner, when considering the dimensions texture, tenderness, velvetiness and eating quality, was Recipe C with a baking temperature of 225 C° (437 F°) for 24 minutes. I’m no cake scientist, but if a linear model is to be believed when extrapolating outside of the range of the dataset (always a good idea) this cake would be delicious when baked in a pizza oven!


  1. Cook, Frances E. (1938). Chocolate cake: I. Optimum baking temperature. (Master’s thesis, Iowa State College). ↩ ↩ ↩

  2. Cochran, W. G., and Cox, G. M. (1957) Experimental designs, 2nd Ed. New York, John Wiley & Sons. ↩

  3. Myers, Elizabeth. (1936). Plain Cake X. Effect of two temperatures of ingredients at time of combining on fat distribution as determined by microscopical examination. (Unpublished thesis, Iowa State College) ↩ ↩

  4. Lowe, Belle and Nelson, P. Mabel (1939) The physical and chemical characteristics of lards and other fats in relation to their culinary value. II. Use in plain cake. Iowa Agrigultural Research Bulletin 255. ↩

To leave a comment for the author, please follow the link and comment on their blog: R on Publishable Stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you’re looking to post or find an R/data-science job.


Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.

Continue reading: The source of the cake dataset

Uncovering The Significance of The Chocolate Cake Dataset

The Chocolate Cake Dataset, available in the lme4 analytics package for studying a hierarchical model, reflects how the breakage angle varies according to the baking temperature and recipe used. Despite its convenient presence in this library, it appears there’s more to this set of data than initially meets the eye. Deeply steeped in historical roots, this dataset comes from a study conducted by Frances E. Cook in 1938.

The Origin of the Cake Dataset

The foundation of this dataset is a master’s thesis titled “Chocolate cake: I. Optimum baking temperature” written by Frances E. Cook at Iowa State University. After hunting down this surprisingly rare digitized resource, it was discovered that the dataset portrays different baking scenarios that Cook explored to uncover the ideal baking temperature for chocolate cake.

Dataset Composition and Methodology

In the study, Cook created cakes using three different recipes and baked them at six different temperatures. She then evaluated various properties of the resulting cakes, such as their breakage angle and subjective measures like tenderness and velvetiness. The dataset is organized with 15 replicates for each of the 18 recipe-temperature combinations, leading to 270 total data points!

Predictions and Conclusions

An analysis of this data presents an interesting finding – the cake breakage angle increments as the baking temperature rises. Besides, Recipe C baked at a temperature of 225°C for 24 minutes was rated highest for texture, tenderness, velvetiness and overall eating quality, making it an ideal choice for chocolate cake lovers!

Long-term Implications

The cake dataset has implications beyond its original context, extending its value to the fields of statistical modeling and machine learning. It could serve as a ‘real-world’ example while learning hierarchical modeling, regression analysis, and other statistical algorithms due to its intuitive structure and comprehensible variables.

The Future

Looking forward, it’s conceivable that the classic datasets like the cake dataset will continue to be a key feature in the data analysis, statistics, and machine learning fields. These age-old datasets serve as important tools in teaching valuable data skills and can be used in developing new analytics software.

Actionable Advice

For academicians, students, or organizations keen on using such datasets or studying them further, here is some advice:

  1. Reach out to academic institutions: As shown in this exploration of the cake dataset’s origins, many valuable resources are tucked away in the libraries of academic institutions. Don’t hesitate to reach out for search assistance.
  2. Open-source databases: Check open-source platforms for old datasets. Platforms like R provide a plethora of datasets for different usages.
  3. Learn from these datasets: Use these practical and easily understandable datasets for learning and teaching concepts reliably.

Read the original article