SARIMA decomposition of body weight log

The dataset I literally created myself

Andrey Shornikov
4 min readOct 20, 2021

Same as most people in my generation, I am trying to be active and keep in shape. Since 2015 I kept records of my exercise logs and bodyweight as a google spreadsheet. Updating data I noticed a certain unsurprising pattern — I gain a bit of weight in winter, and lose some over the summer. Of course my measurements are imperfect, noisy and opposite of what one would call clean data, not like the Titanic dataset. But unlike the Titanik dataset they mean something to me, more than sepal length of Iris Versicolor, if you know what I mean. So I decided to spend a few minutes and check if my feeling is correct and my body weight actually has a periodic behavior.

Image from unsplash by i yunmai

I copied from google spreadsheet columns of date and body weight into a csv file and started my analysis. First I need to filter my data to keep only dates I had measured my bodyweight (on some occasions I logged my activity without weight data). Import, clean, convert text date to date objects

After dropping incomplete rows I got 529 rows of data. Over 6 years it would be about 88 points per year, or one point every 4 days. Since humans are incapable of significantly changing their weight in a matter of 4 days it looks like very decent statistics — measurements are frequent enough compared to the characteristic process time scale. Unfortunately, as most amature fitness practitioners, I am not always consistent, so my measurements are unevenly spaced (side note — during the measurements period I lived in 5 different countries, switched careers and had other minor things affecting my fitness consistency). Absolutely most of statistical models you can find will assume evenly spaced data. Research of unevenly sampled data is an interesting topic with its own science. But this time we will not dive into it. One of the basic ways of working with unevenly spaced data is to perform evenly spaced interpolation of it. Since my data is relatively simple, I will use basic pandas functionality to perform resampling. For that I need a pandas series with datetime index.

Raw and resampled data, image by the author

Now we have data in a format suitable for processing. Our first question will be whether the data has a periodic pattern in it? Now we have data in a format suitable for processing. Our first question will be whether the data has a periodic pattern in it?

We check it by looking at the autocorrelation function of data. I use autocorrelation plot from pandas.

Autocorrelation function shows how data at one point of time is correlated to the data at a point of time shifted by a certain lag. If data is random — there is no correlation. For example hash functions are specially designed in a way that results for inputs that are close to each other are vastly different. For data with some periodic pattern the situation will be the following. Let’s consider as an example the outside temperature. First correlation is high — if today is a cold day, tomorrow will likely be cold as well. Then with increased lag between days correlation decreases — if the last week was cold, it is not necessarily that today will be cold. But when our delay gets close to a weather period, we likely to see increased correlation. If it was cold a year ago, there is a chance it was winter, and winters are cold. Though correlation will be lower than at smallest lags. Because winters are always winters but some winters are colder. So we expect first drop in correlation and then several lower peaks at delays close to 1,2, 3 periods. Here is what we see. The behavior is definitively periodic at least for earlier years.

Autocorrelation function for body weight data, image by the author

Now we can try to decompose our resampled data into long term trend, periodic seasonal variation and residual deviation. This is a classical SARIMA model for time series data. I use decomposition from statsmodels assuming additive model and a frequency of 365 (days, i.e. 1 year period for seasonal component).

Here is what my decomposition looks like.

Decomposition of body weight data into trend, seasonal and residual components using SARIMA model, image by the author.

It looks like we have here a slight downward trend. A seasonal oscillation of ±1 kg and a messy residual of ± 2.5 kg amplitude. Does it all correspond to my expectations?

Overall slight downward trend is expected, somewhere on the go I changed from more of weightlifting to more of running, so a slight drop in weight due to aerobic activity is very likely. I am a bit surprised by the ±1 kg seasonal component as I expected it to be a bit higher.

I am not surprised about the residual as it depends largely on my hydration and on which specific scale device I use, in a given week there are typically 2 of them and I suspect they don’t fully agree.

--

--