VO2max estimation is a feature we provide in HRV4Training since already 7-8 years and is present in a few wearables and smartwatches as well (e.g. Garmin, Coros, Apple Watch, etc.).
In this blog, I cover the basics of how these models normally work, so that we can better understand the limitations, and how we can potentially use the models to track changes in fitness over time.
I've been working on this topic since my PhD, using wearable sensors’ data to estimate VO2max based on sub-maximal HR data, for example, HR while walking at a certain speed (see here for two recent publications). My work was mainly targeting the general population, hence sub-maximal HR was captured during low-intensity activities such as walking in free-living. Then, machine learning & pattern recognition models were used to recognize such activities and map physiological data to fitness level or VO2max. As VO2max is a good marker of cardiovascular health and a strong predictor of mortality risk, I believe there is much to gain in bringing these metrics to the general population, unobtrusively and without requiring any specific laboratory tests.
By learning more about how cardiorespiratory fitness changes in response to exercise, we can move towards quantifying an important health marker, instead of simply quantifying behavior (e.g. steps or other metrics currently being used more etc.). Basically, we would close the loop and potentially help individuals take up a more active lifestyle or simply maintain it by being able to monitor how changes in activity behavior influence an important health marker such as cardiorespiratory fitness.
This being said, as more products on the market start to provide VO2max estimation, it's important to try to understand how these models work, what are the advantages of models using certain predictors, and what are the general limitations. I often receive questions on the accuracy of our models and on how they would compare to other models that for example use resting heart rate or HRV data to estimate VO2max.
Thus, the aim of this post is to explain a bit better how these models work, what is the impact of different predictors (parameters we use to estimate VO2max, for example, BMI or resting heart rate), and if / how you can use them to track changes in cardiorespiratory fitness over time.
What is VO2max?
Cardiorespiratory fitness is defined as the ability of the circulatory and respiratory systems to supply oxygen during sustained physical activity. Cardiorespiratory fitness is not only an objective measure of habitual physical activity but also a useful diagnostic and prognostic health indicator for patients in clinical settings, as well as healthy individuals [1]. While cardiorespiratory fitness is considered among the most important determinants of health and well-being, in this post, our interest is purely related to performance in sports. So everything that follows should be considered in this context.
Current practice for cardiorespiratory fitness assessment is a direct measurement of oxygen volume during maximal exercise or VO2max. VO2max is the gold standard and is regarded as the most precise method for determining cardiorespiratory fitness [2].
What is a VO2max estimate good for?
I will leave it to others to discuss the limitations of VO2max as a measure of human performance (see Magness, Noakes, and others that do a great job explaining the complexities of oxygen consumption, running efficiency, and how the scientific community has been giving a bit too much credit to this variable in the past decades) across individuals (and maybe even within individuals).
Note that the limitations of VO2max measurements are not necessarily the limitations of VO2max estimates, as when we estimate VO2max, we are not looking at oxygen uptake, but at changes in heart rate at a given submaximal workload. What I would like to do here is to highlight how the estimate can be very informative both at the population and at the individual level, as what it relies on, is contextualized physiological data under submaximal effort, the real parameter of interest for us.
The idea is that tracking VO2max over time, as estimated by submaximal heart rate, can provide a proxy to performance/ fitness and therefore help you understand if you are getting in a better shape, and can potentially race faster, just by using available training data and therefore without putting additional stress on the body with specific tests.
For runners, cyclists or triathletes, for example, as training improves aerobic capacity and heart rate lowers at a given intensity, VO2max estimates track well with improvements in fitness and performance as determined in racing events. This is true for athletes of any level, as you can easily find logs of ironman champions going through a base phase which gradually lowers their heart rate at easy intensities, as well as recreational athletes improving their fitness in a similar way.
So if submaximal heart rate (e.g. your heart rate while running at a certain pace) is the real variable of interest, why do we use it to estimate VO2max instead of just providing it?
The reason is to make it easier to interpret. Submaximal heart rate outside of lab settings means for example that we create a feature computed as pace / heart rate (as we can't get everyone to run at the same pace like you'd do in the lab) which is a number that represents fitness but is not 'meaningful to a human'. We introduced the pace to heart rate ratio in a recent publication to contextualize heart rate by effort (or workload) and showed that it is a better predictor of VO2max with respect to anthropometrics data and resting physiological data [9].
Once we understand why we estimate it, what is it based on, and what can be used for (e.g. track progress over time), this estimate can be a nice feature to look at from time to time to track changes in fitness.
How do you build a VO2max estimation model?
To build a model able to predict VO2max from certain parameters, you need to collect a dataset, including the following:
Reference VO2max data
Parameters that you'd like to use eventually to estimate VO2max on other people, for example, in our HRV4Training we use heart rate at rest, weight, height, age and submaximal heart rate (more about this, later).
These parameters, also called predictors, need to be such that we can acquire them in unsupervised free-living settings with minimal burden on the user, as we do not want someone to have to do specific lab tests or follow protocols even in free-living. This is why we came up with the heart rate to pace ratio so that each unsupervised free living GPS workout collected via for example Strava could be used to estimate VO2max regardless of an athlete's ability or preferred running pace (or power for cyclists).
The science behind VO2max estimation
As mentioned above, current practice for cardiorespiratory fitness assessment is the direct measurement of oxygen volume during maximal exercise or VO2max.
However, there are a series of practical limitations to VO2max testing, for example, the need for specialized personnel, expensive medical equipment, high motivational demands of the subject, health risks for subjects in non-optimal health conditions (which limits applicability), and so on [3]. Even when testing conditions are not a problem, performing a maximal test until exhaustion just to monitor your fitness level might interfere with your current training program.
For these reasons, scientists have been working on submaximal tests or tests that do not require maximal effort. Submaximal tests have been developed already more than 60 years ago to estimate VO2max during specific protocols while monitoring HR at predefined workloads [4]. Basically, these tests rely on the inverse relation between fitness and heart rate (HR), with higher HR typically associated with lower fitness levels and vice versa. Contextualized HR, e.g. determining the HR during specific activities, was a good step forward in terms of practical applicability, compared to maximal tests. However, some limitations still apply: the test needs to be re-performed every time that fitness needs to be assessed, still a pre-defined protocol is required.. etc.
Ideally, we would like to keep track of VO2max or cardiorespiratory fitness without the need to perform a specific test. As technology got better and we now have plenty of sensors able to acquire accelerometer, GPS, and HR data in free-living. During my PhD I've developed several machine learning models that would do just that, for the general population, so without even including exercise data (basically HR while walking at different intensities/locations as a predictor of fitness, see [5, 6, 7] for details).
My results as well as attempts from others that tried to estimate VO2max from rest data, for example, HR or HRV, clearly show that using only rest data is insufficient to estimate VO2max with good accuracy [8]. This is the reason why we haven't introduced these models before, and also why the feature is enabled only for users using Strava and an HR monitor during training.
Let’s now look at the impact of different subsets of predictors to estimate VO2max:
Anthropometrics data only (or non-exercise models), including BMI, age and sex (some websites or "online calculators" use this approach).
Resting physiological data, including anthropometrics and resting HR and HRV (possibly used by Polar, or at least it was some time ago).
Sub-maximal HR data, such as HR while running at a certain pace, as used in HRV4Training and probably by Garmin and Coros.
Anthropometrics data only / non-exercise models
Models estimating VO2max using anthropometrics data only have been proposed for many years in research (see Jackson et al., published in 1990, or Baynard et al., more recently published). The goal is to get to a decent estimate without having to perform any measurements or tests. Some of the most recent models actually do include resting heart rate measurements, as anyone can easily collect their resting heart rate, and some other models also include a person's activity level, quantified in different ways (e.g. a number indicating how active you are). For this comparison, we will look at anthropometrics data only, as resting physiological data is included in the next section.
The dataset used here was collected during my Phd and includes about 50 individuals. Baynard reports R2 = 0.22 when including only BMI as a predictor, and R2 = 0.57 when adding BMI, age, and sex. On our dataset, when replicating the author's work, we get R2 = 0.18 for BMI only and R2 = 0.54 when including BMI, age, and sex. Considering that R2 (and any other metric) is highly dependent on the dataset (for example on how much variability we have in the data, for both predictors and predicted variables), these numbers are extremely close. A good starting point for our modeling.
Why these variables? VO2max is known to decrease with age, and is lower in women, and also in individuals with higher body fat. As the aim of these models is to be as simple as possible, BMI is typically used as a way to capture body type / fat, even though there are obvious limitations, as BMI does not capture anything related to actual muscle mass / body fat.
Below you can see the reference and predicted VO2max when building subject-independent models using anthropometrics data only as predictors. This is how we cross-validate models to make sure they work outside of our sample. Basically we use part of the data to train a model and part of the data for validation. The data used for validation has never been seen by the model so we can get realistic estimates of how our model would perform when deployed to new users for whom we have never collected any reference data.
On the left side, we have the linear relation between predicted and estimated VO2max, while on the right side the Bland-Altman plot, shows residual errors for this model.
Resting physiological data (heart rate and HRV)
Things get more interesting when we start including physiological data. What is the rationale behind including resting heart rate? Physiologically speaking, with a more active lifestyle or more specifically with aerobic training, we have changes in the heart (muscle), resulting in increased stroke volume and reduced heart rate. As heart rate reduces with increased aerobic training, and VO2max / fitness also increases, it makes a lot of sense to use resting heart rate to predict VO2max.
Now the more interesting question is, how much better can we estimate VO2max when including heart rate? If we go back to our previous dataset and include resting heart rate together with BMI, age, and gender, we obtain R2 = 0.59, a small but significant improvement compared to the previous R2 = 0.54. The standard error of the estimate goes from 4.8 to 4.6 ml/kg/min. Models including non-exercise parameters combined with physiological data have also been validated in the past and sometimes showed poor results, see for example Esco et al., however, they do perform better than the previous models including only anthropometrics data.
What about HRV? Adding HRV brings no improvement (same R2 and standard error that we had before including resting HR). As a matter of fact, adding HRV and removing resting heart rate also brings no improvements with respect to the original model using only anthropometrics data. This is something I've been arguing for some time, as HRV reflects very well training load and the impact of different stressors, but not necessarily fitness or aerobic capacity. True that some studies showed improvements in baseline HRV for individuals starting an aerobic training plan, however, these findings often failed to be replicated (also, typically everything changes when taking inactive people and getting them active, however, if we take a group of already active people, then things get more challenging). Additionally, there is so much variability in day-to-day HRV scores (easily 50% of your baseline or more), that in general, I am personally a bit skeptical of any HRV data reported as a single snapshot before / after a study. In my opinion, a baseline of at least a week should be collected pre / post study in order to get more confident on an individual's HRV level without being too sensitive to acute variations, otherwise, we might just be trying to interpret noise.
Below you can see results for subject-independent models using as predictors anthropometrics data and resting HR:
Sub-maximal heart rate data (e.g. heart rate while running)
The rationale behind including sub-maximal HR data is the same as for resting HR data. As we train aerobically and get fitter, sub-maximal HR reduces, meaning that we can for example run at the same speed but with a lower heart rate. The reason why we prefer to use sub-maximal HR with respect to resting HR is that these individual differences due to fitness get exacerbated during exercise. Two individuals of quite different fitness levels might have a very similar resting HR, say 50 and 55 bpm. However, during the same intense exercise, say running at 12 km/h, the HR of the unfit individual will be much higher (all other things being equal, so similar body size and age, etc.). This is the principle we exploit with our VO2max estimation in HRV4Training, as we capture workouts data from Strava, and can analyze HR at different speeds for a broad set of individuals. Intuitively, the ones that can run faster and keep their HR lower, are most likely the fittest.
Let's include sub-maximal HR in our models. What we get for running HR, even at a speed as low as 8km/h, so barely running, biomechanically speaking, is R2 = 0.67 and a standard error of 4.1 ml/kg/min. Much better than before.
Here are the results for the subject-independent analysis, similar to what we've seen for the other two models:
Highlighting the importance of sub-maximal HR data
After reading the above, and looking at the plots, you might be asking yourself if it is really worth it to include all the additional physiological data and context, for relatively small improvements. Correlation in estimated VO2max for subject-independent models goes from 0.72 to 0.79. This is a change good enough to publish a paper, but is it really useful to your individual case? Still, much of the variance is not explained by these models (more on this later in the limitations section).
Here I'd like to highlight how including sub-maximal HR is extremely important, and is actually the only way to discriminate between individuals that are similar, which is probably your case if you are an HRV4Training user or simply are into training (hence in the more homogeneous and fit part of the population).
It's always easy to show a high correlation or R2 on a dataset with much variability. Say we take thousand of individuals covering a very broad range of BMI and VO2max, from sedentary, obese, unfit individuals, to Ironman participants, obviously, BMI will be a great predictor of the differences in fitness between these individuals.
But what if we look at similar individuals? People can have similar body sizes (and age), and yet be extremely different from a cardiorespiratory fitness point of view. Without physiological data, we cannot tell the difference. To highlight this point, I'll isolate a subset of participants with similar characteristics, for example, I took individuals aged 21-25 years old and with BMI between 22 and 24 kg/m^2, male-only. This is a rather homogenous sample in terms of our predictors. What happens when we try to predict their VO2max using anthropometrics data only?
As highlighted in the figure above, without physiological data we cannot discriminate individuals with different fitness levels but similar anthropometrics data. All individuals are predicted at more or less the same VO2max as they are similar according to the model. We need physiological data to be able to discriminate them, as sub-maximal HR will reflect much better their cardiorespiratory fitness level, due to the known relationships explained above. The correlation between estimated and predicted VO2max for this subset of similar individuals is only 0.28, much lower than when we looked at the entire sample.
Let's now look at the same subset of individuals but for our latest model, the one used in HRV4Training, which combines anthropometrics data and HR while running:
We can see now how the same group is predicted much more accurately and we can clearly discriminate between the different fitness levels, with one individual clearly being less fit regardless of the low age and BMI.
This is the most accurate model we can develop using anthropometrics data and physiological data during exercise, and a very similar model is currently implemented in HRV4Training.
Getting practical
In this section, I cover in more detail what you need to get a VO2max estimate in HRV4Training and the relationship between the estimate and running performance.
What do you need to be able to use the VO2max estimation in HRV4Training?
For reasons that should be clear after reading the modeling part above, a Strava account and training heart rate (HR) data are required for this feature to work. Additionally VO2max estimation will work only for runners and cyclists using a power meter (in cycling, external load is only meaningful in terms of power, so the heart rate to pace ratio is replaced by the heart rate to power ratio). To recap:
Link HRV4Training to Strava from Settings
Go running using a HR monitor, at least twice per week, or go cycling using a HR monitor and a power meter, also at least twice per week.
Once we have 12 Strava trainings including HR data (and power for cyclists) in the past 6 weeks, HRV4Training will be able to estimate your VO2max. You can check your VO2max under Insights in the app.
Here is my current estimate, next to a lab test I just did a few weeks ago:
Relationship between estimated VO2max and running performance
In one of our publications, titled "Relation Between Estimated Cardiorespiratory Fitness and Running Performance in Free-Living: an Analysis of HRV4Training Data" which was accepted for publication at the the International Conference on Biomedical and Health Informatics, we analyzed the relation between our estimated VO2max and running performance.
In particular, the analysis discussed in the paper shows how estimated VO2max in the app is highly correlated with real life running performance for running distances between the 10km and full marathon, and therefore can be used as an effective proxy to running performance without the need for laboratory tests - at the population level.
In this work, we first built laboratory-based VO2max estimation models, including reference VO2max data collected using indirect calorimetry, and then deployed our models in the HRV4Training app. More than 500 users linked the app to Strava and used the VO2max estimation models while running distances between the 10km and the full marathon over a period of 1 to 8 months, hence creating a unique dataset on which to investigate the relation between estimated VO2max and running performance (big thank you to everyone that contributed to this research and helped moving the field forward).
For the ones interested in reading the paper, you can find the full text at this link on Research Gate.
In terms of tracking individual changes over time, I have shown in the past my own data, showing large changes in the speed to heart rate ratio as I was getting fitter (full story here):
However, some limitations remain, which is why we have developed the aerobic endurance analysis feature, which allows you to have more control when analyzing these changes (e.g. filtering the data based on workout and environmental parameters, as I cover here).
Limitations
There are several limitations to VO2max testing, and even in using VO2max as a marker of fitness and performance. The most obvious limitations are the dependency of the VO2max test on the type of test performed and body-weight normalizations.
Additionally, there are limitations associated with our specific modeling and input data. For example, we use pace and HR data collected from Strava, and therefore if there are issues with your training data, or your HR monitor doesn’t work properly, your estimate might be inaccurate. If you use a wrist-based HR monitor, data is often of poor quality or artifacted. There can also be changes in pace or speed or power that are more tightly coupled to e.g. the terrain you are training on at a given time of the year, than your fitness. Finally, there can be changes in physiology not associated with changes in fitness, e.g. due to seasonality, and these are difficult to model effectively, and might impact the estimate.
This being said, if we understand how the model works and what are the limitations, I think it can be useful to track changes in aerobic endurance over time, without the need for laboratory tests.
I hope this was informative, and thank you for reading!
References
[1] D. Lee, E. G. Artero, X. Sui, and S. N. Blair, “Review: Mortality trends in the general population: the importance of cardiorespiratory fitness,” Journal of Psychopharmacology, vol. 24, no. 4 suppl, pp. 27–35, 2010.
[2] L. Vanhees, J. Lefevre, R. Philippaerts, M. Martens, W. Huygens, T. Troosters, and G. Beunen, “How to assess physical activity? how to assess physical fitness?” European Journal of Cardiovascular Prevention & Rehabilitation, vol. 12, no. 2, pp. 102–114, 2005.
[3] V. Noonan and E. Dean, “Submaximal exercise testing: clinical application and interpretation,” Physical Therapy, vol. 80, no. 8, pp. 782–807, 2000.
[4] P. O. Astrand and I. Ryhming, “A nomogram for calculation of aerobic capacity (physical fitness) from pulse rate during submaximal work,” Journal of Applied Physiology, vol. 7, no. 2, pp. 218–221, 1954.
[5] M. Altini, P. Casale, J. Penders, O. Amft, "Cardiorespiratory fitness estimation in free-living using wearable sensors" accepted for publication in Artificial Intelligence in Medicine. Full paper. 2016.
[6] M. Altini, P. Casale, J. Penders, O. Amft, "Cardiorespiratory fitness estimation using wearable sensors: laboratory and free-living analysis of context-specific submaximal heart rates". Accepted for publication in the Journal of Applied Physiology. Full paper. 2016.
[7] M. Altini, P. Casale, J. Penders, O. Amft, "Personalized Cardiorespiratory Fitness and Energy Expenditure Estimation Using Hierarchical Bayesian models" accepted for publication in the Journal of Biomedical Informatics. download pdf. 2015.
[8] Esco, Michael R., et al. "Cross-validation of the polar fitness testTM via the polar f11 heart rate monitor in predicting vo2 max." Journal of Exercise Physiology 14 (2011): 31-37.
[9] M. Altini, O. Amft, "Relation Between Estimated Cardiorespiratory Fitness and Running Performance in Free-Living: an Analysis of HRV4Training Data", accepted for publication at BHI 2017. Full text here.
Marco holds a PhD cum laude in applied machine learning, a M.Sc. cum laude in computer science engineering, and a M.Sc. cum laude in human movement sciences and high-performance coaching.
He has published more than 50 papers and patents at the intersection between physiology, health, technology, and human performance.
He is co-founder of HRV4Training, advisor at Oura, guest lecturer at VU Amsterdam, and editor for IEEE Pervasive Computing Magazine. He loves running.
Twitter: @altini_marco
Are there any requirements to the duration or distance of the workouts (Running/Cycling), in order to calculate my VO2max in the app? Or can I do even brief workouts and still get my VO2max est?
Hi Marco,
reference [9] seems to be missing – I'd love to follow-up on pace to heart rate ratio!
Cheers,
Michael