Data quality for heart rate variability (HRV) measurement
Artifact removal is probably the most important and (unfortunately) most overlooked step of the signal processing pipeline required to compute HRV features.
While all beat-to-beat data should go through artifact removal (even when collected with ECG or chest straps, as ectopic beats would still be present under these circumstances, see an example here), the issue becomes particularly important for PPG measurements, as they are more prone to noise (which means that it’s easier to mess up the signal, just by moving). Earlier I have shown how even just typing on your laptop, makes wearable’s data completely unreliable for HRV analysis.
Artifacts are a big deal
We cannot underestimate the impact of artifacts in HRV analysis.
A single misdirected beat would cause no change in heart rate (60 beats over a minute are still 60 beats even if a couple of them are out of place), but make HRV data completely unusable. Again, a single artifact does not make HRV data a bit noisy or slightly inaccurate (say, 5% off), it makes it completely unrelated to the actual artifact-free HRV that you are trying to measure.
Let’s look at one example to better understand the extent of the problem:
Above we have one minute of PPG data, including detected peaks. In general, the data shown here is good quality, however, there are some clear artifacts (e.g. in the second row, causing a spike and abnormal gap between beats)
During this test ECG data was collected simultaneously, used to extract reference RR intervals, and compute rMSSD, which was 163ms. If we use the PPG data and detected peaks we have here to compute rMSSD, we get 229ms (which is a large difference - about 80ms or 50% (repeated measures are in the 5–15ms difference range).
The few artifacts present have a large effect on our output metric, and therefore we need to address the issue or the data collected will be rather useless. As mentioned above, keep in mind that this problem does not affect resting heart rate, hence if your heart rate is measured correctly, it does not mean that HRV is measured correctly too.
In HRV4Training, we use different methods to remove artifacts (some are discussed in our publications) plus a few extra steps that can be feature-dependent, or person-dependent, as well as optimized thresholds based on the person’s historical data and group-level parameters.
Let’s look again at our example, we can see here in yellow the valid peaks after artifact removal:
Let’s now look at the PP (and RR) intervals. PP intervals are the beat-to-beat differences computed after detecting individual beats in our PPG (or ECG, called RR intervals in this case) data. When we visualize PP intervals over time, normally we can spot easily any artifacts (spikes) as well as any other issues, since the time series should look very similar between sensing modalities (phone camera, chest strap, or ECG).
In the figure below, we have in the top plot our camera-based PP intervals (in dark blue before artifact correction, while in light blue after artifact correction), as well as RR intervals reported by a Polar chest strap (second row) and computed from reference ECG data (third row). We can also see the participant’s breathing pattern (about 10 oscillations per minute).
As previously discussed, rMSSD for artifacted data in this example was 229m. On the other hand, after artifact removal rMSSD for the camera-based algorithm is 166ms (hence very close to the 163ms of our reference, ECG).
Again, differences in consecutive measurements, even using ECG, are in the 5–15ms range, hence our difference here is negligible and we were able to effectively remove all artifacts and estimate HRV correctly (you can find more information on repeated measures for PPG, chest strap and ECG data, here). These are the algorithms used in HRV4Training.
Can you trust your wearable’s night data?
Night data is very prone to artifacts. We can easily take a morning measurement avoiding any movement, but that is not the case when we sleep. Any movement, either tossing in bed or going to the bathroom will cause issues in data quality. Ironically, the more you are having trouble sleeping, the more inaccurate your data will be.
For the few wearables that show you the full night of HRV, you can easily spot peaks (“high HRV”) when moving more, which has nothing to do with your parasympathetic activity at that time, but simply highlights artifacts in the data (poor quality data leading to erroneous measurements). Remember that poor quality HRV can only result in higher values, hence be skeptical of those.
Finally, an important issue rarely discussed is actual cardiac abnormalities (arrhythmias). In the context of measuring HRV, arrhythmias are artifacts. As I have described elsewhere, a single beat out of place will cause a disruption and artificially increase HRV. Normally, when we have such isolated events, we can deal with them and provide accurate estimates of HRV. However, if the issues are more frequent, it can be difficult or not possible to measure HRV. Unfortunately, if your arrhythmia is frequent during the night, there is no point in using a device that measures as you sleep.
In this case, the only way to measure HRV is to take a morning measurement during a period in which you have no or fewer ectopic beats. This is not to say that devices using night measurements are inferior in terms of artifact detection or removal. However, in the morning you have control, you can wait a bit, you can assess if the measurement was artifacted, etc. — in the night your data will be impacted by ectopic beats and there is really nothing we can do.
Note that harmless arrhythmia such as premature ventricular contractions have a prevalence between 40 and 75% in the population (i.e. pretty much everyone has them). These issues should be carefully evaluated in sports settings as athletes tend to have a higher prevalence of ectopic beats, especially in the context of endurance sports.
Personally, I had some episodes of arrhythmia that caused clear discrepancies between morning and night data (where night data was noisy, see here), to the point that I tend to be skeptical of HRV captured in the night, unless I can see the same in morning data. Is it really looking normal or trending positively, or is it just picking up some ectopic beats? Can you ever be sure?
When I measure in the morning I can feel any potential issue and see the PPG waveform, hence I can trust the data 100%.
Signal quality estimation
Wearables always pretend to provide you with perfect data. None of them provides signal quality metrics or confidence about their estimates, even though the data is often inaccurate and noisy, either because of actual cardiac arrhythmias or because of other sources of noise. This is one of the biggest issues in the industry, in my opinion.
This is also why we use a different approach and are transparent about not only the quality of the data but also how it is determined.
A simple method I’ve developed to determine noise level is to rely on the ratio between the number of removed beats (according to the various filters) and the number of beats originally detected.
Intuitively, if we remove zero or a few artifacts, we will have high quality data, while if we remove many artifacts, we will tend to have poor quality data. While it is possible that all artifacts are removed correctly even if there are many, in general this is rare. The reason is that detecting many artifacts in PPG data is typically associated with movement and therefore large disruptions in signal quality (more than actual ectopic beats), which cannot be easily recovered
Below are two screenshots taken during a measurement in which I intentionally moved the finger a lot while recording. You can see how the app can detect the issue and report back the problem to the user.
While hardly any system out there reports quality, I think this is a key feature that can help to gain confidence in the tools we use.
In my view, it is quite pointless to pretend that a sensor will always provide high-quality data (no matter how much you pay for it), especially when it comes to optical sensors (watches, wristbands, etc.). Motion will always be an issue, and sometimes data might need to be discarded.
Implementing effective artifact removal methods, as well as being transparent about any potential issues, should make it easier to make effective use of these technologies, which can be extremely helpful in tracking individual responses to physical and psychological forms of stress (check out a few examples here).
That’s a wrap for this article, I hope you’ve found it useful.
Marco holds a PhD cum laude in applied machine learning, a M.Sc. cum laude in computer science engineering, and a M.Sc. cum laude in human movement sciences and high-performance coaching.
He has published more than 50 papers and patents at the intersection between physiology, health, technology, and human performance.
He is co-founder of HRV4Training, advisor at Oura, guest lecturer at VU Amsterdam, and editor for IEEE Pervasive Computing Magazine. He loves running.
Twitter: @altini_marco