Sunday, July 31, 2016

More Noise and Errors

Perhaps I should apologize to 95% of my windsurfing friends for this post. It's another geeky post about analyzing GPS data. But I won't. Nobody forces you to read this post. If you do not care about important things like whether or not high-frequency GPS data are really better, stop now. Go windsurfing or watch TV, or whatever.

I recently posted here about noise in GPS data, specifically about the spikes we see in 5 Hz data from  Locosys GW-52 units. This led to a discussion on the "GPS and Speed talk" section of the Australian windsurf forum, where I posted this picture (click on it to enlarge):
This is a screen shot from the GPSResults program where I analyzed GW-52 data. On the left, the data were collected at 5 Hz (every 0.2 seconds); on the right, the data were collected at 1 Hz (every second). The most interesting thing about this picture is near the bottom: the error estimates (+- numbers) for 10-second runs were significantly lower in the 1 Hz data. This was unexpected. Conventional wisdom says 5 Hz data are more accurate.

On the forum, sailquick quickly pointed out that the error estimates should be very similar, since (as far as we know) the 1 Hz data are simply averages of the 5 Hz data. So I looked at this again. Much to my surprise, I now got different error estimates for the 1 Hz data:
Now, the +/- numbers were about 60% higher! Why would I get different results this time?

It took me a few hours to figure this out. The error estimate numbers in the lower picture are simply the average of the data point errors for a given run; in the upper picture, the numbers are significantly lower. Looking at the GPSLogit web site, I discovered that the Windows version of the program has two options for calculation the error estimates for runs ("Error Propagation"): "Average" and "Gaussian". I also noticed that the program would automatically adjust some "filter" parameters every time I loaded a new file. For example, the "Max SDoP" number would be set to 1.5 for 1 Hz data, but to 3.0 for 5 Hz data. It seemed that the software also changed the setting for "Error propagation"! Unfortunately, the Mac version of the program does not have any way of checking or changing this - only the windows version does. For Mac users, this is an entirely hidden change in the way the analysis is done. I must have accidentally changed the setting by switching back and forth between different GPS files.

So I installed the Windows version of GPSResults, and looked at the files again. After loading a file, I went to the "Extras" menu, selected "Filter Settings..", and checked the "error propagation" settings in the  dialog. Sure enough, whenever I loaded a file with 1 Hz data, "Average" was selected; but when I loaded a 5 Hz data file, "Gaussian" was selected.

So what happens if we analyze both data sets with the same settings? For "Gaussian" error propagation, we get basically the same results as shown in the first figure. For "Average", this is what we get:


The estimate error for 10 second runs using averages range from 0.75 to 1.5 for 5 Hz data range, and from 0.19 to 0.35 for 1 Hz data. The conclusion is clear:
When analyzing 1 Hz and 5 Hz data from the GW-52 using the same math for both data sets, the 1 Hz data are more accurate.

This raises the question why GPSResults uses different default settings for 1 Hz data and higher Hz data. The "Average" error propagation is really a "worst case" model: it assumes that errors are all in the same direction - that all speed measurements in a given period are either too low or too high. The "Gaussian" error propagation assumes that errors are random, and cancel each other out to some extend. Under the Gaussian model, the error gets smaller the more measurements we take; this is a primary reason behind the drive to higher data acquisition rates. 

When the error estimates were initially developed, the quality of GPS chips and units was a lot worse than it is now; back then, a Gaussian model may have been too optimistic for routine use, which would explain why it still is the default for 1 Hz data. But when newer, better GPS units with higher Hz rates became available, this was no longer true, and the Gaussian model was needed to get more accurate estimates of the speed errors. Hence it became the standard for high frequency data.

So is there a practical relevance of this for most speedsurfers? Not really. How the errors are calculated has no influence on the speed numbers, unless you are going for "official" records. Most of us never look at the error numbers; you cannot even get error estimates in two of the most popular ways of analyzing GPS speedsurfing data (GPS Action Replay Pro and ka72.com).

Perhaps the most practical consequence is for GW-52 users: there is no harm done if you record your speed sessions at 1 Hz. If we put any faith in the error estimates at all, then the 1 Hz data are more accurate than the 5 Hz data - at least for the GW-52. For other GPS units that have better antenna, use better GPS chips, and/or have better interference shielding, the opposite may be true - but no such units are currently commercially available.
--
Added a few hours later:
I went back to some windsurfing tracks to see what the calculated errors are in 1 Hz and 5 Hz data when using the Gaussian error propagation in GPSResults. I looked at three sessions each, and the 8 fastest 10 second runs in each session. Here are the data for the 1 Hz sessions:
 The average error for the 24 10-second runs is 0.15 knots, with a standard deviation of 0.037.

The 5 Hz data have slightly higher error estimates:
The average is 0.193 knots, standard deviation 0.025.

Based on just these 6 sessions, the 1 Hz data are slightly more accurate, although the difference is not statistically significant.

Interestingly, the Gaussian error propagation numbers do not seem to follow the formulas given in Tom Chalko's paper from 2009 ("Estimating Accuracy of GPS Doppler Speed Measurement using Speed Dilution of Precision (SDOP) Parameter"); instead, the error estimates are about 2x higher than expected. But perhaps I am missing something here.