Monday, December 19, 2016

Fun With Noise

Warning: this is a geeky post. No windsurf pictures, no videos. Just talk about GPS accuracy. So it that does not interest, go and watch TV, play with your kids, or whatever. You have been warned.

If you are looking for someone to blame for this post, don't pick on me! This all started on the Australian windsurf forum. One of the guys who is currently evaluating the new GPS watch from Locosys was kind enough to post some data from both the GW-60 watch and the current "gold standard", the Locosys GW-50. One picture that caught my attention showed data for the top 2-second speed. Interestingly, the GPS watch appeared to have more accurate data, as indicated by  the lower error estimates (SDOP and "+/-" estimates). A detailed comparison of more than 20 different 2-second runs showed that the GW-52 usually have higher speed estimates than the GW-60 watch. The differences were not huge, and the ranges (best guess +- error estimates) between the two devices always overlapped, which is good; but nevertheless, it appeared that either the GW-52 measurements were a tad too high, or the GW-60 measurements a bit too low.

The GW-52 data seemed to have a bit more noise than the watch data, as indicated by the slightly higher "+/-" numbers in the data shown. What do I mean with noise? Check out this graph:
Speed data from 2 GW-52 devices
The data shown are from a driving test with two GW-52 devices right next to each other. Without any noise, the units should show exactly the same speed at each point in time. That's obviously not the case - the speed differs by up to 0.8 knots between the 2 units. That's noise - random errors in the data. There are a number of different potential causes for the noise, but for the remainder of this discussion, what exactly caused the observed noise is irrelevant. What matters is that data from GW-52 units sometimes contain a significant amount of noise. That raises the question:

How does random noise influence our measured speed?

The first answer that comes to mind is: "if it's truly random, some points will be too high, some too low, so the net effect will be quite small". That can actually be true - but only if we have enough data points. For 2-second speed and 5 Hz data, we only have 10 points, which may not be enough!

Time to plug some data into a spreadsheet, and see what happens. The first thing to do is to generate random data with an average speed of 0 knots, and random noise between -1 and +1 knots. That takes a couple of minutes to set up, and 2-second "speeds" can easily be calculated by averaging 10 points. If we then look for a maximum speed, what do we get? Well, I got 0.48 knots when I looked at for the maximum in 600 data points (that would be 2 minutes of data). Cool - noise can make me half a knot faster!

Well, not so fast, cowboy! We don't usually sail around for 2 minutes at our top speed. So let's look at more realistic data. I started by downloading a few GPS tracks from ka72.com. Here is what the fastest 10 second run from the Aussie speedsurfer "Cookie" on November 11, 2016 looked like:
Within this 10 second run, Cookie stayed darn close to 40 knots for about 20 data points - that's 4 seconds. So what happens if we simulate a speed run were we keep the top speed for 4 seconds, and add some noise to it? Here's a graph:
The blue line are the noise-free data, the red line is the data with random noise (between -1 and +1 knots) added. This is a quite a bit noisier than Cookies data, but somewhat similar to the driving data in the first picture.

The next steps are to calculate the 10-point average speeds, and to find the maximum speed ... and then to repeat this a bunch of time with different random noise. I set it up so that 5 simulations were run at a time, and looked at the averages and maximum "measured" speed. After writing down the number, I'd repeat this with fresh random noise. Since the noise was random, the results differed a bit for each run. Usually, the measured top speed would be higher than 40 knots, but I'd often see top speeds below 40 knots in one or two of the 5 simulations. Here's a summary of the results:
On average, the speed was overestimated by almost 0.2 knots. But for one out of five runs, the over-estimate would be even higher, ranging from 0.29 to 0.55 knots. That's a lot! But with less noisy data (like the ones in Cookie's top 10 second run), the over-estimate would be proportionally lower.

If you have another look at the first and third graph, you may notice that the simulated noise looks a bit different from the observed noise. In the simulation, the noise sometimes jumps very quickly between the extremes; in the actual GW-52 data, the jumps are a bit slower, usually spread out over two or more data points. This could, for example, caused by filters like Kalman filters, which are often used in GPS signal processing. To see how such "slightly coupled" errors would affect the measurements, I ran another simulation where the error was random at every 4th data point, and intermediate points were a weighted average of the neighbor "anchor" points. Here's an example of what the simulated data look like:
The "coupled" noise increased the error:
No surprise here. In reality, the noise is probably more random (not as strongly coupled) than in the second simulation; I did it mainly to illustrate what the effect of "coupled" noise would be. In reality, filters that could effectively lead to such coupling would typically also reduce the maxima and minima, which would reduce the induced error.

To some extend, the results shown above may be counter-intuitive. Why do we get a directional measurement error even with completely random noise? The reason is simple - it's because we are specifically looking for the highest values. If there is a region where most values are accurate, but some are artificially high due to random noise, we will find this region.

But there's also lots of good news. First of all, the problem described above mostly affect 2 second data. Going to 10-second data reduces the error substantially - at least more than 2-fold (using basic probabilities), but typically even more, since a 10-second region is more likely to be surrounded by lower speed at both sides than a 2-second region is. The other good news is that preliminary data indicate that the GW60 GPS watch seems to be at least as accurate as the best previous GPS devices - but more user friendly. Let's hope that this can be confirmed in further testing!