Wednesday, May 31, 2017

Reader Comments about Filters

In response to my post about the "60 Knot New Speedrecord", a reader had a bunch of comments. I answered the first comment in the comment section of the original post, but he came back with a whole bunch more. Mathew wrote:
"Thanks for writing up a follow up. It should be pointed out that the acceleration filter shouldn't "per point" as highlighted in the follow-up... it should apply from last-good-point until the next point which doesn't breach the threshold. Also - and this example highlights it - once a filter has been breached in the middle of a run, the remainder of the run must be discarded [ most of the time ]. The problem here isn't that we dont understand what type of filtering is required - we already do know -> the problem is that the auto-filtering of the software analysis, doesn't match how humans do it. ie: if your data isn't clear, then then the data from that run shouldn't be used. [ Which is why we use two, three or four GPS's.... ] ... In the RealSpeed example, with 3ms-2 (5.5 kns-2) it takes about 3 seconds for the threshold to not be exceeded [ (20.8-4.5) / 5.5]. Since a new threshold is breached within 3 second window, the new window needs to be calculated... and so on. Thus about 7 seconds or so should be dropped.
A standard practice in all sciences, is to drop data that is completely bogus -> thus all "simple filtering" should be applied first, then apply SDoP filters last. [ Caveat: big-data analysis doesn't work this way generally speaking, as the algorithms are sometimes more accurate using bogus data. ] It would be useful to implement the filters accurately, then apply it to these examples vs. applying a simple SDoP filter.
... the point is -> SDoP/SDoS isn't a panacea - it doesn't replace simple physical constraints of the sport. Indeed humans naturally apply a few more physical constraints, ie: even though 3ms-2 is the threshold, it is completely unlikely to that you will sustain 1/3G acceleration for more than a few seconds simply because the drag component goes up very quickly [ vs terminal velocity of about 30m/s ]. "

This is pretty long, so let me just restate what I think he is saying in simple terms:
  1. We know that acceleration in speedsurfing is limited to less than 5.5 knots/second
  2. When looking at GPS tracks, that allows us to identify artifacts without any doubt. If the speed shoots up from 5 knots to 20 knots between 2 data points, we know something is wrong.
  3. If software could implement this "physical constraint" in filters, that would allow the identification of artifacts without SDoP filters.
  4. However, just throwing out single points where the acceleration is above the threshold is not good enough (as I had illustrated in response to his previous comment).
Before I go into a detailed analysis about these points, let me say that I mostly agree with Mathew. For example, the typical acceleration in speed runs is less than 2 knots per second. For example, check out one of Boro's recent traces where he got close to 40 knots:
Let's zoom in on the speed run in the middle:
I selected the region where the acceleration was highest, or close to it. Within about 4 seconds, he went from 31 knots to 36 knots - that's an acceleration of about 1.25 knots/second. So if we accept that this is a typical track, then a threshold of 5.5 knots/second seems rather safe. But is it?

Let's look at what the current default in the most accurate GPS analysis software is, GPSResults. For 5 Hz GPS data, the acceleration filter defaults to 8 m/second(squared) - that's 15.55 knot. That's about three times as high as what we said! Why does GPSResults not use the 3 m/s2 that Mathew suggests?

Well, theories are all nice and good, but it's always better to look at real data. I went through a test set of about 200 GPS files from GW-60 watches and GW-52 units to see how many of these files had points with an acceleration above 3 m/s2. The answer: 125 of 197. That's about 2 out of every 3 files! If we'd implement a filter with this threshold, it would remove good runs from the majority of files.

So what threshold would work? In the test set, the highest acceleration within the top five 10-second runs was 6.35 m/s2. If you're wondering what the data look like, here's a screen shot from my analysis software:
At the second point in the table, the doppler speed jumped from 25.562 knots to 28.03 knots - that's about 2.5 knots faster in 0.2 seconds! Looking at the track, the spike is quite obvious. It looks like random noise added on top of a much smaller real acceleration; in the next data points, the speed goes down again. Overall, though, this run looks good - there's no reason to discard it.

So even for this small set, we'd need a threshold close to 7. If we'd look at more test data, we'd probably find a few tracks with even higher numbers, so the threshold of 8.0 that GPSResults uses for 5 Hz data seems quite reasonable. Unfortunately, having to use a higher threshold throws off Mathew's argument a bit. 

The higher threshold is needed because the higher frequency data from the GW-60 and GW-52 devices has a lot more noise. That's not a problem - since we have a lot more data points, we can average out the noise. But trying to implement this in a filter will be quite a bit more complicated than it seems at first glance.

If we examine the current versions of 4 different programs to analyze GPS data (GPSResults, GPS Action Replay Pro, ka72.com, and RealSpeed), it appears that only one of the four has a reasonably implementation all filters that are known to work, and uses them by default when calculating speeds (I am talking about HDoP, SDoP, minimum number of satellites, and acceleration filters). Three out of the four programs are missing one or more of these filters, and/or require a separate manual step to apply them. This is for very straightforward filters: "if the value is above (or below) the threshold, mark the point as bad, and do not use it to calculate speeds". It would be a great improvement to see these simple known filters implemented in all four GPS analysis programs.

But even the most accurate program, GPSResults, currently fails to identify some artifacts that can occur when the data quality is marginal (but not obviously bad).  For this particular problem, better filters would be needed - for example filters that use average SDoP errors for short-term speed (or standard error estimates, which also consider all error estimates in a region, not just single points). For errors that result from "dead reckoning"-type overstatements, acceleration filters would not help at all - the very cause of the error is that the software inside the GPS tries to keep speed relative constant when the GPS signal is bad. So far, though, the level of interest in implementing better filters for such artifacts seems darn close to zero.

Let me finish with a comment on Mathew's pointers towards scientific and "big data" analysis. There are some interesting similarities between large-scale DNA sequence analysis and GPS analysis. I worked in large scale DNA sequencing at the beginning of the Human Genome Project (HGP), when the data were analyzed very much like GPS data are still analyzed for records: with some computer help, but also a lot of human checks at any problem regions. During the first years of the project, progress was very slow, and many scientists doubted that the project would finish on time. The first big breakthrough came when someone developed accurate "quality scores" - a measure how likely the sequence data points were to be correct, which is very similar to SDoP values. As soon as these quality scores were widely used, many analysis bottle necks were removed; the accuracy of the "final" results increased dramatically; and throughput jumped. The quality scores were one essential part that enabled further dramatic improvements which allowed the Human Genome Project to finish ahead of its deadline. Early on, I did a little bit of research to evaluate both the accuracy and the usefulness of the "SDoP-equivalent"; the results surprised me positively on both counts. 

In the early phases of the HGP, we did something that is very similar to the wearing of "2, 3, or 4" GPS units that Mathew mentioned - we'd also generate data several times over. We had lots of complex rules and highly educates scientists to check on any discrepancies; but in the end, we ended up with "finished" data that had orders of magnitude more errors that what late could be done completely automatically, using accuracy estimates (and proper filters and analysis algorithms).

So yes, I may be somewhat biased towards seeing the usefulness of accurate error estimates.