Redundant GPS data

carmelo · Oct 30, 2008

Hi everybody,
I'm developing a system for monitoring cars positions, which acquires
GPS data every 1 second. For optimizing disk space it would be
appropriate not to store all data, because they're often redundant.
For example, if a car stay on the same position for a long time, or if
it moves on a straight line, it would be redundant to store each
second...

Have you got any suggestions?

Thank you very much in advance
Carmelo

Roedy Green · Oct 30, 2008

I'm developing a system for monitoring cars positions, which acquires
GPS data every 1 second. For optimizing disk space it would be
appropriate not to store all data, because they're often redundant.
For example, if a car stay on the same position for a long time, or if
it moves on a straight line, it would be redundant to store each
second...

Have you got any suggestions?

You might store all the data, then just compress it with a standard
compressor.

You might store deltas. These will compress better.

For each t,x,y, data point, calculate the linear interpolation if it
were not present. If it is within a given radius of the actual point,
delete it.

Lew · Oct 30, 2008

carmelo said:
Hi everybody,
I'm developing a system for monitoring cars positions, which acquires
GPS data every 1 second. For optimizing disk space it would be
appropriate not to store all data, because they're often redundant.
For example, if a car stay on the same position for a long time, or if
it moves on a straight line, it would be redundant to store each
second...

Have you got any suggestions?

Do you need to store all the points if they differ, or only the most current one?

carmelo · Oct 30, 2008

Do you need to store all the points if they differ, or only the most current one?

I need to store all the points... because I need to trace routes...

Some people told me to consider the difference between the position at
time t and the position at time t-1, storing data only if the
difference is greater than a threshold value.

Do you think that a GIS system could help me regarding storing and
optimizing GPS position data?

Lew · Oct 30, 2008

carmelo said:
I need to store all the points... because I need to trace routes...

Some people told me to consider the difference between the position at
time t and the position at time t-1, storing data only if the
difference is greater than a threshold value.

Perhaps you could capture all the data to a staging area and post-process it
into a more compact form, e.g., storing the location and duration at that
location rather than the location at each time quantum.

Do you think that a GIS system could help me regarding storing and
optimizing GPS position data?

I don't, but perhaps someone in a database forum (rather than a Java forum)
might have specific experience that's relevant.

carmelo · Oct 30, 2008

Perhaps you could capture all the data to a staging area and post-process it
into a more compact form, e.g., storing the location and duration at that
location rather than the location at each time quantum.

Good idea.
I think to know how can I do in case of a stopped car, because its GPS
coordinates would be unchanged (almost unchanged, because the position
on some instants could be wrongly calculated). I can consider it to be
the same position if the difference between the position at time t and
the position at time t-1 is not greater than a given threshold
value...
But, regarding rectilinear motion, how can I identify it? Because if
it's moving linearly, it would be redundant to store every position on
the straight line...

RedGrittyBrick · Oct 30, 2008

carmelo said:
But, regarding rectilinear motion, how can I identify it? Because if
it's moving linearly, it would be redundant to store every position on
the straight line...

A simple approach might be to look at the first three points and
calculate how far the second point is from a line between the first and
third. If less than some threshold, delete the second point and repeat
the process with what is now the new first, second and third points. If
you didn't delete a point, retry starting from the second, third and
fourth points point. Repeat.

Wojtek · Oct 30, 2008

carmelo wrote :

I need to store all the points... because I need to trace routes...

If you need to store all the points, then that is what you need to do.

A GPS data point uses a little under 100 characters. Given a 500GByte
drive, that will give you somewhere around 1,600 years of data points
at one point per second.

Or one years worth of data for 1,600 sources.

If you are first saving it to a USB device, then periodically move the
data over to a larger drive.

If this is for a covert operation, then your installer team will just
have to go out every so often to move the data over, or else get a
wireless device.

carmelo · Oct 30, 2008

carmelo wrote :

If you need to store all the points, then that is what you need to do.

A GPS data point uses a little under 100 characters. Given a 500GByte
drive, that will give you somewhere around 1,600 years of data points
at one point per second.

Or one years worth of data for 1,600 sources.

If you are first saving it to a USB device, then periodically move the
data over to a larger drive.

If this is for a covert operation, then your installer team will just
have to go out every so often to move the data over, or else get a
wireless device.

Avg_row_length is 277, therefore each record contains about 277
characters.
If 1 character needs 1 Byte, each record needs 277 Bytes. Therefore,
if I store 1 record every 1 second, it needs 277*60=16620 Bytes in 1
minute, 16620*60=997200 Bytes in 1 hour, 23932800 Bytes ~ 22,8 MB in
24 hours...
We have to monitor about 100 devices, therefore we need 100*22,8=2280
MB every day... it's not so little...

Martin Gregorie · Oct 30, 2008

A simple approach might be to look at the first three points and
calculate how far the second point is from a line between the first and
third. If less than some threshold, delete the second point and repeat
the process with what is now the new first, second and third points. If
you didn't delete a point, retry starting from the second, third and
fourth points point. Repeat.

Something that would simplify this discussion is knowing what 'monitoring
a car's position' means. It would appear that there are several meanings
covering a range of possibilities and of data storage requirements, e.g.:

1) knowing where the car is now without keeping any history.
2) knowing where the car has stopped, when and for how long but not
what route it followed between stopping places.
3) as for (2) but with routes followed between stopping places.
4) storing a complete history of the car's movements.
5) storing every GPS $GPMRC sentence as received.

(1) is simple - just keep the last valid GPS sentence.
(2) is also simple - only store the first point in a string of
sentences reporting the same position plus the elapsed time there.
(3) is what almost every hand-held GPS does when it runs out of memory.
First it combines stationary points, replacing them by one point.
Then it replaces lists of points that fit some straightness
definition with the first and last points. Here I think a good
algorithm would be to deduce the direction between points and
combine those with the same direction +/- a small deviation.
(4) is most simply handled by keeping a small fraction of GPS sentences.
I'm a glider pilot and use a GPS flight recorder for cross country
flights. My glider's speed range is 45-130 mph (74-210 kph) and
recording its position every 4 seconds gives points roughly 25-50m
(75-150 ft) apart.
(5) Method (4) is good enough for everything except crash analysis. If
you need that level of detail you'll either need to record every
sentence (simple) or keep, say, the last 30 secs worth of sentences
in a FIFO queue and apply any of (1) to (4) to data that is older than
that.

Method (5) is simple and cheap these days. $GPMRC sentences are around 70
bytes, so 24 hours at 1 second is a bit under 6 MB, or 2 GB/year of
continuous recording.

I'd seriously suggest to the OP that, if he doesn't know typical movement
patterns for the vehicles he needs to monitor, that he gets an EW Micro-
recorder and uses it to collect some data for analysis. The Microrecorder
can record 30 hours of data at 1 sample/sec: it does not use data
compression, has a built-in GPS and rechargeable battery and looks like a
USB memory stick to a PC. Its a rugged, pocket size unit.

Wojtek · Oct 30, 2008

carmelo wrote :

Avg_row_length is 277, therefore each record contains about 277
characters.
If 1 character needs 1 Byte, each record needs 277 Bytes. Therefore,
if I store 1 record every 1 second, it needs 277*60=16620 Bytes in 1
minute, 16620*60=997200 Bytes in 1 hour, 23932800 Bytes ~ 22,8 MB in
24 hours...
We have to monitor about 100 devices, therefore we need 100*22,8=2280
MB every day... it's not so little...

So you are handling about 27Kbytes of data per second, and presumably
writing out to 100 files.

Can you periodically (every 24hrs) start a new file? You could then
move the old file to another server (automatically via a timer) and
compress it. Text compression averages out at about 10:1, probably more
as you would have a LOT of repeating data.

Your incoming server would not fill up, and you can swap out all the
drives you need (or to DVD) on the compression server.

Martin Gregorie · Oct 30, 2008

Avg_row_length is 277, therefore each record contains about 277
characters.

You only need the $GPMRC sentence to track a vehicle, which is about
65-70 characters depending on where you are (and hence the size of lat/
lon strings). What else are you doing that prevents you from ignoring the
other sentences?

We have to monitor about 100 devices, therefore we need 100*22,8=2280 MB
every day... it's not so little...

By ignoring everything except $GPMRC you get just under 6 MB/day per GPS
receiver or 0.6 GB per day for the hundred devices. A bit better than you
suggest, though putting it in a database probably doubles that figure.

carmelo · Oct 31, 2008

Something that would simplify this discussion is knowing what 'monitoring
a car's position' means. It would appear that there are several meanings
covering a range of possibilities and of data storage requirements, e.g.:

1) knowing where the car is now without keeping any history.
2) knowing where the car has stopped, when and for how long but not
what route it followed between stopping places.
3) as for (2) but with routes followed between stopping places.
4) storing a complete history of the car's movements.
5) storing every GPS $GPMRC sentence as received.

(1) is simple - just keep the last valid GPS sentence.
(2) is also simple - only store the first point in a string of
sentences reporting the same position plus the elapsed time there..
(3) is what almost every hand-held GPS does when it runs out of memory.
First it combines stationary points, replacing them by one point.
Then it replaces lists of points that fit some straightness
definition with the first and last points. Here I think a good
algorithm would be to deduce the direction between points and
combine those with the same direction +/- a small deviation.
(4) is most simply handled by keeping a small fraction of GPS sentences.
I'm a glider pilot and use a GPS flight recorder for cross country
flights. My glider's speed range is 45-130 mph (74-210 kph) and
recording its position every 4 seconds gives points roughly 25-50m
(75-150 ft) apart.
(5) Method (4) is good enough for everything except crash analysis. If
you need that level of detail you'll either need to record every
sentence (simple) or keep, say, the last 30 secs worth of sentences
in a FIFO queue and apply any of (1) to (4) to data that is older than
that.

Method (5) is simple and cheap these days. $GPMRC sentences are around 70
bytes, so 24 hours at 1 second is a bit under 6 MB, or 2 GB/year of
continuous recording.

I'd seriously suggest to the OP that, if he doesn't know typical movement
patterns for the vehicles he needs to monitor, that he gets an EW Micro-
recorder and uses it to collect some data for analysis. The Microrecorder
can record 30 hours of data at 1 sample/sec: it does not use data
compression, has a built-in GPS and rechargeable battery and looks like a
USB memory stick to a PC. Its a rugged, pocket size unit.

Thank you very much for your interesting suggestions.
Regarding the NMEA sentence size, it's about 277 Bytes. I'm working
with this type of sentence:

$GPGGA,171741,3019.3909,N,09741.8629,W,2,08,00.9,+00180,M,x.x,M,
003,0800*78

As I said on my previous post:
if I store 1 record every 1 second, it needs 277*60=16620 Bytes in 1
minute, 16620*60=997200 Bytes in 1 hour, 23932800 Bytes ~ 22,8 MB in
24 hours...
We have to monitor about 100 devices, therefore we need 100*22,8=2280
MB every day... it's not so little...

carmelo · Oct 31, 2008

carmelo wrote :

So you are handling about 27Kbytes of data per second, and presumably
writing out to 100 files.

Can you periodically (every 24hrs) start a new file? You could then
move the old file to another server (automatically via a timer) and
compress it. Text compression averages out at about 10:1, probably more
as you would have a LOT of repeating data.

Your incoming server would not fill up, and you can swap out all the
drives you need (or to DVD) on the compression server.

I'm storing data into a MySql (or PostgreSQL) DBMS.
I cannot use files, because the position history must be available to
make queries, reports, statistical analysis...

carmelo · Oct 31, 2008

You only need the $GPMRC sentence to track a vehicle, which is about
65-70 characters depending on where you are (and hence the size of lat/
lon strings). What else are you doing that prevents you from ignoring the
other sentences?

By ignoring everything except $GPMRC you get just under 6 MB/day per GPS
receiver or 0.6 GB per day for the hundred devices. A bit better than you
suggest, though putting it in a database probably doubles that figure.

My GPS device give me an NMEA sentence of this type:

$GPGGA,171741,3019.3909,N,09741.8629,W,2,08,00.9,+00180,M,x.x,M,
003,0800*78

it's about 77 characters.

I made some tests storing these data into a MySql DB. Initially the
Avg_row_length was 277. When the size of the DB grows, and I have 1
million records, Avg_row_length becomes about 135. Therefore, as you
said, "putting it in a database probably doubles that figure".
According to my calculations, the space required for storing every 10
second the position of 100 cars is 41,118GB/year (without considering
space required for indexes).

Working with a 41GB table is too heavy, therefore I think that I
should use:
- a strategy for storing historical data older than a given date
(maybe copying older data into other tables/db)
- a strategy for storing current position data (for example, storing
the position only if it differs from the previous position more than a
given value)

What do you think about?
Thank you for your help

Lew · Oct 31, 2008

carmelo said:
According to my calculations, the space required for storing every 10
second the position of 100 cars is 41,118GB/year (without considering
space required for indexes).

Working with a 41GB table is too heavy, therefore I think that I
should use:

Wha...?

41 GB is not a very large database, especially for one year of data.

Why does it scare you?

Martin Gregorie · Oct 31, 2008

$GPGGA,171741,3019.3909,N,09741.8629,W,2,08,00.9,+00180,M,x.x,M,
003,0800*78

Hmm, not Garmin kit, then?

As I said on my previous post:
if I store 1 record every 1 second, it needs 277*60=16620 Bytes in 1
minute, 16620*60=997200 Bytes in 1 hour, 23932800 Bytes ~ 22,8 MB in 24
hours...
We have to monitor about 100 devices, therefore we need 100*22,8=2280 MB
every day... it's not so little...

Some questions:
- do you need to keep every data point?
- what position and time resolution do your analyses need?
- how often is the data actually read and what key(s) do you need
to access it?

IOW, if its not analysed very often you could minimise storage
requirements by keeping the sentence in a single field. Since its
a CSV format, parsing it is fast and processing overheads are small.
You might be able to get away with a small key too,
e.g. car_id,timestamp

BTW, 'the same position' needs to allow for some jitter but I don't know
how much. If your GPS sends you the EPE figure, maybe 'the same position'
means 'within the EPE radius of the last position', though that's
probably overkill.

Tom Anderson · Nov 1, 2008

Good idea.
I think to know how can I do in case of a stopped car, because its GPS
coordinates would be unchanged (almost unchanged, because the position
on some instants could be wrongly calculated). I can consider it to be
the same position if the difference between the position at time t and
the position at time t-1 is not greater than a given threshold
value...
But, regarding rectilinear motion, how can I identify it? Because if
it's moving linearly, it would be redundant to store every position on
the straight line...

You do what Roedy said, and do linear extrapolation. That is, you take the
difference between positions at t-1 and t-2, add that to the position at
t-1, and use that as a prediction for the position at t. If the prediction
is close enough, you don't store the position for t. If it's not, you do
store it.

You have to make sure that when you look at t-1 and t-2 to do this, it's
the t-1 and t-2 that the decoder will get: that is, in the cases where
you've thrown away a measurement, the predicted ones, rather than the
measured ones. Otherwise, you're making predictions based on data the
decoder won't have.

If you want to get even cleverer, you could try using more than the two
last data points: you could use three and fit a parametric quadratic
curve, to exploit information about the current acceleration rate to
hopefully improve your prediction. Or you could have three or more and fit
a straight or quadratic line of best fit, to smooth out measurement
errors.

tom

Mark Thornton · Nov 1, 2008

carmelo said:
Hi everybody,
I'm developing a system for monitoring cars positions, which acquires
GPS data every 1 second. For optimizing disk space it would be
appropriate not to store all data, because they're often redundant.
For example, if a car stay on the same position for a long time, or if
it moves on a straight line, it would be redundant to store each
second...

Make sure you also check that the time retrieved from the GPS unit has
advanced by 1 second. Some of these units will give the last known
position and the time at that position if they have lost lock on the
satellites. If you discard the GPS time, without checking, the results
can be very confusing. Losing GPS lock is quite common.

Mark Thornton

Wojtek · Nov 3, 2008

carmelo wrote :

I'm storing data into a MySql (or PostgreSQL) DBMS.
I cannot use files, because the position history must be available to
make queries, reports, statistical analysis...

Ok, then you have the storage machine periodically read out and delete
"new" records and store them to its database. You can then perform
stats etc on that machine. Setting up a storage network can give you
terabytes of space.

I do advocate moving the data. That way the time sensitive incoming
data has a small database, so the insert should be quicker. You can
de-normalize the storage database for the reports.

Buffering data on disk	19	Oct 15, 2008
Interview-Today-immediate hire	0	Mar 12, 2013
What data structure for a move-to-front coder?	2	Mar 11, 2009
car park question!	10	Apr 22, 2006
Error Handling	27	May 28, 2010
Policy Issue ( Treatment of IT professionals and Augsberg Syndrome)	0	May 21, 2008
Avoiding deadlocks in concurrent programming	14	Jun 22, 2005
TIBCO Developer/Architect	0	Sep 5, 2007

Redundant GPS data

carmelo

Roedy Green

Lew

carmelo

Lew

carmelo

RedGrittyBrick

Wojtek

carmelo

Martin Gregorie

Wojtek

Martin Gregorie

carmelo

carmelo

carmelo

Lew

Martin Gregorie

Tom Anderson

Mark Thornton

Wojtek

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads