A faster way of finding historical highs/lows

Eamonn Sullivan · Jun 11, 2004

This is a basic algorithm question (from a history major who never
took an algorithm course), but I'm also looking for the fastest way to
do this in python -- for example, maximizing the existing library C
code. I have two searches in an application that I'm trying to speed
up:

1. Find the most recent date when there was an equal or higher (or
lower) value than X. The data (stock, currency and commodity prices)
looks like:

Col A Col B (a few other columns of prices)
Date Price

Right now, the working code just goes through chronologically and
returns the value when the condition is met:

for item in price_list:
if val <= item['price']:
return item['date'], item['price']

I'm wondering if anyone has any ideas about a faster way that a) won't
unnecessarily slow down the simple cases (highest since yesterday, or
last week), while b) speeding up the edge cases (highest/lowest since
1947...).

2. Find the last time the price fell/rose an equal or more than X
times in a row.

Same data format.

Right now, this is implemented inefficiently, something like this:

streak_now = get_streak(price_list[0])
while i < len(price_list):
streak_then = get_streak(price_list)
if streak_then >= streak_now:
return price_list['date'], streak_then
i += 1

That works OK most of the time because long streaks are rare, so
matches usually happen in at most months. But edge cases can be
extreme -- many years -- and take a while (a minute, say) to find.

In either of these, the data can be quite long (time wise), but not
large, so it would be possible to copy, reorder and find the answer
that way. Any ideas?

Peter Hansen · Jun 11, 2004

Eamonn said:
1. Find the most recent date when there was an equal or higher (or
lower) value than X.

The fastest algorithm might depend on how you use the data, as well.
For example, do you update the data often, and search it rarely,
or update it rarely and do the search very often? If searching
many times between updates, some preprocessing will likely make things
go much faster.

Both of your examples sound to me like they would benefit by
using sort(), then a binary search. Sort is very fast relative
to things like the Python loops you are doing, so using it to
prepare the data before the search can be a good step.

-Peter

Humpdydum · Jun 11, 2004

Peter Hansen said:
The fastest algorithm might depend on how you use the data, as well.
For example, do you update the data often, and search it rarely,
or update it rarely and do the search very often? If searching
many times between updates, some preprocessing will likely make things
go much faster.

Both of your examples sound to me like they would benefit by
using sort(), then a binary search. Sort is very fast relative
to things like the Python loops you are doing, so using it to
prepare the data before the search can be a good step.

One thing to keep in mind there is that it depends what is the "average"
number of iterations you expect to have to make before finding a value. If
you only have to go back about 50 data points on average to find a peak,
sorting 100000 items might be overkill. Yet, if you sort once and do a large
number of searches, may well be worth it. If new values can be added into a
sorted container, and get put in the right spot, you only need to sort once
too. Etc.

Oliver

Eamonn Sullivan · Jun 11, 2004

Peter Hansen said:
The fastest algorithm might depend on how you use the data, as well.
For example, do you update the data often, and search it rarely,
or update it rarely and do the search very often? If searching
many times between updates, some preprocessing will likely make things
go much faster.

Both of your examples sound to me like they would benefit by
using sort(), then a binary search. Sort is very fast relative
to things like the Python loops you are doing, so using it to
prepare the data before the search can be a good step.

-Peter

Thanks for this. At the moment, the software answers a few questions
(highest/lowest, longest streak, etc.) once per retrieval of data. The
database retrieval, though, is *by far* the biggest time sapper, so a
little preprocessing would be almost unnoticeable in comparison,
probably (depends on how much, of course).

So, I'm guessing I could sort on the particular price field I'm using
(decorate-sort-undecorate), and then just find the most recent date
among the subset of data that meets the criteria (higher or lower). Is
that what you mean? By binary search, do you mean further reorganizing
the data into a binary tree using date?

David Fraser · Jun 11, 2004

Eamonn said:
Thanks for this. At the moment, the software answers a few questions
(highest/lowest, longest streak, etc.) once per retrieval of data. The
database retrieval, though, is *by far* the biggest time sapper, so a
little preprocessing would be almost unnoticeable in comparison,
probably (depends on how much, of course).

So, I'm guessing I could sort on the particular price field I'm using
(decorate-sort-undecorate), and then just find the most recent date
among the subset of data that meets the criteria (higher or lower). Is
that what you mean? By binary search, do you mean further reorganizing
the data into a binary tree using date?

If you have this in a relational database, you might find that the
database can answer the question quicker for you, using indexes if
neccessary ...
select max(xxx) from yyy where zzz > 40
with an index on xxx and zzz will usually be done quickly internally by
the database, and then you just get the result returned rather than
having to process it

David

Eamonn Sullivan · Jun 12, 2004

David Fraser said:
If you have this in a relational database, you might find that the
database can answer the question quicker for you, using indexes if
neccessary ...
select max(xxx) from yyy where zzz > 40
with an index on xxx and zzz will usually be done quickly internally by
the database, and then you just get the result returned rather than
having to process it

David

Unfortunately, I don't have that level of control. It's market data
through an ActiveX control (on Windows) or C API (elsewhere) with only
a handful of calls -- e.g. GetHistoricalData or Subscribe (for
realtime). So, what I do is pull down the history (up to the present
moment) on a security for a number of price fields and then answer a
bunch of questions on a local copy: high/low value since, biggest
gain/loss since, longest streak since, percentage of average volume,
etc. User may then click a Refresh button (to fill today's field with
fresher data) and then typically moves on to another security. In most
cases, the hourglass stays on for 3-4 seconds or less to answer *all*
of the questions, or as long as 15 seconds if, for example, it's the
longest streak in eons.

The reason I'm doing this, though, is that I'd like to eventually make
the Refresh button unnecessary -- automatically pull in near realtime
data as it happens. The several seconds then become a bit too much
overhead. I know I can do other tricks (if the value hasn't exceeded a
certain window, then the high/low since doesn't need recalculating),
but I'm taking the problem in bits and seeing if I can improve the
searching algorithms first.

Peter Hansen · Jun 14, 2004

Eamonn said:
Thanks for this. At the moment, the software answers a few questions
(highest/lowest, longest streak, etc.) once per retrieval of data. The
database retrieval, though, is *by far* the biggest time sapper, so a
little preprocessing would be almost unnoticeable in comparison,
probably (depends on how much, of course).

So, I'm guessing I could sort on the particular price field I'm using
(decorate-sort-undecorate), and then just find the most recent date
among the subset of data that meets the criteria (higher or lower). Is
that what you mean?

That's exactly the type of thing I meant, and even the specific
improvement I was considering.

By binary search, do you mean further reorganizing
the data into a binary tree using date?

The binary search is just a lot faster way of homing in on a region
of the data compared to most other approaches. If it's sorted by
price, then binary search to get to the chunk with prices equal
to what you want. The last one (assuming ascending dates) is the
"most recent with an equal or lower price", or if there is none
equal, pick the one to the left of the final binary search point...
that kind of thing.

For a binary search, you just examine the data element halfway between
two end points each time, comparing it with the search target.
Decide if you need to continue search to the left or the right of
the current position, then pick a point halfway into _that_ region
and do it again. Eventually you are down to a region with a single
point, and then you're done. It takes only log2(n) comparisons to
search in n points, and often less...

-Peter

Finding x is 1, and x is 'foo' comparisons in a code base	2	Jan 17, 2012
Which way is faster for a simple web reporting system: django, web.pyor cgi	1	May 9, 2008
Is there a way to configure IDLE to use spaces instead of tabs for indenting?	12	Aug 24, 2012
Is there a faster way to fetch data from stl map?	21	Dec 26, 2008
Is there a way to create kernel log messages via Python?	4	Oct 16, 2012
Recursive generator for combinations of a multiset?	0	Nov 21, 2013
Faster way to execute contents of a perl script?	7	Mar 10, 2005
Finding a bit string in another bitstring	9	Oct 30, 2009

A faster way of finding historical highs/lows

Eamonn Sullivan

Peter Hansen

Humpdydum

Eamonn Sullivan

David Fraser

Eamonn Sullivan

Peter Hansen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads