- Re: Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently

Discussion in 'Python' started by Jane Austine, Oct 5, 2004.

  1. Jane Austine

    Jane Austine Guest

    Peter L. Hansen wrote:
    >
    > Jane Austine wrote:
    > > I have student_names and their scores, which are between 0..100.
    > >
    > > I want to retrieve in the sorted order in terms of score, Nth
    > > student's name to N+10th student's name along with their scores.
    > > (suppose a web page showing top 10 students and there are "next page"
    > > link and if you follow the link there shows next top 10 students with
    > > "prev" and "next" page links.)
    > >
    > > The list is dynamically resized -- elements are added or deleted. The
    > > list could go fairly long enough that pulling all the data on memory
    > > at once is not a good choice.

    >
    > "Fairly long enough"? I imagine you'd have to have thousands of
    > students before just doing the simplest thing (ignore performance
    > and read the whole list) would become a noticable problem.


    First of all, thank you for replying me.

    Yes, I do have more than a thousand.

    >
    > It's rare in an environment like the web, with all the user and
    > network delays that are present, that optimizing at the level
    > you are trying to do is worth the effort.
    >


    Yes. Especially when there are CPU load limits(process going on like a
    few seconds using more than 10% will get warning) on the server, and
    it is shared among a few users(using a web hosting service).

    > > What algorithm and data/file structure are most efficient? (the
    > > environment is "cgi" based web environment)

    >
    > And unless mod_python is involved, CGI adds yet another significant
    > delay, masking any slight difference in performance from the optimal
    > approach and the simplistic one.
    >


    True. So the situation is not very good, and that's why I need a
    clever scheme. :)

    > > Simplest approach would be using btree from bsddb.

    >
    > Nope, simplest would be just use a list and ignore the theoretical
    > but unnoticable performance issues. Remember, the actual memory
    > allocation and manipulations of the list elements happens in C code,
    > not in Python, so it is very fast.
    >
    > > What can I do instead? What data structure allows you to access
    > > nth..n+10th data from the sorted list in a constant time?

    >
    > If you've got more than 1000 students, it _might_ be worth
    > analyzing this further. :)
    >


    Actually, it's not exactly students. They are some kind of elements
    and they have a few attributes(columns) like timestamp and etc, and
    can be sorted by a few columns(single key sorting though).

    Most of the time I suppose the lists would grow up to 10000 ~ 90000
    elements.

    -Jane
    Jane Austine, Oct 5, 2004
    #1
    1. Advertising

  2. Re: Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently

    "Jane Austine" <> wrote in message
    news:...
    > Peter L. Hansen wrote:
    > >
    > > Jane Austine wrote:
    > > > I have student_names and their scores, which are between 0..100.
    > > >
    > > > I want to retrieve in the sorted order in terms of score, Nth
    > > > student's name to N+10th student's name along with their scores.
    > > > (suppose a web page showing top 10 students and there are "next page"
    > > > link and if you follow the link there shows next top 10 students with
    > > > "prev" and "next" page links.)
    > > >
    > > > The list is dynamically resized -- elements are added or deleted. The
    > > > list could go fairly long enough that pulling all the data on memory
    > > > at once is not a good choice.

    > >
    > > "Fairly long enough"? I imagine you'd have to have thousands of
    > > students before just doing the simplest thing (ignore performance
    > > and read the whole list) would become a noticable problem.

    >
    > First of all, thank you for replying me.
    >
    > Yes, I do have more than a thousand.
    >
    > >
    > > It's rare in an environment like the web, with all the user and
    > > network delays that are present, that optimizing at the level
    > > you are trying to do is worth the effort.
    > >

    >
    > Yes. Especially when there are CPU load limits(process going on like a
    > few seconds using more than 10% will get warning) on the server, and
    > it is shared among a few users(using a web hosting service).
    >
    > > > What algorithm and data/file structure are most efficient? (the
    > > > environment is "cgi" based web environment)

    > >
    > > And unless mod_python is involved, CGI adds yet another significant
    > > delay, masking any slight difference in performance from the optimal
    > > approach and the simplistic one.
    > >

    >
    > True. So the situation is not very good, and that's why I need a
    > clever scheme. :)
    >
    > > > Simplest approach would be using btree from bsddb.

    > >
    > > Nope, simplest would be just use a list and ignore the theoretical
    > > but unnoticable performance issues. Remember, the actual memory
    > > allocation and manipulations of the list elements happens in C code,
    > > not in Python, so it is very fast.
    > >
    > > > What can I do instead? What data structure allows you to access
    > > > nth..n+10th data from the sorted list in a constant time?

    > >
    > > If you've got more than 1000 students, it _might_ be worth
    > > analyzing this further. :)
    > >

    >
    > Actually, it's not exactly students. They are some kind of elements
    > and they have a few attributes(columns) like timestamp and etc, and
    > can be sorted by a few columns(single key sorting though).
    >
    > Most of the time I suppose the lists would grow up to 10000 ~ 90000
    > elements.
    >
    > -Jane


    If using standalone RDBMS engine(like MySQL) is not your choice, and you
    want to stick to bsddb, take a look at test_basics.py in the lib/bsddb/test
    directory. There you will see BTreeRecnoTestCase.
    Changjune Kim, Oct 5, 2004
    #2
    1. Advertising

  3. Re: Data/File Structure and Algorithm for Retrieving Sorted Data Chunk Efficiently

    "Jane Austine" <> wrote in message
    news:...
    > Peter L. Hansen wrote:
    > >
    > > Jane Austine wrote:
    > > > I have student_names and their scores, which are between 0..100.
    > > >
    > > > I want to retrieve in the sorted order in terms of score, Nth
    > > > student's name to N+10th student's name along with their scores.
    > > > (suppose a web page showing top 10 students and there are "next page"
    > > > link and if you follow the link there shows next top 10 students with
    > > > "prev" and "next" page links.)
    > > >
    > > > The list is dynamically resized -- elements are added or deleted. The
    > > > list could go fairly long enough that pulling all the data on memory
    > > > at once is not a good choice.

    > >
    > > "Fairly long enough"? I imagine you'd have to have thousands of
    > > students before just doing the simplest thing (ignore performance
    > > and read the whole list) would become a noticable problem.

    >
    > First of all, thank you for replying me.
    >
    > Yes, I do have more than a thousand.
    >
    > >
    > > It's rare in an environment like the web, with all the user and
    > > network delays that are present, that optimizing at the level
    > > you are trying to do is worth the effort.
    > >

    >
    > Yes. Especially when there are CPU load limits(process going on like a
    > few seconds using more than 10% will get warning) on the server, and
    > it is shared among a few users(using a web hosting service).
    >
    > > > What algorithm and data/file structure are most efficient? (the
    > > > environment is "cgi" based web environment)

    > >
    > > And unless mod_python is involved, CGI adds yet another significant
    > > delay, masking any slight difference in performance from the optimal
    > > approach and the simplistic one.
    > >

    >
    > True. So the situation is not very good, and that's why I need a
    > clever scheme. :)
    >
    > > > Simplest approach would be using btree from bsddb.

    > >
    > > Nope, simplest would be just use a list and ignore the theoretical
    > > but unnoticable performance issues. Remember, the actual memory
    > > allocation and manipulations of the list elements happens in C code,
    > > not in Python, so it is very fast.
    > >
    > > > What can I do instead? What data structure allows you to access
    > > > nth..n+10th data from the sorted list in a constant time?

    > >
    > > If you've got more than 1000 students, it _might_ be worth
    > > analyzing this further. :)
    > >

    >
    > Actually, it's not exactly students. They are some kind of elements
    > and they have a few attributes(columns) like timestamp and etc, and
    > can be sorted by a few columns(single key sorting though).
    >
    > Most of the time I suppose the lists would grow up to 10000 ~ 90000
    > elements.
    >
    > -Jane


    If using standalone RDBMS engine(like MySQL) is not your choice, and you
    want to stick to bsddb, take a look at test_basics.py in the lib/bsddb/test
    directory. There you will see BTreeRecnoTestCase.
    Changjune Kim, Oct 5, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jane Austine
    Replies:
    14
    Views:
    769
    Dennis Lee Bieber
    Oct 9, 2004
  2. Hunk
    Replies:
    4
    Views:
    361
  3. Replies:
    6
    Views:
    1,981
  4. Sanjeeb
    Replies:
    3
    Views:
    408
    Ryan Kelly
    Aug 3, 2010
  5. bwv549
    Replies:
    3
    Views:
    409
    Eleanor McHugh
    Jun 17, 2009
Loading...

Share This Page