Re: Is there any library for indexing binary data?

Discussion in 'Python' started by Ìð¹Ï, Mar 25, 2010.

  1. Ìð¹Ï

    Ìð¹Ï Guest

    Well, Database is not proper because 1. the table is very big (~10^9
    rows) 2. we should support very fast *simple* query that is to get
    value corresponding to single key (~10^7 queries / second).

    Currently, I have implemented a specific algorithm to deal with my
    problem. However, I want to employ some library to simplify codings,
    otherwise I have to write my own code for each big table. It is
    possible that, after using indexing library, the program cannot run as
    fast as homemade code. But if it can greatly simplify my job and can
    provide satisfied speed (eg 10^5~10^6 queries / second), the indexing
    library is still a good choice for me.

    --
    ShenLei

    2010/3/25 Gabriel Genellina <>:
    > En Thu, 25 Mar 2010 00:28:58 -0300, Ìð¹Ï <>
    > escribi¨®:
    >
    >> Recently, I am finding a good library for build index on binary data.
    >> Xapian & Lucene for python binding focus on text digestion rather than
    >> binary data. Could anyone give me some recommendation? Is there any
    >> library for indexing binary data no matter whether it is written in
    >> python?
    >>
    >> In my case, there is a very big datatable which stores structured
    >> binary data, eg:
    >> struct Item
    >> {
    >> long id; // used as key
    >> double value;
    >> };
    >>
    >> I want to build the index on "id" field to speed on searching. Since
    >> this datatable is not constant, the library should support incremental
    >> indexing. If there is no suitable library, I have to do the index by
    >> myself...

    >
    > What about a database?
    >
    > --
    > Gabriel Genellina
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    Ìð¹Ï, Mar 25, 2010
    #1
    1. Advertising

  2. Ìð¹Ï

    Paul Rubin Guest

    甜瓜 <> writes:
    > Well, Database is not proper because 1. the table is very big (~10^9
    > rows) 2. we should support very fast *simple* query that is to get
    > value corresponding to single key (~10^7 queries / second).


    Just one numeric key/value pair in each row? What's wrong with
    universal hashing?

    PyJudy might also be of interest:
    http://www.dalkescientific.com/Python/PyJudy.html
    Paul Rubin, Mar 25, 2010
    #2
    1. Advertising

  3. Ìð¹Ï

    Ìð¹Ï Guest

    Thank you Rubin! Let me have a look at Judy. It seems good at first glance.

    --
    ShenLei

    2010/3/25 Paul Rubin <>:
    > Ìð¹Ï <> writes:
    >> Well, Database is not proper because 1. the table is very big (~10^9
    >> rows) 2. we should support very fast *simple* query that is to get
    >> value corresponding to single key (~10^7 queries / second).

    >
    > Just one numeric key/value pair in each row? What's wrong with
    > universal hashing?
    >
    > PyJudy might also be of interest:
    > http://www.dalkescientific.com/Python/PyJudy.html
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >
    Ìð¹Ï, Mar 25, 2010
    #3
  4. Ìð¹Ï

    John Nagle Guest

    Ìð¹Ï wrote:
    > Well, Database is not proper because 1. the table is very big (~10^9
    > rows) 2. we should support very fast *simple* query that is to get
    > value corresponding to single key (~10^7 queries / second).


    Ah, crypto rainbow tables.

    John Nagle
    John Nagle, Mar 26, 2010
    #4
  5. In article <>,
    =?GB2312?B?zPC5zw==?= <> wrote:
    >Well, Database is not proper because 1. the table is very big (~10^9
    >rows) 2. we should support very fast *simple* query that is to get
    >value corresponding to single key (~10^7 queries / second).
    >
    >Currently, I have implemented a specific algorithm to deal with my
    >problem. However, I want to employ some library to simplify codings,
    >otherwise I have to write my own code for each big table. It is
    >possible that, after using indexing library, the program cannot run as
    >fast as homemade code. But if it can greatly simplify my job and can
    >provide satisfied speed (eg 10^5~10^6 queries / second), the indexing
    >library is still a good choice for me.


    At first sight this looks to me like B-trees would be an ideal
    solution. The first levels of the tree are in memory, so with some
    luck you have only one or two disk accesses per search.

    >
    >--
    >ShenLei
    >


    Groetjes Albert

    --
    --
    Albert van der Horst, UTRECHT,THE NETHERLANDS
    Economic growth -- being exponential -- ultimately falters.
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
    Albert van der Horst, Apr 4, 2010
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. C
    Replies:
    0
    Views:
    498
  2. Emin
    Replies:
    4
    Views:
    410
    Paul McGuire
    Jan 12, 2007
  3. Ìð¹Ï
    Replies:
    4
    Views:
    477
    Ìð¹Ï
    Mar 26, 2010
  4. Skybuck Flying
    Replies:
    30
    Views:
    1,103
    Bill Reid
    Sep 19, 2011
  5. C
    Replies:
    3
    Views:
    219
    Manohar Kamath [MVP]
    Oct 17, 2003
Loading...

Share This Page