lookup by EnumSet

Discussion in 'Java' started by Roedy Green, Feb 28, 2012.

  1. Roedy Green

    Roedy Green Guest

    I had a long and annoying dream that there was a Java Collection that
    let you look up by EnumSet. It was not a simple Map.

    It worked something like this: You could assign a set of binary
    attributes to a Person, e.g. male/female, fat, thin, average, atheist,
    Christian, Moslem, Jew, Buddhist. Asian, European, African, North
    American, South American..

    Then you could ask for all the fat or average females, Buddhist but
    not Asian.

    You might specify an EnumSet for what you want and one for what you
    don't want. Anything not specified in either does not matter.

    In the dream I was trying to write example code and an entry in the
    Java glossary. When I woke, I could not think of such a class, and
    further it was not obvious how one could be implemented.

    I wondered how you would do it.

    I thought you might extract the attributes into an array of longs and
    check each one for compliance with your masks.

    If the sets were stable, you might extract a BitSet for each
    attribute, and do logical operations on giant bit strings of the
    relevant bits.

    I vaguely recall SQL databases optimising queries of this type by
    transparently building inverse look up indexed.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    One of the most useful comments you can put in a program is
    "If you change this, remember to change ?XXX? too".
     
    Roedy Green, Feb 28, 2012
    #1
    1. Advertising

  2. Roedy Green

    Daniel Pitts Guest

    On 2/28/12 5:55 AM, Roedy Green wrote:
    > I had a long and annoying dream that there was a Java Collection that
    > let you look up by EnumSet. It was not a simple Map.
    >
    > It worked something like this: You could assign a set of binary
    > attributes to a Person, e.g. male/female, fat, thin, average, atheist,
    > Christian, Moslem, Jew, Buddhist. Asian, European, African, North
    > American, South American..
    >
    > Then you could ask for all the fat or average females, Buddhist but
    > not Asian.
    >
    > You might specify an EnumSet for what you want and one for what you
    > don't want. Anything not specified in either does not matter.
    >
    > In the dream I was trying to write example code and an entry in the
    > Java glossary. When I woke, I could not think of such a class, and
    > further it was not obvious how one could be implemented.
    >
    > I wondered how you would do it.
    >
    > I thought you might extract the attributes into an array of longs and
    > check each one for compliance with your masks.
    >
    > If the sets were stable, you might extract a BitSet for each
    > attribute, and do logical operations on giant bit strings of the
    > relevant bits.
    >
    > I vaguely recall SQL databases optimising queries of this type by
    > transparently building inverse look up indexed.


    As many have said indirectly, it sounds like an RDBMS problem. For a
    small enough data set, you could fit the indexes in memory, either as a
    Collection<Person>, or if you're more space conscious a BitSet where the
    bit number corresponds to a List<Person> index. would be one approach.
    For example, Solr uses such a BitSet of "Document Index" to cache its
    Lucene query results.

    If you need to find intersections or unions, BitSets are fairly cheap if
    your data set is small enough. One technique RDBMS query optimization
    algorithms use is to estimate which index-based query would result in
    the smallest set, and then iterate through each of those, filtering out
    ones that don't match other parts of the query.
     
    Daniel Pitts, Feb 28, 2012
    #2
    1. Advertising

  3. On 28.02.2012 18:31, Lew wrote:
    > Syncleus dANN is a good open-source AI library. I know the guy behind
    > it; he's a genius.
    > <http://www.syncleus.com/>


    Looks great! If only I had the time to look into all these interesting
    developments... Well, better there's too much than too little - avoids
    boredom reliably. :) Thank you Lew for the pointer!

    Cheers

    robert


    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Feb 28, 2012
    #3
  4. On 28.02.2012 14:55, Roedy Green wrote:
    > I had a long and annoying dream that there was a Java Collection that
    > let you look up by EnumSet. It was not a simple Map.
    >
    > It worked something like this: You could assign a set of binary
    > attributes to a Person, e.g. male/female, fat, thin, average, atheist,
    > Christian, Moslem, Jew, Buddhist. Asian, European, African, North
    > American, South American..
    >
    > Then you could ask for all the fat or average females, Buddhist but
    > not Asian.
    >
    > You might specify an EnumSet for what you want and one for what you
    > don't want. Anything not specified in either does not matter.


    You need a multi dimensional index which can efficiently select from any
    subset of dimensions given. Whether properties are boolean or need more
    than one bit to represent is just a minor challenge here.

    > In the dream I was trying to write example code and an entry in the
    > Java glossary. When I woke, I could not think of such a class, and
    > further it was not obvious how one could be implemented.
    >
    > I wondered how you would do it.


    The most straightforward solution in memory would probably be something
    like Map<${PropertyType},Set<Person>> per property. Depending on
    application logic you would build these just once when reading the data
    or update indexes whenever you update fields. Querying would use set
    operations to reduce the set of results.

    > I thought you might extract the attributes into an array of longs and
    > check each one for compliance with your masks.
    >
    > If the sets were stable, you might extract a BitSet for each
    > attribute, and do logical operations on giant bit strings of the
    > relevant bits.


    There would be another use for BitSets: you create a BitSet per instance
    which represents key state. And you store these keys along with the
    data in a Map<BitSet,Person>. Downside: this only works good for static
    data and exact matches.

    > I vaguely recall SQL databases optimising queries of this type by
    > transparently building inverse look up indexed.


    There are two ways with RDBMS:

    1. Create an index for every subset of properties that you want to use
    as filter in a query. Note: indexes whose columns are a subset of
    another index can be left out if you manage to make those columns
    leading columns.

    2. Create a bitmap index (in Oracle) for example. Bitmap indexes in
    Oracle need coarse grained locks and thusly are not suited for OLTP
    applications - they are usually used in DWH applications.

    http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm#autoId12

    In memory you could do the same although the bitmap type index would
    probably contain references to objects instead of bits identifying
    rowids. If memory is tight using a BitSet to point into a List<Person>
    or Person[] might be worthwhile.

    Kind regards

    robert


    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Feb 28, 2012
    #4
  5. Roedy Green

    Roedy Green Guest

    On Tue, 28 Feb 2012 07:03:57 -0800, Peter Duniho
    <> wrote, quoted or indirectly quoted
    someone who said :

    >Maybe you're just using the wrong language for this task.


    I don't actually have an application, just a nagging dream, and a
    desire to document common problems.

    However, I sketched a student project a long time ago about computer
    matchmaking that could have used such an algorithm for initial
    filtering before detailed compatibility calculation. I created a
    detailed questionnaire for gay couples. I heard on the CBC radio the
    other day that this scientists are studying the problem of what to
    measure and how to match and publishing learned papers. In Canada it
    is now more important that the traditional ways of getting paired up.

    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    One of the most useful comments you can put in a program is
    "If you change this, remember to change ?XXX? too".
     
    Roedy Green, Feb 28, 2012
    #5
  6. On Tuesday, February 28, 2012 11:08:47 PM UTC+1, Robert Klemme wrote:
    > You need a multi dimensional index which can efficiently select from any
    > subset of dimensions given. Whether properties are boolean or need more
    > than one bit to represent is just a minor challenge here.


    Just stumbled on this: "HyperDex employs a unique multi-dimensional hash function to enable efficient search operations — that is, objects may be retrieved without using the key (PDF) under which they are stored"

    http://hardware.slashdot.org/story/12/02/22/1732221/is-it-time-for-nosql-20

    Kind regards

    robert
     
    Robert Klemme, Mar 1, 2012
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roedy Green

    EnumSet, what the ?

    Roedy Green, Jul 8, 2005, in forum: Java
    Replies:
    6
    Views:
    1,875
    George Cherry
    Jul 9, 2005
  2. George Cherry

    EnumSet--what the...?

    George Cherry, Jul 9, 2005, in forum: Java
    Replies:
    0
    Views:
    2,370
    George Cherry
    Jul 9, 2005
  3. Roedy Green

    EnumSet Generics puzzle

    Roedy Green, Aug 18, 2005, in forum: Java
    Replies:
    11
    Views:
    1,562
    Thomas Hawtin
    Aug 22, 2005
  4. Ulrich Scholz

    EnumSet + contains: strange behavior

    Ulrich Scholz, May 31, 2006, in forum: Java
    Replies:
    10
    Views:
    1,639
    Christian Kaufhold
    Jun 4, 2006
  5. Eric Smith
    Replies:
    19
    Views:
    2,407
    Eric Smith
    May 17, 2007
Loading...

Share This Page