lookup by EnumSet

R

Roedy Green

I had a long and annoying dream that there was a Java Collection that
let you look up by EnumSet. It was not a simple Map.

It worked something like this: You could assign a set of binary
attributes to a Person, e.g. male/female, fat, thin, average, atheist,
Christian, Moslem, Jew, Buddhist. Asian, European, African, North
American, South American..

Then you could ask for all the fat or average females, Buddhist but
not Asian.

You might specify an EnumSet for what you want and one for what you
don't want. Anything not specified in either does not matter.

In the dream I was trying to write example code and an entry in the
Java glossary. When I woke, I could not think of such a class, and
further it was not obvious how one could be implemented.

I wondered how you would do it.

I thought you might extract the attributes into an array of longs and
check each one for compliance with your masks.

If the sets were stable, you might extract a BitSet for each
attribute, and do logical operations on giant bit strings of the
relevant bits.

I vaguely recall SQL databases optimising queries of this type by
transparently building inverse look up indexed.
 
D

Daniel Pitts

I had a long and annoying dream that there was a Java Collection that
let you look up by EnumSet. It was not a simple Map.

It worked something like this: You could assign a set of binary
attributes to a Person, e.g. male/female, fat, thin, average, atheist,
Christian, Moslem, Jew, Buddhist. Asian, European, African, North
American, South American..

Then you could ask for all the fat or average females, Buddhist but
not Asian.

You might specify an EnumSet for what you want and one for what you
don't want. Anything not specified in either does not matter.

In the dream I was trying to write example code and an entry in the
Java glossary. When I woke, I could not think of such a class, and
further it was not obvious how one could be implemented.

I wondered how you would do it.

I thought you might extract the attributes into an array of longs and
check each one for compliance with your masks.

If the sets were stable, you might extract a BitSet for each
attribute, and do logical operations on giant bit strings of the
relevant bits.

I vaguely recall SQL databases optimising queries of this type by
transparently building inverse look up indexed.

As many have said indirectly, it sounds like an RDBMS problem. For a
small enough data set, you could fit the indexes in memory, either as a
Collection<Person>, or if you're more space conscious a BitSet where the
bit number corresponds to a List<Person> index. would be one approach.
For example, Solr uses such a BitSet of "Document Index" to cache its
Lucene query results.

If you need to find intersections or unions, BitSets are fairly cheap if
your data set is small enough. One technique RDBMS query optimization
algorithms use is to estimate which index-based query would result in
the smallest set, and then iterate through each of those, filtering out
ones that don't match other parts of the query.
 
R

Robert Klemme

Syncleus dANN is a good open-source AI library. I know the guy behind
it; he's a genius.
<http://www.syncleus.com/>

Looks great! If only I had the time to look into all these interesting
developments... Well, better there's too much than too little - avoids
boredom reliably. :) Thank you Lew for the pointer!

Cheers

robert
 
R

Robert Klemme

I had a long and annoying dream that there was a Java Collection that
let you look up by EnumSet. It was not a simple Map.

It worked something like this: You could assign a set of binary
attributes to a Person, e.g. male/female, fat, thin, average, atheist,
Christian, Moslem, Jew, Buddhist. Asian, European, African, North
American, South American..

Then you could ask for all the fat or average females, Buddhist but
not Asian.

You might specify an EnumSet for what you want and one for what you
don't want. Anything not specified in either does not matter.

You need a multi dimensional index which can efficiently select from any
subset of dimensions given. Whether properties are boolean or need more
than one bit to represent is just a minor challenge here.
In the dream I was trying to write example code and an entry in the
Java glossary. When I woke, I could not think of such a class, and
further it was not obvious how one could be implemented.

I wondered how you would do it.

The most straightforward solution in memory would probably be something
like Map<${PropertyType},Set<Person>> per property. Depending on
application logic you would build these just once when reading the data
or update indexes whenever you update fields. Querying would use set
operations to reduce the set of results.
I thought you might extract the attributes into an array of longs and
check each one for compliance with your masks.

If the sets were stable, you might extract a BitSet for each
attribute, and do logical operations on giant bit strings of the
relevant bits.

There would be another use for BitSets: you create a BitSet per instance
which represents key state. And you store these keys along with the
I vaguely recall SQL databases optimising queries of this type by
transparently building inverse look up indexed.

There are two ways with RDBMS:

1. Create an index for every subset of properties that you want to use
as filter in a query. Note: indexes whose columns are a subset of
another index can be left out if you manage to make those columns
leading columns.

2. Create a bitmap index (in Oracle) for example. Bitmap indexes in
Oracle need coarse grained locks and thusly are not suited for OLTP
applications - they are usually used in DWH applications.

http://docs.oracle.com/cd/E11882_01/server.112/e25789/indexiot.htm#autoId12

In memory you could do the same although the bitmap type index would
probably contain references to objects instead of bits identifying
rowids. If memory is tight using a BitSet to point into a List<Person>
or Person[] might be worthwhile.

Kind regards

robert
 
R

Roedy Green

Maybe you're just using the wrong language for this task.

I don't actually have an application, just a nagging dream, and a
desire to document common problems.

However, I sketched a student project a long time ago about computer
matchmaking that could have used such an algorithm for initial
filtering before detailed compatibility calculation. I created a
detailed questionnaire for gay couples. I heard on the CBC radio the
other day that this scientists are studying the problem of what to
measure and how to match and publishing learned papers. In Canada it
is now more important that the traditional ways of getting paired up.
 
R

Robert Klemme

You need a multi dimensional index which can efficiently select from any
subset of dimensions given. Whether properties are boolean or need more
than one bit to represent is just a minor challenge here.

Just stumbled on this: "HyperDex employs a unique multi-dimensional hash function to enable efficient search operations — that is, objects may be retrieved without using the key (PDF) under which they are stored"

http://hardware.slashdot.org/story/12/02/22/1732221/is-it-time-for-nosql-20

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top