Re: Get reference to object in Set

Discussion in 'Java' started by Knute Johnson, Mar 5, 2009.

  1. Patricia Shanahan wrote:
    > I think this question has already been asked, but I have failed to work
    > out a good search term to find the previous discussion.
    >
    > Suppose I have a Set<MyType> mySet and a MyType myObject.
    > mySet.contains(myObject) is true. I need a reference to the equal object
    > that is already in mySet.
    >
    > Currently, I do this by maintaining a Map<MyType,MyType>. When the get
    > call returns null I do a myMap.put(myObject, myObject), so that future
    > get calls with equal objects will return the actual reference.
    >
    > Is there a better way to handle this? Any way to make a Set return a
    > reference to the object it contains?
    >
    > Patricia


    How does .contains() find the Object? I would guess that in an
    unordered Set it must iterate.

    If you are using a Navigable or SortedSet you could use the exclusive
    subSet() method.

    --

    Knute Johnson
    email s/nospam/knute2009/

    --
    Posted via NewsDemon.com - Premium Uncensored Newsgroup Service
    ------->>>>>>http://www.NewsDemon.com<<<<<<------
    Unlimited Access, Anonymous Accounts, Uncensored Broadband Access
     
    Knute Johnson, Mar 5, 2009
    #1
    1. Advertising

  2. Knute Johnson

    Tom Anderson Guest

    On Thu, 5 Mar 2009, Knute Johnson wrote:

    > Patricia Shanahan wrote:
    >> I think this question has already been asked, but I have failed to work
    >> out a good search term to find the previous discussion.
    >>
    >> Suppose I have a Set<MyType> mySet and a MyType myObject.
    >> mySet.contains(myObject) is true. I need a reference to the equal object
    >> that is already in mySet.
    >>
    >> Currently, I do this by maintaining a Map<MyType,MyType>. When the get
    >> call returns null I do a myMap.put(myObject, myObject), so that future
    >> get calls with equal objects will return the actual reference.
    >>
    >> Is there a better way to handle this? Any way to make a Set return a
    >> reference to the object it contains?

    >
    > How does .contains() find the Object? I would guess that in an
    > unordered Set it must iterate.


    Well, in a HashSet, it uses a hashtable.

    > If you are using a Navigable or SortedSet you could use the exclusive
    > subSet() method.


    Brilliant! Even better, with a NavigableSet, you can just call ceiling or
    floor with your object - for an object which you know is in the set, those
    will return that object.

    tom

    --
    20 Minutes into the Future
     
    Tom Anderson, Mar 5, 2009
    #2
    1. Advertising

  3. Knute Johnson

    Guest

    On Mar 5, 5:24 pm, Knute Johnson <>
    wrote:
    > Patricia Shanahan wrote:
    > > I think this question has already been asked, but I have failed to work
    > > out a good search term to find the previous discussion.

    >
    > > Suppose I have a Set<MyType> mySet and a MyType myObject.
    > > mySet.contains(myObject) is true. I need a reference to the equal object
    > > that is already in mySet.

    >
    > > Currently, I do this by maintaining a Map<MyType,MyType>. When the get
    > > call returns null I do a myMap.put(myObject, myObject), so that future
    > > get calls with equal objects will return the actual reference.

    >
    > > Is there a better way to handle this? Any way to make a Set return a
    > > reference to the object it contains?

    >
    > > Patricia

    >
    > How does .contains() find the Object? I would guess that in an
    > unordered Set it must iterate.


    Isn't kinda the whole point of a set/map to use hashes
    to have constant time for such operations?

    For example, from the Javadoc for HashSet (which states
    that it makes no guarantees regarding as to the iteration
    order over the set -- which I then deduce match your
    definition of an unordered set):

    * This class offers constant time performance for the
    * basic operations add, remove, contains and size,
    * assuming the hash function disperses the elements
    * properly among the buckets.
     
    , Mar 6, 2009
    #3
  4. Knute Johnson

    Daniel Pitts Guest

    Patricia Shanahan wrote:
    > wrote:
    >> On Mar 5, 5:24 pm, Knute Johnson <>
    >> wrote:
    >>> Patricia Shanahan wrote:
    >>>> I think this question has already been asked, but I have failed to work
    >>>> out a good search term to find the previous discussion.
    >>>> Suppose I have a Set<MyType> mySet and a MyType myObject.
    >>>> mySet.contains(myObject) is true. I need a reference to the equal
    >>>> object
    >>>> that is already in mySet.
    >>>> Currently, I do this by maintaining a Map<MyType,MyType>. When the get
    >>>> call returns null I do a myMap.put(myObject, myObject), so that future
    >>>> get calls with equal objects will return the actual reference.
    >>>> Is there a better way to handle this? Any way to make a Set return a
    >>>> reference to the object it contains?
    >>>> Patricia
    >>> How does .contains() find the Object? I would guess that in an
    >>> unordered Set it must iterate.

    >>
    >> Isn't kinda the whole point of a set/map to use hashes
    >> to have constant time for such operations?

    >
    > True for HashSet and HashMap, but not necessarily true in general.
    > TreeSet, for example, has O(log n) time for contains.
    >
    > Patricia

    "X is true for Apples"
    "No, X is definitely not true for Oranges!"

    neuneudr was talking specifically about unordered sets and maps, which
    excludes tree sets and tree maps..


    --
    Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
     
    Daniel Pitts, Mar 6, 2009
    #4
  5. Knute Johnson

    Tom Anderson Guest

    On Fri, 6 Mar 2009, Daniel Pitts wrote:

    > Patricia Shanahan wrote:
    >> wrote:
    >>> On Mar 5, 5:24 pm, Knute Johnson <>
    >>> wrote:
    >>>> Patricia Shanahan wrote:
    >>>>> I think this question has already been asked, but I have failed to work
    >>>>> out a good search term to find the previous discussion.
    >>>>> Suppose I have a Set<MyType> mySet and a MyType myObject.
    >>>>> mySet.contains(myObject) is true. I need a reference to the equal object
    >>>>> that is already in mySet.
    >>>>> Currently, I do this by maintaining a Map<MyType,MyType>. When the get
    >>>>> call returns null I do a myMap.put(myObject, myObject), so that future
    >>>>> get calls with equal objects will return the actual reference.
    >>>>> Is there a better way to handle this? Any way to make a Set return a
    >>>>> reference to the object it contains?
    >>>>> Patricia
    >>>> How does .contains() find the Object? I would guess that in an
    >>>> unordered Set it must iterate.
    >>>
    >>> Isn't kinda the whole point of a set/map to use hashes
    >>> to have constant time for such operations?

    >>
    >> True for HashSet and HashMap, but not necessarily true in general.
    >> TreeSet, for example, has O(log n) time for contains.
    >>
    >> Patricia

    > "X is true for Apples"
    > "No, X is definitely not true for Oranges!"
    >
    > neuneudr was talking specifically about unordered sets and maps, which
    > excludes tree sets and tree maps..


    Even then, it isn't true. I could write an unordered set on top of a
    LinkedList, and it would be O(n). It'd be madness, but i could do it.

    More usefully, i actually did write an unordered map using a patricia tree
    - i used the keys' hashcodes as the key strings. I think that'd be about
    O(log n).

    tom

    --
    if you can't beat them, build them
     
    Tom Anderson, Mar 6, 2009
    #5
  6. Knute Johnson

    blue indigo Guest

    On Fri, 06 Mar 2009 16:20:40 +0000, Tom Anderson wrote:

    > More usefully, i actually did write an unordered map using a patricia tree
    > - i used the keys' hashcodes as the key strings. I think that'd be about
    > O(log n).


    I don't see how this would be more useful than a HashMap, unless an
    array-based implementation was out of the question, maybe due to the sheer
    size of the collection. (Billions or more of entries?)

    And why call this a "patricia tree" -- I'm not familiar with that usage,
    though I've had plenty of exposure to all kinds of trees over the course
    of my career, B-trees, binary, red-black, several oddball ternary tree
    species, and yes, whole disjoint set forests of heap-like trees.

    Does the name have anything to do with Patricia Shanahan, or is that
    a coincidence?

    --
    blue indigo
    UA Telecom since 1987
     
    blue indigo, Mar 6, 2009
    #6
  7. Knute Johnson

    Tom Anderson Guest

    On Fri, 6 Mar 2009, blue indigo wrote:

    > On Fri, 06 Mar 2009 16:20:40 +0000, Tom Anderson wrote:
    >
    >> More usefully, i actually did write an unordered map using a patricia tree
    >> - i used the keys' hashcodes as the key strings. I think that'd be about
    >> O(log n).

    >
    > I don't see how this would be more useful than a HashMap, unless an
    > array-based implementation was out of the question, maybe due to the
    > sheer size of the collection. (Billions or more of entries?)


    It's not really useful - it was more of an interesting exercise. The one
    potential advantage is that a tree never has to do a resize operation as
    it grows, unlike a hashtable, or any balancing operations, like a
    comparison-based tree, so although the amortized cost of each insertion
    may be higher, you know that they're all going to be about the same, as
    opposed to being really cheap most of the time and incredibly expensive
    occasionally. That could be useful in realtime systems.

    Patricia trees have always been a favourite of mine because their
    structure is dependent only on their contents, and not their history,
    unlike most binary trees. That makes them easier to reason about and
    verify, and means that you don't have to worry about keeping them
    balanced. That's not to say that a Patricia tree is naturally balanced -
    try making one containing all the powers of two as keys - but it's one
    less bit of complexity to worry about.

    > And why call this a "patricia tree" -- I'm not familiar with that usage,
    > though I've had plenty of exposure to all kinds of trees over the course
    > of my career, B-trees, binary, red-black, several oddball ternary tree
    > species, and yes, whole disjoint set forests of heap-like trees.


    You evidently haven't had quite enough exposure:

    http://en.wikipedia.org/w/index.php?title=Patricia_tree
    http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA/

    Wikipedia calls it a radix tree, but Sedgewick, where i learned my
    data structures, calls it a patricia tree.

    > Does the name have anything to do with Patricia Shanahan, or is that a
    > coincidence?


    As far as i know, there's no connection.

    tom

    --
    if you can't beat them, build them
     
    Tom Anderson, Mar 6, 2009
    #7
  8. Knute Johnson

    blue indigo Guest

    On Fri, 06 Mar 2009 18:52:33 +0000, Tom Anderson wrote:

    >> And why call this a "patricia tree" -- I'm not familiar with that usage,
    >> though I've had plenty of exposure to all kinds of trees over the course
    >> of my career, B-trees, binary, red-black, several oddball ternary tree
    >> species, and yes, whole disjoint set forests of heap-like trees.

    >
    > You evidently haven't had quite enough exposure:
    >
    > http://en.wikipedia.org/w/index.php?title=Patricia_tree


    Ah. It's just a radix tree. I already knew about those, of course, but
    under the name "radix tree". :)

    The idea of merging identical subtrees (making a directed acyclic graph)
    to optimize a no-longer-changing one is certainly interesting. This would
    fit right in with an "unmodifiableTree()" sort of method in a Java API,
    given that this method made a copy instead of modifying the existing tree.

    --
    blue indigo
    UA Telecom since 1987
     
    blue indigo, Mar 9, 2009
    #8
  9. Knute Johnson

    Tom Anderson Guest

    On Mon, 9 Mar 2009, blue indigo wrote:

    > On Fri, 06 Mar 2009 18:52:33 +0000, Tom Anderson wrote:
    >
    >>> And why call this a "patricia tree" -- I'm not familiar with that usage,
    >>> though I've had plenty of exposure to all kinds of trees over the course
    >>> of my career, B-trees, binary, red-black, several oddball ternary tree
    >>> species, and yes, whole disjoint set forests of heap-like trees.

    >>
    >> You evidently haven't had quite enough exposure:
    >>
    >> http://en.wikipedia.org/w/index.php?title=Patricia_tree

    >
    > Ah. It's just a radix tree. I already knew about those, of course, but
    > under the name "radix tree". :)


    'just'!

    > The idea of merging identical subtrees (making a directed acyclic graph)
    > to optimize a no-longer-changing one is certainly interesting. This
    > would fit right in with an "unmodifiableTree()" sort of method in a Java
    > API, given that this method made a copy instead of modifying the
    > existing tree.


    That's an interesting idea, although that's not what a Patricia tree does.
    Also, i don't think you're likely to find any identical subtrees to do
    that to - bear in mind that the values in the leaves would have to be
    identical.

    tom

    --
    I don't wanna know your name, i just want BANG BANG BANG!
     
    Tom Anderson, Mar 9, 2009
    #9
  10. Knute Johnson

    blue indigo Guest

    On Mon, 09 Mar 2009 20:00:57 +0000, Tom Anderson wrote:

    > On Mon, 9 Mar 2009, blue indigo wrote:
    >
    >> On Fri, 06 Mar 2009 18:52:33 +0000, Tom Anderson wrote:
    >>
    >>>> And why call this a "patricia tree" -- I'm not familiar with that usage,
    >>>> though I've had plenty of exposure to all kinds of trees over the course
    >>>> of my career, B-trees, binary, red-black, several oddball ternary tree
    >>>> species, and yes, whole disjoint set forests of heap-like trees.
    >>>
    >>> You evidently haven't had quite enough exposure:
    >>>
    >>> http://en.wikipedia.org/w/index.php?title=Patricia_tree

    >>
    >> Ah. It's just a radix tree. I already knew about those, of course, but
    >> under the name "radix tree". :)

    >
    > 'just'!


    "Just" as in it isn't actually something really new to me after all. :)

    >> The idea of merging identical subtrees (making a directed acyclic
    >> graph) to optimize a no-longer-changing one is certainly interesting.
    >> This would fit right in with an "unmodifiableTree()" sort of method in
    >> a Java API, given that this method made a copy instead of modifying the
    >> existing tree.

    >
    > That's an interesting idea, although that's not what a Patricia tree does.
    > Also, i don't think you're likely to find any identical subtrees to do
    > that to - bear in mind that the values in the leaves would have to be
    > identical.


    The idea came from the Wikipedia article; I claim no originality. It would
    be for the Set<String> rather than Map<String,Foo> form; the latter would
    have the problem that the values in the leaves wouldn't usually be
    identical. Implementing a Set<String> with this wouldn't require any
    values in the leaves at all though.

    Actually, the big problem with implementing anything like this in Java is
    that a Java char is 16 bits wide. Storing 65536 child pointers (even when
    they'll mostly be nulls) in every node is not going to be very efficient.
    Nor is converting every string to UTF-8 on the way into its API.

    --
    blue indigo
    UA Telecom since 1987
     
    blue indigo, Mar 11, 2009
    #10
  11. blue indigo wrote:
    > Actually, the big problem with implementing anything like this in
    > Java is that a Java char is 16 bits wide. Storing 65536 child
    > pointers (even when they'll mostly be nulls) in every node is not
    > going to be very efficient. Nor is converting every string to UTF-8
    > on the way into its API.


    Patricia trees don't require that, any more than they require 256
    pointers per node on ASCII-based systems. The decision tree is
    per-bit, so each node has at most two children, one for 0 and one for
    1.
     
    Mike Schilling, Mar 11, 2009
    #11
  12. Knute Johnson

    blue indigo Guest

    On Wed, 5671 Sep 1993 08:33:14 -0700, Mike Schilling wrote:

    > blue indigo wrote:
    >> Actually, the big problem with implementing anything like this in
    >> Java is that a Java char is 16 bits wide. Storing 65536 child
    >> pointers (even when they'll mostly be nulls) in every node is not
    >> going to be very efficient. Nor is converting every string to UTF-8
    >> on the way into its API.

    >
    > Patricia trees don't require that, any more than they require 256
    > pointers per node on ASCII-based systems. The decision tree is
    > per-bit, so each node has at most two children, one for 0 and one for
    > 1.


    The Wikipedia article (and my own past experience with radix trees)
    strongly implies that it is typical to use a small array (but bigger than
    two).

    I thought of splitting the char into nybbles and having arrays of 16
    children; I don't know if you'd get any better speed performance than a
    HashMap<String,Foo> though when you account for the overhead of doing a
    bunch of bit-twiddling (masking and shifting, mainly) on the chars and
    traversing four pointers per char rather than one, and accounting for Java
    Strings memoizing their hashes.

    Figuring out an optimization based on the likelihood that, in a typical
    application, the high bytes will tend to mainly have one or a very few
    values seemed nontrivial when I tried it. Stripping information (such as
    the high byte) creates the possibility of collisions and then you need
    hash buckets.

    Certainly with strings or anything else representable as a long bit stream
    whose length is divisible by n, any radix 2^k with k dividing n can be
    used as the tree radix though. In this case, 2, 4, 8, and 16 are possible
    (since Java's char is 16 bits), but I'd decided that 8 and 16 looked
    unwieldy due to array size and 2 due to the depth of the resulting trees.
    That's why I was leaning toward 4.

    As for the Set<String> version, one application I thought of involves
    anything that resembles web spidering or similar forms of search
    technology where resources' handles are long strings that often share
    common prefixes. URLs in particular have this trait. Storing the set
    of seen-but-not-visited-yet URLs could be vastly more efficient using a
    patricia tree than using a HashMap or similar structure. You eliminate
    duplicates and when you hit the same URL from multiple places, you can
    efficiently find and update an associated value (say, to boost its page
    rank or whatever). Moreover though, you don't take up dozens of bytes
    for every occurrence of
    "http://some.really.com/long/url/prefix/with/bells/on/" in a site with
    deeply nested directory structures, nor the overhead of a hashtable
    (typically 33% space overhead for the slack space and then some more for
    the hashes themselves -- each Java String will memoize its hash when its
    hash is first used -- and some time overhead for sometimes growing the
    array, for hashing, and for equals() tests; the tree effectively only does
    the latter).

    Another case where it clearly wins over HashFoo seems to be if you want to
    find by prefix. Incremental search, tab-completion, and similar UI
    features can use this, the former in conjunction with an index of some
    sort (e.g. a patricia tree of document words associating them with some
    information about their locations) and the latter with a history of some
    sort (stored similarly).

    A possible application then is an html browser optimized for local
    documentation viewing, which might want all of:
    * A tab-completing history in the address box
    * Incremental search in page
    * The ability to spider a local documentation tree and create an index;
    local filesystem URLs are especially likely to have very long common
    prefixes (and javadocs are particularly guilty in my experience);
    I'd love to have the ability to "google" my local javadocs! Especially
    since googling Sun's web site for javadocs not only requires more
    typing (adding "site:sun.com" to the query) but tends to find random
    versions of everything (JCheckBox: 1.4.2; HashMap: 1.4.2; ImageIO:
    1.4.2; SuppressWarnings: 5.0; etc.; using "I'm Feeling Lucky"; 6.0 is
    current whereas 1.4.x is being sunsetted) and can't be done offline.
    Plus I can only search one documentation tree at a time; I need a
    different "site:" bit to search, say, the Apache Commons docs instead of
    Sun's. (On the plus side, a URL found by Google can be posted or mailed
    and it will work for other people.)

    --
    blue indigo
    UA Telecom since 1987
     
    blue indigo, Mar 12, 2009
    #12
  13. blue indigo wrote:
    > On Wed, 5671 Sep 1993 08:33:14 -0700, Mike Schilling wrote:
    >
    >> blue indigo wrote:
    >>> Actually, the big problem with implementing anything like this in
    >>> Java is that a Java char is 16 bits wide. Storing 65536 child
    >>> pointers (even when they'll mostly be nulls) in every node is not
    >>> going to be very efficient. Nor is converting every string to UTF-8
    >>> on the way into its API.

    >>
    >> Patricia trees don't require that, any more than they require 256
    >> pointers per node on ASCII-based systems. The decision tree is
    >> per-bit, so each node has at most two children, one for 0 and one for
    >> 1.

    >
    > The Wikipedia article (and my own past experience with radix trees)
    > strongly implies that it is typical to use a small array (but bigger
    > than two).


    I think the Wikipedia article conflates radix and Patricia trees at the
    expense of explaining the latter. Here's a more specific desciption:
    http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA/
     
    Mike Schilling, Mar 13, 2009
    #13
  14. Knute Johnson

    Lew Guest

    Steven Simpson wrote:
    > blue indigo wrote:
    >> Especially
    >> since googling Sun's web site for javadocs not only requires more
    >> typing (adding "site:sun.com" to the query) but tends to find random
    >> versions of everything (JCheckBox: 1.4.2; HashMap: 1.4.2; ImageIO:
    >> 1.4.2; SuppressWarnings: 5.0; etc.; using "I'm Feeling Lucky"; 6.0 is
    >> current whereas 1.4.x is being sunsetted)

    >
    > The 'site:' term can take a full URI prefix, btw:
    >
    > site:http://java.sun.com/javase/6/docs/api/
    >
    > Of course, that's even more typing... ;-)


    All those farmers, steel workers, roofers, trash collectors, building
    contractors, carpenters, trash collectors, auto workers, ditch diggers and
    jobless folk sure feel sorry for us when we have to do extra typing.

    --
    Lew
     
    Lew, Mar 13, 2009
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Parthiv Joshi
    Replies:
    2
    Views:
    706
    Kalyan
    Jul 2, 2004
  2. Suresh Kojhani
    Replies:
    1
    Views:
    2,407
    Anushi
    Jul 29, 2004
  3. Chris Fink
    Replies:
    2
    Views:
    4,128
    David Waz...
    Jul 3, 2003
  4. yysiow
    Replies:
    1
    Views:
    453
    Kevin Spencer
    Jul 12, 2003
  5. Replies:
    1
    Views:
    502
    Mark Rae [MVP]
    Sep 20, 2007
Loading...

Share This Page