dbf.py API question concerning Index.index_search()

Discussion in 'Python' started by Ethan Furman, Aug 16, 2012.

  1. Ethan Furman

    Ethan Furman Guest

    Indexes have a new method (rebirth of an old one, really):

    .index_search(
    match,
    start=None,
    stop=None,
    nearest=False,
    partial=False )

    The defaults are to search the entire index for exact matches and raise
    NotFoundError if it can't find anything.

    match is the search criteria
    start and stop is the range to search in
    nearest returns where the match should be instead of raising an error
    partial will find partial matches

    The question is what should the return value be?

    I don't like the usual pattern of -1 meaning not found (as in
    'nothere'.find('a')), so I thought a fun and interesting way would be to
    subclass long and override the __nonzero__ method to return True/False
    based on whether the (partial) match was found. The main problems I see
    here is that the special return value reverts to a normal int/long if
    anything is done to it (adding, subtracting, etc), and the found status
    is lost.

    The other option is returning a (number, bool) tuple -- safer, yet more
    boring... ;)

    Thoughts?

    ~Ethan~
     
    Ethan Furman, Aug 16, 2012
    #1
    1. Advertising

  2. On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:

    > Indexes have a new method (rebirth of an old one, really):
    >
    > .index_search(
    > match,
    > start=None,
    > stop=None,
    > nearest=False,
    > partial=False )

    [...]

    Why "index_search" rather than just "search"?


    > The question is what should the return value be?
    >
    > I don't like the usual pattern of -1 meaning not found (as in
    > 'nothere'.find('a'))


    And you are right not to. The problem with returning -1 as a "not found"
    sentinel is that if it is mistakenly used where you would use a "found"
    result, your code silently does the wrong thing instead of giving an
    exception.

    So pick a sentinel value which *cannot* be used as a successful found
    result.

    Since successful searches return integer offsets (yes?), one possible
    sentinel might be None. (That's what re.search and re.match return
    instead of a MatchObject.) But first ensure that None is *not* valid
    input to any of your methods that take an integer.

    For example, if str.find was changed to return None instead of -1 that
    would not solve the problem, because None is a valid argument for slices:

    p = mystring.find(":")
    print(mystring[p:-1]) # Oops, no better with None

    You don't have to predict every imaginable failure mode or defend against
    utterly incompetent programmers, just against the obvious failure modes.

    If None is not suitable as a sentinel, create a constant value that can't
    be mistaken for anything else:

    class NotFoundType(object):
    def __repr__(self):
    return "Not Found"
    __str__ = __repr__

    NOTFOUND = NotFoundType()
    del NotFoundType


    and then return that.


    (By the way, I'm assuming that negative offsets are valid for your
    application. If they aren't, then using -1 as sentinel is perfectly safe,
    since passing a "not found" -1 as offset to another method will result in
    an immediate exception.)


    > The other option is returning a (number, bool) tuple -- safer, yet more
    > boring... ;)


    Boring is good, but it is also a PITA to use, and that's not good. I
    never remember whether the signature is (offset, flag) or (flag, offset),
    and if you get it wrong, your code will probably fail silently:

    py> flag, offset = (23, False) # Oops, I got it wrong.
    py> if flag:
    .... print("hello world"[offset+1:])
    ....
    ello world




    --
    Steven
     
    Steven D'Aprano, Aug 16, 2012
    #2
    1. Advertising

  3. Ethan Furman

    Ethan Furman Guest

    Steven D'Aprano wrote:
    > On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:
    >
    >> Indexes have a new method (rebirth of an old one, really):
    >>
    >> .index_search(
    >> match,
    >> start=None,
    >> stop=None,
    >> nearest=False,
    >> partial=False )

    > [...]
    >
    > Why "index_search" rather than just "search"?


    Because "search" already exists and returns a dbf.List of all matching
    records.

    ~Ethan~
     
    Ethan Furman, Aug 16, 2012
    #3
  4. Ethan Furman

    MRAB Guest

    On 16/08/2012 02:22, Ethan Furman wrote:
    > Steven D'Aprano wrote:
    >> On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:
    >>
    >>> Indexes have a new method (rebirth of an old one, really):
    >>>
    >>> .index_search(
    >>> match,
    >>> start=None,
    >>> stop=None,
    >>> nearest=False,
    >>> partial=False )

    >> [...]
    >>
    >> Why "index_search" rather than just "search"?

    >
    > Because "search" already exists and returns a dbf.List of all matching
    > records.
    >

    Perhaps that should've been called "find_all"!
     
    MRAB, Aug 16, 2012
    #4
  5. Ethan Furman

    Hans Mulder Guest

    On 16/08/12 01:26:09, Ethan Furman wrote:
    > Indexes have a new method (rebirth of an old one, really):
    >
    > .index_search(
    > match,
    > start=None,
    > stop=None,
    > nearest=False,
    > partial=False )
    >
    > The defaults are to search the entire index for exact matches and raise
    > NotFoundError if it can't find anything.
    >
    > match is the search criteria
    > start and stop is the range to search in
    > nearest returns where the match should be instead of raising an error
    > partial will find partial matches
    >
    > The question is what should the return value be?
    >
    > I don't like the usual pattern of -1 meaning not found (as in
    > 'nothere'.find('a')), so I thought a fun and interesting way would be to
    > subclass long and override the __nonzero__ method to return True/False
    > based on whether the (partial) match was found. The main problems I see
    > here is that the special return value reverts to a normal int/long if
    > anything is done to it (adding, subtracting, etc), and the found status
    > is lost.
    >
    > The other option is returning a (number, bool) tuple -- safer, yet more
    > boring... ;)


    I think you should go for the safe boring option, because in many use
    cases the caller will need to known whether the number you're returning
    is the index of a match or just the nearest non-match. The caller could
    redo the match to find out. But you have already done the match, so you
    might as well tell them the result.


    Hope this helps,

    -- HansM
     
    Hans Mulder, Aug 16, 2012
    #5
  6. Ethan Furman

    Ethan Furman Guest

    MRAB wrote:
    > On 16/08/2012 02:22, Ethan Furman wrote:
    >> Steven D'Aprano wrote:
    >>> On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:
    >>>
    >>>> Indexes have a new method (rebirth of an old one, really):
    >>>>
    >>>> .index_search(
    >>>> match,
    >>>> start=None,
    >>>> stop=None,
    >>>> nearest=False,
    >>>> partial=False )
    >>> [...]
    >>>
    >>> Why "index_search" rather than just "search"?

    >>
    >> Because "search" already exists and returns a dbf.List of all matching
    >> records.
    >>

    > Perhaps that should've been called "find_all"!


    In interesting thought.

    Currently there are:

    .index(data) --> returns index of data in Index, or raises error
    .query(string) --> brute force search, returns all matching records
    .search(match) --> binary search through table, returns all matching
    records

    'index' and 'query' are supported by Tables, Lists, and Indexes; search
    (and now index_search) are only supported on Indexes.

    ~Ethan~
     
    Ethan Furman, Aug 16, 2012
    #6
  7. Ethan Furman

    MRAB Guest

    On 16/08/2012 17:13, Ethan Furman wrote:
    > MRAB wrote:
    >> On 16/08/2012 02:22, Ethan Furman wrote:
    >>> Steven D'Aprano wrote:
    >>>> On Wed, 15 Aug 2012 16:26:09 -0700, Ethan Furman wrote:
    >>>>
    >>>>> Indexes have a new method (rebirth of an old one, really):
    >>>>>
    >>>>> .index_search(
    >>>>> match,
    >>>>> start=None,
    >>>>> stop=None,
    >>>>> nearest=False,
    >>>>> partial=False )
    >>>> [...]
    >>>>
    >>>> Why "index_search" rather than just "search"?
    >>>
    >>> Because "search" already exists and returns a dbf.List of all matching
    >>> records.
    >>>

    >> Perhaps that should've been called "find_all"!

    >
    > In interesting thought.
    >
    > Currently there are:
    >
    > .index(data) --> returns index of data in Index, or raises error
    > .query(string) --> brute force search, returns all matching records
    > .search(match) --> binary search through table, returns all matching
    > records
    >
    > 'index' and 'query' are supported by Tables, Lists, and Indexes; search
    > (and now index_search) are only supported on Indexes.
    >

    What exactly is the difference between .index and .index_search with
    the default arguments?
     
    MRAB, Aug 16, 2012
    #7
  8. Ethan Furman

    Ethan Furman Guest

    MRAB wrote:
    > On 16/08/2012 17:13, Ethan Furman wrote:
    >> Currently there are:
    >>
    >> .index(data) --> returns index of data in Index, or raises error
    >> .query(string) --> brute force search, returns all matching records
    >> .search(match) --> binary search through table, returns all matching
    >> records
    >>
    >> 'index' and 'query' are supported by Tables, Lists, and Indexes; search
    >> (and now index_search) are only supported on Indexes.
    >>

    > What exactly is the difference between .index and .index_search with
    > the default arguments?


    ..index requires a data structure that can be compared to a record
    (another record, a dictionary with the same field/key names, or a
    list/tuple with values in the same order as the fields). It returns the
    index or raises NotFoundError. It is brute force.

    ..index_search requires match criteria (a tuple with the desired values
    in the same order as the key). It returns the index or raises
    NotFoundError (unless nearest is True -- then the value returned is
    where the match should be). It is binary search.

    So the only similarity is that they both return a number or raise
    NotFoundError. What they use for the search and how they perform the
    search are both completely different.

    ~Ethan~
     
    Ethan Furman, Aug 16, 2012
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Ethan Furman

    API design question for dbf.py

    Ethan Furman, Jul 6, 2012, in forum: Python
    Replies:
    0
    Views:
    177
    Ethan Furman
    Jul 6, 2012
  2. Tim Chase
    Replies:
    0
    Views:
    126
    Tim Chase
    Aug 16, 2012
  3. Ethan Furman
    Replies:
    0
    Views:
    131
    Ethan Furman
    Aug 16, 2012
  4. Tim Chase
    Replies:
    0
    Views:
    145
    Tim Chase
    Aug 16, 2012
  5. MRAB
    Replies:
    0
    Views:
    163
Loading...

Share This Page