Understanding search queries, semantics, and "Meaning" ...aren't weall looking for meaning?

Discussion in 'Python' started by 5lvqbwl02@sneakemail.com, Dec 30, 2008.

  1. Guest

    I have Section 4.4.1 of SICP rattling around in my head (database
    queries), and I'm trying to come up with a simple dictionary-based
    database in Python to represent circuit diagrams. My main confusion
    isn't one of implementation, but a matter of "big thinking",
    fundamentally, about the problem. Please don't suggest using a SQL
    library, as I'm looking to learn to fish, so to speak, and to learn a
    bit about the biology of fish.

    I've subclassed dict to hdict ("hashable dict") and rewritten the
    __hash__ function so I can include a dict into a set. Thanks to a
    previous poster here for providing that suggestion.

    A circuit component looks like this for example:

    comp1 = hdict(value=1e-6, footprint='0402', vendor='digikey')
    comp2 = hdict(value=10e3, footprint='0603', vendor='mouser')
    etc, etc.

    The database holds the values like this:
    db = dict() # normal dict
    db['value'] = ([1e-6, 10e3], [comp1, comp2]) #ordered set for fast
    lookup/insertion
    db['footprint'] = {'0402':set[comp1], '0603':comp2} # unordered uses
    normal dict for fast lookup
    db['vendor'] = {'digikey':[comp1], 'mouser':[comp2]}

    So basically the keys are the component parameters, and the values is
    the list of components with that value. Stuff that is comparable is
    ordered; stuff that is discrete is not ordered, using either 2-tuples
    or dicts, respectively.

    This allows extremely fast lookup of components based on their
    properties, with O(1) performance for non-ordered stuff, and O(log n)
    performance for ordered stuff (using bisect methods). The set
    operations are extremely fast, so I can do AND and OR operations on
    compound queries of this nature without worry.

    I need this speed not so much for selecting components when the user
    types in a query, but for when the mouse is hovering over the
    schematic and I need things to light up underneath, if the GUI is
    generating hundreds of mouse messages per second, and for inspector
    windows to appear quickly when you click, etc. If you have ever used
    Altium, you can see how effective this is in terms of creating a good
    interactive user experience.

    My question is what happens when I choose to search for NOT
    footprint='0402'.

    Should this return a blank list? This happens by default., and is in
    fact true: a blank list satisfies "not anything" actually.
    Should this return everything that is NOT footprint 0402 (ie returns
    0603 components)? This *strongly implies* a pre-selection of *all*
    objects before performing the negating function, or of looking at the
    ordering of other queries in the overall search term, and then
    applying NOT to the results of another query.

    But I'm suspicious of a brute force preselection of all objects
    whenever I see a NOT, and anyway it messes up the clean query/combiner
    method of search I'm doing, and it requires an implied sequence of
    search, where I'm pretty sure it should not rely on sequencing. Even
    though this is single threaded, etc., the semantics of the query
    should not rely on ordering of the search term:

    footprint='0402' and NOT vendor='mouser' should return the same as
    NOT vendor='mouser' and footprint='0402'.

    So this is my philosophical quandary. I'm not sure what the correct
    thing is. In SICP they are using nondeterministic stuff which I don't
    quite get, so it's hard to follow. Also they are not using
    dictionaries and hashes, so I'm not sure if their generate-and-test
    method would work here anyway. Generate-and-test seems extremely
    inefficient.

    Can a wise guru please enlighten me?

    thanks
    Michael
    , Dec 30, 2008
    #1
    1. Advertising

  2. Re: Understanding search queries, semantics, and "Meaning" ...aren'twe all looking for meaning?

    On Dec 30, 12:35 pm, wrote:
    > I have Section 4.4.1 of SICP rattling around in my head (database
    > queries), and I'm trying to come up with a simple dictionary-based
    > database in Python to represent circuit diagrams.  My main confusion
    > isn't one of implementation, but a matter of "big thinking",
    > fundamentally, about the problem. Please don't suggest using a SQL
    > library, as I'm looking to learn to fish, so to speak, and to learn a
    > bit about the biology of fish.
    >


    I'm going to break rule #1 of your requirements but in an unexpected
    way. Rather than studying PostgreSQL, MySQL, or Oracle, why don't you
    crack open the topic of relational database theory and relational
    algebra? That should help with the "big thinking" bit in the same way
    understanding 1+1 helps you understand how to add any two numbers
    together. No, SQL is not the be-all-end-all of relational theory. Yes,
    it does a pretty dang good job at it, though, which is why it is still
    around.


    > I've subclassed dict to hdict ("hashable dict") and rewritten the
    > __hash__ function so I can include a dict into a set. Thanks to a
    > previous poster here for providing that suggestion.
    >
    > A circuit component looks like this for example:
    >
    > comp1 = hdict(value=1e-6, footprint='0402', vendor='digikey')
    > comp2 = hdict(value=10e3, footprint='0603', vendor='mouser')
    > etc, etc.
    >
    > The database holds the values like this:
    > db = dict() # normal dict
    > db['value'] = ([1e-6, 10e3], [comp1, comp2]) #ordered set for fast
    > lookup/insertion
    > db['footprint'] = {'0402':set[comp1], '0603':comp2} # unordered uses
    > normal dict for fast lookup
    > db['vendor'] = {'digikey':[comp1], 'mouser':[comp2]}
    >
    > So basically the keys are the component parameters, and the values is
    > the list of components with that value.  Stuff that is comparable is
    > ordered; stuff that is discrete is not ordered, using either 2-tuples
    > or dicts, respectively.
    >
    > This allows extremely fast lookup of components based on their
    > properties, with O(1) performance for non-ordered stuff, and O(log n)
    > performance for ordered stuff (using bisect methods).  The set
    > operations are extremely fast, so I can do AND and OR operations on
    > compound queries of this nature without worry.
    >
    > I need this speed not so much for selecting components when the user
    > types in a query, but for when the mouse is hovering over the
    > schematic and I need things to light up underneath, if the GUI is
    > generating hundreds of mouse messages per second, and for inspector
    > windows to appear quickly when you click, etc.  If you have ever used
    > Altium, you can see how effective this is in terms of creating a good
    > interactive user experience.
    >


    OK, turn it around in your head. Consider the indexes you built above
    as yet another table. Consider what a table or a database really is.
    When you see how they are all really the same thing, either you've
    mastered relational algebra of you've seen the big picture. Kind of
    like the cons cell being enough to describe any data structure in the
    universe, a table is enough to describe everything in the relational
    world.

    > My question is what happens when I choose to search for NOT
    > footprint='0402'.
    >
    > Should this return a blank list? This happens by default., and is in
    > fact true: a blank list satisfies "not anything" actually.
    > Should this return everything that is NOT footprint 0402 (ie returns
    > 0603 components)?  This *strongly implies* a pre-selection of *all*
    > objects before performing the negating function, or of looking at the
    > ordering of other queries in the overall search term, and then
    > applying NOT to the results of another query.
    >
    > But I'm suspicious of a brute force preselection of all objects
    > whenever I see a NOT, and anyway it messes up the clean query/combiner
    > method of search I'm doing, and it requires an implied sequence of
    > search, where I'm pretty sure it should not rely on sequencing.  Even
    > though this is single threaded, etc., the semantics of the query
    > should not rely on ordering of the search term:
    >
    > footprint='0402' and NOT vendor='mouser' should return the same as
    > NOT vendor='mouser' and footprint='0402'.
    >
    > So this is my philosophical quandary.  I'm not sure what the correct
    > thing is.  In SICP they are using nondeterministic stuff which I don't
    > quite get, so it's hard to follow.  Also they are not using
    > dictionaries and hashes, so I'm not sure if their generate-and-test
    > method would work here anyway.  Generate-and-test seems extremely
    > inefficient.
    >
    > Can a wise guru please enlighten me?
    >


    (I don't think I qualify as a guru, but I think I see how I can help.)

    In a typical SQL database, when you type in "SELECT foo FROM bar WHERE
    baz='bo'", you are not writing a program, at least not in the sense of
    Python or C or Java or Perl where you give instructions on HOW to run
    the program. You are writing a program in the sense of Lisp or Scheme
    or Haskell in that you are giving instructions on WHAT the program is.

    The database doesn't, at least shouldn't, read in all the rows in a
    table and then start going through the process of elimination removing
    rows it doesn't want to think about. That's highly inefficient, of
    course. Instead, it transforms the query (the WHAT) into a set of
    procedures that describe HOW to get the result.

    That step---taking the definition of the program and turning it into
    actual instructions on HOW to run the program---is the compilation and
    optimization phase. It's a very hard step to do, but no harder than
    the process Python goes through taking files of bytes and converting
    them into Python bytecode.

    Oh, by the way, this step is nondeterministic. Why? Well, no one can
    really say what the BEST way to run any sufficiently complicated
    program is. We can point out good ways and bad ways, but not the best
    way. It's like the travelling salesman problem in a way.

    Now, that "hard stuff" you are glossing over in SICP is the IMPORTANT
    stuff. It's the WHOLE POINT of that section. Stop glossing over it.
    Keep working at it bit by bit until you understand it. You, as a
    programmer, will be infinitely better for it. When you understand it,
    you will really unlock the full potential of Python and Scheme and
    whatever other language is out there because you will see how to go
    from HOW languages to WHAT languages.

    Programmers who can write compilers are GOOD programmers. Programmers
    who can understand someone else's compilers are even better.

    On Python-specific stuff: For a moment, forget about Python and hashes
    and dicts and arrays and lists and such. Focus only on the cons cell
    and Scheme. When you have mastered those concepts, you will see how to
    efficiently use the Python data structures and how to simplify the
    Scheme code into Python. Right now you are doing a bit of premature
    optimization.

    That's my semi-guru advice.
    Jonathan Gardner, Dec 30, 2008
    #2
    1. Advertising

  3. Guest

    Re: Understanding search queries, semantics, and "Meaning" ...aren'twe all looking for meaning?

    > > library, as I'm looking to learn to fish, so to speak, and to learn a
    > > bit about the biology of fish.

    >
    > I'm going to break rule #1 of your requirements but in an unexpected
    > way. Rather than studying PostgreSQL, MySQL, or Oracle, why don't you
    > crack open the topic of relational database theory and relational
    > algebra? That should help with the "big thinking" bit in the same way
    > understanding 1+1 helps you understand how to add any two numbers


    Ah, exactly, thanks for understanding my original post :) I guess
    'relational database theory' is what I should start looking around
    for... otherwise googling on sql, queries, etc., just returns how to
    execute a query, but says nothing about how a fish works.


    >
    > In a typical SQL database, when you type in "SELECT foo FROM bar WHERE
    > baz='bo'", you are not writing a program, at least not in the sense of
    > Python or C or Java or Perl where you give instructions on HOW to run
    > the program. You are writing a program in the sense of Lisp or Scheme
    > or Haskell in that you are giving instructions on WHAT the program is.


    I've gotten a strong inkling that parsing a query yields new code,
    (lambdas) that are created on the fly to do the search.


    > table and then start going through the process of elimination removing
    > rows it doesn't want to think about. That's highly inefficient, of


    Exactly what I'm trying to avoid.

    > course. Instead, it transforms the query (the WHAT) into a set of
    > procedures that describe HOW to get the result.


    For now I'm not parsing actual text queries... my real "search query"
    is coded directly in python like this:
    p1 = lambda: db.search_leaf('x location', 'lte', 5)
    p2 = lambda: db.search_leaf('footprint', 'eq', '0603')
    p3 = lambda: db.search(db.AND, p1, p2)

    p4 = lambda: db.search_leaf('x location', 'gte', 19)
    p5 = lambda: db.search_leaf('footprint', 'eq', '0402')
    p6 = lambda: db.search(db.AND, p1, p2)

    fc = db.search(db.OR, p3, p4)

    this particular example doesn't necessarily make any sense, but in
    effect I'm trying to string together lambda functions which are
    created explicitly for the individual query, then strung together with
    combiner functions.



    > Oh, by the way, this step is nondeterministic. Why? Well, no one can
    > really say what the BEST way to run any sufficiently complicated
    > program is. We can point out good ways and bad ways, but not the best
    > way. It's like the travelling salesman problem in a way.


    The nondeterministic stuff... wow, I've come across (call/cc...),
    (require...), and different variants of this, both in sicp, teach
    yourself scheme, the plt docs, other places, etc., but it still eludes
    me. I'm afraid that unless I understand it, I'll wind up creating
    some type of cargo-cult mimcry of a database without doing it right
    (http://en.wikipedia.org/wiki/Cargo_cult)


    > programmer, will be infinitely better for it. When you understand it,
    > you will really unlock the full potential of Python and Scheme and
    > whatever other language is out there because you will see how to go
    > from HOW languages to WHAT languages.


    I'm under the impression python doesn't have the internal guts for
    nondeterministic programming, specifcially that its lambdas are
    limited to single-line expressions, but I may be wrong here.

    Still, at the begining of the nondeterministic section of SICP
    (section 4.3), it says "nondeterministic computing... is useful for
    'generate and test' applications", which almost categorically sounds
    like an element-by-element search for what you're looking for. This
    was my initial intention behind (prematurely?) optimizing the database
    by using parameter keys instead of something like [x for x in stuff if
    pred(x, val)] filtering.


    > Programmers who can write compilers are GOOD programmers. Programmers
    > who can understand someone else's compilers are even better.


    Does incorporating a search capability in an application necessarily
    mean I'm writing a search compiler? That seems overkill for the
    specific case, but may be true generally. For instance, if a word
    processing app allows you to search for characters with a certain
    font, is that incorporating a search compiler and query language? Or
    is it just brute-force filtering?


    > That's my semi-guru advice.


    Thanks :)
    Michael
    , Dec 30, 2008
    #3
  4. Re: Understanding search queries, semantics, and "Meaning" ...aren'twe all looking for meaning?

    On Dec 30 2008, 3:25 pm, wrote:
    > > In a typical SQL database, when you type in "SELECT foo FROM bar WHERE
    > > baz='bo'", you are not writing a program, at least not in the sense of
    > > Python or C or Java or Perl where you give instructions on HOW to run
    > > the program. You are writing a program in the sense of Lisp or Scheme
    > > or Haskell in that you are giving instructions on WHAT the program is.

    >
    > I've gotten a strong inkling that parsing a query yields new code,
    > (lambdas) that are created on the fly to do the search.
    >


    More on this at the very end. Just smile to know that you're very
    close.

    > > course. Instead, it transforms the query (the WHAT) into a set of
    > > procedures that describe HOW to get the result.

    >
    > For now I'm not parsing actual text queries... my real "search query"
    > is coded directly in python like this:
    > p1 = lambda: db.search_leaf('x location', 'lte', 5)
    > p2 = lambda: db.search_leaf('footprint', 'eq', '0603')
    > p3 = lambda: db.search(db.AND, p1, p2)
    >
    > p4 = lambda: db.search_leaf('x location', 'gte', 19)
    > p5 = lambda: db.search_leaf('footprint', 'eq', '0402')
    > p6 = lambda: db.search(db.AND, p1, p2)
    >
    > fc = db.search(db.OR, p3, p4)
    >
    > this particular example doesn't necessarily make any sense, but in
    > effect I'm trying to string together lambda functions which are
    > created explicitly for the individual query, then strung together with
    > combiner functions.
    >


    If only compilers were so simple... Again, you're writing the "HOW"
    when you're doing what you're doing above, when you really want to
    write the "WHAT".

    Oh, and BTW, lambdas are just an expression to generate a function
    quickly. You're confusing the python lambda expression with lambdas
    the theoretical concepts. Lambdas the theoretical concept are really
    Python's callable objects. Python just expresses them in a very weird
    way that is seemingly unrelated to lambda the theoretical concept (but
    really is).

    > > Oh, by the way, this step is nondeterministic. Why? Well, no one can
    > > really say what the BEST way to run any sufficiently complicated
    > > program is. We can point out good ways and bad ways, but not the best
    > > way. It's like the travelling salesman problem in a way.

    >
    > The nondeterministic stuff... wow, I've come across (call/cc...),
    > (require...), and different variants of this, both in sicp, teach
    > yourself scheme, the plt docs, other places, etc., but it still eludes
    > me.  I'm afraid that unless I understand it, I'll wind up creating
    > some type of cargo-cult mimcry of a database without doing it right
    > (http://en.wikipedia.org/wiki/Cargo_cult)
    >


    Sorry, I got confused on the meaning of non-deterministic programming.
    I forgot that this was a specific thing in the Scheme world.

    Yes, you've got to understand this. It makes writing a compiler much,
    much easier, almost trivial.

    > > programmer, will be infinitely better for it. When you understand it,
    > > you will really unlock the full potential of Python and Scheme and
    > > whatever other language is out there because you will see how to go
    > > from HOW languages to WHAT languages.

    >
    > I'm under the impression python doesn't have the internal guts for
    > nondeterministic programming, specifcially that its lambdas are
    > limited to single-line expressions, but I may be wrong here.
    >


    (Recall what I mentioned about lambdas above. It applies here as
    well.)

    > Still, at the begining of the nondeterministic section of SICP
    > (section 4.3), it says "nondeterministic computing... is useful for
    > 'generate and test' applications", which almost categorically sounds
    > like an element-by-element search for what you're looking for.  This
    > was my initial intention behind (prematurely?) optimizing the database
    > by using parameter keys instead of something like [x for x in stuff if
    > pred(x, val)] filtering.
    >


    Yes, you certainly can't use list comprehensions to solve your
    problems. They're too primitive and part of the interface is that they
    actually iterate through the sequence.

    > > Programmers who can write compilers are GOOD programmers. Programmers
    > > who can understand someone else's compilers are even better.

    >
    > Does incorporating a search capability in an application necessarily
    > mean I'm writing a search compiler?  That seems overkill for the
    > specific case, but may be true generally.  For instance, if a word
    > processing app allows you to search for characters with a certain
    > font, is that incorporating a search compiler and query language?  Or
    > is it just brute-force filtering?
    >


    Ok, here's some big picture hand-waving.

    I'm going to make an assertion, and it turns out to be fundamental. I
    won't explain it here but I hope you'll learn what it means as time
    goes on. The assertion is this:

    All programs simply transform one program's code into another
    program's code.

    By program, I mean, a set of instructions to a computer of some sort
    that are meant to be actually evaluated somehow. By program's code, I
    mean a description in any form of such a program.

    For instance, all human language is really some kind of program code.
    So is all Scheme, Lisp, Python, whatever code. So is HTML. So are
    bytes and words in memory representing a compiled program. So are IP
    packets on the internet.

    When you type in search terms to the search feature of Word, you are
    writing program code. Sure, the program code is going to be
    interpreted in some way, but it represents a program that says, "Find,
    by whatever method available, the next instance of text matching the
    text I type in here". That's a fairly easy thing to do, and you don't
    need to know about relational algebra and non-deterministic code to
    write a correct and fast solution.

    If you're going to implement a general function that takes arbitrary
    search criteria against arbitrary structured data and uses several
    methods together to find the answers, that's going to require a fairly
    advanced compiler and optimizer. You can use Scheme's non-
    deterministic methods to build that program, which is really neat and
    a big time-saver in terms of development time. That level of program
    is pretty advanced and only those who really understand how to write a
    compiler can put it together.

    For the mouse-hovering bit, since you're always running the same
    query, you can "pre-compile" it and just write a specific index and
    some specific code to look it up. This isn't too difficult.
    Jonathan Gardner, Jan 2, 2009
    #4
  5. Aahz Guest

    In article <>,
    <> wrote:
    >
    >I have Section 4.4.1 of SICP rattling around in my head (database
    >queries), and I'm trying to come up with a simple dictionary-based
    >database in Python to represent circuit diagrams. My main confusion
    >isn't one of implementation, but a matter of "big thinking",
    >fundamentally, about the problem. Please don't suggest using a SQL
    >library, as I'm looking to learn to fish, so to speak, and to learn a
    >bit about the biology of fish.


    Having read the rest of your conversation with Jonathan, I'm going to
    suggest a related book that I think may help you approach this from a
    different angle: _Mastering Regular Expressions_ by Jeffrey Friedl. (I
    haven't gone through SICP myself, so I can't be sure that it's relevant,
    but it covers the topic of transforming search criteria into executable
    statements -- and it will almost certainly be useful to you even if it's
    not relevant to your current problem.)
    --
    Aahz () <*> http://www.pythoncraft.com/

    "Debugging is twice as hard as writing the code in the first place.
    Therefore, if you write the code as cleverly as possible, you are, by
    definition, not smart enough to debug it." --Brian W. Kernighan
    Aahz, Jan 14, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Richard K Bethell

    IXXSO queries and asp.net search pages

    Richard K Bethell, Nov 21, 2003, in forum: ASP .Net
    Replies:
    2
    Views:
    391
    Richard K Bethell
    Nov 24, 2003
  2. Ian Roddis

    xslt queries in xml to SQL queries

    Ian Roddis, Feb 26, 2006, in forum: Python
    Replies:
    3
    Views:
    1,468
    Crutcher
    Feb 26, 2006
  3. Abby Lee
    Replies:
    5
    Views:
    375
    Abby Lee
    Aug 2, 2004
  4. Abby Lee

    so many queries within queries I'm confused

    Abby Lee, Aug 4, 2004, in forum: ASP General
    Replies:
    11
    Views:
    336
    Aaron [SQL Server MVP]
    Aug 6, 2004
  5. Adam
    Replies:
    63
    Views:
    404
    alex23
    Jun 28, 2013
Loading...

Share This Page