Understanding search queries, semantics, and "Meaning" ...aren't weall looking for meaning?

5

5lvqbwl02

I have Section 4.4.1 of SICP rattling around in my head (database
queries), and I'm trying to come up with a simple dictionary-based
database in Python to represent circuit diagrams. My main confusion
isn't one of implementation, but a matter of "big thinking",
fundamentally, about the problem. Please don't suggest using a SQL
library, as I'm looking to learn to fish, so to speak, and to learn a
bit about the biology of fish.

I've subclassed dict to hdict ("hashable dict") and rewritten the
__hash__ function so I can include a dict into a set. Thanks to a
previous poster here for providing that suggestion.

A circuit component looks like this for example:

comp1 = hdict(value=1e-6, footprint='0402', vendor='digikey')
comp2 = hdict(value=10e3, footprint='0603', vendor='mouser')
etc, etc.

The database holds the values like this:
db = dict() # normal dict
db['value'] = ([1e-6, 10e3], [comp1, comp2]) #ordered set for fast
lookup/insertion
db['footprint'] = {'0402':set[comp1], '0603':comp2} # unordered uses
normal dict for fast lookup
db['vendor'] = {'digikey':[comp1], 'mouser':[comp2]}

So basically the keys are the component parameters, and the values is
the list of components with that value. Stuff that is comparable is
ordered; stuff that is discrete is not ordered, using either 2-tuples
or dicts, respectively.

This allows extremely fast lookup of components based on their
properties, with O(1) performance for non-ordered stuff, and O(log n)
performance for ordered stuff (using bisect methods). The set
operations are extremely fast, so I can do AND and OR operations on
compound queries of this nature without worry.

I need this speed not so much for selecting components when the user
types in a query, but for when the mouse is hovering over the
schematic and I need things to light up underneath, if the GUI is
generating hundreds of mouse messages per second, and for inspector
windows to appear quickly when you click, etc. If you have ever used
Altium, you can see how effective this is in terms of creating a good
interactive user experience.

My question is what happens when I choose to search for NOT
footprint='0402'.

Should this return a blank list? This happens by default., and is in
fact true: a blank list satisfies "not anything" actually.
Should this return everything that is NOT footprint 0402 (ie returns
0603 components)? This *strongly implies* a pre-selection of *all*
objects before performing the negating function, or of looking at the
ordering of other queries in the overall search term, and then
applying NOT to the results of another query.

But I'm suspicious of a brute force preselection of all objects
whenever I see a NOT, and anyway it messes up the clean query/combiner
method of search I'm doing, and it requires an implied sequence of
search, where I'm pretty sure it should not rely on sequencing. Even
though this is single threaded, etc., the semantics of the query
should not rely on ordering of the search term:

footprint='0402' and NOT vendor='mouser' should return the same as
NOT vendor='mouser' and footprint='0402'.

So this is my philosophical quandary. I'm not sure what the correct
thing is. In SICP they are using nondeterministic stuff which I don't
quite get, so it's hard to follow. Also they are not using
dictionaries and hashes, so I'm not sure if their generate-and-test
method would work here anyway. Generate-and-test seems extremely
inefficient.

Can a wise guru please enlighten me?

thanks
Michael
 
J

Jonathan Gardner

I have Section 4.4.1 of SICP rattling around in my head (database
queries), and I'm trying to come up with a simple dictionary-based
database in Python to represent circuit diagrams.  My main confusion
isn't one of implementation, but a matter of "big thinking",
fundamentally, about the problem. Please don't suggest using a SQL
library, as I'm looking to learn to fish, so to speak, and to learn a
bit about the biology of fish.

I'm going to break rule #1 of your requirements but in an unexpected
way. Rather than studying PostgreSQL, MySQL, or Oracle, why don't you
crack open the topic of relational database theory and relational
algebra? That should help with the "big thinking" bit in the same way
understanding 1+1 helps you understand how to add any two numbers
together. No, SQL is not the be-all-end-all of relational theory. Yes,
it does a pretty dang good job at it, though, which is why it is still
around.

I've subclassed dict to hdict ("hashable dict") and rewritten the
__hash__ function so I can include a dict into a set. Thanks to a
previous poster here for providing that suggestion.

A circuit component looks like this for example:

comp1 = hdict(value=1e-6, footprint='0402', vendor='digikey')
comp2 = hdict(value=10e3, footprint='0603', vendor='mouser')
etc, etc.

The database holds the values like this:
db = dict() # normal dict
db['value'] = ([1e-6, 10e3], [comp1, comp2]) #ordered set for fast
lookup/insertion
db['footprint'] = {'0402':set[comp1], '0603':comp2} # unordered uses
normal dict for fast lookup
db['vendor'] = {'digikey':[comp1], 'mouser':[comp2]}

So basically the keys are the component parameters, and the values is
the list of components with that value.  Stuff that is comparable is
ordered; stuff that is discrete is not ordered, using either 2-tuples
or dicts, respectively.

This allows extremely fast lookup of components based on their
properties, with O(1) performance for non-ordered stuff, and O(log n)
performance for ordered stuff (using bisect methods).  The set
operations are extremely fast, so I can do AND and OR operations on
compound queries of this nature without worry.

I need this speed not so much for selecting components when the user
types in a query, but for when the mouse is hovering over the
schematic and I need things to light up underneath, if the GUI is
generating hundreds of mouse messages per second, and for inspector
windows to appear quickly when you click, etc.  If you have ever used
Altium, you can see how effective this is in terms of creating a good
interactive user experience.

OK, turn it around in your head. Consider the indexes you built above
as yet another table. Consider what a table or a database really is.
When you see how they are all really the same thing, either you've
mastered relational algebra of you've seen the big picture. Kind of
like the cons cell being enough to describe any data structure in the
universe, a table is enough to describe everything in the relational
world.
My question is what happens when I choose to search for NOT
footprint='0402'.

Should this return a blank list? This happens by default., and is in
fact true: a blank list satisfies "not anything" actually.
Should this return everything that is NOT footprint 0402 (ie returns
0603 components)?  This *strongly implies* a pre-selection of *all*
objects before performing the negating function, or of looking at the
ordering of other queries in the overall search term, and then
applying NOT to the results of another query.

But I'm suspicious of a brute force preselection of all objects
whenever I see a NOT, and anyway it messes up the clean query/combiner
method of search I'm doing, and it requires an implied sequence of
search, where I'm pretty sure it should not rely on sequencing.  Even
though this is single threaded, etc., the semantics of the query
should not rely on ordering of the search term:

footprint='0402' and NOT vendor='mouser' should return the same as
NOT vendor='mouser' and footprint='0402'.

So this is my philosophical quandary.  I'm not sure what the correct
thing is.  In SICP they are using nondeterministic stuff which I don't
quite get, so it's hard to follow.  Also they are not using
dictionaries and hashes, so I'm not sure if their generate-and-test
method would work here anyway.  Generate-and-test seems extremely
inefficient.

Can a wise guru please enlighten me?

(I don't think I qualify as a guru, but I think I see how I can help.)

In a typical SQL database, when you type in "SELECT foo FROM bar WHERE
baz='bo'", you are not writing a program, at least not in the sense of
Python or C or Java or Perl where you give instructions on HOW to run
the program. You are writing a program in the sense of Lisp or Scheme
or Haskell in that you are giving instructions on WHAT the program is.

The database doesn't, at least shouldn't, read in all the rows in a
table and then start going through the process of elimination removing
rows it doesn't want to think about. That's highly inefficient, of
course. Instead, it transforms the query (the WHAT) into a set of
procedures that describe HOW to get the result.

That step---taking the definition of the program and turning it into
actual instructions on HOW to run the program---is the compilation and
optimization phase. It's a very hard step to do, but no harder than
the process Python goes through taking files of bytes and converting
them into Python bytecode.

Oh, by the way, this step is nondeterministic. Why? Well, no one can
really say what the BEST way to run any sufficiently complicated
program is. We can point out good ways and bad ways, but not the best
way. It's like the travelling salesman problem in a way.

Now, that "hard stuff" you are glossing over in SICP is the IMPORTANT
stuff. It's the WHOLE POINT of that section. Stop glossing over it.
Keep working at it bit by bit until you understand it. You, as a
programmer, will be infinitely better for it. When you understand it,
you will really unlock the full potential of Python and Scheme and
whatever other language is out there because you will see how to go
from HOW languages to WHAT languages.

Programmers who can write compilers are GOOD programmers. Programmers
who can understand someone else's compilers are even better.

On Python-specific stuff: For a moment, forget about Python and hashes
and dicts and arrays and lists and such. Focus only on the cons cell
and Scheme. When you have mastered those concepts, you will see how to
efficiently use the Python data structures and how to simplify the
Scheme code into Python. Right now you are doing a bit of premature
optimization.

That's my semi-guru advice.
 
5

5lvqbwl02

library, as I'm looking to learn to fish, so to speak, and to learn a
I'm going to break rule #1 of your requirements but in an unexpected
way. Rather than studying PostgreSQL, MySQL, or Oracle, why don't you
crack open the topic of relational database theory and relational
algebra? That should help with the "big thinking" bit in the same way
understanding 1+1 helps you understand how to add any two numbers

Ah, exactly, thanks for understanding my original post :) I guess
'relational database theory' is what I should start looking around
for... otherwise googling on sql, queries, etc., just returns how to
execute a query, but says nothing about how a fish works.

In a typical SQL database, when you type in "SELECT foo FROM bar WHERE
baz='bo'", you are not writing a program, at least not in the sense of
Python or C or Java or Perl where you give instructions on HOW to run
the program. You are writing a program in the sense of Lisp or Scheme
or Haskell in that you are giving instructions on WHAT the program is.

I've gotten a strong inkling that parsing a query yields new code,
(lambdas) that are created on the fly to do the search.

table and then start going through the process of elimination removing
rows it doesn't want to think about. That's highly inefficient, of

Exactly what I'm trying to avoid.
course. Instead, it transforms the query (the WHAT) into a set of
procedures that describe HOW to get the result.

For now I'm not parsing actual text queries... my real "search query"
is coded directly in python like this:
p1 = lambda: db.search_leaf('x location', 'lte', 5)
p2 = lambda: db.search_leaf('footprint', 'eq', '0603')
p3 = lambda: db.search(db.AND, p1, p2)

p4 = lambda: db.search_leaf('x location', 'gte', 19)
p5 = lambda: db.search_leaf('footprint', 'eq', '0402')
p6 = lambda: db.search(db.AND, p1, p2)

fc = db.search(db.OR, p3, p4)

this particular example doesn't necessarily make any sense, but in
effect I'm trying to string together lambda functions which are
created explicitly for the individual query, then strung together with
combiner functions.


Oh, by the way, this step is nondeterministic. Why? Well, no one can
really say what the BEST way to run any sufficiently complicated
program is. We can point out good ways and bad ways, but not the best
way. It's like the travelling salesman problem in a way.

The nondeterministic stuff... wow, I've come across (call/cc...),
(require...), and different variants of this, both in sicp, teach
yourself scheme, the plt docs, other places, etc., but it still eludes
me. I'm afraid that unless I understand it, I'll wind up creating
some type of cargo-cult mimcry of a database without doing it right
(http://en.wikipedia.org/wiki/Cargo_cult)

programmer, will be infinitely better for it. When you understand it,
you will really unlock the full potential of Python and Scheme and
whatever other language is out there because you will see how to go
from HOW languages to WHAT languages.

I'm under the impression python doesn't have the internal guts for
nondeterministic programming, specifcially that its lambdas are
limited to single-line expressions, but I may be wrong here.

Still, at the begining of the nondeterministic section of SICP
(section 4.3), it says "nondeterministic computing... is useful for
'generate and test' applications", which almost categorically sounds
like an element-by-element search for what you're looking for. This
was my initial intention behind (prematurely?) optimizing the database
by using parameter keys instead of something like [x for x in stuff if
pred(x, val)] filtering.

Programmers who can write compilers are GOOD programmers. Programmers
who can understand someone else's compilers are even better.

Does incorporating a search capability in an application necessarily
mean I'm writing a search compiler? That seems overkill for the
specific case, but may be true generally. For instance, if a word
processing app allows you to search for characters with a certain
font, is that incorporating a search compiler and query language? Or
is it just brute-force filtering?

That's my semi-guru advice.

Thanks :)
Michael
 
J

Jonathan Gardner

I've gotten a strong inkling that parsing a query yields new code,
(lambdas) that are created on the fly to do the search.

More on this at the very end. Just smile to know that you're very
close.
For now I'm not parsing actual text queries... my real "search query"
is coded directly in python like this:
p1 = lambda: db.search_leaf('x location', 'lte', 5)
p2 = lambda: db.search_leaf('footprint', 'eq', '0603')
p3 = lambda: db.search(db.AND, p1, p2)

p4 = lambda: db.search_leaf('x location', 'gte', 19)
p5 = lambda: db.search_leaf('footprint', 'eq', '0402')
p6 = lambda: db.search(db.AND, p1, p2)

fc = db.search(db.OR, p3, p4)

this particular example doesn't necessarily make any sense, but in
effect I'm trying to string together lambda functions which are
created explicitly for the individual query, then strung together with
combiner functions.

If only compilers were so simple... Again, you're writing the "HOW"
when you're doing what you're doing above, when you really want to
write the "WHAT".

Oh, and BTW, lambdas are just an expression to generate a function
quickly. You're confusing the python lambda expression with lambdas
the theoretical concepts. Lambdas the theoretical concept are really
Python's callable objects. Python just expresses them in a very weird
way that is seemingly unrelated to lambda the theoretical concept (but
really is).
The nondeterministic stuff... wow, I've come across (call/cc...),
(require...), and different variants of this, both in sicp, teach
yourself scheme, the plt docs, other places, etc., but it still eludes
me.  I'm afraid that unless I understand it, I'll wind up creating
some type of cargo-cult mimcry of a database without doing it right
(http://en.wikipedia.org/wiki/Cargo_cult)

Sorry, I got confused on the meaning of non-deterministic programming.
I forgot that this was a specific thing in the Scheme world.

Yes, you've got to understand this. It makes writing a compiler much,
much easier, almost trivial.
I'm under the impression python doesn't have the internal guts for
nondeterministic programming, specifcially that its lambdas are
limited to single-line expressions, but I may be wrong here.

(Recall what I mentioned about lambdas above. It applies here as
well.)
Still, at the begining of the nondeterministic section of SICP
(section 4.3), it says "nondeterministic computing... is useful for
'generate and test' applications", which almost categorically sounds
like an element-by-element search for what you're looking for.  This
was my initial intention behind (prematurely?) optimizing the database
by using parameter keys instead of something like [x for x in stuff if
pred(x, val)] filtering.

Yes, you certainly can't use list comprehensions to solve your
problems. They're too primitive and part of the interface is that they
actually iterate through the sequence.
Does incorporating a search capability in an application necessarily
mean I'm writing a search compiler?  That seems overkill for the
specific case, but may be true generally.  For instance, if a word
processing app allows you to search for characters with a certain
font, is that incorporating a search compiler and query language?  Or
is it just brute-force filtering?

Ok, here's some big picture hand-waving.

I'm going to make an assertion, and it turns out to be fundamental. I
won't explain it here but I hope you'll learn what it means as time
goes on. The assertion is this:

All programs simply transform one program's code into another
program's code.

By program, I mean, a set of instructions to a computer of some sort
that are meant to be actually evaluated somehow. By program's code, I
mean a description in any form of such a program.

For instance, all human language is really some kind of program code.
So is all Scheme, Lisp, Python, whatever code. So is HTML. So are
bytes and words in memory representing a compiled program. So are IP
packets on the internet.

When you type in search terms to the search feature of Word, you are
writing program code. Sure, the program code is going to be
interpreted in some way, but it represents a program that says, "Find,
by whatever method available, the next instance of text matching the
text I type in here". That's a fairly easy thing to do, and you don't
need to know about relational algebra and non-deterministic code to
write a correct and fast solution.

If you're going to implement a general function that takes arbitrary
search criteria against arbitrary structured data and uses several
methods together to find the answers, that's going to require a fairly
advanced compiler and optimizer. You can use Scheme's non-
deterministic methods to build that program, which is really neat and
a big time-saver in terms of development time. That level of program
is pretty advanced and only those who really understand how to write a
compiler can put it together.

For the mouse-hovering bit, since you're always running the same
query, you can "pre-compile" it and just write a specific index and
some specific code to look it up. This isn't too difficult.
 
A

Aahz

I have Section 4.4.1 of SICP rattling around in my head (database
queries), and I'm trying to come up with a simple dictionary-based
database in Python to represent circuit diagrams. My main confusion
isn't one of implementation, but a matter of "big thinking",
fundamentally, about the problem. Please don't suggest using a SQL
library, as I'm looking to learn to fish, so to speak, and to learn a
bit about the biology of fish.

Having read the rest of your conversation with Jonathan, I'm going to
suggest a related book that I think may help you approach this from a
different angle: _Mastering Regular Expressions_ by Jeffrey Friedl. (I
haven't gone through SICP myself, so I can't be sure that it's relevant,
but it covers the topic of transforming search criteria into executable
statements -- and it will almost certainly be useful to you even if it's
not relevant to your current problem.)
--
Aahz ([email protected]) <*> http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it." --Brian W. Kernighan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top