Roedy Green said:
I wonder if there is anything like this yet -- a "sense of it" search
engine.
I often have a heck of a time finding something on my own website or
in the bible when I know only the meaning of what I am looking for,
not the original specific vocabulary.
For example, someone on TV claims the bible says that you could take
care of the earth because there is no one coming after you to clean it
up. What is the actual reference if any?
Have you any ideas on how you might implement such a search engine,
even for a small data set?
Well, if this isn't the Holy Grail of search, it sure comes close.
It's odd that you should use that example. I'm reasonably familiar with the
Bible, but it's not like I read it every day. In any case, some fellow I was
debating with (let's say he had a very literal interpretation of the Bible,
and let's also say that we were arguing about something in the Old
Testament) made some outrageous claim, and I thought he should at least
quote the Book correctly. I had a ^H^H^H^H^H hard time using keywords to
locate the passages. This was also a case where I could express the search
more effectively as an idea rather than as some keywords.
The case you mention is actually reasonably amenable to keyword search. I
tried "take care earth Bible", and the first Google link was
http://www.earthcareonline.org/bibleverses.html, which is very relevant
indeed. For example:
"The land shall not be sold in perpetuity, for the land is mine; with me you
are but aliens and tenants. Throughout the land that you hold, you shall
provide for the redemption of the land." (Leviticus 25:23-24)
The reason this works (and works frequently) is because someone has already
done all the research, and created a document that links reasonable keywords
with the actual text passage(s). Note that the above example highlights your
point - what if nobody had done this specific Bible analysis, and no
document like this existed on the Web, and hence there are merely text
versions of the Bible available? Although the Leviticus passage is clearly
relevant, the keywords that locate it are not immediately obvious - not
unless you already know the passage you are looking for.
Latent semantic analysis, which Lew mentions, is a useful technique in
general. It's an interesting question as to whether LSA would be
particularly effective for the above passage, though, especially in the
absence of summary documents like the link above. As another example, try
the Google search "land caretaker Bible", not an unreasonable one to use.
And Bingo, link #5 (for me) was:
http://www.progress.org/bible03.htm And on
the first screen I readily pick out
"Instead, Leviticus 25.23 proves helpful: "The land must not be sold
permanently, because the land is mine [God's] and you are but aliens and my
tenants." This verse clearly assumes that mankind is custodian or caretaker
of land.)"
This association ("tenants" <=> "custodians, caretakers") was *not* done by
the search engine, it was done by the page author, Eugene Loh. In fact, if
you just use keywords "land caretaker", the third page in Google again turns
up useful summary documents. But to reiterate, it wasn't Google that
associated the *Bible* with your keywords...all these people did that. I
suggest that a great many such searches work so well, given reasonable
keywords, not because Google or other search engines are using LSA and other
techniques, but because millions of people are composing documents that
associate the terms. In effect, all of these people are doing what you are
asking for - they *are* the search engine. Google et al are effectively just
indexing/searching the searches.
What you're asking for, in essence, is searching in the absence of
derivative documents in the same semantic space (say, English theological).
We've all had to do that. The search engines fall flat on their collective
faces in this case. Most often you end up with 556,212 documents, where
document number 78,400 is the first relevant one. It may not even be
possible to provide better keywords, so it's not just the problem you pose.
I don't see a machine solution. Let the human beings continue to do the
work.
AHS