Web App like Google

G

godwin

Hello there,
I need some thoughts about a web application that i am dreaming
and drooling about in python. I want a search page to look exactly like
Google. But when i press the search button, it should search a database
in an rdbms like Oracle and provide results.
For example, if my keywords are "all customers with names
starting with 'God'" it should somehow search table CUSTOMER , with
following query :
SELECT CUSTNAME FROM CUSTOMER WHERE CUSTNAME LIKE 'God%'
So we basically need is a good python parser module which parses the
keywords into an sql or sqls and list the results. I can look in the
keywords for table and column synonyms and map it into
table and column names and create sql but it's not a foolproof idea as
we all know that english is a very vague language. The above idea wil
fail , if i can't identify table,column names
,operators and values in their logical orders so as to create a
syntactically correct sql. If there are more tables involved, i should
also think of joining tables (inner,outer,equi joins).
All I want is some enlightening thoughts from the python hackers(i mean
programmers) out there.Plz polish your grey cells and let me know your
thoughts.

# this is my basic and foolish keywordparser
# the object db provides the table as well as column names
# as u can see it may or may not work even for a single table
class KeyWordParser(object):
def __init__(self,keywords,db):
self.keywords = keywords.upper().split()
self.db = db
self.tables = []
self.columns = []

def parse2sql(self):
for word in self.keywords:
if word in self.db.tables():
self.tables.append(word)

for word in self.keywords:
for table in self.tables:
for column in self.db.columns(table):
if column == word:
self.columns.append(column)
sql = 'SELECT %s FROM %s' % (','.join(self.columns) or
'*',','.join(self.tables))
return sql
 
R

Robert Kern

godwin said:
Hello there,
I need some thoughts about a web application that i am dreaming
and drooling about in python. I want a search page to look exactly like
Google. But when i press the search button, it should search a database
in an rdbms like Oracle and provide results.
For example, if my keywords are "all customers with names
starting with 'God'" it should somehow search table CUSTOMER , with
following query :
SELECT CUSTNAME FROM CUSTOMER WHERE CUSTNAME LIKE 'God%'

This is a Natural Language Processing (NLP) task. In general, it's
pretty hard. For Python, there is the Natural Language Toolkit (NLTK):

http://nltk.sourceforge.net/

You could get pretty far, though, by accepting a specific subset, the
so-called "controlled natural language" approach. For example:

http://www.jfsowa.com/clce/specs.htm
http://www.ics.mq.edu.au/~rolfs/controlled-natural-languages/
http://www.ifi.unizh.ch/attempto/

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
G

Graham Fawcett

In translating natural language to SQL, be sure you're not introducing
opportunities for SQL injection attacks. Code like

sql = 'SELECT %s FROM %s' % (this, that)

is considered dangerous, because a well-crafted value for "that" can be
used to, e.g., delete rows from your tables, run system commands, etc.
You can save a lot of worry by using a database account with read-only
privileges, but you still have to be careful. My advice is to read up
on "sql injection" before going too public with your code.

Graham
 
G

Godwin

Thanks for informing me about NLTK. I'll certainly look into it and
other options. Hope my dream doesn't go into the graves.

Godwin Burby
 
G

Godwin

Thanks for providing me with all those informative links about NLTK nad
CNL. I'll certainly look into it.
 
G

Godwin

Thanks for making me aware of the security loophole of the web app i am
planning.
Godwin Burby
 
G

gene tani

Yes, there's a lot of issues, cross-site scripting, session hijacking,
proper authentication, etc. Open Web App Security Project is useful

www.owasp.org

Also, before you start with NLP and full-on parsers, think about if you
can apply a text indexer, stemming and stopping both your user's
queries and the database content. Much easier conceptually, easier on
db server too. and there's lots of good python packages/python
bindings.

http://www.xapian.org/
http://www.pypackage.org/packages/python-pyndex
http://www.divmod.org/Home/Projects/Lupy/
 
G

Godwin

Actually i went through the NLTK site tutorials and it has a steep
learning curve associated with it. Well if i can learn it, it will be
the best fit module for my app. But if it gets hard, i'll definitely
opt for the text indexers u gratefully pointed to me. Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,120
Latest member
ShelaWalli
Top