Writing a parser the right way?

Discussion in 'Python' started by beza1e1, Sep 21, 2005.

  1. beza1e1

    beza1e1 Guest

    I'm writing a parser for english language. This is a simple function to
    identify, what kind of sentence we have. Do you think, this class
    wrapping is right to represent the result of the function? Further
    parsing then checks isinstance(text, Declarative).

    -------------------
    class Sentence(str): pass
    class Declarative(Sentence): pass
    class Question(Sentence): pass
    class Command(Sentence): pass

    def identify_sentence(text):
    text = text.strip()
    if text[-1] == '.':
    return Declarative(text)
    elif text[-1] == '!':
    return Command(text)
    elif text[-1] == '?':
    return Question(text)
    return text
    -------------------

    At first i just returned the class, then i decided to derive Sentence
    from str, so i can insert the text as well.
    beza1e1, Sep 21, 2005
    #1
    1. Advertising

  2. beza1e1

    Ben Sizer Guest

    beza1e1 wrote:
    > I'm writing a parser for english language. This is a simple function to
    > identify, what kind of sentence we have. Do you think, this class
    > wrapping is right to represent the result of the function? Further
    > parsing then checks isinstance(text, Declarative).
    >
    > -------------------
    > class Sentence(str): pass
    > class Declarative(Sentence): pass
    > class Question(Sentence): pass
    > class Command(Sentence): pass


    As far as the parser is concerned, making these separate classes is
    unnecessary when you could just store the sentence type as a normal
    data member of Sentence. So the answer to your question is no, in my
    opinion.

    However, when you come to actually use the resulting Sentence objects,
    perhaps the behaviour is different? If you're looking to use a standard
    interface to Sentences but are going to be doing substantially
    different processing depending on which sentence type you have, then
    yes, this class hierarchy may be useful to you.

    --
    Ben Sizer
    Ben Sizer, Sep 21, 2005
    #2
    1. Advertising

  3. beza1e1

    beza1e1 Guest

    Well, a declarative sentence is essentially subject-predicate-object,
    while a question is predicate-subject-object. This is important in
    further processing. So perhaps i should code this order into the
    classes? I need to think a little bit more about this.

    Thanks for your feed for thought! :)
    beza1e1, Sep 21, 2005
    #3
  4. beza1e1 wrote:
    > Well, a declarative sentence is essentially subject-predicate-object,
    > while a question is predicate-subject-object. This is important in
    > further processing. So perhaps i should code this order into the
    > classes? I need to think a little bit more about this.


    A question is subject-predicate-object?

    That was unknown by me.

    Honestly, if you're trying a general English parser, good luck.
    Christopher Subich, Sep 21, 2005
    #4
  5. beza1e1

    Paul McGuire Guest

    "beza1e1" <> wrote in message
    news:...
    > I'm writing a parser for english language. This is a simple function to
    > identify, what kind of sentence we have. Do you think, this class
    > wrapping is right to represent the result of the function? Further
    > parsing then checks isinstance(text, Declarative).
    >
    > -------------------
    > class Sentence(str): pass
    > class Declarative(Sentence): pass
    > class Question(Sentence): pass
    > class Command(Sentence): pass
    >
    > def identify_sentence(text):
    > text = text.strip()
    > if text[-1] == '.':
    > return Declarative(text)
    > elif text[-1] == '!':
    > return Command(text)
    > elif text[-1] == '?':
    > return Question(text)
    > return text
    > -------------------
    >
    > At first i just returned the class, then i decided to derive Sentence
    > from str, so i can insert the text as well.
    >

    Andreas -

    Are you trying to parse any English sentence, or just a limited form of
    them? Parsing *any* English sentence (or question or interjection or
    command) is a ***huge*** undertaking - Google for "natural language" and you
    will find many efforts (with substantial time and money and manpower
    resources) working on this problem. Applications range from automated
    language translation to helpdesk automated analysis. I really suggest you
    do a bit of research on this topic, just to get an idea of how big this job
    is. Here's a Wikipedia link:
    http://en.wikipedia.org/wiki/Natural_language_processing

    Here are some simple examples, that quickly go beyond
    subject-predicate-object:

    I drive a truck.
    I drive a red truck.
    I drive a red truck to work.
    I drive a red truck to the shop to work on it.
    I drive a red truck to the shop to have some work done on it.
    I drive a red truck very fast.
    I drive a red truck through a red light.

    Then factor in other sentences (past and future tenses, past and future
    perfect tenses, figurative metaphors) and parsing general English is a major
    job. The favorite test case of the natural language folks is "Time flies
    like an arrow," which early auto-translation software converted to "Temporal
    insects enjoy a pointed projectile."

    On the other hand, if you plan to limit the type and/or content of the
    sentences being parsed (such as computer system commands or adventure game
    inputs, or descriptions of physical objects), then you can scope out a
    reasonable capability by choosing a vocabulary of known verbs and objects,
    and avoiding ambiguities (such as "set", as in "I set the set of glasses
    next to the TV set," or "lead" as in "Lead me to the store that sells lead
    pencils.").

    Hope this sheds some light on your task,
    -- Paul
    Paul McGuire, Sep 21, 2005
    #5
  6. Christopher Subich wrote:
    > beza1e1 wrote:
    >
    >> Well, a declarative sentence is essentially subject-predicate-object,
    >> while a question is predicate-subject-object. This is important in
    >> further processing. So perhaps i should code this order into the
    >> classes? I need to think a little bit more about this.

    >
    > A question is subject-predicate-object?
    >
    > That was unknown by me.
    >
    > Honestly, if you're trying a general English parser, good luck.


    I second that. Have you read any of the natural language processing
    reasearch in this area? There are a variety of English parsers already
    available? Googling for "charniak parser" or "collins parser" should
    get you something. I believe Dan Bikel has one too. Those are trained
    on Wall Street Journal text. You might also look into Minipar, which is
    rule-based and not as WSJ specific.

    STeVe
    Steven Bethard, Sep 21, 2005
    #6
  7. beza1e1

    beza1e1 Guest

    Thanks for the hints. I just found NLTK and MontyLingua.

    And yes, it is just adventure game language. This means every tense
    except present tense is discarded as "not changing world". Furthermore
    the parser will make a lot of assumptions, which are perhaps 90% right,
    not perfect:

    if word[-2:] == "ly":
    return Adverb(word)

    Note that uppercase words are identified before, so Willy is parsed
    correctly as a noun. On the other hand "silly boy", will not return a
    correct result.

    Currently it is just a proof-of-concept. Maybe i can integrate a better
    parser engine later. The idea is a kind of mud, where you talk correct
    sentences instead of "go north". I envision a difference like Diablo to
    Pen&Paper. I'd call it more a collaborative story telling game, than a
    actual RPG.

    I fed it your sentences, Paul. Result:
    <['I', 'drive', 'a']> <['red']> <['truck']>
    should be:
    <['I']> <['drive']> <['a', 'red', 'truck']>

    Verbs are the tricky part i think. There is no way to recognice them.
    So i will have to get a database ... work to do. ;)
    beza1e1, Sep 22, 2005
    #7
  8. Steven Bethard, Sep 22, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tom
    Replies:
    0
    Views:
    424
  2. =?Utf-8?B?QmlzaG95?=
    Replies:
    0
    Views:
    987
    =?Utf-8?B?QmlzaG95?=
    Dec 28, 2006
  3. Bogdan
    Replies:
    1
    Views:
    790
    Bogdan
    Jun 16, 2009
  4. Leon
    Replies:
    3
    Views:
    170
    TaeHo Yoo
    Nov 26, 2004
  5. Oran
    Replies:
    2
    Views:
    538
Loading...

Share This Page