Generic logic/conditional class or library for classification of data

Discussion in 'Python' started by Basilisk96, Apr 1, 2007.

  1. Basilisk96

    Basilisk96 Guest

    This topic is difficult to describe in one subject sentence...

    Has anyone come across the application of the simple statement "if
    (object1's attributes meet some conditions) then (set object2's
    attributes to certain outcomes)", where "object1" and "object2" are
    generic objects, and the "conditions" and "outcomes" are dynamic run-
    time inputs? Typically, logic code for any application out there is
    hard-coded. I have been working with Python for a year, and its
    flexibility is nothing short of amazing. Wouldn't it be possible to
    have a class or library that can do this sort of dynamic logic?

    The main application of such code would be for classification
    algorithms which, based on the attributes of a given object, can
    classify the object into a scheme. In general, conditions for
    classification can be complex, sometimes involving a collection of
    "and", "or", "not" clauses. The simplest outcome would involve simply
    setting a few attributes of the output object to given values if the
    input condition is met. So each such "if-then" clause can be viewed as
    a rule that is custom-defined at runtime.

    As a very basic example, consider a set of uncategorized objects that
    have text descriptions associated with them. The objects are some type
    of tangible product, e.g., books. So the input object has a
    Description attribute, and the output object (a categorized book)
    would have some attributes like Discipline, Target audience, etc.
    Let's say that one such rule is "if ( 'description' contains
    'algebra') then ('discipline' = 'math', 'target' = 'student') ". Keep
    in mind that all these attribute names and their values are not known
    at design time.

    Is there one obvious way to do this in Python?
    Perhaps this is more along the lines of data mining methods?
    Is there a library with this sort of functionality out there already?

    Any help will be appreciated.
    Basilisk96, Apr 1, 2007
    #1
    1. Advertising

  2. On Sat, 31 Mar 2007 21:54:46 -0700, Basilisk96 wrote:

    > As a very basic example, consider a set of uncategorized objects that
    > have text descriptions associated with them. The objects are some type
    > of tangible product, e.g., books. So the input object has a
    > Description attribute, and the output object (a categorized book)
    > would have some attributes like Discipline, Target audience, etc.
    > Let's say that one such rule is "if ( 'description' contains
    > 'algebra') then ('discipline' = 'math', 'target' = 'student')". Keep
    > in mind that all these attribute names and their values are not known at
    > design time.


    Easy-peasy.

    rules = {'algebra': {'discipline': 'math', 'target': 'student'},
    'python': {'section': 'programming', 'os': 'linux, windows'}}

    class Input_Book(object):
    def __init__(self, description):
    self.description = description

    class Output_Book(object):
    def __repr__(self):
    return "Book - %s" % self.__dict__

    def process_book(book):
    out = Output_Book()
    for desc in rules:
    if desc in book.description:
    attributes = rules[desc]
    for attr in attributes:
    setattr(out, attr, attributes[attr])
    return out

    book1 = Input_Book('python for cheese-makers')
    book2 = Input_Book('teaching algebra in haikus')
    book3 = Input_Book('how to teach algebra to python programmers')


    >>> process_book(book1)

    Book - {'section': 'programming', 'os': 'linux, windows'}
    >>> process_book(book2)

    Book - {'discipline': 'math', 'target': 'student'}
    >>> process_book(book3)

    Book - {'discipline': 'math', 'section': 'programming',
    'os': 'linux, windows', 'target': 'student'}


    I've made some simplifying assumptions: the input object always has a
    description attribute. Also the behaviour when two or more rules set the
    same attribute is left undefined. If you want more complex rules you can
    follow the same technique, except you'll need a set of meta-rules to
    decide what rules to follow.

    But having said that, I STRONGLY recommend that you don't follow that
    approach of creating variable instance attributes at runtime. The reason
    is, it's quite hard for you to know what to do with an Output_Book once
    you've got it. You'll probably end up filling your code with horrible
    stuff like this:

    if hasattr(book, 'target'):
    do_something_with(book.target)
    elif hasattr(book, 'discipline'):
    do_something_with(book.discipline)
    elif ... # etc.


    Replacing the hasattr() checks with try...except blocks isn't any
    less icky.

    Creating instance attributes at runtime has its place; I just don't think
    this is it.

    Instead, I suggest you encapsulate the variable parts of the book
    attributes into a single attribute:

    class Output_Book(object):
    def __init__(self, name, data):
    self.name = name # common attribute(s)
    self.data = data # variable attributes


    Then, instead of setting each variable attribute individually with
    setattr(), simply collect all of them in a dict and save them in data:

    def process_book(book):
    data = {}
    for desc in rules:
    if desc in book.description:
    data.update(rules[desc])
    return Output_Book(book.name, data)


    Now you can do this:

    outbook = process_book(book)
    # handle the common attributes that are always there
    print outbook.name
    # handle the variable attributes
    print "Stock = %s" % output.data.setdefault('status', 0)
    print "discipline = %s" % output.data.get('discipline', 'none')
    # handle all the variable attributes
    for key, value in output.data.iteritems():
    do_something_with(key, value)


    Any time you have to deal with variable attributes that may or may not be
    there, you have to use more complex code, but you can minimize the
    complexity by keeping the variable attributes separate from the common
    attributes.


    --
    Steven.
    Steven D'Aprano, Apr 1, 2007
    #2
    1. Advertising

  3. Re: Generic logic/conditional class or library for classification ofdata

    On Mar 31, 2007, at 11:54 PM, Basilisk96 wrote:

    > This topic is difficult to describe in one subject sentence...
    >
    > Has anyone come across the application of the simple statement "if
    > (object1's attributes meet some conditions) then (set object2's
    > attributes to certain outcomes)", where "object1" and "object2" are
    > generic objects, and the "conditions" and "outcomes" are dynamic run-
    > time inputs? Typically, logic code for any application out there is
    > hard-coded. I have been working with Python for a year, and its
    > flexibility is nothing short of amazing. Wouldn't it be possible to
    > have a class or library that can do this sort of dynamic logic?
    >
    > The main application of such code would be for classification
    > algorithms which, based on the attributes of a given object, can
    > classify the object into a scheme. In general, conditions for
    > classification can be complex, sometimes involving a collection of
    > "and", "or", "not" clauses. The simplest outcome would involve simply
    > setting a few attributes of the output object to given values if the
    > input condition is met. So each such "if-then" clause can be viewed as
    > a rule that is custom-defined at runtime.
    >
    > As a very basic example, consider a set of uncategorized objects that
    > have text descriptions associated with them. The objects are some type
    > of tangible product, e.g., books. So the input object has a
    > Description attribute, and the output object (a categorized book)
    > would have some attributes like Discipline, Target audience, etc.
    > Let's say that one such rule is "if ( 'description' contains
    > 'algebra') then ('discipline' = 'math', 'target' = 'student') ". Keep
    > in mind that all these attribute names and their values are not known
    > at design time.
    >
    > Is there one obvious way to do this in Python?
    > Perhaps this is more along the lines of data mining methods?
    > Is there a library with this sort of functionality out there already?
    >
    > Any help will be appreciated.


    You may be interested in http://divmod.org/trac/wiki/DivmodReverend
    -- it is a general purpose Bayesian classifier written in python.

    hope this helps,
    Michael
    Michael Bentley, Apr 1, 2007
    #3
  4. Basilisk96

    Basilisk96 Guest

    Thanks for the help, guys.
    Dictionaries to the rescue!

    Steven, it's certainly true that runtime creation of attributes does
    not fit well here. At some point, an application needs to come out of
    generics and deal with logic that is specific to the problem. The
    example I gave was classification of books, which is relatively easy
    to understand. The particular app I'm working with deals with
    specialty piping valves, where the list of rules grows complicated
    fairly quickly.

    So, having said that "attributes are not known at design time", it
    seems that dictionaries are best for the generic core functionality:
    it's easy to iterate over arbitrary "key, value" pairs without
    hiccups. I can even reference a custom function by a key, and call it
    during the iteration to do what's necessary. The input/output
    dictionaries would dictate that behavior, so that would be the
    implementation-specific stuff. Easy enough, and the core functionality
    remains generic enough for re-use.

    Michael, I looked at the sample snippets at that link, and I'll have
    to try it out. Thanks!
    Basilisk96, Apr 3, 2007
    #4
  5. Basilisk96

    Guest

    On Apr 3, 5:43 am, "Basilisk96" <> wrote:
    > Thanks for the help, guys.
    > Dictionaries to the rescue!
    >
    > Steven, it's certainly true that runtime creation of attributes does
    > not fit well here. At some point, an application needs to come out of
    > generics and deal with logic that is specific to the problem. The
    > example I gave was classification of books, which is relatively easy
    > to understand. The particular app I'm working with deals with
    > specialty piping valves, where the list of rules grows complicated
    > fairly quickly.
    >
    > So, having said that "attributes are not known at design time", it
    > seems that dictionaries are best for the generic core functionality:
    > it's easy to iterate over arbitrary "key, value" pairs without
    > hiccups. I can even reference a custom function by a key, and call it
    > during the iteration to do what's necessary. The input/output
    > dictionaries would dictate that behavior, so that would be the
    > implementation-specific stuff. Easy enough, and the core functionality
    > remains generic enough for re-use.
    >
    > Michael, I looked at the sample snippets at that link, and I'll have
    > to try it out. Thanks!


    Hello,

    If your rules become more complicated and maybe increase in number
    significantly,
    it might be an idea to switch to a rule-based system. Take a look at
    CLIPS and the
    associated Python bindings:

    http://www.ghg.net/clips/CLIPS.html
    http://pyclips.sourceforge.net/

    Kind regards,

    Marco
    , Apr 3, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Corn Holio

    Text Classification - Spam Filter

    Corn Holio, Jan 3, 2004, in forum: Java
    Replies:
    6
    Views:
    3,525
    Pavel Tonkov
    Jan 4, 2004
  2. Replies:
    1
    Views:
    562
    Paul Lutus
    Sep 8, 2004
  3. Stefan Ram
    Replies:
    4
    Views:
    453
    Chris Uppal
    Feb 1, 2006
  4. Internet Citizen

    Character classification: novice question

    Internet Citizen, May 14, 2004, in forum: C++
    Replies:
    2
    Views:
    457
    Internet Citizen
    May 14, 2004
  5. Replies:
    2
    Views:
    422
Loading...

Share This Page