ranking texts against a white list

Discussion in 'Perl Misc' started by Mario Protto, Mar 4, 2005.

  1. Mario Protto

    Mario Protto Guest

    hi all,

    I have many small texts (200-1000 chars), I have a white list (100 words), I
    have to evaluate any text with its relevancy against the word list.
    Now I'm using a very simple alg like
    _______________________
    in text there is at least 1 word from list?
    yes --> rank = 1
    no --> rank = 0
    _______________________

    but I'd like rank to be a real number between 0 and 1, I have think
    something like count how many differnt word there are in test and normalize
    to 1 but perhaps there is some other, most intelligent...;), way to do that
    .....any suggest?

    thx

    Mario
    www.mario-online.com
    Mario Protto, Mar 4, 2005
    #1
    1. Advertising

  2. "Mario Protto" <mario AT mario-online DOT d> writes:
    >
    > I have many small texts (200-1000 chars), I have a white list (100 words), I
    > have to evaluate any text with its relevancy against the word list.
    > Now I'm using a very simple alg like
    > _______________________
    > in text there is at least 1 word from list?
    > yes --> rank = 1
    > no --> rank = 0
    > _______________________
    >
    > but I'd like rank to be a real number between 0 and 1, I have think
    > something like count how many differnt word there are in test and normalize
    > to 1 but perhaps there is some other, most intelligent...;), way to do that
    > ....any suggest?


    This question doesn't have anything to do with Perl, until there is
    a particular implementation problem you want help with, so this is
    not the proper news group for it.

    If you don't know what the meaning of the relevancy number is, how
    can anyone else? It's easy to start speculating, but before even doing
    that I would want to know how the number is to be used.

    If you search with google using some of the words "rank text white list",
    you may find more information. Another source of ideas is documentation
    (and source) of existing text search and ranking tools. 'Glimpse' comes
    to mind, but there are probably many.

    There's probably a proper news group dealing with such questions, but
    I don't know what it might be called.
    Arndt Jonasson, Mar 4, 2005
    #2
    1. Advertising

  3. Mario Protto

    Mario Protto Guest

    >> I have many small texts (200-1000 chars), I have a white list (100
    >> words), I
    >> have to evaluate any text with its relevancy against the word list.
    >> Now I'm using a very simple alg like
    >> _______________________
    >> in text there is at least 1 word from list?
    >> yes --> rank = 1
    >> no --> rank = 0
    >> _______________________
    >>
    >> but I'd like rank to be a real number between 0 and 1, I have think
    >> something like count how many differnt word there are in test and
    >> normalize
    >> to 1 but perhaps there is some other, most intelligent...;), way to do
    >> that
    >> ....any suggest?

    >
    > This question doesn't have anything to do with Perl, until there is
    > a particular implementation problem you want help with, so this is
    > not the proper news group for it.


    Ehm...sorry but I forgot to tell that this function is embedded in a Perl
    project that start fetching text in a various way, putting it in a
    Postgresql db and, via a PHP front-end, permit to human operators to filter
    and show the contents.

    > If you don't know what the meaning of the relevancy number is, how
    > can anyone else? It's easy to start speculating, but before even doing
    > that I would want to know how the number is to be used.


    Well, the relevancy number could be something like "how much this document
    talk about my terms", I know it could be almost a theoric question but it
    seems to me as a common needed for perl programmer managing text...isn't it?

    > If you search with google using some of the words "rank text white list",
    > you may find more information. Another source of ideas is documentation
    > (and source) of existing text search and ranking tools. 'Glimpse' comes
    > to mind, but there are probably many.


    of course I've done some Cpan and Google search before my post, also (for
    who is interested) in italian newsgroup about Perl Stefano Rodighiero
    suggest a very interesting article:
    * "Building a Vector Space Search Engine in Perl"
    http://www.perl.com/pub/a/2003/02/19/engine.html

    > There's probably a proper news group dealing with such questions, but
    > I don't know what it might be called.


    me too...:)

    Mario
    Mario Protto, Mar 4, 2005
    #3
  4. Mario Protto wrote:
    > hi all,
    >
    > I have many small texts (200-1000 chars), I have a white list (100 words), I
    > have to evaluate any text with its relevancy against the word list.
    > Now I'm using a very simple alg like
    > _______________________
    > in text there is at least 1 word from list?
    > yes --> rank = 1
    > no --> rank = 0
    > _______________________
    >
    > but I'd like rank to be a real number between 0 and 1, I have think
    > something like count how many differnt word there are in test and normalize
    > to 1 but perhaps there is some other, most intelligent...;), way to do that
    > ....any suggest?
    >

    Hi

    check out

    http://www.perl.com/pub/a/2003/02/19/engine.html

    is an article on building vector-space searches. May be what you are after.

    Mark
    Mark Clements, Mar 4, 2005
    #4
  5. "Mario Protto" <mario AT mario-online DOT
    d> wrote in
    news:d09m58$e3c$:

    >>> I have many small texts (200-1000 chars), I have a white list (100
    >>> words), I have to evaluate any text with its relevancy

    ....

    >>> but I'd like rank to be a real number between 0 and 1, I have think
    >>> something like count how many differnt word there are in test and

    ....

    >> This question doesn't have anything to do with Perl, until there is
    >> a particular implementation problem you want help with, so this is
    >> not the proper news group for it.

    >
    > Ehm...sorry but I forgot to tell that this function is embedded in a
    > Perl project that start fetching text in a various way,


    Still irrelevant.

    To get a better idea of what types of topics are relevant here, you should
    read the posting guidelines for this group. They are posted here regularly
    or you can Google for them on the web.

    Sinan
    A. Sinan Unur, Mar 4, 2005
    #5
  6. A. Sinan Unur <> wrote:

    > "Mario Protto" <mario AT mario-online DOT
    > d> wrote in
    > news:d09m58$e3c$:
    >
    >>>> I have many small texts (200-1000 chars), I have a white list
    >>>> (100 words), I have to evaluate any text with its relevancy

    > ...
    >
    >>>> but I'd like rank to be a real number between 0 and 1, I have
    >>>> think something like count how many differnt word there are in
    >>>> test and

    > ...
    >
    >>> This question doesn't have anything to do with Perl, until there
    >>> is a particular implementation problem you want help with, so
    >>> this is not the proper news group for it.

    >>
    >> Ehm...sorry but I forgot to tell that this function is embedded
    >> in a Perl project that start fetching text in a various way,

    >
    > Still irrelevant.


    Maybe comp.programming? It seems like it might be a better place to
    discuss an algorithm without caring about what language it's
    implemented in.


    > To get a better idea of what types of topics are relevant here,
    > you should read the posting guidelines for this group. They are
    > posted here regularly or you can Google for them on the web.


    I bet Google hates the use of their trademarked name as a generic
    verb.... :)

    --
    David Wall
    David K. Wall, Mar 4, 2005
    #6
  7. David K. Wall <> wrote:
    > A. Sinan Unur <> wrote:


    >> you can Google for them on the web.

    >
    > I bet Google hates the use of their trademarked name as a generic
    > verb.... :)



    I hope the smiley means you mean just the opposite...?

    I would think they _love_ it.


    --
    Tad McClellan SGML consulting
    Perl programming
    Fort Worth, Texas
    Tad McClellan, Mar 4, 2005
    #7
  8. Tad McClellan wrote:

    > David K. Wall <> wrote:
    >> A. Sinan Unur <> wrote:

    >
    >>> you can Google for them on the web.

    >>
    >> I bet Google hates the use of their trademarked name as a generic
    >> verb.... :)

    >
    >
    > I hope the smiley means you mean just the opposite...?
    >
    > I would think they _love_ it.
    >

    Er, no. Because that's how you lose trademarks. Ask Bayer,
    for whom aspirin used to be a trademark. Also escalator,
    linoleum, zipper and yo-yo, all of which used to be brand
    names, and were lost to their owners because they became
    generic terms.

    --
    Christopher Mattern

    "Which one you figure tracked us?"
    "The ugly one, sir."
    "...Could you be more specific?"
    Chris Mattern, Mar 4, 2005
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sravani
    Replies:
    1
    Views:
    339
    JohnFol
    Mar 9, 2005
  2. Niels Dybdahl

    Color.white vs. Color.WHITE

    Niels Dybdahl, Oct 6, 2004, in forum: Java
    Replies:
    3
    Views:
    439
    Chris Smith
    Oct 6, 2004
  3. -Rob

    google and page ranking

    -Rob, Oct 7, 2003, in forum: HTML
    Replies:
    16
    Views:
    655
    Mini Me
    Oct 17, 2003
  4. Ben C
    Replies:
    6
    Views:
    2,156
    Leif K-Brooks
    Jan 28, 2007
  5. Avonelle Lovhaug

    Drop down list ranking/resequencing

    Avonelle Lovhaug, Sep 3, 2003, in forum: Javascript
    Replies:
    0
    Views:
    91
    Avonelle Lovhaug
    Sep 3, 2003
Loading...

Share This Page