similar articles algorithm based on numeric indexing of all rows via columns in a table

julie_smith · Jan 18, 2005

Hi,
I have an articles table containing columns like
id,name,author,section,creationdate,description,longmatter, etc.
I am using mysql.

some of them are fixed value fields(enumerations)

like->section will have news,sports,politics etc...

while description will be a text field with any amount of arbitrary
text.

now I have 50000 articles under different sections.

I want to implement a "similar articles" feature.
By this I mean when an article is shown,
I want to display all the similar articles based on that article.(10
per page).

Now how do I calculate the similarity of 1 article with all the 50000
articles ?

I dont want articles from the same section only.
Since the search result has to be very fast,
Can I create some algorithm that will look through all the fields in
each row of the
articles table and assign a weight/checksum to it.

And then in the similar articles part I display all the articles wth a
+-5 difference in checksum with the
current displayed articles checksum ?

Thanks in advance,

Julie

Gunnar Hjalmarsson · Jan 18, 2005

I want to implement a "similar articles" feature.
By this I mean when an article is shown,
I want to display all the similar articles based on that article.(10
per page).

Now how do I calculate the similarity of 1 article with all the 50000
articles ?

I dont want articles from the same section only.
Since the search result has to be very fast,
Can I create some algorithm that will look through all the fields in
each row of the
articles table and assign a weight/checksum to it.

Check out the CPAN module Algorithm:

iff.

Anno Siegel · Jan 18, 2005

Hi,
I have an articles table containing columns like
id,name,author,section,creationdate,description,longmatter, etc.
I am using mysql.

some of them are fixed value fields(enumerations)

like->section will have news,sports,politics etc...

while description will be a text field with any amount of arbitrary
text.

now I have 50000 articles under different sections.

I want to implement a "similar articles" feature.

Okay. Given two articles, how do you decide if they are similar?

By this I mean when an article is shown,
I want to display all the similar articles based on that article.(10
per page).

What you are going to do with the list of similar articles is of
no consequence on how you select them.

Now how do I calculate the similarity of 1 article with all the 50000
articles ?

First you have to tell us how to compare two individual articles, *then*
we can talk about ways to apply this to many pairs efficiently.

I dont want articles from the same section only.
Since the search result has to be very fast,
Can I create some algorithm that will look through all the fields in
each row of the
articles table and assign a weight/checksum to it.

And then in the similar articles part I display all the articles wth a
+-5 difference in checksum with the
current displayed articles checksum ?

Since you mention all the different fields, I suppose they all play
a part in deciding whether two articles are similar or not. You can't
map that many dimensions onto a single number and have it work like
you want to. The best you can hope for is a numeric representation
of *each field*, which can be compared to decide if articles are similar
with respect to one particular field. With some of the fields being
text strings, that won't be possible for all fields either.

Anno

Filter table rows based on multiple checkboxes value	2	Jan 13, 2023
selecting all the columns in a table based on the column headersvalue	1	May 25, 2009
can ASP table display 200 columns, 500,000 rows?	30	Dec 4, 2003
dynamically generate listcheckbox in a table based on another check box list	0	Apr 10, 2006
comp.lang.c FAQ list Table of Contents	0	Jan 12, 2008
Passing data between objects and calling all objects of a class in turn	1	Aug 24, 2010
How i can populate all fileds dynamically in jsp page based on contents found in xml file?	1	Oct 3, 2006
ASP or HTML - how to freeze columns in pivot table	1	Dec 16, 2004

similar articles algorithm based on numeric indexing of all rows via columns in a table

julie_smith

Gunnar Hjalmarsson

Anno Siegel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads