Interesting challenge - can it be done?

A

Adam Akhtar

Hi im learning ruby and making a few scripts here and there. As part of
an ongoing script regarding ebay i came up with a challenge though i
dont know if its possible.

Given say 200 items from ebay from the same category e.g. cameras I
would like put them in groups based on similiarity.
E.g. find all listings that are for canon ixy 500 and put them in one
group

Also id like it to understand the difference between "canon ixy 500" and
say 7 listings for "canon ixy 500 camera case"

Another problem are the unrelated words in a listings title that the
script would have to ingnore e.g. "Bargain", "RARE", "Very RARE", "Did I
say it was RARE" etc

Of course I dont expect it could ever be perfect but if it could be 60%
accurate i would be happy!

Elegance is not a priority here so if there is a "hack" which achieves
similiar results to say some crazy A.I. routine which takes a year to
write then thats great. What ever gets the job done!

One hack i came up with was to look for model numbers using regexps. It
works really well, but only if there is a model number. I suppose i
could first group lisitngs with model numbers and then come up with some
routine for the remainder. So for items without model numbers what can i
do?

Also if it takes 2 hours for the script to do the job then fine. At most
1000 listings will be used. Speed isnt an issue.

What type of problem is this? is it a.i.??? Whre should i start
researching (I checked wiki but there wasnt enough written for me to
know which path to take).

any help\pointers greatly appreciated.
 
M

Martin DeMello

What type of problem is this? is it a.i.??? Whre should i start
researching (I checked wiki but there wasnt enough written for me to
know which path to take).

"machine learning" and "classification" are useful starting points

martin
 
T

Trans

Hi im learning ruby and making a few scripts here and there. As part of
an ongoing script regarding ebay i came up with a challenge though i
dont know if its possible.

Given say 200 items from ebay from the same category e.g. cameras I
would like put them in groups based on similiarity.
E.g. find all listings that are for canon ixy 500 and put them in one
group

Also id like it to understand the difference between "canon ixy 500" and
say 7 listings for "canon ixy 500 camera case"

Another problem are the unrelated words in a listings title that the
script would have to ingnore e.g. "Bargain", "RARE", "Very RARE", "Did I
say it was RARE" etc

Of course I dont expect it could ever be perfect but if it could be 60%
accurate i would be happy!

Elegance is not a priority here so if there is a "hack" which achieves
similiar results to say some crazy A.I. routine which takes a year to
write then thats great. What ever gets the job done!

One hack i came up with was to look for model numbers using regexps. It
works really well, but only if there is a model number. I suppose i
could first group lisitngs with model numbers and then come up with some
routine for the remainder. So for items without model numbers what can i
do?

Also if it takes 2 hours for the script to do the job then fine. At most
1000 listings will be used. Speed isnt an issue.

What type of problem is this? is it a.i.??? Whre should i start
researching (I checked wiki but there wasnt enough written for me to
know which path to take).

any help\pointers greatly appreciated.

Off the top of my head. Try indexing by keywords:

keywords.each do |keyword|
index[keyword] +=3D1 if (/#{keyword}/ =3D~ item.description)
end

Then write "rules" based on keyword combinations.

But this is very general question. From the sound of it I suspect you
need to sit down and do a good bit of reading on programming.

T.
 
T

Tim Pease

Hi im learning ruby and making a few scripts here and there. As
part of
an ongoing script regarding ebay i came up with a challenge though i
dont know if its possible.

Given say 200 items from ebay from the same category e.g. cameras I
would like put them in groups based on similiarity.
E.g. find all listings that are for canon ixy 500 and put them in one
group

Also id like it to understand the difference between "canon ixy
500" and
say 7 listings for "canon ixy 500 camera case"

Another problem are the unrelated words in a listings title that the
script would have to ingnore e.g. "Bargain", "RARE", "Very RARE",
"Did I
say it was RARE" etc

Of course I dont expect it could ever be perfect but if it could be
60%
accurate i would be happy!

Elegance is not a priority here so if there is a "hack" which
achieves
similiar results to say some crazy A.I. routine which takes a year to
write then thats great. What ever gets the job done!

One hack i came up with was to look for model numbers using
regexps. It
works really well, but only if there is a model number. I suppose i
could first group lisitngs with model numbers and then come up with
some
routine for the remainder. So for items without model numbers what
can i
do?

Also if it takes 2 hours for the script to do the job then fine. At
most
1000 listings will be used. Speed isnt an issue.

What type of problem is this? is it a.i.??? Whre should i start
researching (I checked wiki but there wasnt enough written for me to
know which path to take).

any help\pointers greatly appreciated.

Off the top of my head. Try indexing by keywords:

keywords.each do |keyword|
index[keyword] +=1 if (/#{keyword}/ =~ item.description)
end

Then write "rules" based on keyword combinations.

But this is very general question. From the sound of it I suspect you
need to sit down and do a good bit of reading on programming.

Use Lucene and/or Solr. They were build to do these kinds of queries.
In fact, you can even use the Ruby/Java bridge if you want to create
Lucene search directly in Ruby.

Blessings,
TwP
 
A

Adam Akhtar

Thank you both for your reccommendations. It does seem like a
classification problem. Regarding lucene/ferrit i had a look and it does
seem great for searching. Im wondering though if or how i cna use it as
i the script wont have an index of products to search for (i dont want
to hardcode it) but rather will to the best of its abilities, group
listings togehter by whatever patterns it may see.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top