surf said:
Some web sites like Yahoo have tons of people in their personals,
but the ability to search the profiles of people is so horribly
limited. Has anyone written some kind of Perl interface or some way
to do more advanced searches of personals ?
I haven't for Yahoo, but I once wrote my own system for some other
personals sites that had much worse search and display capabilities.
The mining program broke down like this:
Get a list of new profiles since the last run. (This part varies a
lot from site to site.)
Foreach profile, check it against my requirements (age, weight,
whatever)
If it matches, save the profile text to a file (generally with the
extraneous ads and stuff stripped). Parse out the person's vital
statistics and save them to a database. Fetch any pictures
belonging to the profile and save them with a naming scheme matching
them to the profile, and add links to them to the profile.
This resulted in a nice database on my local system holding only the
important information, an html file for each profile, and a set of
images. Then I had a second program I'd run to look through them and
delete the ones I wasn't interested in.
None of it was that complicated. The hardest part was parsing through
the HTML mess each site used on its profile pages -- figuring out what
tags and text I could count on staying the same in every profile so I
could parse out the right info.