need help with perl project

P

paintermonkey

Hi all,

I was given a very large and monotonous task at work yesterday: I have
to enter 3,000+ contacts into Outlook. The information I need is found
on a password protected site (I have the login and password). On the
main page are links to profiles that bear the information I need.

So far, my understanding of this task, next to entering each contact
manually, is to devise a script where I can spider each link, extract
the profiles, convert them all into an excel or txt file, then export
them somehow to Outlook, while making sure information gets transferred
into proper fields. (preferably excel because I'd also need to include
a notes field with specific comments).

I can open a profile (.asp) and save it as .xls and this gives me good
row numbers to work with (e.g., the info I need is in rows 1-5 and then
7-12). But if I'd have to save each profile as it's own .xls file,
then I'm pretty much back where I started in monotonous/redundancy
land, not to mention memory overload land.

The info on the profiles look like this:

Company Name
Address1
Address2
Address3

Phone
Fax
Website

Employees (heading)
Name1, JobTitle
Name2, JobTitle
Name3, JobTitle

Info (which can go on for pages following the previous info, which is
all I need)

I have to ultimately create a vCard for each of the individuals with
their job titles, all having the same address, phone, etc info for
those fields.

Does anyone know of any codes or snippets of scripts that can help me?
I'm good with theory, bad with actual codewriting. Or perhaps any
advice as to get me to the end result (3000+ vCards that my boss wants)
that is more straightforward or useful?

Thanks in advance!

PM
 
P

Paul Lalli

Does anyone know of any codes or snippets of scripts that can help me?

I'm afraid you're at the wrong newsgroup. This group is about helping
people write their own code, not to find existing executable scripts.

That being said, have you searched CPAN yet for the various modules
that could help in your task? http://search.cpan.org
I'm good with theory, bad with actual codewriting.

If you at least make an attempt to write your own code, and post what
you've got if/when it doesn't work correctly, it's likely someone will
be able to help you fix it.

Paul Lalli
 
U

usenet

I was given a very large and monotonous task at work yesterday

In programming terms, you were given several discrete tasks. Your job
is to tie those individual tasks together to form an application.

Fortunately, most of the code to do each task has already been written
for you and is available to you free of charge on CPAN. Your script
needs to interact with a website and extract information - you may be
interested in WWW::Mechanize. You may also benefit from something like
HTML::parser. If you need to create an Excel spreadsheet you will
probably find something like Spreadsheet::WriteExcel::Simple useful.
For the part of your script that creates vCards, you may be interested
in Text::vCard or Net::vCard. Search CPAN for terms of interest and
you will usually find useful modules that do what you need.

The "glue" that holds all these modules together and forms a useful
application is basic Perl. If you need help with Perl (or maybe even
help with using a module) then this is a great place - the caliber of
help you receive here is second to none. Many participants here are
professional Perl programmers, and you may recognize some of the names
here from the covers of your Perl reference manuals. And the help is
absolutely free! But most participants here ask just one thing -
that you respect and abide by the posting guidelines that this
community has established. If you post a question per those guidelines
you are almost assured of getting prompt, courteous, and expert help.
You may review the guidelines here: http://tinyurl.com/oxldo
 
P

PM

Thank you, David.

You probably guessed I'm a Perl novice. But it's good to know I've
narrowed it down to something that can actually help me. The script
info you gave is a great place for me to start my research. I've spent
most of today on CPAN. It's nice to have some reassurance that I'm
looking at the right place.

Thanks again,

PM
 
T

Tad McClellan

I was given a very large and monotonous task at work yesterday: I have
to enter 3,000+ contacts into Outlook. The information I need is found
on a password protected site (I have the login and password). On the
main page are links to profiles that bear the information I need.

So far, my understanding of this task, next to entering each contact
manually, is to devise a script where I can spider each link,


Fire up the:

Web Scraping Proxy

http://www.research.att.com/~hpk/wsp/

setup your browser to use that proxy, then pointy-clicky to get
to the info you need, and wsp will log the HTTP traffic for you,
in the form of Perl code!

Modify the log code, figure out how to scrape the info you need
from the web page, and push it out in whatever format you need.


search.cpan.org

can reveal many modules that help with those several tasks.

But if I'd have to save each profile as it's own .xls file,
then I'm pretty much back where I started in monotonous/redundancy
land, not to mention memory overload land.


Files do not consume memory (unless you meant grey-matter memory).
 
P

PM

Thanks Tad,

That is really useful information. However, the pointy-clicky part is
also the strenuous part as there are 1,800 links to click on.
 
P

PM

For each contact? That seems unlikely

That's actually exactly what it is. 1,800 organizations, with 3,000
contacts total. Each organization has a profile that is hyperlinked
from the main page. I'll look into the backend database. And for
heaven sakes, can't I just express some gratitude without someone
harping on following posting guidelines!
 
P

PM

For each contact? That seems unlikely

That's actually exactly what it is. 1,800 organizations, with 3,000
contacts total. Each organization has a profile that is hyperlinked
from the main page. I'll look into the backend database. And for
heaven sakes, can't I just express some gratitude without someone
harping on following posting guidelines!
 
K

Keith Keller

On 2006-05-02 said:
That's actually exactly what it is. 1,800 organizations, with 3,000
contacts total. Each organization has a profile that is hyperlinked
from the main page. I'll look into the backend database.

Oy! The backend definitely seems like the way to go here. If they
won't give it to you, try to get out of the project. ;-)
And for
heaven sakes, can't I just express some gratitude without someone
harping on following posting guidelines!

No. The posting guidelines are meant to make it easy for other people
to help you. If you don't follow them, you might not get the best help
(or, indeed, any help) for your task. Picture it: someone comes late to
this thread, and only reads what you wrote, above. Do you think they're
more likely to pull down the entire thread to see what you wrote so they
can try to help, or to ignore you altogether?

--keith
 
P

Paul Lalli

PM said:
And for
heaven sakes, can't I just express some gratitude without someone
harping on following posting guidelines!

This question shows that you still Don't Get It. When you don't quote
context in a message, no one has any way of knowing for *what* you are
expressing gratititude, nor even to whom! If you're too lazy to
bother telling the person you're grateful towards that you're grateful,
then you're not exactly all that grateful, now are you!?

Paul Lalli
 
T

Tad McClellan

PM said:
Thanks Tad,


Thanks for what?

Please quote some context in followups like everybody else does.

That is really useful information. However, the pointy-clicky part is
also the strenuous part as there are 1,800 links to click on.


Write code that finds and follows one link, then put that code
in a loop that iterates 1800 times.
 
T

Tad McClellan

PM said:
That's actually exactly what it is. 1,800 organizations, with 3,000
contacts total. Each organization has a profile that is hyperlinked
from the main page.


Parse the main page to get the 1800 URIs, then make a loop that
iterates 1800 times.

Write code to parse one contact, then put it in a loop that
iterates 3000 times, and put that loop inside the loop that
iterates 1800 times.

can't I just express some gratitude


Expressing gratitude without showing gratitude will lead many
to believe that there really isn't any gratitude there.

If you were truly grateful to the newsgroup, then you would be
posting the way the newsgroup likes postings to be made.

But you're not...

without someone
harping on following posting guidelines!


.... and you are evening whining about what the newsgroup likes.

So long!

*plonk*
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,218
Latest member
JolieDenha

Latest Threads

Top