Startying with Python, need some pointers with manipulating strings

B

Benji99

Hi guys, I'm starting to learn Python and so far am very
impressed with it's possibilities. I do however need some help
with certain things I'm trying to do which as of yet haven't
managed to find the answer by myself. Hopefully, someone will be
able to give me some pointers :)

First my background, I haven't programmed seriously in over 5
years, but recently have started programming again in
Delphi/Pascal scripting, and that's what I'm most familiar with
right now. I'm also much more confortable with structured
programming in contrast to OO (which isn't helping much with
Python :))

Anyway, I have a very specific project in mind which I've mostly
implemented in Pascal and I'd like to implement it in Python
since the possibilities after that are much more interesting.

Basically, I'm getting a htmlsource from a URL and need to
a.) find specific URLs
b.) find specific data
c.) with specific URLs, load new html pages and repeat.

I've managed to load the html source I want into an object
called htmlsource using:

I'm assuming that htmlSource is a string with \n at the end of
each line.
NOTE: I've become very accustomed with the TStringList class in
Delphi so forgive me if I'm trying to work in that way with
Python...

Basically, I want to search through the whole string(
htmlSource), for a specific keyword, when it's found, I want to
know which line it's on so that I can retrieve that line and
then I should be able to parse/extract what I need using Regular
Expressions (which I'm getting quite confortable with). So how
can this be accomplished?

Second main thing I'd like to know has to do with urllister, I'm
very intrigued by it's use of grabbing automatically url links
from the source. but I've only managed to get it to retrive
everything, which is a lot. what are my options in term of
getting it to be more specific? Can I tell it to retrieve a URL
IF a keyword is found on the same string line?

Hopefully someone will be able able/willing to give me a hand, I
think with these roadblocks out of the way, I should be able to
figure out the rest of what I need. Thanks in advance!

Benji99
 
P

Paul McGuire

Benji99 said:
Basically, I'm getting a htmlsource from a URL and need to
a.) find specific URLs
b.) find specific data
c.) with specific URLs, load new html pages and repeat.
Basically, I want to search through the whole string(
htmlSource), for a specific keyword, when it's found, I want to
know which line it's on so that I can retrieve that line and
then I should be able to parse/extract what I need using Regular
Expressions (which I'm getting quite confortable with). So how
can this be accomplished?
If you download pyparsing (at http://pyparsing.sourceforge.net), you'll find
in the examples something very close to this called urlextractor.py (lists
out all href's and their associated links on the page at www.yahoo.com).

-- Paul
 
K

Kent Johnson

Benji99 said:
I've managed to load the html source I want into an object
called htmlsource using:




I'm assuming that htmlSource is a string with \n at the end of
each line.
NOTE: I've become very accustomed with the TStringList class in
Delphi so forgive me if I'm trying to work in that way with
Python...

Basically, I want to search through the whole string(
htmlSource), for a specific keyword, when it's found, I want to
know which line it's on so that I can retrieve that line and
then I should be able to parse/extract what I need using Regular
Expressions (which I'm getting quite confortable with). So how
can this be accomplished?

The Pythonic way to do this is to iterate through the lines of htmlSource and process them one at a
time.
htmlSource = htmlSource.split('\n') # Split on newline, making a list of lines
for line in htmlSource:
# Do something with line - check to see if it has the text of interest

You might want to look at Beautiful Soup. If you can find the links of interest by the tags around
them it might do what you want:
http://www.crummy.com/software/BeautifulSoup/

Kent
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top