Getting a value that follows string.find()

englishkevin110 · Aug 13, 2013

I know the title doesn't make much sense, but I didnt know how to explain my problem.

Anywho, I've opened a page's source in URLLIB
starturlsource = starturlopen.read()
string.find(starturlsource, '<a href="/profile.php?id=')
And I used string.find to find a specific area in the page's source.
I want to store what comes after ?id= in a variable.
Can someone help me with this?

Joel Goldstick · Aug 13, 2013

lookup urlparse for you answer

I know the title doesn't make much sense, but I didnt know how to explain my problem.

Anywho, I've opened a page's source in URLLIB
starturlsource = starturlopen.read()
string.find(starturlsource, '<a href="/profile.php?id=')
And I used string.find to find a specific area in the page's source.
I want to store what comes after ?id= in a variable.
Can someone help me with this?

englishkevin110 · Aug 14, 2013

lookup urlparse for you answer

--

Joel Goldstick

http://joelgoldstick.com

I dont want to do any kind of HTML parsing.

Joel Goldstick · Aug 14, 2013

I dont want to do any kind of HTML parsing.

Aside from the fact that I really want a pony, and you seem to want
your work done for you, look here:

http://stackoverflow.com/questions/11600681/parse-query-part-from-url

Joel Goldstick · Aug 14, 2013

Aside from the fact that I really want a pony, and you seem to want
your work done for you, look here:

http://stackoverflow.com/questions/11600681/parse-query-part-from-url

I may have been too quick on my reading of you question. You wanted
to get the value of the parameters, but also to find the url in the
page. You want to do this without parsing, if I understand you. The
good news is there is a module called Beautiful Soup that will do the
parsing for you. The tutorial is way better than excellent, and you
will be up and running in less than a half hour from downloading the
module

http://www.crummy.com/software/BeautifulSoup/bs4/doc/

Dave Angel · Aug 14, 2013

I know the title doesn't make much sense, but I didnt know how to explain my problem.

Anywho, I've opened a page's source in URLLIB
starturlsource = starturlopen.read()
string.find(starturlsource, '<a href="/profile.php?id=')
And I used string.find to find a specific area in the page's source.
I want to store what comes after ?id= in a variable.
Can someone help me with this?

Python 3.3.0 (default, Mar 7 2013, 00:24:38)
[GCC 4.6.3] on linux
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'find'

There is no find function in the string module [1]. But assuming
starturlsource is a str, you could do:

pattern = '<a href="/profile.php?id='
index = starturlsource.find( pattern )

index will then be -1 if there's no match, or have a non-negative value
if a match is found.

In the latter case, you can extract the next 17 characters with

newstr = starturlsource[index+len(pattern):index+len(pattern)+17]

You are of course making several assumptions about the web page, which
are perfectly reasonable since it's a page under your control. Or is
it?

[1] Assuming Python 3.3 since you omitted stating the version you're
using. But even in Python 2.7, using the string.find function is
deprecated in favor of the str method.

Steven D'Aprano · Aug 14, 2013

[fixing Joel's top-posting]

I dont want to do any kind of HTML parsing.

What you are doing *is* HTML parsing, or at least a half-baked, fragile,
likely to go wrong form of parsing.

But if you insist, the algorithm is simple: after calling find(), you
have the offset to the search string. You know the length of the search
string. Therefore you can calculate the index of the first character that
follows the search string:

text = "blah blah blah blah spam spam... blah blah blah blah..."
needle = "spam spam" # what we search for

i = text.find(needle)
if i == -1:
print("not found")
else:
print(text[i+len(needle):])

Of course, the problem is, you need to know not just the *start* offset
of the bit that follows, but the *ending* offset as well. Which brings
you into the realm of half-arsed parsing.

John Gordon · Aug 14, 2013

In said:
I know the title doesn't make much sense, but I didnt know how to explain my problem.

Anywho, I've opened a page's source in URLLIB
starturlsource = starturlopen.read()
string.find(starturlsource, '<a href="/profile.php?id=')
And I used string.find to find a specific area in the page's source.
I want to store what comes after ?id= in a variable.
Can someone help me with this?

starturlsource = starturlopen.read()

match_string = '<a href="/profile.php?id='

match_index = string.find(starturlsource, match_string)

if match_index != -1:
url = starturlsource[match_index + len(match_string):]

else:
print 'not found'

I keep getting this error when im trying to show category name.	0	Dec 26, 2023
A website that I couldn't make a screenshot of it nor save any page from.	1	Oct 29, 2023
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
simple_html_dom: simple use-case - getting a scipt to work	0	Mar 2, 2020
Getting carousel to continually rotate	0	Sep 27, 2017
Getting A Post ID VIA AJax	1	Jul 16, 2014
Help With a Script	5	Jul 10, 2021
Getting error from previously working code	0	Mar 5, 2017

Getting a value that follows string.find()

englishkevin110

Joel Goldstick

englishkevin110

Joel Goldstick

Joel Goldstick

Dave Angel

Steven D'Aprano

John Gordon

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads