How do you print a string after it's been searched for an RE?

John Salerno · Jun 23, 2011

After I've run the re.search function on a string and no match was
found, how can I access that string? When I try to print it directly,
it's an empty string, I assume because it has been "consumed." How do
I prevent this?

It seems to work fine for this 2.x code:

import urllib.request
import re

next_nothing = '12345'
pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
nothing='
pattern = re.compile(r'[0-9]+')

while True:
page = urllib.request.urlopen(pc_url + next_nothing)
match_obj = pattern.search(page.read().decode())
if match_obj:
next_nothing = match_obj.group()
print(next_nothing)
else:
print(page.read().decode())
break

But when I try it with my own code (3.2), it won't print the text of
the page:

import urllib.request
import re

next_nothing = '12345'
pc_url = 'http://www.pythonchallenge.com/pc/def/linkedlist.php?
nothing='
pattern = re.compile(r'[0-9]+')

while True:
page = urllib.request.urlopen(pc_url + next_nothing)
match_obj = pattern.search(page.read().decode())
if match_obj:
next_nothing = match_obj.group()
print(next_nothing)
else:
print(page.read().decode())
break

P.S. I plan to clean up my code, I know it's not great right now. But
my immediate goal is to just figure out why the 2.x code can print
"text", but my own code can't print "page," which are basically the
same thing, unless something significant has changed with either the
urllib.request module, or the way it's decoded, or something, or is it
just an RE issue?

Thanks.

Ian Kelly · Jun 23, 2011

After I've run the re.search function on a string and no match was
found, how can I access that string? When I try to print it directly,
it's an empty string, I assume because it has been "consumed." How do
I prevent this?

This has nothing to do with regular expressions. It would appear that
page.read() is letting you read the response body multiple times in
2.x but not in 3.x, probably due to a change in buffering. Just store
the string in a variable and avoid calling page.read() multiple times.

John Salerno · Jun 23, 2011

This has nothing to do with regular expressions. It would appear that
page.read() is letting you read the response body multiple times in
2.x but not in 3.x, probably due to a change in buffering. Just store
the string in a variable and avoid calling page.read() multiple times.

Thank you. That worked, and as a result I think my code will look
cleaner.

Thomas L. Shinnick · Jun 23, 2011

There is also
print(match_obj.string)
which gives you a copy of the string searched. See end of section
6.2.5. Match Objects

John Salerno · Jun 23, 2011

There is also
print(match_obj.string)
which gives you a copy of the string searched. See end of section
6.2.5. Match Objects

I tried that, but the only time I wanted the string printed was when
there *wasn't* a match, so the match object was a NoneType.

Padding strings for a clean visual print out...	5	Dec 23, 2023
get back my simple little string after re search and replace	0	Jan 13, 2010
Re for Apache log file format	4	Oct 8, 2013
How to loop through all the other pages in a pdf using python	3	May 16, 2023
print header for output	0	Jun 19, 2011
Regex not matching a string	2	Jan 9, 2013
groveling over a file for Q:: and A:: stmts	3	Jul 24, 2012
Weird problem matching with REs	11	May 29, 2011

How do you print a string after it's been searched for an RE?

John Salerno

Ian Kelly

John Salerno

Thomas L. Shinnick

John Salerno

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads