Python equivalent of Perl's $/

J

John K Masters

I am currently working my way through Jeffrey Friedl's book Mastering
Regular Expressions. Great book apart from the fact it uses Perl for the
examples.

One particular expression that interests me is '$/ = ".\n"' which,
rather than splitting a file into lines, splits on a period-newline
boundary. Combined with Perl's 'while (<>)' construct this seems a great
way to process the files I am interested in.

Without wishing to start a flame war, is there a way to do this in Python?

Regards, John
 
K

kyosohma

I am currently working my way through Jeffrey Friedl's book Mastering
Regular Expressions. Great book apart from the fact it uses Perl for the
examples.

One particular expression that interests me is '$/ = ".\n"' which,
rather than splitting a file into lines, splits on a period-newline
boundary. Combined with Perl's 'while (<>)' construct this seems a great
way to process the files I am interested in.

Without wishing to start a flame war, is there a way to do this in Python?

Regards, John

Python has a Regular Expressions module. Check it out here:
http://docs.python.org/lib/module-re.html

There's also a chapter from Dive Into Python that covers this topic
too:
http://www.diveintopython.org/regular_expressions/index.html

Finally, Python "while" statement's docs can be found here:
http://docs.python.org/ref/while.html

Hope that helps!

Mike
 
M

Mark T

John K Masters said:
I am currently working my way through Jeffrey Friedl's book Mastering
Regular Expressions. Great book apart from the fact it uses Perl for the
examples.

One particular expression that interests me is '$/ = ".\n"' which,
rather than splitting a file into lines, splits on a period-newline
boundary. Combined with Perl's 'while (<>)' construct this seems a great
way to process the files I am interested in.

Without wishing to start a flame war, is there a way to do this in Python?

Regards, John
['test\ntest2', 'test3\ntest4', 'test5']

-Mark T.
 
N

Nick Craig-Wood

John K Masters said:
I am currently working my way through Jeffrey Friedl's book Mastering
Regular Expressions. Great book apart from the fact it uses Perl for the
examples.

One particular expression that interests me is '$/ = ".\n"' which,
rather than splitting a file into lines, splits on a period-newline
boundary. Combined with Perl's 'while (<>)' construct this seems a great
way to process the files I am interested in.

Without wishing to start a flame war, is there a way to do this in Python?

Regards, John

Something like this maybe?

import re

input_data = """I am currently working my way through Jeffrey Friedl's book Mastering
Regular Expressions. Great book apart from the fact it uses Perl for the
examples.

One particular expression that interests me is '$/ = ".\\n"' which,
rather than splitting a file into lines, splits on a period-newline
boundary. Combined with Perl's 'while (<>)' construct this seems a great
way to process the files I am interested in.

Without wishing to start a flame war, is there a way to do this in Python?
"""

for para in re.split(r"\.\n", input_data):
print "para = %r" % para
 
J

John K Masters

Something like this maybe?

import re

input_data = """I am currently working my way through Jeffrey Friedl's book Mastering
Regular Expressions. Great book apart from the fact it uses Perl for the
examples.

One particular expression that interests me is '$/ = ".\\n"' which,
rather than splitting a file into lines, splits on a period-newline
boundary. Combined with Perl's 'while (<>)' construct this seems a great
way to process the files I am interested in.

Without wishing to start a flame war, is there a way to do this in Python?
"""

for para in re.split(r"\.\n", input_data):
print "para = %r" % para

Thanks, that looks promising. The Perl examples are really confusing
sometimes and throw me off the track of the obvious Python way. That
said, the Python documentation does not always make it clear, at least
not to me, how to get the result one wants.

Regards, John
 
A

attn.steven.kuo

I am currently working my way through Jeffrey Friedl's book Mastering
Regular Expressions. Great book apart from the fact it uses Perl for the
examples.

One particular expression that interests me is '$/ = ".\n"' which,
rather than splitting a file into lines, splits on a period-newline
boundary. Combined with Perl's 'while (<>)' construct this seems a great
way to process the files I am interested in.

Without wishing to start a flame war, is there a way to do this in Python?



import StringIO

text = """\
To mimic Perl's input record separator in
Python, you can use a generator.
And a substring test.
Perhaps something like the following
is what you wanted.
"""

mockfile = StringIO.StringIO(text)

def genrecords(mockfile, sep=".\n"):
buffer = ""
while True:
while sep in buffer:
idx = buffer.find(sep) + len(sep)
yield buffer[:idx]
buffer = buffer[idx:]
rl = mockfile.readline()
if rl == "":
break
else:
buffer = '%s%s' % (buffer, rl)
yield buffer
raise StopIteration

for record in genrecords(mockfile):
print "READ:", record
 
J

John K Masters

import StringIO

text = """\
To mimic Perl's input record separator in
Python, you can use a generator.
And a substring test.
Perhaps something like the following
is what you wanted.
"""

mockfile = StringIO.StringIO(text)

def genrecords(mockfile, sep=".\n"):
buffer = ""
while True:
while sep in buffer:
idx = buffer.find(sep) + len(sep)
yield buffer[:idx]
buffer = buffer[idx:]
rl = mockfile.readline()
if rl == "":
break
else:
buffer = '%s%s' % (buffer, rl)
yield buffer
raise StopIteration

for record in genrecords(mockfile):
print "READ:", record
Thanks, this also looks like a good way to go but ATM beyond my level of
Python knowledge. I've not reached the generator chapter yet but I'll
flag the message and return later.

Regards, John
 
A

attn.steven.kuo

(snipped)


Thanks, this also looks like a good way to go but ATM beyond my level of
Python knowledge. I've not reached the generator chapter yet but I'll
flag the message and return later.

Regards, John


Some features in Perl can be found in Python, so if you know
the former, then learning the latter ought to go smoothly. In
any case, here's an updated version of the generator that
avoid repeating an unncessary string search:

def genrecords(mockfile, sep=".\n"):
"""
"""
buffer = ""
while True:
idx = buffer.find(sep) + len(sep)
while idx >= len(sep):
yield buffer[:idx]
buffer = buffer[idx:]
idx = buffer.find(sep) + len(sep)
rl = mockfile.readline()
if rl == "":
break
else:
buffer = '%s%s' % (buffer, rl)
yield buffer
raise StopIteration
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top