regex into str

P

Peter Kleiweg

I want to use regular expressions with less typing. Like this:

A / 'b.(..)' # test for regex 'b...' in A
A[0] # get the last whole match
A[1] # get the first group in the last match

A /= 'b.','X',1 # replace first occurence of regex 'b.'
# in A with 'X'
A /= 'b.','X' # replace all occurences of regex 'b.'
# in A with 'X'

This works fine if I create a class derived from 'str' and put
in the right functions. I have a demonstration below.

But what I really want is to insert these functions into class
'str' itself, so I can use them on ordinary strings:

def __div__(self, regex):
p = re.compile(regex)
self.__sre__ = p.search(self)
return str(self.__sre__.group())

setattr(str, '__div__', __div__)

But when I try this I get:

TypeError: can't set attributes of built-in/extension type 'str'

I there a way to get this done?




Working example:


#!/usr/bin/env python

import re

class Mystr(str):
def __div__(self, regex):
p = re.compile(regex)
self.sre = p.search(self)
return Mystr(self.sre.group())

def __idiv__(self, tpl):
try:
regex, repl, count = tpl
except ValueError:
regex, repl = tpl
count = 0
p = re.compile(regex)
return Mystr(p.sub(repl, self, count))

def __call__(self, g):
return self.sre.group(g)

if __name__ == '__main__':
a = Mystr('abcdebfghbij')
print "a :", a

print "Match a / 'b(..)(..)' :",
print a / 'b(..)(..)' # find match

print "a[0], a[1], a[2] :",
print a[0], a[1], a[2] # print letters from string

print "a(0), a(1), a(2) :",
print a(0), a(1), a(2) # print matches

print "a :", a

a /= 'b.', 'X', 1 # find and replace once
print "a :", a

a /= 'b.', 'X' # find and replace all
print "a :", a
 
P

Peter Kleiweg

I said:
I want to use regular expressions with less typing. Like this:

A / 'b.(..)' # test for regex 'b...' in A
A[0] # get the last whole match
A[1] # get the first group in the last match

I meant:

A(0)
A(1)

While A[0] and A[1] should work like normal string indexing.
 
J

Jeff Epler

This is intended to be impossible.

Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFBMTmBJd01MZaTXX0RAhtuAKCCvsIDU+KsdTlGsnolsjAVeyL2ZwCgjyU5
Kumg9fZvpWWHMFgWRHNBZ/A=
=HNDP
-----END PGP SIGNATURE-----
 
P

Peter Kleiweg

Jeff Epler schreef:
This is intended to be impossible.

That i svery annoying.
Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.

This works:

a += 'x'

So this should too:

a /= 'x'




Is there a way to tell Python that '' should be something else
than str?
 
J

Jeremy Bowers

This works:

a += 'x'

In the sense you mean, no it doesn't.

Python 2.3.4 (#1, Jun 8 2004, 17:41:43)
[GCC 3.3.3 20040217 (Gentoo Linux 3.3.3, propolice-3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.1074272448

Note the two different id numbers. 'a' and 'ab' are not the same string.
Is there a way to tell Python that '' should be something else
than str?

No.
 
P

Peter Kleiweg

Jeremy Bowers schreef:
In the sense you mean, no it doesn't.

I mean in the sense it does, and in that sense it does.
Python 2.3.4 (#1, Jun 8 2004, 17:41:43)
[GCC 3.3.3 20040217 (Gentoo Linux 3.3.3, propolice-3.3-7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.1074272448

Note the two different id numbers. 'a' and 'ab' are not the same string.

That is not relevant. What matters is that a += 'b' does what it
is supposed to do.


Bummer.
 
D

Diez B. Roggisch

Jeff said:
This is intended to be impossible.

Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.

Another reason for not allowing this is that modifying builtins can lead to
severe bugs, as other libs might rely on certain functionality. If you
change that, things start getting very weird....
 
P

Peter Kleiweg

Diez B. Roggisch schreef:
Another reason for not allowing this is that modifying builtins can lead to
severe bugs, as other libs might rely on certain functionality. If you
change that, things start getting very weird....

Programming causes bugs. That's not a reason to disallow programming.
 
D

Diez B. Roggisch

Hi,
Programming causes bugs. That's not a reason to disallow programming.

Well, with that attitude I suggest you start coding in assembler - all
freedom you can imagine, no rules. Every bit is subject to your personal
interpretation. Or C, which is basically assembler with more names and
curly braces.

But for some reasons people started developing and using higher level
languages, that forbid certain techniques - and everytime somebody yelled
"I want to be free to do what I want" - python has its very special case of
that with its whitspace-dependend blocking structure that frequently causes
people confronted with it to reject python as language.

People started using higher level languages because they actually _did_
decrease the amount of problems programming caused - so the projects could
get more eleaborated.

Don't get me wrong - there is a lot of decisions to be made in language
design, and lots of them are debatable - python is no exception from that
rule. But as I said before - allowing builtins to be manipulated aks for
more trouble than its worth. Imagine a len() that always returns 1 - no
matter what you feed it. Or _if_ you're allowed to change builtin-types
constructors - then who is to decide which of the 5 different string
implementations in the various modules imported is the one to use?

The only thing you really need is a simple constructor for your undoubtly
interesting and useful string-derived class. Overloading "" as the string
constructor isn't possible - for the simple reason that only a statically
typed language could distinct the usage of the "classic" constructor vs.
your enhanced version.

So what you could do is to modify the builtins-_dict_ - that is possible -
to contain a new constructor s in it - then creating your strings is just

s('foo')

Which is only three chars more than usual string creation.

Another approach would be some macro-mechanism - but python doesn't have
such facility builtin - and I'm not aware that there is a widely adopted
3rd-party module/extenion out in the wild.
 
J

Jeremy Bowers

That is not relevant. What matters is that a += 'b' does what it
is supposed to do.

If you want to be that way, then fine. No, it doesn't do what it is
"supposed" to do, in the context you are discussing. It does not add a "b"
to the original string, it creates a new string containing the original
contents plus the new contents.

Let me refresh your memory: You are arguing that you should be able to
apply a division operator to a string to apply a regex to it. When people
told you it was impossible, because strings were immutable, you said that
a += "b" did what you wanted. In context, this was clearly a claim that
strings are mutable, although that is a translation of your claim from
what you said here:

This works:

a += 'x'

So this should too:

a /= 'x'

which was in reply to
Even if you could assign to str.__div__ (and this is very deliberately and
specifically disallowed) you would end up disappointed, because strings
are immutable. That means there's nowhere to store "the last match",
no way to mutate the string with the "/=" operator, and also that the
interpreter is free to use the same storage for two equal strings.

Therefore, I say again: Your example does not do what you are implicitly
claiming it does. Therefore it is not a counter example to Jeff Epler's
(correct) claim.

So when you say above "This works:", I'm saying, no, it doesn't, not in
the sense you were replying to Jeff. It is not mutating the string
originally referenced by a, you still can't do that, and your
attempted counterpoint has no force, no meaning. (Any other putative
meaning you would claim it had after the fact would simply be a
non-sequitor, in context.)

Any successful attempt to mutate a string (in pure Python) would
constitute a serious bug in Python. (Any successful mutation by a C
extension would constitute a major, Python-breaking bug in that extension.)
 
P

Peter Kleiweg

Jeremy Bowers schreef:
If you want to be that way, then fine. No, it doesn't do what it is
"supposed" to do, in the context you are discussing. It does not add a "b"
to the original string, it creates a new string containing the original
contents plus the new contents.

Yes, exactly as it is supposed to do.
 
A

Alex Martelli

Peter Kleiweg said:

I think you might be happier with Ruby -- beyond a number of trivia, the
big difference between the two, from my POV, is that in Ruby you can
alter built-ins, in Python you can't. Which is why I personally stick
with Python, but to anyone who mostly likes Python but believes he would
get better programs by modifying built-ins, I suggest Ruby.


Alex
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top