How to escape # hash character in regex match strings

5

504crank

I've encountered a problem with my RegEx learning curve -- how to
escape hash characters # in strings being matched, e.g.:

123

The correct result should be:

123456

I've tried to escape the hash symbol in the match string without
result.

Any ideas? Is the answer something I overlooked in my lurching Python
schooling?
 
P

Peter Otten

I've encountered a problem with my RegEx learning curve -- how to
escape hash characters # in strings being matched, e.g.:


123

The correct result should be:

123456 '123456'

I've tried to escape the hash symbol in the match string without
result.

Any ideas? Is the answer something I overlooked in my lurching Python
schooling?

re.escape() is used to build the regex from a string that may contain
characters that have a special meaning in regular expressions but that you
want to treat as literals. You can for example search for r"C:\dir" with
['C:\\dir']

Without escaping you'd get
['C:7ir']

Peter
 
D

David Shapiro

Maybe a using a Unicode equiv of # would do the trick.

-----Original Message-----
From: [email protected] [mailto:p[email protected]] On Behalf Of Peter Otten
Sent: Wednesday, June 10, 2009 11:32 AM
To: (e-mail address removed)
Subject: Re: How to escape # hash character in regex match strings

I've encountered a problem with my RegEx learning curve -- how to
escape hash characters # in strings being matched, e.g.:


123

The correct result should be:

123456 '123456'

I've tried to escape the hash symbol in the match string without
result.

Any ideas? Is the answer something I overlooked in my lurching Python
schooling?

re.escape() is used to build the regex from a string that may contain
characters that have a special meaning in regular expressions but that you
want to treat as literals. You can for example search for r"C:\dir" with
['C:\\dir']

Without escaping you'd get
['C:7ir']

Peter
 
L

Lie Ryan

I've encountered a problem with my RegEx learning curve -- how to
escape hash characters # in strings being matched, e.g.:


123

The correct result should be:

123456

I've tried to escape the hash symbol in the match string without
result.

Any ideas? Is the answer something I overlooked in my lurching Python
schooling?

As you're not being clear on what you wanted, I'm just guessing this is
what you wanted:
'123456'
 
B

Brian D

As you're not being clear on what you wanted, I'm just guessing this is
what you wanted:


'123456'

Sorry I wasn't more clear. I positively appreciate your reply. It
provides half of what I'm hoping to learn. The hash character is
actually a desirable hook to identify a data entity in a scraping
routine I'm developing, but not a character I want in the scrubbed
data.

In my application, the hash makes a string of alphanumeric characters
unique from other alphanumeric strings. The strings I'm looking for
are actually manually-entered identifiers, but a real machine-created
identifier shouldn't contain that hash character. The correct pattern
should be 'A1234509', but is instead often merely entered as '#12345'
when the first character, representing an alphabet sequence for the
month, and the last two characters, representing a two-digit year, can
be assumed. Identifying the hash character in a RegEx match is a way
of trapping the string and transforming it into its correct machine-
generated form.

I'm surprised it's been so difficult to find an example of the hash
character in a RegEx string -- for exactly this type of situation,
since it's so common in the real world that people want to put a pound
symbol in front of a number.

Thanks!
 
B

Brian D

Sorry I wasn't more clear. I positively appreciate your reply. It
provides half of what I'm hoping to learn. The hash character is
actually a desirable hook to identify a data entity in a scraping
routine I'm developing, but not a character I want in the scrubbed
data.

In my application, the hash makes a string of alphanumeric characters
unique from other alphanumeric strings. The strings I'm looking for
are actually manually-entered identifiers, but a real machine-created
identifier shouldn't contain that hash character. The correct pattern
should be 'A1234509', but is instead often merely entered as '#12345'
when the first character, representing an alphabet sequence for the
month, and the last two characters, representing a two-digit year, can
be assumed. Identifying the hash character in a RegEx match is a way
of trapping the string and transforming it into its correct machine-
generated form.

I'm surprised it's been so difficult to find an example of the hash
character in a RegEx string -- for exactly this type of situation,
since it's so common in the real world that people want to put a pound
symbol in front of a number.

Thanks!

By the way, other forms the strings can take in their manually created
forms:

A#12345
#1234509

Garbage in, garbage out -- I know. I wish I could tell the people
entering the data how challenging it is to work with what they
provide, but it is, after all, a screen-scraping routine.
 
5

504crank

As you're not being clear on what you wanted, I'm just guessing this is
what you wanted:


'123456'- Hide quoted text -

- Show quoted text -

Sorry I wasn't more clear. I positively appreciate your reply. It
provides half of what I'm hoping to learn. The hash character is
actually a desirable hook to identify a data entity in a scraping
routine I'm developing, but not a character I want in the scrubbed
data.

In my application, the hash makes a string of alphanumeric characters
unique from other alphanumeric strings. The strings I'm looking for
are actually manually-entered identifiers, but a real machine-created
identifier shouldn't contain that hash character. The correct pattern
should be 'A1234509', but is instead often merely entered as '#12345'
when the first character, representing an alphabet sequence for the
month, and the last two characters, representing a two-digit year, can
be assumed. Identifying the hash character in a RegEx match is a way
of trapping the string and transforming it into its correct machine-
generated form.

Other patterns the strings can take in their manually-created
form:

A#12345
#1234509

Garbage in, garbage out -- I know. I wish I could tell the people
entering the data how challenging it is to work with what they
provide, but it is, after all, a screen-scraping routine.

I'm surprised it's been so difficult to find an example of the hash
character in a RegEx string -- for exactly this type of situation,
since it's so common in the real world that people want to put a pound
symbol in front of a number.

Thanks!
 
R

Rhodri James

I'm surprised it's been so difficult to find an example of the hash
character in a RegEx string -- for exactly this type of situation,
since it's so common in the real world that people want to put a pound
symbol in front of a number.

It's a character with no special meaning to the regex engine, so I'm not
in the least surprised that there aren't many examples containing it.
You could just as validly claim that there aren't many examples involving
the letter 'q'.

By the way, I don't know what you're doing but I'm seeing all of your
posts twice, from two different addresses. This is a little confusing,
to put it mildly, and doesn't half break the threading.
 
L

Lie Ryan

Brian said:
By the way, other forms the strings can take in their manually created
forms:

A#12345
#1234509

Garbage in, garbage out -- I know. I wish I could tell the people
entering the data how challenging it is to work with what they
provide, but it is, after all, a screen-scraping routine.

perhaps it's like this?
# you can use re.search if that suits better
a = re.match('([A-Z]?)#(\d{5})(\d\d)?', 'A#12345')
b = re.match('([A-Z]?)#(\d{5})(\d\d)?', '#1234509')
a.group(0) 'A#12345'
a.group(1) 'A'
a.group(2) '12345'
a.group(3)
b.group(0) '#1234509'
b.group(1) ''
b.group(2) '12345'
b.group(3)
'09'
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,439
Members
44,829
Latest member
PIXThurman

Latest Threads

Top