Non-printable char in regex

S

souporpower

Hi

I have a regex problem on Windows XP Active Perl V5.8.8. The string to
match starts with a 2 chars that are non-printable. I tried using Hex,
Octal and Control patterns without success. If I remove the first 2
chars in the string the subsequent pattern matching works fine. Has
anyone encountered this before? Does perl have a pattern matcher for
non-printable chars?

Thanks for your help
 
G

Gunnar Hjalmarsson

I have a regex problem on Windows XP Active Perl V5.8.8. The string to
match starts with a 2 chars that are non-printable. I tried using Hex,
Octal and Control patterns without success. If I remove the first 2
chars in the string the subsequent pattern matching works fine. Has
anyone encountered this before? Does perl have a pattern matcher for
non-printable chars?

This gives you the string's characters as octal strings that can be used
in a Perl regex:

my @octchars = map sprintf('\\%03o', ord), split //, $string;
 
M

Martijn Lievaart

Hi

I have a regex problem on Windows XP Active Perl V5.8.8. The string to
match starts with a 2 chars that are non-printable. I tried using Hex,
Octal and Control patterns without success. If I remove the first 2
chars in the string the subsequent pattern matching works fine. Has
anyone encountered this before? Does perl have a pattern matcher for
non-printable chars?

Use a dot if you don't care for the value. But be aware that those two
_bytes_ may form one unicode _character_.

If you do care about matching the exact value, look at perldoc perlre,
especially \x and \x{}

HTH,
M4
 
S

szr

Sherman said:
Octal patterns work as expected for me:

Is there any technical or beneficial reason to use octal patterns over
hexadecimal (or vice-versa), or is that just your preference?
 
U

Uri Guttman

SP> None that I'm aware of. I'd fail the Damian test, I'm afraid... it's a
SP> habit, not a conscious decision to write it one way or another.

perl being timtowtdi supports several ways to get binary literals into
strings. hex, octal, control codes, special escapes (\n, etc) and even
binary. this is because the data may be related to other data that is
usually encoded or printed in that format. the binary data can be the
same but the literals used can be any supported syntax.

uri
 
J

Jürgen Exner

anyone encountered this before? Does perl have a pattern matcher for
non-printable chars?

Yes. Perl REs support POSIX character classes as well as negation.
Therefore [:^print:] would denote a non-printable character.

jue
 
S

souporpower

Use a dot if you don't care for the value. But be aware that those two
_bytes_ may form one unicode _character_.

I have tried dot which doesn't work
If you do care about matching the exact value, look at perldoc perlre,
especially \x and \x{}

HTH,
M4

Thanks
 
S

souporpower

Hi,

Could you post you code. I doubt we can really help you with it. Have you
tried it like \x00 (numbers should be the hex value of your characters)?
That should do the trick.

Leon Timmermans

I don't know the hex values. I am using MS-DOS :( Besides, the hex
values are not constant.
When I use $mystring=substr($mystring, 2) everything is dandy. BTW, I
scrape the web
with Mechanize and so it could be an image. But I am not sure.

Thanks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top