Convert String Containing Hex Values

C

cp

If I have a string that looks like this:

Job_x0020_Number

How do I turn that into:

Job Number

?
 
A

Arndt Jonasson

cp said:
If I have a string that looks like this:

Job_x0020_Number

How do I turn that into:

Job Number

If you try to solve the problem in a language you know, you will see
that you need to define the problem in more detail before you can
start coding. You ought to do so before asking here.

Making some plausible assumptions about your problem, this might work:

$string = "Job_x0020_Number";
$string =~ s/_x[0-9a-fA-F]+_/ /;

Since you wrote "hex values" in the subject line, I assume all
hexadecimal digits can occur after the 'x'.
 
C

cp

cp said:
cp said:
If I have a string that looks like this:

Job_x0020_Number

How do I turn that into:

Job Number

?
s/(.*)\_[[:xdigit:]]+\_(.*)/$1 $2/;
Just tested mine, doesn't work if you use leading x for hex number. Change
to:
s/(.*)\_x[[:xdigit:]]+\_(.*)/$1 $2/;

still uglier than other solutions given but it should work.
 
U

Uri Guttman

c> s/(.*)\_[[:xdigit:]]+\_(.*)/$1 $2/;

overkill. you don't need to grab and put back the leading and trailing
strings. like gunnar did, just delete the stuff you want to delete.

and _ doesn't need escaping there (or anywhere as it is just a word
char).

so your regex should be:

s/_[[:xdigit:]]+_//;

a lot cleaner and easier to read.

uri
 
C

cp

Uri said:
c> s/(.*)\_[[:xdigit:]]+\_(.*)/$1 $2/;

overkill. you don't need to grab and put back the leading and trailing
strings. like gunnar did, just delete the stuff you want to delete.

and _ doesn't need escaping there (or anywhere as it is just a word
char).

so your regex should be:

s/_[[:xdigit:]]+_//;

a lot cleaner and easier to read.

uri

Thanks for the tip. I mistakenly remembered from my quick reading of
'Programming Perl' that all non alpha characters were metacharacters, but I
was wrong. I am looking at the list right now (p. 141 3rd ed.) which is :

\ | ( ) [ { ^ $ * + ? .

and then you have / which only needs a backslash in front to match literally
if it is also used as a delimiter.

I probably shouldn't be trying to help out around here yet, but I couldn't
resist trying to help a fellow cp!
 
C

cp

Uri said:
c> s/(.*)\_[[:xdigit:]]+\_(.*)/$1 $2/;

overkill. you don't need to grab and put back the leading and trailing
strings. like gunnar did, just delete the stuff you want to delete.

and _ doesn't need escaping there (or anywhere as it is just a word
char).

so your regex should be:

s/_[[:xdigit:]]+_//;

a lot cleaner and easier to read.

uri

This could be a stretch in trying to justify backslashing characters
unnecessarily, but what about the possibility of reserved metacharacters?
What if Larry Wall decides to make use of _ and the other
non-metacharacter, non-alpha characters and old scripts that did not
backslash them will be broken? I know it's a stretch and I did read that
even if Perl6 breaks old scripts, there will be a tool to upgrade scripts
from Perl5 to Perl6 so maybe it's not even an issue.
 
U

Uri Guttman

c" == cp said:
so your regex should be:

s/_[[:xdigit:]]+_//;

a lot cleaner and easier to read.

c> This could be a stretch in trying to justify backslashing
c> characters unnecessarily, but what about the possibility of
c> reserved metacharacters? What if Larry Wall decides to make use of
c> _ and the other non-metacharacter, non-alpha characters and old
c> scripts that did not backslash them will be broken? I know it's a
c> stretch and I did read that even if Perl6 breaks old scripts, there
c> will be a tool to upgrade scripts from Perl5 to Perl6 so maybe it's
c> not even an issue.

_ is in \w and will always be a word char and needs no more escaping
than does k or 3. perl5 regexes ain't gonna change metachar meanings or
it will break too much code. perl6 not only will have a perl5 regex
compiler it will have a much easier regex (actually called rules and
grammars) extension mechanism that it won't need to change its metachars
in the future.

uri
 
C

cp

Jürgen Exner said:
What about
s/_.*_/ /;

My fault for not explaining the whole problem.

MS Word allows custom datafields in their Word XML files. My users are
typically creative with them, and insert fields like:

Client ID Number
Job Number(s)

while some would know not to include not to include non-alphnumerics,
and would write the fields as:

client_id_or_job_number
etc.

Word, when it saves the file as XML, translates illegal characters so
in the above example, I would get:
<o:Client_x0020_ID_x0020_Number dt:dt="string">
12345
</o:Client_x0020_ID_x0020_Number>

<o:Job_x0020_Number_x0028_s_x0029_ dt:dt="string">
5 and 6
</o:Job_x0020_Number_x0028_s_x0029_>

The regex Abigail provided fits the bill nicely, as I would like to
spit out a text file with the custom data fields as:

Client ID Number: 12345
Job Number(s): 5 and 6


Thanks to all who helped
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top