Regexp and Pattern.class

R

roger_varley

Hi

I've got an application (over which I have no control) that presents
its data as a single string. The data contains ' (single quote)
characters that denote end of line. However, the data can also
legitimately contain the ' character, so the generating program escapes
any embedded ' characters with ? (Question mark). (Its a Tradacomms
formatted EDI file if anyone is interested).

How/Can I phrase the regexp parameter to the Pattern.split() method to
split the string back into the original lines. Once I've cracked this,
the + and : characters used to split each line into groups and
individual fields should be easy :)

Or am I going to have to hand-roll this by reading the string a
character at a time?

Regards
Roger
 
T

Tilman Bohn

In message <[email protected]>,
Hi

I've got an application (over which I have no control) that presents
its data as a single string. The data contains ' (single quote)
characters that denote end of line. However, the data can also
legitimately contain the ' character, so the generating program escapes
any embedded ' characters with ? (Question mark). (Its a Tradacomms
formatted EDI file if anyone is interested).

First question: Can a question mark followed by an apostrophe be
legal application data? If so, how is the question mark or the
complete sequence escaped?

For now I'll assume the sequence ?' can never occur legally in
the application data.
How/Can I phrase the regexp parameter to the Pattern.split() method to
split the string back into the original lines.

Under the above assumption you would split either on "(?<!\\?)'"
or on "(?<=[^?])'", according to taste. The look-behind assertions
are needed so the last character of each line isn't cut off.
Once I've cracked this,
the + and : characters used to split each line into groups and
individual fields should be easy :)

So no help needed there then. Ok. ;-)
Or am I going to have to hand-roll this by reading the string a
character at a time?

Nope. The above should work.
 
R

roger_varley

First question: Can a question mark followed by an apostrophe be
legal application data? If so, how is the question mark or the
complete sequence escaped?

I've never seen that combination in <mumble> years of handling
Tradacomms EDI files so I've had to actually go and test it. The
generating program throws out ???' where the sequence ?' occurs.

For now I'll assume the sequence ?' can never occur legally in
the application data.

Thanks for your help.

Regards
Roger
 
K

klynn47

Sometimes I find it easier to use the Unicode representation of certain
characters.
 
T

Tilman Bohn

In message <[email protected]>,
(e-mail address removed) wrote on 17 Dec 2004 09:24:20 -0800:

[...]
I've never seen that combination in <mumble> years of handling
Tradacomms EDI files so I've had to actually go and test it. The
generating program throws out ???' where the sequence ?' occurs.

Interesting. Ok, in this case the pattern I gave you won't work
correctly. Before you can find the correct one you'll need to try
what happens for a) ??' and b) ???'.
 
K

klynn47

Sometimes I find it easier to use the Unicode representation of certain
characters.
 
T

Tilman Bohn

In message <[email protected]>,
(e-mail address removed) wrote on 17 Dec 2004 08:19:21 -0800:

[...]
How/Can I phrase the regexp parameter to the Pattern.split() method to
split the string back into the original lines.

BTW, that's backwards. The regexp gets passed to Pattern.compile()
first, then your input is the parameter to the split() method executed
on the resulting Pattern object.
 
R

roger_varley

Hi Tilman

??' in the input results in ?????' in the output file and ???' in the
input file results in ???????' in the output.

Regards
Roger
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,767
Messages
2,569,571
Members
45,045
Latest member
DRCM

Latest Threads

Top