Regexp and Pattern.class

Discussion in 'Java' started by roger_varley@yahoo.com, Dec 17, 2004.

  1. Guest

    Hi

    I've got an application (over which I have no control) that presents
    its data as a single string. The data contains ' (single quote)
    characters that denote end of line. However, the data can also
    legitimately contain the ' character, so the generating program escapes
    any embedded ' characters with ? (Question mark). (Its a Tradacomms
    formatted EDI file if anyone is interested).

    How/Can I phrase the regexp parameter to the Pattern.split() method to
    split the string back into the original lines. Once I've cracked this,
    the + and : characters used to split each line into groups and
    individual fields should be easy :)

    Or am I going to have to hand-roll this by reading the string a
    character at a time?

    Regards
    Roger
     
    , Dec 17, 2004
    #1
    1. Advertising

  2. Tilman Bohn Guest

    In message <>,
    wrote on 17 Dec 2004 08:19:21 -0800:

    > Hi
    >
    > I've got an application (over which I have no control) that presents
    > its data as a single string. The data contains ' (single quote)
    > characters that denote end of line. However, the data can also
    > legitimately contain the ' character, so the generating program escapes
    > any embedded ' characters with ? (Question mark). (Its a Tradacomms
    > formatted EDI file if anyone is interested).


    First question: Can a question mark followed by an apostrophe be
    legal application data? If so, how is the question mark or the
    complete sequence escaped?

    For now I'll assume the sequence ?' can never occur legally in
    the application data.

    > How/Can I phrase the regexp parameter to the Pattern.split() method to
    > split the string back into the original lines.


    Under the above assumption you would split either on "(?<!\\?)'"
    or on "(?<=[^?])'", according to taste. The look-behind assertions
    are needed so the last character of each line isn't cut off.

    > Once I've cracked this,
    > the + and : characters used to split each line into groups and
    > individual fields should be easy :)


    So no help needed there then. Ok. ;-)

    > Or am I going to have to hand-roll this by reading the string a
    > character at a time?


    Nope. The above should work.

    --
    Cheers, Tilman

    --
    `Boy, life takes a long time to live...' -- Steven Wright
     
    Tilman Bohn, Dec 17, 2004
    #2
    1. Advertising

  3. Guest


    >
    > First question: Can a question mark followed by an apostrophe be
    > legal application data? If so, how is the question mark or the
    > complete sequence escaped?
    >


    I've never seen that combination in <mumble> years of handling
    Tradacomms EDI files so I've had to actually go and test it. The
    generating program throws out ???' where the sequence ?' occurs.


    > For now I'll assume the sequence ?' can never occur legally in
    > the application data.
    >


    Thanks for your help.

    Regards
    Roger
     
    , Dec 17, 2004
    #3
  4. Guest

    Sometimes I find it easier to use the Unicode representation of certain
    characters.
     
    , Dec 17, 2004
    #4
  5. Tilman Bohn Guest

    In message <>,
    wrote on 17 Dec 2004 09:24:20 -0800:

    [...]
    > I've never seen that combination in <mumble> years of handling
    > Tradacomms EDI files so I've had to actually go and test it. The
    > generating program throws out ???' where the sequence ?' occurs.


    Interesting. Ok, in this case the pattern I gave you won't work
    correctly. Before you can find the correct one you'll need to try
    what happens for a) ??' and b) ???'.

    --
    Cheers, Tilman

    `Boy, life takes a long time to live...' -- Steven Wright
     
    Tilman Bohn, Dec 17, 2004
    #5
  6. Guest

    Sometimes I find it easier to use the Unicode representation of some
    characters.
     
    , Dec 17, 2004
    #6
  7. Guest

    Sometimes I find it easier to use the Unicode representation of certain
    characters.
     
    , Dec 17, 2004
    #7
  8. Tilman Bohn Guest

    In message <>,
    wrote on 17 Dec 2004 08:19:21 -0800:

    [...]
    > How/Can I phrase the regexp parameter to the Pattern.split() method to
    > split the string back into the original lines.


    BTW, that's backwards. The regexp gets passed to Pattern.compile()
    first, then your input is the parameter to the split() method executed
    on the resulting Pattern object.

    --
    Cheers, Tilman

    `Boy, life takes a long time to live...' -- Steven Wright
     
    Tilman Bohn, Dec 17, 2004
    #8
  9. Guest

    Hi Tilman

    ??' in the input results in ?????' in the output file and ???' in the
    input file results in ???????' in the output.

    Regards
    Roger
     
    , Dec 20, 2004
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sunny
    Replies:
    1
    Views:
    483
    Salt_Peter
    Dec 7, 2006
  2. Pallav singh
    Replies:
    0
    Views:
    415
    Pallav singh
    Jan 22, 2012
  3. Pallav singh
    Replies:
    0
    Views:
    434
    Pallav singh
    Jan 22, 2012
  4. Pallav singh
    Replies:
    1
    Views:
    475
    Peter Remmers
    Jan 22, 2012
  5. Joao Silva
    Replies:
    16
    Views:
    409
    7stud --
    Aug 21, 2009
Loading...

Share This Page