Regex syntax

Discussion in 'Java' started by -, Aug 8, 2005.

  1. -

    - Guest

    I have managed to form the regex for the following two:

    CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>

    String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";

    LWS = [CRLF] 1*( SP | HT )

    String LWS_REGEX = "((\r\n)??( |\\x09)+?)";


    However, the following stumped me for hours.

    TEXT = <any OCTET except CTLs, but including LWS>


    String TEXT_REGEX = ...... // help me out please.
     
    -, Aug 8, 2005
    #1
    1. Advertising

  2. -

    - Guest

    - wrote:
    > I have managed to form the regex for the following two:
    >
    > CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>
    >
    > String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";
    >
    > LWS = [CRLF] 1*( SP | HT )
    >
    > String LWS_REGEX = "((\r\n)??( |\\x09)+?)";
    >
    >
    > However, the following stumped me for hours.
    >
    > TEXT = <any OCTET except CTLs, but including LWS>
    >
    >
    > String TEXT_REGEX = ...... // help me out please.


    Kindly disregard.
     
    -, Aug 8, 2005
    #2
    1. Advertising

  3. - <> writes:

    > String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";


    Too many square brackets. Just use "[\\x00-\\x1f\\x7f]"

    > LWS = [CRLF] 1*( SP | HT )


    Im not absolutely sure how to read this notation, so I'm guessing
    it means one carrige return/line feed pair followed by one or more
    space/horizontal tab.

    > String LWS_REGEX = "((\r\n)??( |\\x09)+?)";


    Why two question marks? And the backlashes might want to be escaped
    too. Look more like
    "\\r\\n[\\x20\\x09]+"

    (is it mail header format or something like that? :)

    > However, the following stumped me for hours.
    >
    > TEXT = <any OCTET except CTLs, but including LWS>


    LWS is not an octet, so how much do you want to match?

    How about:
    "[^\\x00-\\x1f\\x7f]|\\r\\n[\\x20\\x09]+"

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Aug 8, 2005
    #3
  4. -

    - Guest

    Lasse Reichstein Nielsen wrote:
    > - <> writes:
    >
    >
    >>String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";

    >
    >
    > Too many square brackets. Just use "[\\x00-\\x1f\\x7f]"
    >
    >
    >>LWS = [CRLF] 1*( SP | HT )

    >
    >
    > Im not absolutely sure how to read this notation, so I'm guessing
    > it means one carrige return/line feed pair followed by one or more
    > space/horizontal tab.
    >
    >
    >>String LWS_REGEX = "((\r\n)??( |\\x09)+?)";

    >
    >
    > Why two question marks? And the backlashes might want to be escaped
    > too. Look more like
    > "\\r\\n[\\x20\\x09]+"
    >
    > (is it mail header format or something like that? :)
    >
    >
    >>However, the following stumped me for hours.
    >>
    >>TEXT = <any OCTET except CTLs, but including LWS>

    >
    >
    > LWS is not an octet, so how much do you want to match?
    >
    > How about:
    > "[^\\x00-\\x1f\\x7f]|\\r\\n[\\x20\\x09]+"
    >
    > /L\\


    Thanks... One more qn:

    token = 1*<any CHAR except CTLs>


    As corrected, CTL is ([\\x00-\\x1f\\x7f])

    CHAR = <any US-ASCII character (octets 0 - 127)>

    So it's CHAR = "([\\x00-\\x7F])";

    I tried

    String regex = "[([\\x00-\\x7F])&&[^([\\x00-\\x1f\\x7f])]]";

    and then test for "\u007f".matches(regex) and it returns true which is
    obviously wrong.
     
    -, Aug 8, 2005
    #4
  5. - <> writes:

    > I tried
    >
    > String regex = "[([\\x00-\\x7F])&&[^([\\x00-\\x1f\\x7f])]]";


    You are guessing blindly now. Good thing it didn't appear to work.
    Do read up on the format of regular expressions before trying that
    again :)
    <URL:http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html>

    CHAR except CTL would be the characters 0x20-0x7e, which is most easily
    written directly:
    "[\\x20-\\x7e]+"

    > and then test for "\u007f".matches(regex) and it returns true which is
    > obviously wrong.


    It's what you asked for, although I'm surprised that it gave "true".
    The string is not a valid Regular Expression (the first ")" is
    unmatched, since the first one is inside a character group).

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Aug 9, 2005
    #5
  6. -

    Roedy Green Guest

    On Mon, 08 Aug 2005 14:07:02 +0800, - <> wrote or
    quoted :

    >I have managed to form the regex for the following two:


    my regex cheat sheet might help you. See
    http://mindprod.com/jgloss/regex.html
     
    Roedy Green, Aug 14, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?SmViQnVzaGVsbA==?=

    Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine?

    =?Utf-8?B?SmViQnVzaGVsbA==?=, Oct 22, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    714
    =?Utf-8?B?SmViQnVzaGVsbA==?=
    Oct 22, 2005
  2. Rick Venter

    perl regex to java regex

    Rick Venter, Oct 29, 2003, in forum: Java
    Replies:
    5
    Views:
    1,635
    Ant...
    Nov 6, 2003
  3. Replies:
    2
    Views:
    603
  4. Xah Lee
    Replies:
    1
    Views:
    946
    Ilias Lazaridis
    Sep 22, 2006
  5. Replies:
    3
    Views:
    773
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page