Regex syntax

?

-

I have managed to form the regex for the following two:

CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>

String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";

LWS = [CRLF] 1*( SP | HT )

String LWS_REGEX = "((\r\n)??( |\\x09)+?)";


However, the following stumped me for hours.

TEXT = <any OCTET except CTLs, but including LWS>


String TEXT_REGEX = ...... // help me out please.
 
?

-

- said:
I have managed to form the regex for the following two:

CTL = <any US-ASCII control character (octets 0 - 31) and DEL (127)>

String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";

LWS = [CRLF] 1*( SP | HT )

String LWS_REGEX = "((\r\n)??( |\\x09)+?)";


However, the following stumped me for hours.

TEXT = <any OCTET except CTLs, but including LWS>


String TEXT_REGEX = ...... // help me out please.

Kindly disregard.
 
L

Lasse Reichstein Nielsen

- said:
String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";

Too many square brackets. Just use "[\\x00-\\x1f\\x7f]"
LWS = [CRLF] 1*( SP | HT )

Im not absolutely sure how to read this notation, so I'm guessing
it means one carrige return/line feed pair followed by one or more
space/horizontal tab.
String LWS_REGEX = "((\r\n)??( |\\x09)+?)";

Why two question marks? And the backlashes might want to be escaped
too. Look more like
"\\r\\n[\\x20\\x09]+"

(is it mail header format or something like that? :)
However, the following stumped me for hours.

TEXT = <any OCTET except CTLs, but including LWS>

LWS is not an octet, so how much do you want to match?

How about:
"[^\\x00-\\x1f\\x7f]|\\r\\n[\\x20\\x09]+"

/L
 
?

-

Lasse said:
String CTL_REGEX = "([[\\x00-\\x1F]\\x7F])";


Too many square brackets. Just use "[\\x00-\\x1f\\x7f]"

LWS = [CRLF] 1*( SP | HT )


Im not absolutely sure how to read this notation, so I'm guessing
it means one carrige return/line feed pair followed by one or more
space/horizontal tab.

String LWS_REGEX = "((\r\n)??( |\\x09)+?)";


Why two question marks? And the backlashes might want to be escaped
too. Look more like
"\\r\\n[\\x20\\x09]+"

(is it mail header format or something like that? :)

However, the following stumped me for hours.

TEXT = <any OCTET except CTLs, but including LWS>


LWS is not an octet, so how much do you want to match?

How about:
"[^\\x00-\\x1f\\x7f]|\\r\\n[\\x20\\x09]+"

/L\\

Thanks... One more qn:

token = 1*<any CHAR except CTLs>


As corrected, CTL is ([\\x00-\\x1f\\x7f])

CHAR = <any US-ASCII character (octets 0 - 127)>

So it's CHAR = "([\\x00-\\x7F])";

I tried

String regex = "[([\\x00-\\x7F])&&[^([\\x00-\\x1f\\x7f])]]";

and then test for "\u007f".matches(regex) and it returns true which is
obviously wrong.
 
L

Lasse Reichstein Nielsen

- said:
I tried

String regex = "[([\\x00-\\x7F])&&[^([\\x00-\\x1f\\x7f])]]";

You are guessing blindly now. Good thing it didn't appear to work.
Do read up on the format of regular expressions before trying that
again :)
<URL:http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html>

CHAR except CTL would be the characters 0x20-0x7e, which is most easily
written directly:
"[\\x20-\\x7e]+"
and then test for "\u007f".matches(regex) and it returns true which is
obviously wrong.

It's what you asked for, although I'm surprised that it gave "true".
The string is not a valid Regular Expression (the first ")" is
unmatched, since the first one is inside a character group).

/L
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top