Help for Regular Exression in split function

Y

YattaMaX

Hi All
( first of all: sorry for my bad english)

I Have necessity of a regular expression that extract only word (> 3
chars) without numbers or spechial chars.

Examples:

With this string :

String str1 = "jump: qwe donaldduck:.,#@?bye2xyz zkj ooo iuy
...uix#f4 [email protected] hkj"

str1.toLowerCase().trim().split( REGEX )

return:

jump
donaldduck
maxx



Help me please, I don't be able to find this regular expression :(



Bye
MaX
 
H

Hendrik Maryns

YattaMaX said:
Hi All
( first of all: sorry for my bad english)

I Have necessity of a regular expression that extract only word (> 3
chars) without numbers or spechial chars.

Examples:

With this string :

String str1 = "jump: qwe donaldduck:.,#@?bye2xyz zkj ooo iuy ..uix#f4
[email protected] hkj"

str1.toLowerCase().trim().split( REGEX )

return:

jump
donaldduck
maxx

How about REGEX = "\\W*"? (Bad naming, make that regex = "\\W*")

HTH, H.

--
Hendrik Maryns

==================
www.lieverleven.be
http://aouw.org
 
Y

YattaMaX

Hendrik Maryns ha scritto:
How about REGEX = "\\W*"? (Bad naming, make that regex = "\\W*")

Thanks.

Just a few questions :


- with \\W the numbers is included ?
(I don't want the number : jump3 -> jump)

- Word with means three char is exluded ?


(Sorry for my bad english)


bye
MaX
 
Y

YattaMaX

YattaMaX ha scritto:
Hendrik Maryns ha scritto:

No, this is not work correctly :(

With the string :
String str1 = "jump: qwe donaldduck:.,#@?bye2xyz zkj ooo iuy ..uix#f4
[email protected] hkj"

And the function :
str1.toLowerCase().trim().split( [\\W*] ) ;

This return:


jump




qwe
donaldduck





bye2xyz
zkj
ooo
iuy

uix
f4
lk
maxx0i

oi

hkj






Thanks
MaX
 
Y

YattaMaX

(e-mail address removed) opalinski from opalpaweb ha scritto:
"\\W\\W\\W\\W+"

oughta drop one, two, three character stuff and return stuff that is at
least four characters long.

Opalinski
(e-mail address removed)
http://www.geocities.com/opalpaweb/


\\W is "A non-word character"

I want only the word without number with at least 3 characters.


Thanks for your contribution.


Bye
MaX
 
Y

YattaMaX

Hendrik Maryns ha scritto:
That could be written nicer as "\\W{3,}", if I recall the syntax correctly.


Sorry, but "\\W" , is not a "non-word character" ?

Why you use \\W ? when the necessity is get only word with at least
three characters ( not the contrary ).




I know this page, is 7days that I read this, but without good result.






Thanks for your contribution.


Bye
MaX
 
O

opalpa

Sorry, didn't pay enough attention to the split being used instead of
patterns and matchers.

package experiment;
import java.util.regex.*;
public class Split {
public static void main(String args[]) {
String str1 = "jump: qwe donaldduck:.,#@?bye2xyz zkj ooo iuy
...uix#f4 [email protected] hk
j";
System.out.println(str1);
String w[] = str1.toLowerCase().trim().split( "[^a-z]" ) ;
for (String s :/* in */ w) {
if (s.length() > 3)
System.out.println(s);
}
}
}

outputs:

jump: qwe donaldduck:.,#@?bye2xyz zkj ooo iuy ..uix#f4
[email protected] hkj
jump
donaldduck
maxx

Opalinski
(e-mail address removed)
http://www.geocities.com/opalpaweb/
 
O

Oliver Wong

YattaMaX said:
Hi All
( first of all: sorry for my bad english)

I Have necessity of a regular expression that extract only word (> 3
chars) without numbers or spechial chars.

Examples:

With this string :

String str1 = "jump: qwe donaldduck:.,#@?bye2xyz zkj ooo iuy ..uix#f4
[email protected] hkj"

str1.toLowerCase().trim().split( REGEX )

return:

jump
donaldduck
maxx



Help me please, I don't be able to find this regular expression :(

split() is not what you want, as the regular expression you provide to
split() describes the seperators, not the acceptable strings.

Why don't you try building a Matcher, and using it to find subsequences
which match your requirement of at least 3 alphabetic characters in a row?

http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Matcher.html

- Oliver
 
Y

YattaMaX

Oliver Wong ha scritto:
split() is not what you want, as the regular expression you provide to
split() describes the seperators, not the acceptable strings.



I use split because I want an Array with only word > 3 character.


Thanks

Bye
MaX
 
O

Oliver Wong

YattaMaX said:
Oliver Wong ha scritto:




I use split because I want an Array with only word > 3 character.

Split will not do what you want. The arguments to split describe to it
the seperators. You have no information about the seperators. You have
information about the tokens you want. It's not that you want the seperators
to be 3 alphabetic characters long; you want the tokens to be 3 alphabetic
characters long. Split will not let you specify that.

Therefore, I recommend you try a different approach. I mentioned Matcher
in my previous post, but personally I would avoid Regular Expressions
altogether for this problem and just use an DFA that keeps track of how many
alphabetic characters it has seen so far, and if that number exceeds 3, to
accept the given substring.

- Oliver
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top