RegExp pattern to escape ALL special characters (but exclude unicodechars)

G

Gabriela

Hi,
I'd like to write a regexp that converts all special chars to "-".
I've used this pattern
[^a-z0-9]
with ignore case, and it works beautifully.
BUT - I want to support also unicode chars (and not escape them).
I could not find a way to do it, except for listing all special
characters "manually" and escaping them. I'd rather prepare
"whitelist" - of the chars allowed, then a "blacklist" of all special
chars.
Any ideas?
Thanx,
Gabi
 
M

Martin Honnen

Gabriela said:
I'd like to write a regexp that converts all special chars to "-".
I've used this pattern
[^a-z0-9]
with ignore case, and it works beautifully.
BUT - I want to support also unicode chars (and not escape them).
I could not find a way to do it, except for listing all special
characters "manually" and escaping them. I'd rather prepare
"whitelist" - of the chars allowed, then a "blacklist" of all special
chars.

Well in a language where a string is a sequence of Unicode characters
any character is an Unicode character so I have no idea which kind of
characters you want to convert and which not.
 
G

Gabriela

Gabriela said:
I'd like to write a regexp that converts all special chars to "-".
I've used this pattern
[^a-z0-9]
with ignore case, and it works beautifully.
BUT - I want to support also unicode chars (and not escape them).
I could not find a way to do it, except for listing all special
characters "manually" and escaping them. I'd rather prepare
"whitelist" - of the chars allowed, then a "blacklist" of all special
chars.

Well in a language where a string is a sequence of Unicode characters
any character is an Unicode character so I have no idea which kind of
characters you want to convert and which not.

Isn't there a distinction between a special character (!@#$%^&*()_-
+..._) and all alphanumeric/literal characters?
 
M

Martin Honnen

Gabriela said:
Isn't there a distinction between a special character (!@#$%^&*()_-
+..._) and all alphanumeric/literal characters?

Maybe you are looking for letters and digits. Unicode defines classes
for that but the regular expression language in JavaScript/ECMAScript
does not have much support such constructs.
\d
is defined as 0..9, \D as anything which is not in \d.
\w
is defined as a..zA..Z0..9_, \W as anything which is not in \w.
Then there is \s for white space characters. And \S for anything not a
white space character.

Other than that you need to define your own ranges of characters.
 
T

Tim Greer

Gabriela said:
Hi,
I'd like to write a regexp that converts all special chars to "-".
I've used this pattern
[^a-z0-9]
with ignore case, and it works beautifully.
BUT - I want to support also unicode chars (and not escape them).
I could not find a way to do it, except for listing all special
characters "manually" and escaping them. I'd rather prepare
"whitelist" - of the chars allowed, then a "blacklist" of all special
chars.
Any ideas?
Thanx,
Gabi

Just remember, it's better to deny all by default and then specifically
have a list (whitelist) of allowed characters, than to try and
specifically list all invalid characters.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top