regexp to match CJK characters

Cafe Babe · Oct 28, 2006

How can I write a regexp to match CJK characters?
Thanks in advance

David Vallner · Oct 28, 2006

--------------enig898654CD2A67A152918C63B7
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Paul said:
Cafe Babe wrote:
=20
=20
print "Yes!" if varname =3D~ /^CJK$/
=20
If this is not what you wanted, you will simply have to write a longer = post.
=20

CJK =3D (I think) Chinese, Japanese, Korean. "CJK characters" usually
refers to the encodings you use for those - Big5, JIS, Unicode, etc.

David Vallner

--------------enig898654CD2A67A152918C63B7
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFQ4aLy6MhrS8astoRAu5qAJ9gaWMehjdgyOzYahKEGxFlidLPuQCeIU3v
0wnhxaaQJ9cjNQrwCJux0aE=
=pjVJ
-----END PGP SIGNATURE-----

--------------enig898654CD2A67A152918C63B7--

Cafe Babe · Oct 28, 2006

David said:
CJK = (I think) Chinese, Japanese, Korean. "CJK characters" usually
refers to the encodings you use for those - Big5, JIS, Unicode, etc.

David Vallner

Yes, so how can write the regexp? thanks a lot

Josef 'Jupp' Schugt · Oct 28, 2006

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cafe Babe wrote:
| David Vallner wrote:
|> CJK = (I think) Chinese, Japanese, Korean. "CJK characters" usually
|> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
| Yes, so how can write the regexp? thanks a lot

Which encoding?

Jupp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFQ7lNrhv7B2zGV08RAiWDAJ9nHZ53nFKfbWdHshWc8z/5zU/u6gCdGfyt
8XDVfOVp/F/MbhPx/6MitxA=
=8zOn
-----END PGP SIGNATURE-----

Cafe Babe · Oct 29, 2006

Josef said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cafe Babe wrote:
| David Vallner wrote:
|> CJK = (I think) Chinese, Japanese, Korean. "CJK characters" usually
|> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
| Yes, so how can write the regexp? thanks a lot

Which encoding?

Jupp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFQ7lNrhv7B2zGV08RAiWDAJ9nHZ53nFKfbWdHshWc8z/5zU/u6gCdGfyt
8XDVfOVp/F/MbhPx/6MitxA=
=8zOn
-----END PGP SIGNATURE-----

UTF-8

and

$KCODE='u'
require_dependency 'jcode',

thanks

Dido Sevilla · Oct 29, 2006

UTF-8

and

$KCODE='u'
require_dependency 'jcode',

You may need to use the Oniguruma patch. I believe this is necessary
to give regular expressions support for character sets other than
plain ASCII.

http://www.geocities.jp/kosako3/oniguruma/

If you're using Gentoo, all you need to do is remerge Ruby with the
cjk use flag turned on. For other systems, you may need to download
and apply the patch manually. See the Oniguruma site for more details.
If you're using a 1.9 Ruby, Oniguruma is already built-in.

Yukihiro Matsumoto · Oct 30, 2006

Hi,

In message "Re: regexp to match CJK characters"

|You may need to use the Oniguruma patch. I believe this is necessary
|to give regular expressions support for character sets other than
|plain ASCII.

Regular expression comes with 1.8 does support UTF-8.

matz.

Kevin Jackson · Oct 30, 2006

Regular expression comes with 1.8 does support UTF-8.

does this mean though that you must do a match on an escaped character
(\u1234 or on a 'real' character?)

Kev

Yukihiro Matsumoto · Oct 30, 2006

Hi,

In message "Re: regexp to match CJK characters"

|> Regular expression comes with 1.8 does support UTF-8.
|
|does this mean though that you must do a match on an escaped character
|(\u1234 or on a 'real' character?)

You don't have to escape, if you specify -Ku or $KCODE='u'.

matz.

RegExp - Match specific words, but not if they're inside parenthesis (with or without other words within)	6	Jan 29, 2023
Using characters from the International Phonetic Alphabet in a C program	0	Sep 21, 2022
question about regexp	1	Jan 26, 2012
Sort by number of characters	0	Nov 3, 2023
Did you know that there is a match-case function in python?	4	Dec 17, 2023
String#match vs. Regexp#match - confused	1	Sep 4, 2008
Help for extracting text with regexp.	4	Feb 18, 2011
Korean fonts on Python 2.6 (MacOsX)	1	May 23, 2012

regexp to match CJK characters

Cafe Babe

David Vallner

Cafe Babe

Josef 'Jupp' Schugt

Cafe Babe

Dido Sevilla

Yukihiro Matsumoto

Kevin Jackson

Yukihiro Matsumoto

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads