regexp to match CJK characters

D

David Vallner

--------------enig898654CD2A67A152918C63B7
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Paul said:
Cafe Babe wrote:
=20
=20
print "Yes!" if varname =3D~ /^CJK$/
=20
If this is not what you wanted, you will simply have to write a longer = post.
=20

CJK =3D (I think) Chinese, Japanese, Korean. "CJK characters" usually
refers to the encodings you use for those - Big5, JIS, Unicode, etc.

David Vallner


--------------enig898654CD2A67A152918C63B7
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFQ4aLy6MhrS8astoRAu5qAJ9gaWMehjdgyOzYahKEGxFlidLPuQCeIU3v
0wnhxaaQJ9cjNQrwCJux0aE=
=pjVJ
-----END PGP SIGNATURE-----

--------------enig898654CD2A67A152918C63B7--
 
C

Cafe Babe

David said:
CJK = (I think) Chinese, Japanese, Korean. "CJK characters" usually
refers to the encodings you use for those - Big5, JIS, Unicode, etc.

David Vallner

Yes, so how can write the regexp? thanks a lot
 
J

Josef 'Jupp' Schugt

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cafe Babe wrote:
| David Vallner wrote:
|> CJK = (I think) Chinese, Japanese, Korean. "CJK characters" usually
|> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
| Yes, so how can write the regexp? thanks a lot

Which encoding?

Jupp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFQ7lNrhv7B2zGV08RAiWDAJ9nHZ53nFKfbWdHshWc8z/5zU/u6gCdGfyt
8XDVfOVp/F/MbhPx/6MitxA=
=8zOn
-----END PGP SIGNATURE-----
 
C

Cafe Babe

Josef said:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Cafe Babe wrote:
| David Vallner wrote:
|> CJK = (I think) Chinese, Japanese, Korean. "CJK characters" usually
|> refers to the encodings you use for those - Big5, JIS, Unicode, etc.
| Yes, so how can write the regexp? thanks a lot

Which encoding?

Jupp
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFQ7lNrhv7B2zGV08RAiWDAJ9nHZ53nFKfbWdHshWc8z/5zU/u6gCdGfyt
8XDVfOVp/F/MbhPx/6MitxA=
=8zOn
-----END PGP SIGNATURE-----

UTF-8

and

$KCODE='u'
require_dependency 'jcode',

thanks
 
D

Dido Sevilla

UTF-8

and

$KCODE='u'
require_dependency 'jcode',

You may need to use the Oniguruma patch. I believe this is necessary
to give regular expressions support for character sets other than
plain ASCII.

http://www.geocities.jp/kosako3/oniguruma/

If you're using Gentoo, all you need to do is remerge Ruby with the
cjk use flag turned on. For other systems, you may need to download
and apply the patch manually. See the Oniguruma site for more details.
If you're using a 1.9 Ruby, Oniguruma is already built-in.
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: regexp to match CJK characters"

|You may need to use the Oniguruma patch. I believe this is necessary
|to give regular expressions support for character sets other than
|plain ASCII.

Regular expression comes with 1.8 does support UTF-8.

matz.
 
K

Kevin Jackson

Regular expression comes with 1.8 does support UTF-8.

does this mean though that you must do a match on an escaped character
(\u1234 or on a 'real' character?)

Kev
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: regexp to match CJK characters"

|> Regular expression comes with 1.8 does support UTF-8.
|
|does this mean though that you must do a match on an escaped character
|(\u1234 or on a 'real' character?)

You don't have to escape, if you specify -Ku or $KCODE='u'.

matz.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,525
Members
44,997
Latest member
mileyka

Latest Threads

Top