Premature end of regular expression with non-ascii character

N

Nick Snels

Hi,

I'm trying to get regular expressions to work with a string that
contains letters with accents. I have the following sentence:

De kiné weet één van hun patiënten te overtuigen om gekke dingen te
doen.

The regexp /patiënten/ matches the word patiënten. However when I do the
regexp /kiné/, I get the error 'premature end of regular expression:
/kiné/ (SyntaxError)'. Can anybody tell me what is going on? Another
issue with the same sentence is, when I use the regexp /\s/ to highlight
all the spaces, the space between 'kiné weet' is not highlighted as a
space. It seems like regular expressions cann't handle non-ascii
characters at the end of a string.

Kind regards,

Nick
 
M

Matthew Smillie

I'm trying to get regular expressions to work with a string that
contains letters with accents. I have the following sentence:

De kin=E9 weet =E9=E9n van hun pati=EBnten te overtuigen om gekke = dingen te
doen.

The regexp /pati=EBnten/ matches the word pati=EBnten. However when I =20=
do the
regexp /kin=E9/, I get the error 'premature end of regular expression:
/kin=E9/ (SyntaxError)'. Can anybody tell me what is going on? Another
issue with the same sentence is, when I use the regexp /\s/ to =20
highlight
all the spaces, the space between 'kin=E9 weet' is not highlighted as = a
space. It seems like regular expressions cann't handle non-ascii
characters at the end of a string.


I believe this is a character encoding problem which is fixed in 1.9 =20
by the inclusion of a new regular expression engine (Which you can =20
also download and use in 1.8):

http://www.geocities.jp/kosako3/oniguruma/

Best of luck.
matt.
 
D

Dave Burt

Nick Snels asked:
I'm trying to get regular expressions to work with a string that
contains letters with accents. ...

The regexp /patiënten/ matches the word patiënten. However when I do the
regexp /kiné/, I get the error 'premature end of regular expression:
/kiné/ (SyntaxError)'. Can anybody tell me what is going on?

You might avoid the syntax error by setting $KCODE = "u" at the start of
your program.
Another
issue with the same sentence is, when I use the regexp /\s/ to highlight
all the spaces, the space between 'kiné weet' is not highlighted as a
space. It seems like regular expressions cann't handle non-ascii
characters at the end of a string.

Ruby strings are made up of bytes, not characters. That's the cause of the
issues you're having. There are a couple of recent plugins for Ruby to help
improve the situation (see
http://redhanded.hobix.com/inspect/unicodeLibForRuby18.html) but they're far
from perfect.

I hope $KCODE can clear up most of your problems, though.

Cheers,
Dave
 
L

Logan Capaldo

Hi,

I'm trying to get regular expressions to work with a string that
contains letters with accents. I have the following sentence:

De kin=E9 weet =E9=E9n van hun pati=EBnten te overtuigen om gekke = dingen te
doen.

The regexp /pati=EBnten/ matches the word pati=EBnten. However when I =20=
do the
regexp /kin=E9/, I get the error 'premature end of regular expression:
/kin=E9/ (SyntaxError)'. Can anybody tell me what is going on? Another
issue with the same sentence is, when I use the regexp /\s/ to =20
highlight
all the spaces, the space between 'kin=E9 weet' is not highlighted as = a
space. It seems like regular expressions cann't handle non-ascii
characters at the end of a string.

Kind regards,

Nick

--=20
Posted via http://www.ruby-forum.com/.

Are you using $KCODE=3D"u" at the top of your script?
 
N

Nick Snels

Thank you both very much for the suggestions. First off I have
$KCODE="u" in config/environment.rb (Rails). I have also tried to add it
into the class. But the error remained.

Secondly I looked at oniguruma and I must say it looks promising.
Unfortunately for me and my Windows (Cygwin) machine I have to compile
it into Ruby 1.8.2-1.8.4. And I cann't get it to work. Cann't get 1.8.2
to compile, an error which you then solve, yet another error and so one.
Hopeless. I managed to compile 1.8.4 but when I open Ruby I get the
error that a file is missing. I'm using the Windows one-click Ruby
installer if anybody is wondering how on earth I managed to get Ruby
working :). I could use 1.9.0 because this includes oniguruma. The only
problem here is that I don't know if Rails works with it. I have
contacted the author of oniguruma, maybe he can be conclusive as to
whether or not oniguruma solves my problem. When I get a response I'll
post it here. In the mean time if anybody has any other suggestions,
please let me hear. Thanks.

Kind regards,

Nick
 
D

Dave Burt

Nick said:
Thank you both very much for the suggestions. First off I have
$KCODE="u" in config/environment.rb (Rails). I have also tried to add it
into the class. But the error remained.

I haven't had the issues you're talking about, because I'm only doing apps
in English, but here are a couple of places you might start to look for
solutions:

http://wiki.rubyonrails.com/rails/pages/HowToUseUnicodeStrings

http://redhanded.hobix.com/inspect/unicodeLibForRuby18.html
I could use 1.9.0 because this includes oniguruma. The only
problem here is that I don't know if Rails works with it.

Don't. 1.9.0 isn't for production, really; it's an experimental version
which is growing some features that may become part of Ruby 2.0.

Cheers,
Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top