newb: Rails character encoding and validation

M

Mark

I'm putting together a basic rails application, and writing my first
units tests for it.. It occured to me that the user 'name' field might
want to contain foreign characters (like é,â,ì,ø... etc.) But two
problems have popped up. Firstly, I can't dig up a good reference for a
suitable regular expression for validating the field.
at the moment, I'm using:
validates_format of :name, :with => /^[-' a-zA-Z]+$/
but this isn't going to allow the foreign characters, so the test fails.

The second problem is the error message I get when I run the unit test:
My test framework sets the name to José, the the failure message when I
run the script returns JosÜ.
It looks like the character encoding of my editor isn't the same as the
character encoding that rails is using.

so, a) any clues as to what is going on? and b) is there a consistent
way of dealing with foreign characters for validation purposes?

Many thanks in advance!

Mark.
 
D

David Vallner

--------------enig902C2ED73F90EF0D6EC933B9
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Luciano said:
It is not really viable to validade a name field with a regex if you
are willing to accept Unicode characters. The only reasonable
validation is to check whether the field is empty.
=20

It is so viable. Just not using [a-zA-Z].

Character classes are your friend.


--------------enig902C2ED73F90EF0D6EC933B9
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFhFKAy6MhrS8astoRAieoAJ90jtaFgeGzrCH7jpg5X7XuWlH4hQCdFY1a
BamjCfHDIR8nl074bsdMRwY=
=sxOS
-----END PGP SIGNATURE-----

--------------enig902C2ED73F90EF0D6EC933B9--
 
D

David Vallner

--------------enig227A3DBF4A250C1E04B278E6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

David said:
Luciano said:
It is not really viable to validade a name field with a regex if you
are willing to accept Unicode characters. The only reasonable
validation is to check whether the field is empty.
=20
It is so viable. Just not using [a-zA-Z].
=20
Character classes are your friend.
=20

For clarification: I am unsure just how well Ruby's regexp engine
handles Unicode "extended latin" characters, a trivial test using $KCODE
=3D 'u', require 'jcode', and iconv failed for me. But that could be me
getting the codepages wrong. The above is just saying that there is
nothing saying that a regexp engine properly supporting Unicode and
character classes would be unsuitable to validate non-ASCII text.

David Vallner


--------------enig227A3DBF4A250C1E04B278E6
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFhFPNy6MhrS8astoRAqi/AJ0Qs7jp3tz6v+jbnK3wsOr5RBZhTwCeOgxJ
+WsL5eKtUFkeeccAfzHGbEM=
=tSrh
-----END PGP SIGNATURE-----

--------------enig227A3DBF4A250C1E04B278E6--
 
L

Luciano Ramalho

It is so viable. Just not using [a-zA-Z].

Character classes are your friend.

You mean, using the Unicode database?

Yes, I know that is possible. My point was that it is not worthwhile
(that's why I wrote "not viable" instead of "impossible"; sorry if I
was not clear: English is not my first language).

Besides all sort of letter-like characters, ideograms and so on, a
person's name may contain hyphens, apostrophes and who-knows-what
other characters.

Remember Prince's name when he used to be called "the artist formerly
known as Prince"? [1]

I just do not think it is "economically viable" the effort to try to
validate a name, except to verify that it contains something other
than blanks.

BTW, which would be a safe way to know whether a Unicode string
contains something other than blanks? Because AFAIK unicode has many
other blank characters besides the old ASCII ones. Can a Ruby regex
cope with that?

Cheers,

Luciano

[1] http://en.wikipedia.org/wiki/Prince_(musician)
 
D

David Vallner

--------------enig188C2A910A41A39BCC9EE800
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Luciano said:
I just do not think it is "economically viable" the effort to try to
validate a name, except to verify that it contains something other
than blanks.
=20

That is true. It doesn't have anything to do with Unicode however, as I
think your post implied.

Speaking of which, I wonder if there's a database name record out there
at all containing someone with a retroflex click in his name. And if
it's recorded as the exclamation mark, or U+01C3 ;P
BTW, which would be a safe way to know whether a Unicode string
contains something other than blanks? Because AFAIK unicode has many
other blank characters besides the old ASCII ones. Can a Ruby regex
cope with that?
=20

It Should Be Able To.

I think at least oniguruma can do this sort of "industrial-strength"
processing, no idea about the current engine.

Speaking of which, is there a Oniguruma 1.8 backport (?) that you could
use as an add-on regexp engine? (I think currently you can use it as a
drop-in replacement if you built Ruby from source, I was thinking of a
more orthogonal way of using the Shiny Features. Where orthogonal really
means from a binary gem.)

David Vallner


--------------enig188C2A910A41A39BCC9EE800
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFhFtNy6MhrS8astoRArnbAJ9ZtKVUx2Q/TRO8fSxiYCjjZcJK4ACfcJ4u
bkJRDYMPFLQrqLCZlCDf1CI=
=WyS8
-----END PGP SIGNATURE-----

--------------enig188C2A910A41A39BCC9EE800--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top