character mapping functions and UNICODE : remove accents, case, etc

Discussion in 'Perl Misc' started by An. Valula, Oct 19, 2003.

  1. An. Valula

    An. Valula Guest

    Hi,

    does anyone out there know about perl capabilities to convert rich text,
    such as "étrangères" to "etrangere" (remove accents)?
    Of course, tr/éè/ee/ would do, but I look for sth better: you do not
    tr/a-z/A-Z/ for uc(), do you?

    regards
     
    An. Valula, Oct 19, 2003
    #1
    1. Advertising

  2. Re: character mapping functions and UNICODE : remove accents, case,etc

    On Sun, 19 Oct 2003 09:45:57 GMT
    "An. Valula" <> wrote:

    >
    > does anyone out there know about perl capabilities to convert rich
    > text, such as "étrangères" to "etrangere" (remove accents)?
    > Of course, tr/éè/ee/ would do, but I look for sth better: you do not
    > tr/a-z/A-Z/ for uc(), do you?


    I realize this doesn't answer the question directly, but have you
    checked out RTF::parse
    (http://search.cpan.org/~pverd/RTF-Parser-1.07/)? Thay _may_ aid you
    in what you want to accomplish.

    HTH

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    Celebrate Hannibal Day this year. Take an elephant to lunch.
     
    James Willmore, Oct 19, 2003
    #2
    1. Advertising

  3. An. Valula

    An. Valula Guest

    Hi,

    thank you for your answer, but, no, I do not want to remove bold or
    paragraph marks.

    I want to convert "rich" text to "poor" text.
    What I call "rich" text is for example with accents, miscelaneous cases etc.
    For example: "Hêtre chétif".
    Whereas "poor" text is withous accents, no casing (casing is easy to solve
    with uc/lc). For example: "hetre chetif".

    There must be someone else who wants to compare strings without diacritical
    signs ?!

    regards



    "James Willmore" <> wrote in message
    news:...
    On Sun, 19 Oct 2003 09:45:57 GMT
    "An. Valula" <> wrote:

    >
    > does anyone out there know about perl capabilities to convert rich
    > text, such as "étrangères" to "etrangere" (remove accents)?
    > Of course, tr/éè/ee/ would do, but I look for sth better: you do not
    > tr/a-z/A-Z/ for uc(), do you?


    I realize this doesn't answer the question directly, but have you
    checked out RTF::parse
    (http://search.cpan.org/~pverd/RTF-Parser-1.07/)? Thay _may_ aid you
    in what you want to accomplish.

    HTH

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    Celebrate Hannibal Day this year. Take an elephant to lunch.
     
    An. Valula, Oct 23, 2003
    #3
  4. Re: character mapping functions and UNICODE : remove accents, case,etc

    On Thu, 23 Oct 2003, An. Valula floated out upon a sea of TOFU:

    > thank you for your answer, but, no, I do not want to remove bold or
    > paragraph marks.


    But that *is* what the term "rich text" format normally refers to -
    whether used in the generic sense or in particular reference to
    Microsoft's "RTF" interchange specification.

    > I want to convert "rich" text to "poor" text.


    Not really, and that's why you confused the previous respondent. You
    need some better term. (Try a glossary of text processing if you
    don't believe me).

    > There must be someone else who wants to compare strings without diacritical
    > signs ?!


    Is there a problem? You already know one solution.

    > > does anyone out there know about perl capabilities to convert rich
    > > text, such as "étrangères" to "etrangere" (remove accents)?
    > > Of course, tr/éè/ee/ would do, but I look for sth better: you do not
    > > tr/a-z/A-Z/ for uc(), do you?


    You probably should note that your tr/// and your uc() perform
    *different* operations, in general - also depending on the locale
    setting.

    Anyhow, I don't have an answer to your requirement, other than the
    obvious one. Well, perhaps I do: you could "do the Unicode
    decomposition" thing, but it would seem distinctly inefficient
    compared to a tr///

    Have a look at e.g http://www.perldoc.com/perl5.8.0/pod/perlretut.html
    and see whether you really want to fight this via Unicode-style regex
    features. If you want to be sure of covering accents that you've
    never even heard of, then I guess that's the way to go, but if you're
    just looking for the usual Western-European accents then me, I'd go
    with the tr/// I reckon. But this is all supposition - it's not a
    requirement which I've needed myself.
     
    Alan J. Flavell, Oct 23, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jon Maz
    Replies:
    4
    Views:
    1,072
    Jon Maz
    Jun 15, 2004
  2. Daniel Mark
    Replies:
    6
    Views:
    70,935
    Tim Chase
    Sep 28, 2006
  3. Sallu
    Replies:
    1
    Views:
    575
    Gabriel Genellina
    Jun 19, 2008
  4. Kevin Walzer

    Re: PIL (etc etc etc) on OS X

    Kevin Walzer, Aug 1, 2008, in forum: Python
    Replies:
    4
    Views:
    459
    Fredrik Lundh
    Aug 13, 2008
  5. Ryan Chan
    Replies:
    2
    Views:
    364
    Jürgen Exner
    Oct 4, 2009
Loading...

Share This Page