Ruby to convert US to UK punctuation/spelling?

Discussion in 'Ruby' started by Michael Lommel, Jun 16, 2008.

  1. I have about a thousand multipage documents which I need to convert from
    US English and punctuation to UK English and punctuation. Before I start
    on a ruby script (I'm just learning ruby) wanted to see if anyone knows
    of existing tools to do this? I've also looked into a perl US->UK
    conversion tool but doesn't seem to exist.

    I'm starting with utf8 rtf documents which have printer's quotes (i.e.,
    distinct left and right curly quotes) which were retained from an
    original conversion from MS Word docs. For my documents, converting from
    US to UK punctuation means double quotes become single quotes and some
    single quotes become double (apostrophes are retained and single quotes
    not inside double quotes would need to be retained); but in the
    conversion I would like to retain distinct left and right quotation
    marks.

    I'm thinking that the end documents should have all print typography
    (em-dashes, en-dashes, printer quotes) should be converted to character
    entities.

    If there is no existing script to do this (seems like a problem others
    must have faced before) any thoughts on the right approach/tools/code
    snippets?

    Many thanks Rubyist...
    --
    Posted via http://www.ruby-forum.com/.
     
    Michael Lommel, Jun 16, 2008
    #1
    1. Advertising

  2. Michael Lommel

    Axel Etzold Guest

    -------- Original-Nachricht --------
    > Datum: Mon, 16 Jun 2008 09:45:45 +0900
    > Von: Michael Lommel <>
    > An:
    > Betreff: Ruby to convert US to UK punctuation/spelling?


    Dear Michael,

    > I have about a thousand multipage documents which I need to convert from
    > US English and punctuation to UK English and punctuation. Before I start
    > on a ruby script (I'm just learning ruby) wanted to see if anyone knows
    > of existing tools to do this? I've also looked into a perl US->UK
    > conversion tool but doesn't seem to exist.


    for general spell-checking, there is aspell, which you can use with different language
    options, and there are Ruby bindings for it:

    http://blog.evanweaver.com/files/doc/fauna/raspell/files/README.html

    So you might use the language option Aspell.new("en_GB") rather than Aspell.new("en_US") for the spell checking of misspelled (in the British English sense) American English text.
    If you have so much text, it will find some other errors, that both language forms consider erroneous, too. :)

    >I've also looked into a perl US->UK conversion tool but doesn't seem to exist.


    There certainly are Perl bindings to aspell- I'd bet a hundred quid/two hundred bucks :)


    > For my documents, converting from
    > US to UK punctuation means double quotes become single quotes and some
    > single quotes become double (apostrophes are retained and single quotes
    > not inside double quotes would need to be retained); but in the
    > conversion I would like to retain distinct left and right quotation
    > marks.


    That suggests some combination of String#scan, String#gsub and Regular expressions ...
    Since apostrophes and quotation marks are the same sign, I'd suggest making a
    list of words with apostrophes, write them to a file, where you can correct them manually,
    and String#gsub - replace first the apostrophes by something like <apostrophe>
    and then the quotes by <lquote> and <rquote> or the other way round.

    There's a Regular expressions tutorial here:

    http://www.regular-expressions.info/tutorial.html


    > I'm thinking that the end documents should have all print typography
    > (em-dashes, en-dashes, printer quotes) should be converted to character
    > entities.
    >


    You can do that with String.gsub("--",'<em-dash>'), after having copied the em-dash
    into the double quotes... etc..


    Best regards,

    Axel
    --
    Psssst! Schon vom neuen GMX MultiMessenger gehört?
    Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
     
    Axel Etzold, Jun 16, 2008
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. George

    spelling problem

    George, Aug 9, 2005, in forum: ASP .Net
    Replies:
    5
    Views:
    434
    George
    Aug 9, 2005
  2. gnat

    Spelling

    gnat, Feb 20, 2004, in forum: C Programming
    Replies:
    10
    Views:
    892
    Robert Wessel
    Feb 21, 2004
  3. KraftDiner

    Spelling mistakes!

    KraftDiner, Jan 6, 2006, in forum: Python
    Replies:
    54
    Views:
    1,187
    Antoon Pardon
    Jan 13, 2006
  4. Walter S. Leipold

    Re: Spelling mistakes!

    Walter S. Leipold, Jan 9, 2006, in forum: Python
    Replies:
    2
    Views:
    334
    Sybren Stuvel
    Jan 9, 2006
  5. Drew
    Replies:
    4
    Views:
    338
Loading...

Share This Page