Split text into equal characters but keeping whole words

Discussion in 'Ruby' started by John Butler, Jun 3, 2010.

  1. John Butler

    John Butler Guest

    Hi,

    I have text that is petty large. What i want to do is split it into an
    array of no more than 250 characters per line but finishing on a full
    word at the end of each line. So if the last word was "ruby" but 'b'
    was the 250th character in the line then i want the line to cut off
    before ruby and then start at ruby for the next line. What i have at
    the minute is below which splits the text into an array of lines 250
    characters long, but as i say it slices through words at the end of the
    250 character limit.

    mytext.scan(/.{1,250}/m)

    Any ideas?

    JB
    --
    Posted via http://www.ruby-forum.com/.
    John Butler, Jun 3, 2010
    #1
    1. Advertising

  2. Hi --

    On Thu, 3 Jun 2010, John Butler wrote:

    > Hi,
    >
    > I have text that is petty large. What i want to do is split it into an
    > array of no more than 250 characters per line but finishing on a full
    > word at the end of each line. So if the last word was "ruby" but 'b'
    > was the 250th character in the line then i want the line to cut off
    > before ruby and then start at ruby for the next line. What i have at
    > the minute is below which splits the text into an array of lines 250
    > characters long, but as i say it slices through words at the end of the
    > 250 character limit.
    >
    > mytext.scan(/.{1,250}/m)
    >
    > Any ideas?


    The \b anchor (word boundary) might help you:

    mytext.scan(/.{1,250}\b/m)

    at least as a first approximation. You'll still have some edge cases
    and probably have to massage the output though.


    David

    --
    David A. Black, Senior Developer, Cyrus Innovation Inc.

    THE Ruby training with Black/Brown/McAnally
    COMPLEAT Coming to Chicago area, June 18-19, 2010!
    RUBYIST http://www.compleatrubyist.com
    David A. Black, Jun 3, 2010
    #2
    1. Advertising

  3. [Note: parts of this message were removed to make it a legal post.]

    On Thu, Jun 3, 2010 at 10:19 AM, John Butler <>wrote:

    > I have text that is pretty large. What i want to do is split it into an
    > array of no more than 250 characters per line but finishing on a full
    > word at the end of each line. So if the last word was "ruby" but 'b'
    > was the 250th character in the line then i want the line to cut off
    > before ruby and then start at ruby for the next line. ... Any ideas?
    >


    Using as an example the text below (setting to one side my slight surprise
    that Warren Buffett knows the quote by Jacobi), and ignoring any edge cases
    and subtle issues mentioned by Ian Hobson and David A Black, an idea might
    be:
    * go through the text adding 250 (or whatever) to the "latest" end position;
    * use rindex and David Black's suggestion of the \b anchor
    to search *backwards* for the "first" previous word boundary;
    * a refinement is use that as a "first guess" for the next split position,
    and then see if any edge cases or issues raised by Ian Hobson
    should make the split position "earlier". (But once you're doing
    that, it might perhaps be better to use Ian Hobson's suggestion
    of going forwards one word at a time and checking if it will fit,
    or if there is a paragraph break, etc.)

    def split_on_words_with_max_line_length( text, max_line_length )
    text_last_index_plus_1 = text.length
    ii = nil ; jj = 0; aa = []
    while true
    ii = jj; jj = ii + max_line_length
    if jj < text_last_index_plus_1 then
    ww = text.rindex( %r{\b}m, jj )
    jj = ww # done like this in case jj needed for edge cases
    else
    jj = text_last_index_plus_1
    end
    aa << text[ ii ... jj ].strip
    break unless jj < text_last_index_plus_1
    end
    aa
    end

    text =
    "http://en.wikipedia.org/wiki/Carl_Gustav_Jacob_Jacobi" \
    " Carl Gustav Jacob Jacobi (10 December 1804 - 18 February 1851)" \
    " was a Prussian mathematician, widely considered to be" \
    " the most inspiring teacher of his time and one of" \
    " the greatest mathematicians of all time." \
    " ... It was in algebraic development that Jacobi's peculiar power" \
    " mainly lay, and he made important contributions of this kind" \
    " to many areas of mathematics ... One of his maxims was:" \
    " 'Invert, always invert' ('man muss immer umkehren')," \
    " expressing his belief that the solution of many hard problems" \
    " can be clarified by re-expressing them in inverse form." \
    "\n\n" \
    "http://www.ibtimes.com/articles/20100227/" \
    "invert-always-invert-buffet-advises-shareholders.htm" \
    " In his annual letter to shareholders, legendary investor" \
    " Warren Buffett outlined a few approaches to investing" \
    " and business management that should be avoided." \
    " Buffett cited Jacobi, a mathematician, who advised problem solvers" \
    " to \"invert, always invert\". In other words, instead of trying" \
    " to find ways of doing something successfully, first find methods" \
    " that are likely fail and avoid them."

    max_line_length = 102 # or whatever
    puts text
    puts; puts split_on_words_with_max_line_length( text, max_line_length )
    Colin Bartlett, Jun 3, 2010
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Robert Oschler

    re.split() not keeping matched text

    Robert Oschler, Jul 25, 2004, in forum: Python
    Replies:
    5
    Views:
    355
    Peter Otten
    Jul 26, 2004
  2. qwweeeit

    Split text file into words

    qwweeeit, Mar 8, 2005, in forum: Python
    Replies:
    4
    Views:
    1,331
    Duncan Booth
    Mar 9, 2005
  3. Graham Smith

    Text::Reform - Words are wrapped whole??

    Graham Smith, Dec 13, 2004, in forum: Perl Misc
    Replies:
    1
    Views:
    79
    Matt Garrish
    Dec 13, 2004
  4. Robert Oschler
    Replies:
    2
    Views:
    110
    peterS.
    Aug 2, 2005
  5. pantagruel
    Replies:
    8
    Views:
    424
    Dr John Stockton
    Jul 22, 2006
Loading...

Share This Page