Regex question(how easy/hard to do it in ruby)

Discussion in 'Ruby' started by Sarah Tanembaum, May 4, 2004.

  1. Pointers, please...

    I have this text in a comma delimited file with the following
    characteristic:

    ccc-123456, <multiline data>,

    Field number:

    1a - its always begin with 1 to 3 characters followed by
    a dash, e.g JKL-, A-, NM-, PQ-

    1b - after the dash, it follows by numbers starting from
    1 to 99999

    2 - a multiline data with either or both newline chars(\n)
    and/or cariage-return char(\r), or both(\r\n). This field
    might include special characters such as a
    single(') or double(") quote, a space, characters
    with ascii number > 127 - accented character,
    umlaud, etc ...

    3 - this field contain at least 2 line to at most 5 line of
    data where each line might be
    Begin with 2-3 chars, e.g GH@OPRJGPF1234
    followed by an "@", 1-7chars, and followed by
    1-4 numbers

    My question is :

    1a. how to parse the first field(field 1a) so I can manipulate/rename it to
    a new label dending on what label they have currently

    1b. in field 1b, instead of just 1 number, I'd like to pad
    them with leading zero so, 1 -> 000001,
    1494 -> 001494, 560987->560987(no change).

    2. capture 2nd field and escape the special characters with ascii number

    3. capture 3rd field and parse them as well just as field 1.

    THanks
     
    Sarah Tanembaum, May 4, 2004
    #1
    1. Advertising

  2. Sarah Tanembaum

    Ara.T.Howard Guest

    On Mon, 3 May 2004, Sarah Tanembaum wrote:

    > Pointers, please...
    >
    > I have this text in a comma delimited file with the following
    > characteristic:
    >
    > ccc-123456, <multiline data>,
    >
    > Field number:
    >
    > 1a - its always begin with 1 to 3 characters followed by
    > a dash, e.g JKL-, A-, NM-, PQ-
    >
    > 1b - after the dash, it follows by numbers starting from
    > 1 to 99999
    >
    > 2 - a multiline data with either or both newline chars(\n)
    > and/or cariage-return char(\r), or both(\r\n). This field
    > might include special characters such as a
    > single(') or double(") quote, a space, characters
    > with ascii number > 127 - accented character,
    > umlaud, etc ...
    >
    > 3 - this field contain at least 2 line to at most 5 line of
    > data where each line might be
    > Begin with 2-3 chars, e.g GH@OPRJGPF1234
    > followed by an "@", 1-7chars, and followed by
    > 1-4 numbers
    >
    > My question is :
    >
    > 1a. how to parse the first field(field 1a) so I can manipulate/rename it to
    > a new label dending on what label they have currently


    what exactly do you mean by this? if you want to parse the fields themselves
    out use the 'csv' module included with ruby...

    > 1b. in field 1b, instead of just 1 number, I'd like to pad
    > them with leading zero so, 1 -> 000001,
    > 1494 -> 001494, 560987->560987(no change).


    ~ > ruby -e 'p(sprintf("%06.6d", 42))'
    "000042"

    ~ > man 3 printf

    > 2. capture 2nd field and escape the special characters with ascii number


    esc = '\\'[0]
    munged = ''
    field_2.each_byte{|c| munged << esc if c > 127; munged << c}
    field_2 = munged

    you could also use a regex to do this...

    special = %r/([#{ 127.chr }-#{ 255.chr })]/o
    field_2.gsub!(special){|match| "\\#{ match }"}

    >
    > 3. capture 3rd field and parse them as well just as field 1.
    >
    > THanks



    can you post some sample data? we could probably say more then...


    -a
    --
    ===============================================================================
    | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov
    | PHONE :: 303.497.6469
    | ADDRESS :: E/GC2 325 Broadway, Boulder, CO 80305-3328
    | URL :: http://www.ngdc.noaa.gov/stp/
    | TRY :: for l in ruby perl;do $l -e "print \"\x3a\x2d\x29\x0a\"";done
    ===============================================================================
     
    Ara.T.Howard, May 4, 2004
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Hans-Joachim Widmaier

    File handling: The easy and the hard way

    Hans-Joachim Widmaier, Sep 30, 2004, in forum: Python
    Replies:
    9
    Views:
    370
    Thorsten Kampe
    Oct 3, 2004
  2. RunLevelZero
    Replies:
    4
    Views:
    472
    Mike Meyer
    Jun 5, 2005
  3. kostas
    Replies:
    18
    Views:
    509
    Sohail Somani
    Nov 25, 2007
  4. Richard
    Replies:
    3
    Views:
    590
    Richard
    Sep 27, 2009
  5. Griff
    Replies:
    0
    Views:
    79
    Griff
    Feb 11, 2007
Loading...

Share This Page