A method to search and cut in Ruby.

Discussion in 'Ruby' started by scomboni@gmail.com, Jul 25, 2007.

  1. Guest

    Hello all,
    I'm trying to parse some EXIF data and return certain fields of text.
    If I wanted to parsed a document looking for text and then locating it
    take everything after a colon : delimitator what is the best method in
    ruby vs Shell command?

    For example if I have a file called filename.txt and I want to search
    for this ---> "Compression : JPEG (old-style)"
    but only return everything to the right of the delimitator?

    Using command line tool such as grep to locate and then cut to grab
    everything after " : " works but I'm trying to learn how to do things
    like this in Ruby and having no luck grabbing everything to the right
    of the colon. Some of the text I'm grabbing can contain additional
    colons and I want everything after the first one.


    Thanks.

    Sc--
    , Jul 25, 2007
    #1
    1. Advertising

  2. Phrogz Guest

    On Jul 25, 4:59 pm, wrote:
    > For example if I have a file called filename.txt and I want to search
    > for this ---> "Compression : JPEG (old-style)"
    > but only return everything to the right of the delimitator?


    C:\>irb
    irb(main):001:0> s = "Foo : Bar"
    => "Foo : Bar"
    irb(main):002:0> s.split ":"
    => ["Foo ", " Bar"]
    irb(main):003:0> s.split(":").last
    => " Bar"

    ....or did you want the leading whitespace chomped?

    irb(main):004:0> s.split( /\s*:\s*/ )
    => ["Foo", "Bar"]
    irb(main):005:0> s.split( /\s*:\s*/ ).last
    => "Bar"
    Phrogz, Jul 26, 2007
    #2
    1. Advertising

  3. Guest

    On Jul 25, 7:05 pm, Phrogz <> wrote:
    > On Jul 25, 4:59 pm, wrote:
    >
    > > For example if I have a file called filename.txt and I want to search
    > > for this ---> "Compression : JPEG (old-style)"
    > > but only return everything to the right of the delimitator?

    >
    > C:\>irb
    > irb(main):001:0> s = "Foo : Bar"
    > => "Foo : Bar"
    > irb(main):002:0> s.split ":"
    > => ["Foo ", " Bar"]
    > irb(main):003:0> s.split(":").last
    > => " Bar"
    >
    > ...or did you want the leading whitespace chomped?
    >
    > irb(main):004:0> s.split( /\s*:\s*/ )
    > => ["Foo", "Bar"]
    > irb(main):005:0> s.split( /\s*:\s*/ ).last
    > => "Bar"


    I was looking at this method as well. I was having trouble though as I
    know the left hand side column so for this example the Compression
    portion then the colon : the remainder could be anything for instance
    a description with lots of text and could also contain a additional
    colon. At the shell prompt I could grep "Compression" | cut -d: -f2-
    and that would grab everything...
    So it sounds like I'm looking in the correct area but Im just not
    pulling it all together. I will keep at it if you have additional info
    to point to that would be great..and appreciated.
    Thanks again.


    Sc-
    , Jul 26, 2007
    #3
  4. bbiker Guest

    On Jul 25, 7:05 pm, Phrogz <> wrote:
    > On Jul 25, 4:59 pm, wrote:
    >
    > > For example if I have a file called filename.txt and I want to search
    > > for this ---> "Compression : JPEG (old-style)"
    > > but only return everything to the right of the delimitator?

    >
    > C:\>irb
    > irb(main):001:0> s = "Foo : Bar"
    > => "Foo : Bar"
    > irb(main):002:0> s.split ":"
    > => ["Foo ", " Bar"]
    > irb(main):003:0> s.split(":").last
    > => " Bar"
    >
    > ...or did you want the leading whitespace chomped?
    >
    > irb(main):004:0> s.split( /\s*:\s*/ )
    > => ["Foo", "Bar"]
    > irb(main):005:0> s.split( /\s*:\s*/ ).last
    > => "Bar"



    Some of the text I'm grabbing can contain additional
    colons and I want everything after the first one.

    SC
    I am assuming you want everything after the first colon but not after
    the second

    irb(main):001:0> line = "Compression : JPEG (old_style) : whatever"
    => "Compression : JPEG (old_style) : whatever"
    irb(main):002:0> if line =~ /^Compression\s*:?(.*)$/
    irb(main):003:1> end_str = $1
    irb(main):004:1> first_el = end_str.split(/:/).first
    puts first_el
    irb(main):005:1> end
    JPEG (old_style)

    if you everything after each colon

    el = end_str.split(/:/)

    p el = {"JPEG (old-style)", "whatever"]
    bbiker, Jul 26, 2007
    #4
  5. Phil Meier Guest

    schrieb:
    > On Jul 25, 7:05 pm, Phrogz <> wrote:
    >> On Jul 25, 4:59 pm, wrote:

    ....
    >> ...or did you want the leading whitespace chomped?
    >>
    >> irb(main):004:0> s.split( /\s*:\s*/ )
    >> => ["Foo", "Bar"]
    >> irb(main):005:0> s.split( /\s*:\s*/ ).last
    >> => "Bar"

    >
    > I was looking at this method as well. I was having trouble though as I
    > know the left hand side column so for this example the Compression
    > portion then the colon : the remainder could be anything for instance
    > a description with lots of text and could also contain a additional
    > colon. At the shell prompt I could grep "Compression" | cut -d: -f2-
    > and that would grab everything...


    To get everything after the first colon you can use:
    s =~ /^.*?:/
    textAfterColon = $'.dup

    To also get rid of the spaces directly after the first colon use this Regex:
    s =~ /^.*?:\s*/
    textAfterColon = $'.dup

    The trick is to anchor the regex (i.e. using ^)

    BR Phil
    Phil Meier, Jul 26, 2007
    #5
  6. Guest

    On Jul 26, 3:28 am, Phil Meier <> wrote:
    > schrieb:
    >
    >
    >
    > > On Jul 25, 7:05 pm, Phrogz <> wrote:
    > >> On Jul 25, 4:59 pm, wrote:

    > ...
    > >> ...or did you want the leading whitespace chomped?

    >
    > >> irb(main):004:0> s.split( /\s*:\s*/ )
    > >> => ["Foo", "Bar"]
    > >> irb(main):005:0> s.split( /\s*:\s*/ ).last
    > >> => "Bar"

    >
    > > I was looking at this method as well. I was having trouble though as I
    > > know the left hand side column so for this example the Compression
    > > portion then the colon : the remainder could be anything for instance
    > > a description with lots of text and could also contain a additional
    > > colon. At the shell prompt I could grep "Compression" | cut -d: -f2-
    > > and that would grab everything...

    >
    > To get everything after the first colon you can use:
    > s =~ /^.*?:/
    > textAfterColon = $'.dup
    >
    > To also get rid of the spaces directly after the first colon use this Regex:
    > s =~ /^.*?:\s*/
    > textAfterColon = $'.dup
    >
    > The trick is to anchor the regex (i.e. using ^)
    >
    > BR Phil


    Thanks so much for everyones answers much appreciated. The space after
    the colon, I thought I was going to have to live with that. Thanks
    again for the help.
    Scott
    , Jul 26, 2007
    #6
  7. on Thu 26. July 2007 02.37, wrote:

    > On Jul 25, 7:05 pm, Phrogz <> wrote:
    >> On Jul 25, 4:59 pm, wrote:
    >>
    >> > For example if I have a file called filename.txt and I want to search
    >> > for this ---> "Compression : JPEG (old-style)"
    >> > but only return everything to the right of the delimitator?

    >>
    >> C:\>irb
    >> irb(main):001:0> s = "Foo : Bar"
    >> => "Foo : Bar"
    >> irb(main):002:0> s.split ":"
    >> => ["Foo ", " Bar"]
    >> irb(main):003:0> s.split(":").last
    >> => " Bar"
    >>
    >> ...or did you want the leading whitespace chomped?
    >>
    >> irb(main):004:0> s.split( /\s*:\s*/ )
    >> => ["Foo", "Bar"]
    >> irb(main):005:0> s.split( /\s*:\s*/ ).last
    >> => "Bar"

    >
    > I was looking at this method as well. I was having trouble though as I
    > know the left hand side column so for this example the Compression
    > portion then the colon : the remainder could be anything for instance
    > a description with lots of text and could also contain a additional
    > colon. At the shell prompt I could grep "Compression" | cut -d: -f2-
    > and that would grab everything...
    > So it sounds like I'm looking in the correct area but Im just not
    > pulling it all together. I will keep at it if you have additional info
    > to point to that would be great..and appreciated.
    > Thanks again.


    split takes an optional second parameter <limit>, defining how many elements
    the resuting array shall have at most

    'foo : bar : baz'.split(/\s*:\s*/, 2)
    => [ "foo", "bar : baz" ]

    -Thomas

    --
    <sig. under construction>
    Thomas Gantner, Jul 26, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Abby Lee
    Replies:
    5
    Views:
    366
    Abby Lee
    Aug 2, 2004
  2. Brian Takita

    Cut AOP implementation in Ruby

    Brian Takita, Jan 19, 2006, in forum: Ruby
    Replies:
    1
    Views:
    117
    Trans
    Jan 19, 2006
  3. Replies:
    1
    Views:
    90
    Michael Fellinger
    Jul 24, 2006
  4. Replies:
    4
    Views:
    98
    Giles Bowkett
    Feb 9, 2007
  5. John Carter

    First cut Perl to Ruby conversion?

    John Carter, Mar 13, 2007, in forum: Ruby
    Replies:
    2
    Views:
    84
    eden li
    Mar 14, 2007
Loading...

Share This Page