splitting with a regex & keeping a ref?

Discussion in 'Ruby' started by Kyle Schmitt, May 1, 2008.

  1. Kyle Schmitt

    Kyle Schmitt Guest

    I'm writing some scripts to help handle some ornery samba servers we
    have: part of that is unfortunately reading the config scripts that
    have built up over the years.

    I was hoping to use the standard string method as a quick &
    not-so-dirty way of parsing the files, given that samba uses a very
    simple format.

    #the sample_data variable is defined below
    irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
    => ["", "\ncomment = shared directory for the shop\npath =
    /dept/shop\nvalid u ....(truncated)
    Gives good results, but omits what's between the brackets. I expected
    that part.

    irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
    => ["", "[shop]", "\ncomment = shared directory for the shop\npath =
    /dept/sho ....(truncated)
    Neat, gives me the data between the brackets in an element before the
    data itself.


    I know quite well I can zip through that array again, but I was
    wondering, hoping, that there would be a way of accessing that back
    reference in a block as part of the split.

    Is there any way to do that that I'm just missing?

    Thanks,
    Kyle

    sample_data=%{[shop]
    comment = shared directory for the shop
    path = /dept/shop
    valid users = @shop @admin
    public = no
    writable = yes
    force group = shop
    create mask = 0770
    [bob]
    comment = User files for bob
    path = /users/bob
    valid users = bob @admin
    public = no
    writable = yes
    create mask = 0770}
    Kyle Schmitt, May 1, 2008
    #1
    1. Advertising

  2. Hi --

    On Thu, 1 May 2008, Kyle Schmitt wrote:

    > I'm writing some scripts to help handle some ornery samba servers we
    > have: part of that is unfortunately reading the config scripts that
    > have built up over the years.
    >
    > I was hoping to use the standard string method as a quick &
    > not-so-dirty way of parsing the files, given that samba uses a very
    > simple format.
    >
    > #the sample_data variable is defined below
    > irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
    > => ["", "\ncomment = shared directory for the shop\npath =
    > /dept/shop\nvalid u ....(truncated)
    > Gives good results, but omits what's between the brackets. I expected
    > that part.
    >
    > irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
    > => ["", "[shop]", "\ncomment = shared directory for the shop\npath =
    > /dept/sho ....(truncated)
    > Neat, gives me the data between the brackets in an element before the
    > data itself.
    >
    >
    > I know quite well I can zip through that array again, but I was
    > wondering, hoping, that there would be a way of accessing that back
    > reference in a block as part of the split.


    I'm afraid I can't quite follow that sentence. What do you mean by a
    back reference? Can you show some sample desired output?


    David

    --
    Rails training from David A. Black and Ruby Power and Light:
    INTRO TO RAILS June 9-12 Berlin
    ADVANCING WITH RAILS June 16-19 Berlin
    INTRO TO RAILS June 24-27 London (Skills Matter)
    See http://www.rubypal.com for details and updates!
    David A. Black, May 1, 2008
    #2
    1. Advertising

  3. Kyle Schmitt

    Kyle Schmitt Guest

    David, back reference as in a regex back reference.
    In a nutshell, it stores what was matched, and allows you to do
    something with it. You just place parentheses around the part of the
    match you want to save.

    They work like this in ruby's gsub (but a little differently in sed,
    if that's the regex you grew up with).

    example=%{Brian had a dog
    James had a cat
    Allen has a hampster}
    puts example
    #If you wanted to change the type of pet with gsub, you could do it like this...
    puts example.gsub("/[^ ]+$/","grue")
    #but if you wanted to describe the pet, and not change the type, you'd
    need a backreference
    puts example.gsub(/([^ ]+$)/){|i| "big ugly #{i}"}

    On Thu, May 1, 2008 at 9:55 AM, David A. Black <> wrote:
    > Hi --
    >
    >
    > On Thu, 1 May 2008, Kyle Schmitt wrote:
    >
    >
    > > I'm writing some scripts to help handle some ornery samba servers we
    > > have: part of that is unfortunately reading the config scripts that
    > > have built up over the years.
    > >
    > > I was hoping to use the standard string method as a quick &
    > > not-so-dirty way of parsing the files, given that samba uses a very
    > > simple format.
    > >
    > > #the sample_data variable is defined below
    > > irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
    > > => ["", "\ncomment = shared directory for the shop\npath =
    > > /dept/shop\nvalid u ....(truncated)
    > > Gives good results, but omits what's between the brackets. I expected
    > > that part.
    > >
    > > irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
    > > => ["", "[shop]", "\ncomment = shared directory for the shop\npath =
    > > /dept/sho ....(truncated)
    > > Neat, gives me the data between the brackets in an element before the
    > > data itself.
    > >
    > >
    > > I know quite well I can zip through that array again, but I was
    > > wondering, hoping, that there would be a way of accessing that back
    > > reference in a block as part of the split.
    > >

    >
    > I'm afraid I can't quite follow that sentence. What do you mean by a
    > back reference? Can you show some sample desired output?
    >
    >
    > David
    >
    > --
    > Rails training from David A. Black and Ruby Power and Light:
    > INTRO TO RAILS June 9-12 Berlin
    > ADVANCING WITH RAILS June 16-19 Berlin
    > INTRO TO RAILS June 24-27 London (Skills Matter)
    > See http://www.rubypal.com for details and updates!
    >
    >
    Kyle Schmitt, May 1, 2008
    #3
  4. Kyle Schmitt

    Kyle Schmitt Guest

    Ohh right, desired sample output.

    What I'd really like, is to split the string, and either stuff it
    straight into a hash at the same time, or, more realistically since
    it's splitting, array tuples.
    So...

    sample.data.split(){magic happens here}
    =>{"[shop]"=>"\ncomment = shared directory for the shop\npath..>"}

    or
    sample.data.split(){magick happens here}
    =>[["[shop]","\ncomment = shared directory for the shop\npath..>"]]
    Kyle Schmitt, May 1, 2008
    #4
  5. Kyle Schmitt

    Kyle Schmitt Guest

    David,
    re-reading your sig, and that page, I've got to apologize,
    you already knew that stuff in spades I'm sure! :)

    What part doesn't quite make sense?

    On Thu, May 1, 2008 at 10:17 AM, Kyle Schmitt <> wrote:
    > Ohh right, desired sample output.
    >
    > What I'd really like, is to split the string, and either stuff it
    > straight into a hash at the same time, or, more realistically since
    > it's splitting, array tuples.
    > So...
    >
    > sample.data.split(){magic happens here}
    > =>{"[shop]"=>"\ncomment = shared directory for the shop\npath..>"}
    >
    > or
    > sample.data.split(){magick happens here}
    > =>[["[shop]","\ncomment = shared directory for the shop\npath..>"]]
    >
    >
    Kyle Schmitt, May 1, 2008
    #5
  6. Kyle Schmitt

    yermej Guest

    On May 1, 9:46 am, Kyle Schmitt <> wrote:
    > I'm writing some scripts to help handle some ornery samba servers we
    > have: part of that is unfortunately reading the config scripts that
    > have built up over the years.
    >
    > I was hoping to use the standard string method as a quick &
    > not-so-dirty way of parsing the files, given that samba uses a very
    > simple format.
    >
    > #the sample_data variable is defined below
    > irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
    > => ["", "\ncomment = shared directory for the shop\npath =
    > /dept/shop\nvalid u ....(truncated)
    > Gives good results, but omits what's between the brackets. I expected
    > that part.
    >
    > irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
    > => ["", "[shop]", "\ncomment = shared directory for the shop\npath =
    > /dept/sho ....(truncated)
    > Neat, gives me the data between the brackets in an element before the
    > data itself.
    >
    > I know quite well I can zip through that array again, but I was
    > wondering, hoping, that there would be a way of accessing that back
    > reference in a block as part of the split.
    >
    > Is there any way to do that that I'm just missing?
    >
    > Thanks,
    > Kyle
    >
    > sample_data=%{[shop]
    > comment = shared directory for the shop
    > path = /dept/shop
    > valid users = @shop @admin
    > public = no
    > writable = yes
    > force group = shop
    > create mask = 0770
    > [bob]
    > comment = User files for bob
    > path = /users/bob
    > valid users = bob @admin
    > public = no
    > writable = yes
    > create mask = 0770}


    I think you might want scan instead of split.

    sample_data.scan( /(\[[a-z0-9]+\])([^\[]*)/i) do |share, opts|
    # create your hash or whatever here
    end
    yermej, May 1, 2008
    #6
  7. Kyle Schmitt

    Kyle Schmitt Guest

    yermej,
    scan you say. Heh, I never even thought of that one.
    Makes the whole thing rather simple!

    Thanks.

    On Thu, May 1, 2008 at 11:10 AM, yermej <> wrote:
    >
    > On May 1, 9:46 am, Kyle Schmitt <> wrote:
    > > I'm writing some scripts to help handle some ornery samba servers we
    > > have: part of that is unfortunately reading the config scripts that
    > > have built up over the years.
    > >
    > > I was hoping to use the standard string method as a quick &
    > > not-so-dirty way of parsing the files, given that samba uses a very
    > > simple format.
    > >
    > > #the sample_data variable is defined below
    > > irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
    > > => ["", "\ncomment = shared directory for the shop\npath =
    > > /dept/shop\nvalid u ....(truncated)
    > > Gives good results, but omits what's between the brackets. I expected
    > > that part.
    > >
    > > irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
    > > => ["", "[shop]", "\ncomment = shared directory for the shop\npath =
    > > /dept/sho ....(truncated)
    > > Neat, gives me the data between the brackets in an element before the
    > > data itself.
    > >
    > > I know quite well I can zip through that array again, but I was
    > > wondering, hoping, that there would be a way of accessing that back
    > > reference in a block as part of the split.
    > >
    > > Is there any way to do that that I'm just missing?
    > >
    > > Thanks,
    > > Kyle
    > >
    > > sample_data=%{[shop]
    > > comment = shared directory for the shop
    > > path = /dept/shop
    > > valid users = @shop @admin
    > > public = no
    > > writable = yes
    > > force group = shop
    > > create mask = 0770
    > > [bob]
    > > comment = User files for bob
    > > path = /users/bob
    > > valid users = bob @admin
    > > public = no
    > > writable = yes
    > > create mask = 0770}

    >
    > I think you might want scan instead of split.
    >
    > sample_data.scan( /(\[[a-z0-9]+\])([^\[]*)/i) do |share, opts|
    > # create your hash or whatever here
    > end
    >
    >
    Kyle Schmitt, May 1, 2008
    #7
  8. Kyle Schmitt

    Kyle Schmitt Guest

    Robbert, yermej, David,

    Thanks a bunch!
    Here's what I finally came up with, in case anyone's bored enough to wonder.

    file="/path/to/smb/file/sample.conf"
    regex=/(\[[a-z0-9]+\])([^\[]*)/i
    samba_config={}
    File.open(file){|f| f.read()}.scan(regex) do
    |title,options|
    samba_config.store(title,{})
    options.strip.each() do
    |l|
    samba_config[title].store(l[/^[^=]*/].strip,l[/[^=]*[^\n]$/].strip)
    end
    end
    Kyle Schmitt, May 1, 2008
    #8
  9. Hi --

    On Fri, 2 May 2008, Kyle Schmitt wrote:

    > Robbert, yermej, David,
    >
    > Thanks a bunch!
    > Here's what I finally came up with, in case anyone's bored enough to wonder.
    >
    > file="/path/to/smb/file/sample.conf"
    > regex=/(\[[a-z0-9]+\])([^\[]*)/i
    > samba_config={}
    > File.open(file){|f| f.read()}.scan(regex) do
    > |title,options|
    > samba_config.store(title,{})
    > options.strip.each() do
    > |l|
    > samba_config[title].store(l[/^[^=]*/].strip,l[/[^=]*[^\n]$/].strip)
    > end
    > end


    I know you're not asking for refactoring advice, but here's some
    anyway :)

    If you're just going to read a file's contents into a string, you can
    use File.read, rather than the whole open/read thing. Also, I'd
    encourage you to drop the empty parentheses after method names. The
    message-sending dot tells you that it's a method; the () doesn't add
    signal, just noise.

    Anyway, here's a tweaked version, in case it's of interest. Nothing
    too radical, just a couple of possibly fun alternative techniques :)

    File.read("filename").scan(regex) do |title,options|
    samba_config[title] = {}
    options.strip.each do |option|
    samba_config[title].update(Hash[*option.strip.split(/\s*=\s*/)])
    end
    end


    David

    --
    Rails training from David A. Black and Ruby Power and Light:
    INTRO TO RAILS June 9-12 Berlin
    ADVANCING WITH RAILS June 16-19 Berlin
    INTRO TO RAILS June 24-27 London (Skills Matter)
    See http://www.rubypal.com for details and updates!
    David A. Black, May 1, 2008
    #9
  10. Kyle Schmitt

    Kyle Schmitt Guest

    David,
    I don't mind it at all!

    Out of curiosity, agreeing that File.read().scan() is much cleaner, is
    it just syntactic sugar for the same thing, or is it computationally
    different?

    Thanks for the Hash[*Array] syntax btw, I've used it way way back, but
    for the life of me couldn't remember it, thought maybe I was mistaken.

    > I know you're not asking for refactoring advice, but here's some
    > anyway :)
    >
    > If you're just going to read a file's contents into a string, you can
    > use File.read, rather than the whole open/read thing. Also, I'd
    > encourage you to drop the empty parentheses after method names. The
    > message-sending dot tells you that it's a method; the () doesn't add
    > signal, just noise.
    >
    > Anyway, here's a tweaked version, in case it's of interest. Nothing
    > too radical, just a couple of possibly fun alternative techniques :)
    >
    > File.read("filename").scan(regex) do |title,options|
    > samba_config[title] = {}
    > options.strip.each do |option|
    > samba_config[title].update(Hash[*option.strip.split(/\s*=\s*/)])
    > end
    > end
    >
    >
    >
    >
    > David
    >
    > --
    > Rails training from David A. Black and Ruby Power and Light:
    > INTRO TO RAILS June 9-12 Berlin
    > ADVANCING WITH RAILS June 16-19 Berlin
    > INTRO TO RAILS June 24-27 London (Skills Matter)
    > See http://www.rubypal.com for details and updates!
    >
    >
    Kyle Schmitt, May 1, 2008
    #10
  11. Hi --

    On Fri, 2 May 2008, Kyle Schmitt wrote:

    > David,
    > I don't mind it at all!
    >
    > Out of curiosity, agreeing that File.read().scan() is much cleaner, is
    > it just syntactic sugar for the same thing, or is it computationally
    > different?


    I don't know whether File.read is actually written in terms of
    File.open (I'm afraid I'm too lazy to check right now), but I think
    the underlying system calls etc. would be at least very similar. I
    expect File.read is slightly cheaper, since it doesn't involve the
    whole block structure. (I'm talking, of course, specifically about
    comparison with File.open {|f| f.read }, where you get the whole file
    at once.)


    David

    --
    Rails training from David A. Black and Ruby Power and Light:
    INTRO TO RAILS June 9-12 Berlin
    ADVANCING WITH RAILS June 16-19 Berlin
    INTRO TO RAILS June 24-27 London (Skills Matter)
    See http://www.rubypal.com for details and updates!
    David A. Black, May 1, 2008
    #11
  12. On May 1, 9:46 am, Kyle Schmitt <> wrote:
    > I'm writing some scripts to help handle some ornery samba servers we
    > have: part of that is unfortunately reading the config scripts that
    > have built up over the years.
    >
    > I was hoping to use the standard string method as a quick &
    > not-so-dirty way of parsing the files, given that samba uses a very
    > simple format.
    >
    > #the sample_data variable is defined below
    > irb(main):sample_data.split(/\[[a-z0-9]+\]/i)
    > => ["", "\ncomment = shared directory for the shop\npath =
    > /dept/shop\nvalid u ....(truncated)
    > Gives good results, but omits what's between the brackets. I expected
    > that part.
    >
    > irb(main):sample_data.split(/(\[[a-z0-9]+\])/i)
    > => ["", "[shop]", "\ncomment = shared directory for the shop\npath =
    > /dept/sho ....(truncated)
    > Neat, gives me the data between the brackets in an element before the
    > data itself.
    >
    > I know quite well I can zip through that array again, but I was
    > wondering, hoping, that there would be a way of accessing that back
    > reference in a block as part of the split.
    >
    > Is there any way to do that that I'm just missing?
    >
    > Thanks,
    > Kyle
    >
    > sample_data=%{[shop]
    > comment = shared directory for the shop
    > path = /dept/shop
    > valid users = @shop @admin
    > public = no
    > writable = yes
    > force group = shop
    > create mask = 0770
    > [bob]
    > comment = User files for bob
    > path = /users/bob
    > valid users = bob @admin
    > public = no
    > writable = yes
    > create mask = 0770}


    regex = /(\[.*?\])/
    h = Hash[ *IO.read("data").strip.split( regex )[1..-1] ]
    h.each{|k,v| h[k] = Hash[ *v.strip.split( / *= *|\n/ ) ] }
    William James, May 2, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    330
  2. Replies:
    22
    Views:
    737
    peter koch
    Apr 30, 2008
  3. Replies:
    6
    Views:
    332
    James Kanze
    Apr 29, 2008
  4. Navindra Umanee

    strong ref from weak ref?

    Navindra Umanee, Feb 12, 2005, in forum: Ruby
    Replies:
    2
    Views:
    135
    Navindra Umanee
    Feb 12, 2005
  5. Juha Nieminen
    Replies:
    13
    Views:
    596
    Edek Pienkowski
    Aug 29, 2012
Loading...

Share This Page