Easily parsing a string to retrieve values and assign them to a variable/symbol.

Discussion in 'Ruby' started by Eric DUMINIL, Jul 18, 2007.

  1. Eric DUMINIL

    Eric DUMINIL Guest

    Hi!

    I've been looking in API's for a while in desperate need for an easy
    way to parse string and retrieve data (forget about Regexp or scanf),
    so that any non-rubyist guy I work with could describe, with a single
    string, a FTP directory on which some files are saved. Moreover, I
    need some metadata so that I can effectively sort and work with data I
    retrieve from this FTP.

    For example, I would not know which file I should retrieve on:
    'ftp://ftp.org/DATA/mike'
    but
    'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
    just fine, so that I could, for example, get this hash:
    {:year=>"2005", :user_name=>"mike", :day=>"15", :month=>"10"}
    for this filename:
    'ftp://ftp.org/DATA/mike/2005/10-15.txt'


    I don't know if such a method is already available for Ruby, so I
    decided to implement it on my own. Here it is:

    ###### Source ###########################################

    class String
    def parse_for_variables(description,begin_var_name="{",end_var_name="}")
    split_reg_exp=Regexp.new(Regexp.quote(begin_var_name)<<"(.+?)"<<Regexp.quote(end_var_name))
    @variables=[]
    @is_a_variable_name=true
    searching_reg_exp=Regexp.new("^"<<description.split(split_reg_exp).collect{|str|
    @is_a_variable_name=!@is_a_variable_name
    if @is_a_variable_name then
    @variables<<str.sub(/:(\d+)$/,'').intern
    str=~/:(\d+)$/ ? '(.{'<<$1<<'})' :"(.+)"
    else
    Regexp.quote(str)
    end
    }.join<<"$")
    values=searching_reg_exp.match(self).to_a[1..-1]

    !values.nil? &&
    @variables.length==values.length &&
    Hash.check_for_consistency_and_create_from_arrays(@variables,values)
    end
    end


    class Hash
    def self.create_from_arrays(keys,values)
    self[*keys.zip(values).flatten]
    end

    def self.check_for_consistency_and_create_from_arrays(keys,values)
    @result={}
    keys.each_with_index{|k,i|
    raise ArgumentError if @result.has_key?(k) and @result[k]!=values
    @result[k]=values
    }
    @result
    rescue ArgumentError
    false
    end
    end

    ############################################################


    #### Examples ###############################################

    irb(main):026:0> 'foobar'.parse_for_variables('foo{name}')
    => {:name=>"bar"}

    # You can specify the length of a string by adding :i to the end of a
    variable name

    irb(main):027:0> 'foobar'.parse_for_variables('foo{name:3}')
    => {:name=>"bar"}

    irb(main):028:0> 'foobar'.parse_for_variables('foo{name:2}')
    => false

    irb(main):029:0> 'foobar'.parse_for_variables('foo{name}')
    => {:name=>"bar"}

    # By default, variable names are written between {}, but it could be
    overridden with optional arguments

    irb(main):030:0> 'foo(bar){|x|
    x+2}'.parse_for_variables('foo(<<arg>>){|<<var>>|
    <<expression>>}','<<','>>')
    => {:arg=>"bar", :var=>"x", :expression=>"x+2"}

    irb(main):031:0>
    'C:\Windows\system32\vbrun700.dll'.parse_for_variables('{disk}:\{path}\{filename}.{extension}')
    => {:disk=>"C", :extension=>"dll", :filename=>"vbrun700",
    :path=>"Windows\\system32"}

    irb(main):032:0>
    '2006-12-09.csv'.parse_for_variables('{year}-{month}-{day}.csv')
    => {:year=>"2006", :day=>"09", :month=>"12"}

    irb(main):033:0> '2005 12 15'.parse_for_variables('{year} {month} {day}')
    => {:year=>"2005", :day=>"15", :month=>"12"}

    irb(main):034:0>
    '20061209.txt'.parse_for_variables('{year:4}{month:2}{day:2}.txt')
    => {:year=>"2006", :day=>"09", :month=>"12"}

    irb(main):035:0>
    '20061209.txt'.parse_for_variables('{year:2}{month:2}{day:2}.txt')
    => false

    # You can use a variable name twice:
    irb(main):036:0>
    'DATA/2007/2007-12-09.csv'.parse_for_variables('DATA/{year}/{year}-{month}-{day}.csv')
    => {:year=>"2007", :day=>"09", :month=>"12"}

    # as long as values are consistent:
    irb(main):037:0>
    'DATA/2007/2006-12-09.csv'.parse_for_variables('DATA/{year}/{year}-{month}-{day}.csv')
    => false

    irb(main):038:0> 'whateverTooLong'.parse_for_variables('whatever{name:4}')
    => false

    irb(main):039:0>
    'whateverAsLongAsIWant'.parse_for_variables('whateverKsome_variableK','K','K')
    => {:some_variable=>"AsLongAsIWant"}

    irb(main):040:0>
    'whatevertoolong.csv'.parse_for_variables('whatever$name:4$.csv','$','$')
    => false
    ############################################################


    Have you ever use such a method?
    Is it possible to implement it in a more elegant way?


    Thanks for reading, and please feel free to use my code if you ever need it,

    Eric Duminil
    Eric DUMINIL, Jul 18, 2007
    #1
    1. Advertising

  2. Re: Easily parsing a string to retrieve values and assign th

    Eric DUMINIL wrote:
    > '20061209.txt'.parse_for_variables('{year:4}{month:2}{day:2}.txt')
    > => {:year=>"2006", :day=>"09", :month=>"12"}


    I like this. It's sort of like a cut down regex for non-programmers. You
    should write this up with a definition and put it in a library. I bet
    people would use it.

    Don't forget to come up with a cool name.

    best,
    Dan

    --
    Posted via http://www.ruby-forum.com/.
    Daniel Lucraft, Jul 18, 2007
    #2
    1. Advertising

  3. Eric DUMINIL

    Peña, Botp Guest

    From: Eric DUMINIL [mailto:]=20
    # For example, I would not know which file I should retrieve on:
    # 'ftp://ftp.org/DATA/mike'
    # but
    # 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
    # just fine, so that I could, for example, get this hash:
    # {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"}
    # for this filename:
    # 'ftp://ftp.org/DATA/mike/2005/10-15.txt'

    very nice.
    but would it be more practical if we delineate a variable just like we =
    used to in ruby inline string; ie, use #{var} instead of just {var}

    this would be handy like, if i want to rename or move all folders under =
    /mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just =
    stay the same...

    kind regards -botp
    Peña, Botp, Jul 18, 2007
    #3
  4. Eric DUMINIL

    Eric DUMINIL Guest

    Hi
    Thanks for the appreciation!
    Your suggestion is interesting, even though I'm not sure it would work, bec=
    ause:

    'foobar'.parse_for_variables('foo#{name}','#{')
    =3D> {:name=3D>"bar"}

    works, but when you use it with double quotes string:

    'foobar'.parse_for_variables("foo#{name}",'#{')
    NameError: undefined local variable or method `name' for main:Object

    it already tries to evaluate "name" inside the string...
    so either you get retrieval or assignment right, but not both :(
    Anyway, assignment is not that big a deal:

    (irb) h=3D{:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"=
    10"}
    =3D> {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"}

    (irb) 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt'.gsub(/\{(.+=
    ?)\}/){h[$1.intern]}
    =3D> "ftp://ftp.org/DATA/mike/2005/10-15.txt"

    Best regards,

    Eric














    On 18/07/07, Pe=F1a, Botp <> wrote:
    > From: Eric DUMINIL [mailto:]
    > # For example, I would not know which file I should retrieve on:
    > # 'ftp://ftp.org/DATA/mike'
    > # but
    > # 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
    > # just fine, so that I could, for example, get this hash:
    > # {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"}
    > # for this filename:
    > # 'ftp://ftp.org/DATA/mike/2005/10-15.txt'
    >
    > very nice.
    > but would it be more practical if we delineate a variable just like we us=

    ed to in ruby inline string; ie, use #{var} instead of just {var}
    >
    > this would be handy like, if i want to rename or move all folders under /=

    mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just sta=
    y the same...
    >
    > kind regards -botp
    >
    >
    Eric DUMINIL, Jul 18, 2007
    #4
  5. Eric DUMINIL

    Peña, Botp Guest

    From: Eric DUMINIL [mailto:]=20
    # 'foobar'.parse_for_variables("foo#{name}",'#{')
    # NameError: undefined local variable or method `name' for main:Object

    oops, totally ignored that, was thinking about lazy evals..
    i think you're current interface is good, it would be easy to infix the =
    "#" later...

    kind regards -botp
    Peña, Botp, Jul 18, 2007
    #5
  6. Eric DUMINIL

    SonOfLilit Guest

    There is an option for regexen to lazily evaluate. So you could
    represent a regex-free string with a regex like that, then whenever
    you need it - evaluate the regex, convert it to a string and use it
    :).

    OR you could store the string 'stuff \#{name}' and later #eval() it or
    something similar and less dangerous when you need it's evaluation.

    Aur

    On 7/18/07, Eric DUMINIL <> wrote:
    > Hi
    > Thanks for the appreciation!
    > Your suggestion is interesting, even though I'm not sure it would work, b=

    ecause:
    >
    > 'foobar'.parse_for_variables('foo#{name}','#{')
    > =3D> {:name=3D>"bar"}
    >
    > works, but when you use it with double quotes string:
    >
    > 'foobar'.parse_for_variables("foo#{name}",'#{')
    > NameError: undefined local variable or method `name' for main:Object
    >
    > it already tries to evaluate "name" inside the string...
    > so either you get retrieval or assignment right, but not both :(
    > Anyway, assignment is not that big a deal:
    >
    > (irb) h=3D{:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D=
    >"10"}
    > =3D> {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"=

    }
    >
    > (irb) 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt'.gsub(/\{(=

    +?)\}/){h[$1.intern]}
    > =3D> "ftp://ftp.org/DATA/mike/2005/10-15.txt"
    >
    > Best regards,
    >
    > Eric
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >
    > On 18/07/07, Pe=F1a, Botp <> wrote:
    > > From: Eric DUMINIL [mailto:]
    > > # For example, I would not know which file I should retrieve on:
    > > # 'ftp://ftp.org/DATA/mike'
    > > # but
    > > # 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
    > > # just fine, so that I could, for example, get this hash:
    > > # {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"10"=

    }
    > > # for this filename:
    > > # 'ftp://ftp.org/DATA/mike/2005/10-15.txt'
    > >
    > > very nice.
    > > but would it be more practical if we delineate a variable just like we =

    used to in ruby inline string; ie, use #{var} instead of just {var}
    > >
    > > this would be handy like, if i want to rename or move all folders under=

    /mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just s=
    tay the same...
    > >
    > > kind regards -botp
    > >
    > >

    >
    >
    SonOfLilit, Jul 18, 2007
    #6
  7. Eric DUMINIL

    Eric DUMINIL Guest

    I think that what you describe is exactly what I implemented as
    searching_reg_exp.

    For example searching_reg_exp corresponding to
    'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' is:
    /^ftp:\/\/ftp\.org\/DATA\/(.+)\/(.+)\/(.+)\-(.+)\.txt$/

    if you want it to be non-greedy, it would be:
    /^ftp:\/\/ftp\.org\/DATA\/(.+?)\/(.+?)\/(.+?)\-(.+?)\.txt$/

    Or did I get you wrong?

    I wouldn't choose the eval() path for security reasons, as you mentioned it=
    ...
    'foo{system("rm -rf ~/")}' would be pretty bad!

    Which method are you thinking about when you wrote "something similar
    and less dangerous"?

    Bye,

    Eric






    On 18/07/07, SonOfLilit <> wrote:
    > There is an option for regexen to lazily evaluate. So you could
    > represent a regex-free string with a regex like that, then whenever
    > you need it - evaluate the regex, convert it to a string and use it
    > :).
    >
    > OR you could store the string 'stuff \#{name}' and later #eval() it or
    > something similar and less dangerous when you need it's evaluation.
    >
    > Aur
    >
    > On 7/18/07, Eric DUMINIL <> wrote:
    > > Hi
    > > Thanks for the appreciation!
    > > Your suggestion is interesting, even though I'm not sure it would work,=

    because:
    > >
    > > 'foobar'.parse_for_variables('foo#{name}','#{')
    > > =3D> {:name=3D>"bar"}
    > >
    > > works, but when you use it with double quotes string:
    > >
    > > 'foobar'.parse_for_variables("foo#{name}",'#{')
    > > NameError: undefined local variable or method `name' for main:Object
    > >
    > > it already tries to evaluate "name" inside the string...
    > > so either you get retrieval or assignment right, but not both :(
    > > Anyway, assignment is not that big a deal:
    > >
    > > (irb) h=3D{:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=

    =3D>"10"}
    > > =3D> {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"1=

    0"}
    > >
    > > (irb) 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt'.gsub(/\=

    {(.+?)\}/){h[$1.intern]}
    > > =3D> "ftp://ftp.org/DATA/mike/2005/10-15.txt"
    > >
    > > Best regards,
    > >
    > > Eric
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > >
    > > On 18/07/07, Pe=F1a, Botp <> wrote:
    > > > From: Eric DUMINIL [mailto:]
    > > > # For example, I would not know which file I should retrieve on:
    > > > # 'ftp://ftp.org/DATA/mike'
    > > > # but
    > > > # 'ftp://ftp.org/DATA/{user_name}/{year}/{month}-{day}.txt' would do
    > > > # just fine, so that I could, for example, get this hash:
    > > > # {:year=3D>"2005", :user_name=3D>"mike", :day=3D>"15", :month=3D>"1=

    0"}
    > > > # for this filename:
    > > > # 'ftp://ftp.org/DATA/mike/2005/10-15.txt'
    > > >
    > > > very nice.
    > > > but would it be more practical if we delineate a variable just like w=

    e used to in ruby inline string; ie, use #{var} instead of just {var}
    > > >
    > > > this would be handy like, if i want to rename or move all folders und=

    er /mike/2005/ to /mike/2007/ eg.. the retrieval and assignment string just=
    stay the same...
    > > >
    > > > kind regards -botp
    > > >
    > > >

    > >
    > >

    >
    >
    Eric DUMINIL, Jul 18, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt
    Replies:
    1
    Views:
    483
    Kevin Spencer
    Feb 11, 2005
  2. Anonieko

    HttpHandlers - Learn Them. Use Them.

    Anonieko, Jun 15, 2006, in forum: ASP .Net
    Replies:
    5
    Views:
    504
    tdavisjr
    Jun 16, 2006
  3. flamesrock
    Replies:
    3
    Views:
    597
    gene tani
    Jun 7, 2005
  4. baumann@pan
    Replies:
    1
    Views:
    724
    Richard Bos
    Apr 15, 2005
  5. why the lucky stiff
    Replies:
    5
    Views:
    131
    why the lucky stiff
    Sep 22, 2004
Loading...

Share This Page