File open, read and store in Hash, efficient?

Discussion in 'Ruby' started by Kev, Mar 9, 2007.

  1. Kev

    Kev Guest

    Hello,

    I am writing a class and I require it to open a file, and store the
    contents in key, value pairs.
    This is my first

    def initialize()
    @@store = Hash.new
    end

    def read_file
    if File.exists?("LocationCopy.csv")
    f = File.open("LocationCopy.csv","r")
    f.each do |line|
    temp = line.split(",")
    @@store[temp[0]] = temp[1]
    end
    f.close
    end
    #puts @@store
    end
     
    Kev, Mar 9, 2007
    #1
    1. Advertising

  2. Kev

    Kev Guest

    On 9 Mar, 09:12, "Kev" <> wrote:
    > Hello,
    >
    > I am writing a class and I require it to open a file, and store the
    > contents in key, value pairs.
    > This is my first
    >
    > def initialize()
    > @@store = Hash.new
    > end
    >
    > def read_file
    > if File.exists?("LocationCopy.csv")
    > f = File.open("LocationCopy.csv","r")
    > f.each do |line|
    > temp = line.split(",")
    > @@store[temp[0]] = temp[1]
    > end
    > f.close
    > end
    > #puts @@store
    > end


    Unfortunately thats what I call finger trouble, as I was saying this
    is my first attempt at a Ruby application and was wondering if there
    is a more efficient method for what I am trying to achieve. Would
    using f.each_line and using a block be better?

    Thanks,
    Kev
     
    Kev, Mar 9, 2007
    #2
    1. Advertising

  3. 2007/3/9, Kev <>:
    > On 9 Mar, 09:12, "Kev" <> wrote:
    > > Hello,
    > >
    > > I am writing a class and I require it to open a file, and store the
    > > contents in key, value pairs.
    > > This is my first
    > >
    > > def initialize()
    > > @@store = Hash.new
    > > end
    > >
    > > def read_file
    > > if File.exists?("LocationCopy.csv")
    > > f = File.open("LocationCopy.csv","r")
    > > f.each do |line|
    > > temp = line.split(",")
    > > @@store[temp[0]] = temp[1]
    > > end
    > > f.close
    > > end
    > > #puts @@store
    > > end

    >
    > Unfortunately thats what I call finger trouble, as I was saying this
    > is my first attempt at a Ruby application and was wondering if there
    > is a more efficient method for what I am trying to achieve. Would
    > using f.each_line and using a block be better?


    Efficiency is ok. Using the block form of File.open is safer, i.e.
    the file is always closed - even in case of error. But you should not
    use a class variable, use @store instead.

    And you can make your life easier by using CSV lib. Then it becomes a
    one liner:

    10:41:07 [~]: cat x
    a,b
    d,b;c

    10:41:08 [~]: ruby -r csv -r enumerator -e 'p CSV.to_enum:)open, "x",
    "r", ";").inject({}) {|h,(k,v)| h[k]=v; h}'
    {"a,b"=>nil, "d,b"=>"c"}

    10:41:32 [~]: ruby -r csv -r enumerator -e 'p CSV.to_enum:)open, "x",
    "r", ",").inject({}) {|h,(k,v)| h[k]=v; h}'
    {"a"=>"b", "d"=>"b;c"}

    CSV.foreach uses "," as default separator:

    10:41:49 [~]: ruby -r csv -r enumerator -e 'p CSV.to_enum:)foreach,
    "x").inject({}) {|h,(k,v)| h[k]=v; h}'
    {"a"=>"b", "d"=>"b;c"}

    Explanation: CSV.foreach yiels every record to the block. By using
    to_enum (which is part of "enumerator") you can treat the CSV reader
    like any Enumerable. With #inhect, a value is passed as first
    parameter to the block and the block result is passed to the next
    invocation to the block. In this case the hash which is stuffed into
    #inject is simply passed on and on and is ultimately the result of
    #inject. "p" then prints it.

    Kind regards

    robert

    --
    Have a look: http://www.flickr.com/photos/fussel-foto/
     
    Robert Klemme, Mar 9, 2007
    #3
  4. Kev

    Kev Guest

    Excellent.

    Thank you Robert.
     
    Kev, Mar 9, 2007
    #4
  5. Kev

    gga Guest

    Well, your code is more or less okay. It may be buggy in that you are
    also storing the \n (end of line) character. You probably need
    something like:
    @@store[temp[0]] = temp[1].chomp
    to remove the it.

    You can avoid checking if the file exists (if it does not, an Errno
    exception will be raised and propagated upstream). Let the
    application, instead of your class, deal with what's probably a user
    error (providing a missing file).
    You can also avoid the file close by doing it in a block (let ruby's C
    code automatically do the file close) and you can use IO#foreach
    (File#foreach) for iterating thru each line more easily.
    If you know you won't have files that won't fit in memory, you can
    read all your text into a string or array in a single go (this is
    usually called slurping), which can also speed things up a little in
    some cases.

    Here are some examples of doing the same thing written in different
    ways:


    require 'yaml'

    class ReaderYAML
    def initialize(file)
    # slurp the whole file into a string
    lines = File.read(file)
    # change commas to : (yaml hash representation)
    lines.gsub!(/,/, ':')
    # create the hash thru yaml
    @h = YAML::load(lines)
    end
    end

    require 'csv'

    class ReaderCSV
    def initialize(file)
    # read the file as a CSV file, flatten the resulting array and
    # make it a hash
    @h = Hash[*(CSV.read(file).flatten)]
    end
    end

    class ReaderCommas
    def initialize(file)
    @h = {}
    # slurp the file into an array
    lines = File.readlines(file)
    # process each line
    lines.each { |line|
    key, value = line.chomp.split(',')
    @h[key] = value
    }
    end
    end

    class ReaderCommasBigFile
    def initialize(file)
    @h = {}
    File.foreach(file) do |line|
    key, val = line.chomp.split(',')
    @h[key] = val
    end
    end
    end

    h = ReaderYAML.new('csv.txt')
    p h

    h2 = ReaderCSV.new('csv.txt')
    p h2

    h3 = ReaderCommas.new('csv.txt')
    p h3

    h4 = ReaderCommasBigFile.new('csv.txt')
    p h4


    require 'benchmark'

    n = 5000
    Benchmark.bm(5) do |b|
    b.report('big') { n.times do ReaderCommasBigFile.new('csv.txt');
    end }
    b.report('file') { n.times do ReaderCommas.new('csv.txt'); end }
    b.report('csv') { n.times do ReaderCSV.new('csv.txt'); end }
    b.report('yaml') { n.times do ReaderYAML.new('csv.txt'); end }
    end


    The YAML version does not do exactly the same as the others, but
    depending on your data, it might still be what you want. It also
    works for a very simple key/value pair per line. Albeit YAML involves
    a little bit more work, it is still pretty optimized and will turn
    numeric data automatically into the appropriate ruby numeric class.
    CSV automatically deals with comma separated files for you, albeit it
    is somewhat slow.

    Anyway, hope that gives you some ideas. Overall, unless you are
    dealing with huge files, you should not worry too much about speed
    while writing your class.
     
    gga, Mar 9, 2007
    #5
  6. Kev

    Kev Guest

    gga,

    Thank you for the code,
    I will go away and digest.

    Cheers,
    Kev
     
    Kev, Mar 9, 2007
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. rp
    Replies:
    1
    Views:
    539
    red floyd
    Nov 10, 2011
  2. æœã®æœ¨
    Replies:
    3
    Views:
    1,024
    Juha Nieminen
    Apr 18, 2012
  3. JL
    Replies:
    2
    Views:
    100
    Tim Chase
    Dec 14, 2013
  4. Cameron Simpson
    Replies:
    0
    Views:
    88
    Cameron Simpson
    Dec 15, 2013
  5. Chris Angelico
    Replies:
    0
    Views:
    97
    Chris Angelico
    Dec 15, 2013
Loading...

Share This Page