How to improve this code?

Jair Rillo Junior · Jan 23, 2008

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
(e-mail address removed),value1
(e-mail address removed),value2
(e-mail address removed),value3
(e-mail address removed),value4
(e-mail address removed),value1
(e-mail address removed),value2

the output should be in two lines
(e-mail address removed),value1,value2,value3,value4
(e-mail address removed),value1,value

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

The code following below:

h = Hash.new
File.open("Sector_brand.csv").each_line do |lines|
values = lines.split(",")
email = values[0]
content = values[1]
if h.key?(email)
l = h
l.push content
h[email] = l
else
l = [content]
h[email] = l
end
end

I didn't put the code to print the Hash. Also I didn't create the code
above in a class because it is just a test.

Well guys, does anyone could see my code and give comments? How the code
above could be improved?

Thanks in advice.

Junior

7stud -- · Jan 23, 2008

Jair said:
Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
(e-mail address removed),value1
(e-mail address removed),value2
(e-mail address removed),value3
(e-mail address removed),value4
(e-mail address removed),value1
(e-mail address removed),value2

the output should be in two lines
(e-mail address removed),value1,value2,value3,value4
(e-mail address removed),value1,value

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

The code following below:

h = Hash.new
File.open("Sector_brand.csv").each_line do |lines|
values = lines.split(",")
email = values[0]
content = values[1]
if h.key?(email)
l = h(e-mail address removed)

Rados³aw Bu³at · Jan 23, 2008

SG0sIEkgdGhpbmsgdGhhdCB3ZSwgUnVieSBwcm9ncmFtbWVycywgbGlrZSAiPDwiIChpdCdzIHZl
cmJvc2UgYW5kCmxlc3MgdHlwaW5nKSBhYm92ZSAicHVzaCIsIHNvOgoKaFtlbWFpbF0gPDwgY29u
dGVudAoKLS0gClJhZG9zs2F3IEJ1s2F0CgpodHRwOi8vcmFkYXJlay5qb2dnZXIucGwgLSBt82og
YmxvZwo=

John Carter · Jan 23, 2008

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Probably the mostly useful first thought in Ruby is... "Nah, I bet
it's in the standard library somewhere, better check ruby-doc.org"

From standard library module 'csv'

# Open a CSV formatted file for reading or writing.
#
# For reading.
#
# EXAMPLE 1
# CSV.open('csvfile.csv', 'r') do |row|
# p row
# end
#
# EXAMPLE 2
# reader = CSV.open('csvfile.csv', 'r')
# row1 = reader.shift
# row2 = reader.shift
# if row2.empty?
# p 'row2 not find.'
# end
# reader.close
#
# ARGS
# filename: filename to parse.
# col_sep: Column separator. ?, by default. If you want to separate
# fields with semicolon, give ?; here.
# row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
# want to separate records with \r, give ?\r here.
#
# RETURNS
# reader instance. To get parse result, see CSV::Reader#each.
#
#
# For writing.
#
# EXAMPLE 1
# CSV.open('csvfile.csv', 'w') do |writer|
# writer << ['r1c1', 'r1c2']
# writer << ['r2c1', 'r2c2']
# writer << [nil, nil]
# end
#
# EXAMPLE 2
# writer = CSV.open('csvfile.csv', 'w')
# writer << ['r1c1', 'r1c2'] << ['r2c1', 'r2c2'] << [nil, nil]
# writer.close
#
# ARGS
# filename: filename to generate.
# col_sep: Column separator. ?, by default. If you want to separate
# fields with semicolon, give ?; here.
# row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
# want to separate records with \r, give ?\r here.
#
# RETURNS
# writer instance. See CSV::Writer#<< and CSV::Writer#add_row to know how
# to generate CSV string.
#

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

My flavourite idiom is...
require 'set'
h = Hash.new{|hash,key| hash[key] = Set.new}

then in the loop..
values = lines.split(",")
email = values.shift
h(e-mail address removed)

Clifford Heath · Jan 23, 2008

The others have shown you how to create a Hash with a block to provide
a default value. Another way to program this is to say:

File.open("Sector_brand.csv").each_line do |lines|
values = lines.split(",")
(h[values[0]] ||= []) << values[1]
end

....or the equivalent using one of the CSV libraries.

Clifford Heath.

botp · Jan 23, 2008

My initial thought was store the values into a Hash object, where the

my initial thought was just to output them plainly

my stupid example follows,

botp@pc4all:~$ cat test.rb
v0=nil
File.open("test.txt").each_line do |lines|
values = lines.chomp.split(",")
if v0 != values[0]
puts unless v0.nil?
v0 = values[0]
print v0
end
print ",",values[1]
end

botp@pc4all:~$ ruby test.rb
(e-mail address removed),value1,value2,value3,value4
(e-mail address removed),value1,value2

7stud -- · Jan 23, 2008

John said:
Probably the mostly useful first thought in Ruby is... "Nah, I bet
it's in the standard library somewhere, better check ruby-doc.org"

I disagree with that. In my experience if you can avoid the Ruby
Standard library, you will save yourself a lot of headaches because it
is so poorly documented, and your code will probably run faster as well.

From standard library module 'csv'

In particular, the csv module is so inefficient, James Gray wrote his
own module and called it fastercsv.

William James · Jan 23, 2008

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
(e-mail address removed),value1
(e-mail address removed),value2
(e-mail address removed),value3
(e-mail address removed),value4
(e-mail address removed),value1
(e-mail address removed),value2

the output should be in two lines
(e-mail address removed),value1,value2,value3,value4
(e-mail address removed),value1,value

awk -F, "{a[$1]=a[$1] FS $2} END{for(k in a)print k a[k]}" file

Stephane Wirtel · Jan 23, 2008

You can try this code

#!/usr/bin/env ruby

require "csv"

hash = Hash.new { |hash,key| hash[key] = [] }

CSV.open( "file.csv", "r", "," ) do |row|
hash[row[0]] << row[1]
end

Good luck

Stephane

Jair Rillo Junior · Jan 23, 2008

Hey guys, so many ways to do !!!

I didn't know about csv library, as well as I didn't know about the <<
operator.

Thank you very much guys!!

Lee Jarvis · Jan 23, 2008

Jair said:
Hey guys, so many ways to do !!!

I didn't know about csv library, as well as I didn't know about the <<
operator.

Yeah its actually a method.. Array#<<

it seems to be prefered over Array#push, although they both return the
array itself so you can string along a load of appending values
together..

foo = [1,2] => [1, 2]
foo << 3 << 4 => [1, 2, 3, 4]
foo.push(5,6) => [1, 2, 3, 4, 5, 6]
foo.push(7).push(8)

Click to expand...

=> [1, 2, 3, 4, 5, 6, 7, 8]

Regards,
Lee

David Morton · Jan 23, 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I disagree with that. In my experience if you can avoid the Ruby
Standard library, you will save yourself a lot of headaches because it
is so poorly documented, and your code will probably run faster as
well.

In particular, the csv module is so inefficient, James Gray wrote his
own module and called it fastercsv.

Most of the time, it's better to look for standard libraries or at
least good third party libraries rather than re-inventing the wheel
though. Most of the time, when one re-invents the wheel, one gets it
wrong.

David Morton
Maia Mailguard http://www.maiamailguard.com
(e-mail address removed)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFHl3p3Uy30ODPkzl0RAudbAJ9O4nwdWYftZ+JYk8da7erHGaBv/QCaA/Ug
4ujI4f8GGvUD+Bk2emEsozI=
=qE3W
-----END PGP SIGNATURE-----

7stud -- · Jan 23, 2008

David said:
Most of the time, it's better to look for standard libraries or at
least good third party libraries rather than re-inventing the wheel
though. Most of the time, when one re-invents the wheel, one gets it
wrong.

Looking for a standard library module so that you can split a string on
a comma is a ridiculous waste of time. At some point, you actually
have to be able to program something.

David Morton · Jan 23, 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Looking for a standard library module so that you can split a string
on
a comma is a ridiculous waste of time. At some point, you actually
have to be able to program something.

If you think parsing a CSV file is as simple as splitting on a comma,
you need to think again.

Look up RFC 4180. It's not a hard format, but it *is* more than just
"foo,bar".split(',').

It's enough code that I'd rather use an existing library than to waste
a ridiculous amount of time doing it (correctly) myself.

David Morton
Maia Mailguard http://www.maiamailguard.com
(e-mail address removed)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFHl5pEUy30ODPkzl0RAkPDAKCcpJjxQZfSjGIuPBvtY0AQg7nU7wCeJT9s
vP9SrJlvtHuAzElaQgTvZQQ=
=Qyq3
-----END PGP SIGNATURE-----

Thomas Wieczorek · Jan 23, 2008

I used the standard CSV class and James' fasterCSV with Ruby 1.8.x
James' solution is much faster. If I know not wrong, I think fasterCSV
replaced the 1.8 class in Ruby 1.9
Just gem install fasterCSV then google for it. You'll find a lot of
good explanations and the doc was mostly good enough for me.

James Gray · Jan 23, 2008

If I know not wrong, I think fasterCSV replaced the 1.8 class in
Ruby 1.9

Correct. In Ruby 1.9, when you `require "csv"` you are getting the
FasterCSV code under its new name.

James Edward Gray II

can you explain code	3	Jan 8, 2010
How to try a range of hex values in C# code ?	0	Nov 19, 2022
HCaptcha - How to stop page from refreshing on submit if captcha is not checked/validated	1	Aug 29, 2023
How to improve this kind of API?	5	Feb 9, 2006
communication between matlab and microcontroller (problem with the C code)	0	Apr 12, 2012
How is this evaluated	6	Jul 4, 2013
how to read this	3	Nov 6, 2009
How to create an array which can be used also as a dictionary	4	Sep 10, 2009

How to improve this code?

Jair Rillo Junior

7stud --

Rados³aw Bu³at

John Carter

Clifford Heath

botp

7stud --

William James

Stephane Wirtel

Jair Rillo Junior

Lee Jarvis

David Morton

7stud --

David Morton

Thomas Wieczorek

James Gray

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads