How to improve this code?

J

Jair Rillo Junior

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
(e-mail address removed),value1
(e-mail address removed),value2
(e-mail address removed),value3
(e-mail address removed),value4
(e-mail address removed),value1
(e-mail address removed),value2

the output should be in two lines
(e-mail address removed),value1,value2,value3,value4
(e-mail address removed),value1,value

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

The code following below:

h = Hash.new
File.open("Sector_brand.csv").each_line do |lines|
values = lines.split(",")
email = values[0]
content = values[1]
if h.key?(email)
l = h
l.push content
h[email] = l
else
l = [content]
h[email] = l
end
end

I didn't put the code to print the Hash. Also I didn't create the code
above in a class because it is just a test.

Well guys, does anyone could see my code and give comments? How the code
above could be improved?

Thanks in advice.

Junior
 
7

7stud --

Jair said:
Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
(e-mail address removed),value1
(e-mail address removed),value2
(e-mail address removed),value3
(e-mail address removed),value4
(e-mail address removed),value1
(e-mail address removed),value2

the output should be in two lines
(e-mail address removed),value1,value2,value3,value4
(e-mail address removed),value1,value

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

The code following below:

h = Hash.new
File.open("Sector_brand.csv").each_line do |lines|
values = lines.split(",")
email = values[0]
content = values[1]
if h.key?(email)
l = h(e-mail address removed)
 
R

Rados³aw Bu³at

SG0sIEkgdGhpbmsgdGhhdCB3ZSwgUnVieSBwcm9ncmFtbWVycywgbGlrZSAiPDwiIChpdCdzIHZl
cmJvc2UgYW5kCmxlc3MgdHlwaW5nKSBhYm92ZSAicHVzaCIsIHNvOgoKaFtlbWFpbF0gPDwgY29u
dGVudAoKLS0gClJhZG9zs2F3IEJ1s2F0CgpodHRwOi8vcmFkYXJlay5qb2dnZXIucGwgLSBt82og
YmxvZwo=
 
J

John Carter

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Probably the mostly useful first thought in Ruby is... "Nah, I bet
it's in the standard library somewhere, better check ruby-doc.org"

From standard library module 'csv'

# Open a CSV formatted file for reading or writing.
#
# For reading.
#
# EXAMPLE 1
# CSV.open('csvfile.csv', 'r') do |row|
# p row
# end
#
# EXAMPLE 2
# reader = CSV.open('csvfile.csv', 'r')
# row1 = reader.shift
# row2 = reader.shift
# if row2.empty?
# p 'row2 not find.'
# end
# reader.close
#
# ARGS
# filename: filename to parse.
# col_sep: Column separator. ?, by default. If you want to separate
# fields with semicolon, give ?; here.
# row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
# want to separate records with \r, give ?\r here.
#
# RETURNS
# reader instance. To get parse result, see CSV::Reader#each.
#
#
# For writing.
#
# EXAMPLE 1
# CSV.open('csvfile.csv', 'w') do |writer|
# writer << ['r1c1', 'r1c2']
# writer << ['r2c1', 'r2c2']
# writer << [nil, nil]
# end
#
# EXAMPLE 2
# writer = CSV.open('csvfile.csv', 'w')
# writer << ['r1c1', 'r1c2'] << ['r2c1', 'r2c2'] << [nil, nil]
# writer.close
#
# ARGS
# filename: filename to generate.
# col_sep: Column separator. ?, by default. If you want to separate
# fields with semicolon, give ?; here.
# row_sep: Row separator. nil by default. nil means "\r\n or \n". If you
# want to separate records with \r, give ?\r here.
#
# RETURNS
# writer instance. See CSV::Writer#<< and CSV::Writer#add_row to know how
# to generate CSV string.
#

My initial thought was store the values into a Hash object, where the
KEY is the email (column a) and the value is an Array containing the
values (column b).
Going through all lines, test if the email address already exists in the
Hash, if so update the Array, otherwise create a new entry into the
Hash.

My flavourite idiom is...
require 'set'
h = Hash.new{|hash,key| hash[key] = Set.new}

then in the loop..
values = lines.split(",")
email = values.shift
h(e-mail address removed)
 
C

Clifford Heath

The others have shown you how to create a Hash with a block to provide
a default value. Another way to program this is to say:

File.open("Sector_brand.csv").each_line do |lines|
values = lines.split(",")
(h[values[0]] ||= []) << values[1]
end

....or the equivalent using one of the CSV libraries.

Clifford Heath.
 
B

botp

My initial thought was store the values into a Hash object, where the

my initial thought was just to output them plainly :)
my stupid example follows,

botp@pc4all:~$ cat test.rb
v0=nil
File.open("test.txt").each_line do |lines|
values = lines.chomp.split(",")
if v0 != values[0]
puts unless v0.nil?
v0 = values[0]
print v0
end
print ",",values[1]
end

botp@pc4all:~$ ruby test.rb
(e-mail address removed),value1,value2,value3,value4
(e-mail address removed),value1,value2
 
7

7stud --

John said:
Probably the mostly useful first thought in Ruby is... "Nah, I bet
it's in the standard library somewhere, better check ruby-doc.org"

I disagree with that. In my experience if you can avoid the Ruby
Standard library, you will save yourself a lot of headaches because it
is so poorly documented, and your code will probably run faster as well.
From standard library module 'csv'

In particular, the csv module is so inefficient, James Gray wrote his
own module and called it fastercsv.
 
W

William James

Hi guys,
I am new in the Ruby world, I am coming from Java, and I would like to
"think " in Ruby instead Java.

I did a code to read a CSV file (separated by comma), organize the
values and print the output.

Basically the CSV looks like this:
(e-mail address removed),value1
(e-mail address removed),value2
(e-mail address removed),value3
(e-mail address removed),value4
(e-mail address removed),value1
(e-mail address removed),value2

the output should be in two lines
(e-mail address removed),value1,value2,value3,value4
(e-mail address removed),value1,value

awk -F, "{a[$1]=a[$1] FS $2} END{for(k in a)print k a[k]}" file
 
S

Stephane Wirtel

You can try this code

#!/usr/bin/env ruby

require "csv"

hash = Hash.new { |hash,key| hash[key] = [] }

CSV.open( "file.csv", "r", "," ) do |row|
hash[row[0]] << row[1]
end

Good luck

Stephane
 
J

Jair Rillo Junior

Hey guys, so many ways to do !!!

I didn't know about csv library, as well as I didn't know about the <<
operator.

Thank you very much guys!!
 
L

Lee Jarvis

Jair said:
Hey guys, so many ways to do !!!

I didn't know about csv library, as well as I didn't know about the <<
operator.

Yeah its actually a method.. Array#<<

it seems to be prefered over Array#push, although they both return the
array itself so you can string along a load of appending values
together..
foo = [1,2] => [1, 2]
foo << 3 << 4 => [1, 2, 3, 4]
foo.push(5,6) => [1, 2, 3, 4, 5, 6]
foo.push(7).push(8)
=> [1, 2, 3, 4, 5, 6, 7, 8]


Regards,
Lee
 
D

David Morton

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I disagree with that. In my experience if you can avoid the Ruby
Standard library, you will save yourself a lot of headaches because it
is so poorly documented, and your code will probably run faster as
well.


In particular, the csv module is so inefficient, James Gray wrote his
own module and called it fastercsv.

Most of the time, it's better to look for standard libraries or at
least good third party libraries rather than re-inventing the wheel
though. Most of the time, when one re-invents the wheel, one gets it
wrong.



David Morton
Maia Mailguard http://www.maiamailguard.com
(e-mail address removed)



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFHl3p3Uy30ODPkzl0RAudbAJ9O4nwdWYftZ+JYk8da7erHGaBv/QCaA/Ug
4ujI4f8GGvUD+Bk2emEsozI=
=qE3W
-----END PGP SIGNATURE-----
 
7

7stud --

David said:
Most of the time, it's better to look for standard libraries or at
least good third party libraries rather than re-inventing the wheel
though. Most of the time, when one re-invents the wheel, one gets it
wrong.

Looking for a standard library module so that you can split a string on
a comma is a ridiculous waste of time. At some point, you actually
have to be able to program something.
 
D

David Morton

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Looking for a standard library module so that you can split a string
on
a comma is a ridiculous waste of time. At some point, you actually
have to be able to program something.



If you think parsing a CSV file is as simple as splitting on a comma,
you need to think again.

Look up RFC 4180. It's not a hard format, but it *is* more than just
"foo,bar".split(',').

It's enough code that I'd rather use an existing library than to waste
a ridiculous amount of time doing it (correctly) myself.


David Morton
Maia Mailguard http://www.maiamailguard.com
(e-mail address removed)



-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFHl5pEUy30ODPkzl0RAkPDAKCcpJjxQZfSjGIuPBvtY0AQg7nU7wCeJT9s
vP9SrJlvtHuAzElaQgTvZQQ=
=Qyq3
-----END PGP SIGNATURE-----
 
T

Thomas Wieczorek

I used the standard CSV class and James' fasterCSV with Ruby 1.8.x
James' solution is much faster. If I know not wrong, I think fasterCSV
replaced the 1.8 class in Ruby 1.9
Just gem install fasterCSV then google for it. You'll find a lot of
good explanations and the doc was mostly good enough for me.
 
J

James Gray

If I know not wrong, I think fasterCSV replaced the 1.8 class in
Ruby 1.9

Correct. In Ruby 1.9, when you `require "csv"` you are getting the
FasterCSV code under its new name.

James Edward Gray II
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,267
Latest member
WaylonCogb

Latest Threads

Top