changing the format of a text file


B

Bary Buz

Hello everyone,

i am new to ruby and im having some problems trying to reformat a text
file.

Basically, i have a large log file which is around 200mb in the
following format:
----------------------------------------------------------
1000000 name
Status :A
Basetype :2
Version :1.0
|
|
(more
fields)
|
Name :/file/name/etc
1000001 name
Status :B
Basetype :2
Version :a20
|
|
Name :/file/name/etc
1000002 name
Status :C
|

... and so on


so for each 200mb file there are lot of entries.

What i want to do is to open the file, read the data into an array,
reformat the text and save it into another file with the following
output:

id, Status, Basetype, .... , Name
1000000, A, 2, ..... , /file/name/etc
1000001, B, 2, ..... , /file/name/etc

i tried to write a script in ruby to do that task but i dont get any
output so far.

def getfile(file_name)
entry = []
IO.foreach(file_name) do |fl|
if fl.include? 'name'
entry.push fl.scan(/\d+/)[0]
elsif fl.strip =~ /\A\d/
end
end
entry
end

def writefile(file, *linedata)
linedata.each do |line|
file << line.join(", ") +\n"
end
end

def readfile(file, outputfile)
out = File.new(outputfile, "w+")
info = []

wline = ['id', 'Status', 'Basetype', .... 'Name']

IO.foreach(file) { |line|

if line =~ //
wline[0]= line.scan(/\d+/)
elsif line =~ /Status/
wline[1]= line.split(":")[1].scan(/[a-zA-Z]+/).join("")
elsif line =~ /Basetype/
wline[2]= line.split(":")[1].scan(/\d+/).join("")
|
|
|
wline all fields
|
writefile(out, wline)
end
out.close
end

readfile('filename', 'outputfile')


this is what ive done so far, can someone tell me whats wrong and i dont
get any output at all..

Thanks in advance
 
Ad

Advertisements

J

James Coglan

[Note: parts of this message were removed to make it a legal post.]

2009/2/25 Bary Buz said:
Hello everyone,

i am new to ruby and im having some problems trying to reformat a text
file.

Basically, i have a large log file which is around 200mb in the
following format:
----------------------------------------------------------
1000000 name
Status :A
Basetype :2
Version :1.0
|
|
(more
fields)
|
Name :/file/name/etc
1000001 name
Status :B
Basetype :2
Version :a20
|
|
Name :/file/name/etc
1000002 name
Status :C
|

... and so on


so for each 200mb file there are lot of entries.

What i want to do is to open the file, read the data into an array,
reformat the text and save it into another file with the following
output:

id, Status, Basetype, .... , Name
1000000, A, 2, ..... , /file/name/etc
1000001, B, 2, ..... , /file/name/etc



I would strongly recommend looking at Treetop (http://treetop.rubyforge.org/).
It's a parser generator that produces tree structures from text files using
a grammar that you specify. If you know regular expressions, it shouldn't be
too big a leap to use Treetop's grammar language.

For this particular task it may be overkill, but certainly worth looking at.
 
Ad

Advertisements

J

James Gray

Hello everyone,

Hello and welcome.
i am new to ruby and im having some problems trying to reformat a text
file.

Basically, i have a large log file which is around 200mb in the
following format:
----------------------------------------------------------
1000000 name
Status :A
Basetype :2
Version :1.0
id, Status, Basetype, .... , Name
1000000, A, 2, ..... , /file/name/etc
1000001, B, 2, ..... , /file/name/etc

Do you just read the log file replacing variables holding Status, =20
Basetype, Version, and Name then spit out a new entry each time you =20
run across a number?
i tried to write a script in ruby to do that task but i dont get any
output so far.

I'll try to give some feedback=85
def getfile(file_name)
entry =3D []
IO.foreach(file_name) do |fl|
if fl.include? 'name'
entry.push fl.scan(/\d+/)[0]
elsif fl.strip =3D~ /\A\d/
end
end
entry
end

I don't see this method used anywhere in the code.
def writefile(file, *linedata)
linedata.each do |line|
file << line.join(", ") +\n"

You are missing a quote there. It should be:

=85 + "\n"
end
end

def readfile(file, outputfile)
out =3D File.new(outputfile, "w+")
info =3D []

wline =3D ['id', 'Status', 'Basetype', .... 'Name']

IO.foreach(file) { |line|

if line =3D~ //

Don't do that. It doesn't do what you think it does. :)

What are you looking for here? A line that starts with a digit? If =20
so, use this:

if line =3D~ /\A\s*(\d+)/
# the digit is in the $1 variable here...
wline[0]=3D line.scan(/\d+/)
elsif line =3D~ /Status/
wline[1]=3D line.split(":")[1].scan(/[a-zA-Z]+/).join("")

The above two lines can be simplified to:

elsif line =3D~ /\A\s*Status\s*:\s*([a-zA-Z]+)/
wline[1] =3D $1

The other assignments could be handled in a similar way.
elsif line =3D~ /Basetype/
wline[2]=3D line.split(":")[1].scan(/\d+/).join("")
|
|
|
wline all fields
|
writefile(out, wline)
end
out.close
end

readfile('filename', 'outputfile')


this is what ive done so far, can someone tell me whats wrong and i =20=
dont
get any output at all..

It's not real easy for me to tell why you don't see output. It looks =20=

like outputs might only happen in that last elsif. If that's the =20
case, you won't se output unless the code makes it there. I'm =20
guessing it's not. Maybe because of the line =3D~ // condition, which =20=

is problematic.

I believe the code below does something like what you want. I hope it =20=

can be adapted to your needs.

James Edward Gray II

#!/usr/bin/env ruby -wKU

fields =3D ["id"]
fields_written =3D false
entry =3D { }

DATA.each do |line|
case line
when /\A\s*(\d+)/
unless entry.empty?
unless fields_written
puts fields.join(", ")
fields_written =3D true
end
puts fields.map { |f| entry[f] }.join(", ")
entry.clear
end
entry["id"] =3D $1
when /\A\s*([a-zA-Z]+)\s*:\s*(\S+)/
fields << $1 unless fields.include? $1
entry[$1] =3D $2
end
end

__END__
1000000 name
Status :A
Basetype :2
Version :1.0
Name :/file/name/etc
1000001 name
Status :B
Basetype :2
Version :a20
Name :/file/name/etc
1000002 name
Status :C
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top