ruby noob

M

mosfet

Hi,

I would like to parse a very simple html(index_msg.htm) file described
below :

<tr>
<td>WM_ACTIVATE</td>
<td>0x0006</td>
<td></td>
<td>0x0000</td>
<td>WM_NULL</td>
</tr>
<tr>
<td>WM_ACTIVATEAPP</td>
<td>0x001C</td>
<td></td>
<td>0x0001</td>
<td>WM_CREATE</td>
</tr>
....
I would like to parse this file and to extract information like this :

enum foo
{
eWM_ACTIVATE = 0x0006,
eWM_ACTIVATEAPP = 0x0001,
...
};

I am starting with this :


fileIn = File.open("C:/WIKI_CE/index_msg.htm", "r")
fileOut = File.new("C:/WIKI_CE/enumWmMsg.h", "w")

begin
while (line = fileIn.readline)
line.chomp
$stdout.print line
end
rescue EOFError
fileIn.close
fileOut.close
end

but now I am stuck. Should I use regex or how can I compare two string ?
 
P

Peter Szinek

mosfet said:
Hi,

I would like to parse a very simple html(index_msg.htm) file described
below :

<tr>
<td>WM_ACTIVATE</td>
<td>0x0006</td>
<td></td>
<td>0x0000</td>
<td>WM_NULL</td>
</tr>
<tr>
<td>WM_ACTIVATEAPP</td>
<td>0x001C</td>
<td></td>
<td>0x0001</td>
<td>WM_CREATE</td>
</tr>
...
I would like to parse this file and to extract information like this :

enum foo
{
eWM_ACTIVATE = 0x0006,
eWM_ACTIVATEAPP = 0x0001,
...
};

I am starting with this :


fileIn = File.open("C:/WIKI_CE/index_msg.htm", "r")
fileOut = File.new("C:/WIKI_CE/enumWmMsg.h", "w")

begin
while (line = fileIn.readline)
line.chomp
$stdout.print line
end
rescue EOFError
fileIn.close
fileOut.close
end

This should get you started:

=====================================================================
require 'rubygems'
require 'scrubyt'

data = Scrubyt::Extractor.define do
fetch('input.html')

record do
var_name 'WM_ACTIVATE'
code '0x0006'
end
end

result = data.to_xml.to_s
names = result.scan(/var_name>(.+?)<\/var_name/).flatten
values = result.scan(/code>(.+?)<\/code/).flatten
pairs = names.zip(values)

pairs.each do |name, value|
puts "e#{name} = #{value}"
end
=====================================================================

The XML to array code kind of sucks, in the next version of scRUBYt! you
will be able to output the result directly to a hash (or CSV or YAML or
some other, more friendly format for such a task).

Cheers,
Peter
__
http://www.rubyrailways.com :: Ruby and Web2.0 blog
http://scrubyt.org :: Ruby web scraping framework
http://rubykitchensink.ca/ :: The indexed archive of all things Ruby
 
J

Jan Svitok

Hi,

I would like to parse a very simple html(index_msg.htm) file described
below :

<tr>
<td>WM_ACTIVATE</td>
<td>0x0006</td>
<td></td>
<td>0x0000</td>
<td>WM_NULL</td>
</tr>
<tr>
<td>WM_ACTIVATEAPP</td>
<td>0x001C</td>
<td></td>
<td>0x0001</td>
<td>WM_CREATE</td>
</tr>
...
I would like to parse this file and to extract information like this :

enum foo
{
eWM_ACTIVATE = 0x0006,
eWM_ACTIVATEAPP = 0x0001,
...
};

I am starting with this :


fileIn = File.open("C:/WIKI_CE/index_msg.htm", "r")
fileOut = File.new("C:/WIKI_CE/enumWmMsg.h", "w")

begin
while (line = fileIn.readline)
line.chomp
$stdout.print line
end
rescue EOFError
fileIn.close
fileOut.close
end

but now I am stuck. Should I use regex or how can I compare two string ?

1. have a look at hpricot
2. if it's too big for you use regexen with /m flag, and use Regex#scan():

REGEX = /<tr>\s*
<td>(.*?)<\/td>\s*
<td>(.*?)<\/td>
<td>(.*?)<\/td>
<td>(.*?)</td>
<td>(.*?)</td>
<\/tr>/xm

file_in = File.read("C:/WIKI_CE/index_msg.htm")
File.open("C:/WIKI_CE/enumWmMsg.h", "w") do |file_out|
file_in.scan(REGEX) do
file_out.puts $1, $2, $3, $4, $5
end
end
end

Notes:
1. we_use_snake_case_for_variable_names
2. Use File.open with block to automatically close the file
3. You'll have the values in $1..$5
4. It seems you are inconsistent - in the first example you chose the
second line, in the other the fourth one.

In any case, Peter's approach will be easier, and more stable.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top