Table-driven dispatch

  • Thread starter James B Crigler
  • Start date
J

James B Crigler

I need to read a file into memory and parse it for information on
demand as a kind of ad hoc query mechanism. (The file is
a non-tabular report from a database.)

Though there are other field types, for the time being, I only
want to consider fields that appear with a tag on a single line,
e.g.,

Title: A Tale Of Two Cities

Author: Charles Dickens

There are various classes of files representing different object
types as reported from the database. The tags, even for fields
like title and author, are not consistent among report types. (I
don't have control of the database, so I can't fix it.)

My original implementation approximated the code you'll find below.

There are some annoyances in the code:

1. The deferred lookups have to be written individually (def
title, def author, etc.), though they only have to be written
once if they belong in the base class.

2. I pass the variable into find_tagged_one_line to see whether
it already has a value, but I can't assign it there.

Is there a way, e.g., with missing_method, to create
a table-driven caching dispatch on the field names that performs
assignment in the find_... method?


class Literature
Tags = {
'title' => %r[^Title:\s*],
'author' => %r[^Author:\s*],
# etc
}

def initialize(path)
@text = Array.new
@tags = Tags
@title = nil
@author = nil
if File.exists? path
File.open(path).each do |line|
@text.push line.chomp
end
# else
# raise NoSuchLiterature
end
end

def find_tagged_one_line(re, var)
return var unless var.nil? # Got a value? Return it
m = nil
line = @text.find { |l| m = re.match(l) } # Match RE in array of lines
return "" if line.nil? # No match?
return line[m.end(0) .. -1] # Return part of line after tag
end

def title
@title = find_tagged_one_line(@tags['title'], @title)
end

def author
@author = find_tagged_one_line(@tags['author'], @author)
end

# etc
end

class Novel < Literature

Tags = {
'title' => %r[^Novel Title:\s*],
'author' => %r[^Novel Author:\s*],
'year' => %r[^Year of publication:\s*],
# etc
}

def initialize(path)
super(path)
@year = nil
@tags = Tags # Override the tags hash
end

def year
@year = find_tagged_one_line(@tags['year'], @year)
end
end
--
James B Crigler Voice: (770)494-2077
C-27J Software Requirements Manager Fax: (770)494-3886
Lockheed Martin Aeronautics Company
D/6B3M Z/0100
86 South Cobb Drive
Marietta GA 30063
"I say, Jerry! You'd be in a Blazing bad way if recalling to life
was to become the fashion, Jerry!" -- A Tale of Two Cities
 
R

Robert Klemme

James B Crigler said:
I need to read a file into memory and parse it for information on
demand as a kind of ad hoc query mechanism. (The file is
a non-tabular report from a database.)

Though there are other field types, for the time being, I only
want to consider fields that appear with a tag on a single line,
e.g.,

Title: A Tale Of Two Cities

Author: Charles Dickens

There are various classes of files representing different object
types as reported from the database. The tags, even for fields
like title and author, are not consistent among report types. (I
don't have control of the database, so I can't fix it.)

My original implementation approximated the code you'll find below.

There are some annoyances in the code:

1. The deferred lookups have to be written individually (def
title, def author, etc.), though they only have to be written
once if they belong in the base class.

2. I pass the variable into find_tagged_one_line to see whether
it already has a value, but I can't assign it there.

Is there a way, e.g., with missing_method, to create
a table-driven caching dispatch on the field names that performs
assignment in the find_... method?


class Literature
Tags = {
'title' => %r[^Title:\s*],
'author' => %r[^Author:\s*],
# etc
}

def initialize(path)
@text = Array.new
@tags = Tags
@title = nil
@author = nil
if File.exists? path
File.open(path).each do |line|
@text.push line.chomp
end

Btw, you don't close the file handle properly here.

How about:

class TaggedItem
def initialize(tags)
@tags = tags
@values = {}
end

def push(tag, value)
(key = get_key tag) and @values[key] = value
end

def [](key) @values[key] end
def value?(key) @values.has_key? key end

def method_missing(s,*a)
super unless a.empty?
@values
end

private
def get_key(tag) @tags[tag] end
end


lit = TaggedItem.new(
"Title" => :title,
"Author" => :author
)

nov = TaggedItem.new(
"Novel Title" => :title,
"Novel Author" => :author,
"Year of publication" => :year
)

all=[lit, nov]

while ( line = gets )
line.chomp!
if %r{^([^:]+):\s*(.*)$} =~ line
all.each {|x| x.push( $1, $2 )}
end
end

p all
p lit.author

Of course you can do all sorts of changes here, for example, you could
inherit OpenStruct and thus get rid of @values.

Kind regards

robert
 
J

James B Crigler

Robert said:
Btw, you don't close the file handle properly here.

Unless something has changed in 1.8, Pickaxe1 says:

With no associated block, open is a synonym for File.new . If
the optional code block is given, it will be passed file as
an argument, and the file will automatically be closed when
the block terminates. In this instance, File.open returns nil.

I'm still looking at the rest of your solution. I'll get back to
the newsgroup soon.
--
James B Crigler Voice: (770)494-2077
C-27J Software Requirements Manager Fax: (770)494-3886
Lockheed Martin Aeronautics Company
D/6B3M Z/0100
86 South Cobb Drive
Marietta GA 30063
"I say, Jerry! You'd be in a Blazing bad way if recalling to life
was to become the fashion, Jerry!" -- A Tale of Two Cities
 
J

James B Crigler

James said:
Unless something has changed in 1.8, Pickaxe1 says:

With no associated block, open is a synonym for File.new . If
the optional code block is given, it will be passed file as
an argument, and the file will automatically be closed when
the block terminates. In this instance, File.open returns nil.

I'm still looking at the rest of your solution. I'll get back to
the newsgroup soon.

I apologize for my previous post. It occurred to me that Robert might
be right, and of course he is. There isn't a form for self-closing
files unless I add it (per Pickaxe1). Sorry.

FWIW, the correct form would be

File.open(path) { |file|
file.each do |line|
@text.push line.chomp
end
end

I.e., I need another level of nesting.

--
James B Crigler Voice: (770)494-2077
C-27J Software Requirements Manager Fax: (770)494-3886
Lockheed Martin Aeronautics Company
D/6B3M Z/0100
86 South Cobb Drive
Marietta GA 30063
"I say, Jerry! You'd be in a Blazing bad way if recalling to life
was to become the fashion, Jerry!" -- A Tale of Two Cities
 
J

James B Crigler

I like this solution, except it solves the wrong problem. That's
my fault because I didn't quite give enough of the problem
specification. I have over a thousand individual files to scan, some
with a couple of thousand lines. Some fields are not tagged, but
ASCII paragraphs, and these can be interspersed with the tagged lines.
Also, Robert's solution applies a capturing Regexp to every line of
the file. (I didn't specify it, but performance is at a premium here.)

I will probably use the part about putting the fields into a local
hash as the fields I want are parsed out. Also the tag translation
hashes are a nice touch.

Consider what happens in Robert's solution when confronted with this
(not exactly according to my document specification, but you'll get
the idea):

-----------
Title: A Tale Of Two Cities
Author: Charles Dickens
Analysis:
There are many different ways of interpreting Mr. Dickens's
Title: It is reminiscent of, for instance, the title of
St. Augustine's book, The City of God.
-----------

(This is easily programmed around (just take the first title, author,
whatever).)

In the problem I'm trying to solve, I am making ad hoc queries at the
command line and turning them into a filter procedure in an eval.
I don't know which fields I'll be querying or printing until after

filter = eval "proc { |#{mode}| #{filter_expr} }"

is evaluated. ("mode" contains the subclass of document, e.g.,
"novel"; "filter_expr" contains the filter expression (surprise! ;-)
passed in from the command line, and it is supposed to contain stuff
like "novel.title =~ /city/i" and so on.) A separate bit of command
line syntax selects fields to print.

--
James B Crigler Voice: (770)494-2077
C-27J Software Requirements Manager Fax: (770)494-3886
Lockheed Martin Aeronautics Company
D/6B3M Z/0100
86 South Cobb Drive
Marietta GA 30063
"I say, Jerry! You'd be in a Blazing bad way if recalling to life
was to become the fashion, Jerry!" -- A Tale of Two Cities
 
R

Robert Klemme

James B Crigler said:
I like this solution, except it solves the wrong problem. That's
my fault because I didn't quite give enough of the problem
specification. I have over a thousand individual files to scan, some
with a couple of thousand lines. Some fields are not tagged, but
ASCII paragraphs, and these can be interspersed with the tagged lines.

Well, maybe you start over again by describing the use case. Some
questions that might help in doing this:

- What problem are you trying to solve?
- What's the input? (format, size)
- What's the expected output?
Also, Robert's solution applies a capturing Regexp to every line of
the file. (I didn't specify it, but performance is at a premium here.)

Well, your original solution did apply multiple regexps per line if I
remember correctly (i.e. in the case where you are searching for more than
one type). A regexp match against the line is cheap, and you have to
check *somehow* what kind of line you have. After all, how else do you
want to find out which lines are tagged and which lines aren't?
I will probably use the part about putting the fields into a local
hash as the fields I want are parsed out. Also the tag translation
hashes are a nice touch.

Consider what happens in Robert's solution when confronted with this
(not exactly according to my document specification, but you'll get
the idea):

-----------
Title: A Tale Of Two Cities
Author: Charles Dickens
Analysis:
There are many different ways of interpreting Mr. Dickens's
Title: It is reminiscent of, for instance, the title of
St. Augustine's book, The City of God.
-----------

(This is easily programmed around (just take the first title, author,
whatever).)

In the problem I'm trying to solve, I am making ad hoc queries at the
command line and turning them into a filter procedure in an eval.
I don't know which fields I'll be querying or printing until after

filter = eval "proc { |#{mode}| #{filter_expr} }"

is evaluated.

But you know them at the start of the script (afer evaluating of the
command line arguments):
("mode" contains the subclass of document, e.g.,
"novel"; "filter_expr" contains the filter expression (surprise! ;-)
passed in from the command line, and it is supposed to contain stuff
like "novel.title =~ /city/i" and so on.) A separate bit of command
line syntax selects fields to print.

So you have two selection mechanisms:

1. the records to choose
2. the fields of the result records to print

Code generation for 1. is certainly a good idea, because that usually
yields superior performance. I won't go into more detail at the moment
partly because I feel there's not yet enough information and partly I
don't have the time right now.

Regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top