design pattern for a file converter...

T

Tom Whittaker

I am creating an application that will allow the user to specify
different types of input files, and then my application reads in the
file, parses it, extracts rows and columns/fields, and stores the
contents in a mySQL table (my common format for all input data). I
never change the file I read in. I read it once and discard it. The
algorithm will read a specific format and then return results in a
common format that can be acted upon or stored in a mySQL table.

For example, a file could be an Excel file or a database file.

I'm thinking about using the strategy pattern, or potentially the
template method pattern.

Does anyone have any insight into the best design pattern(s) to use
for this feature?
 
R

Roedy Green

I'm thinking about using the strategy pattern, or potentially the
template method pattern.

All you really need is an interface for your converters. Then you can
pluggably invoke them by name with:

final Class<? extends Macro> macroClass = Class.forName(
binaryClassName ).asSubclass( Macro.class );

presuming your interface is called Macro.
 
M

markspace

I am creating an application that will allow the user to specify
different types of input files, and then my application reads in the
file, parses it,


First, I think design patterns are somewhat low level implementations
that depend a lot on how the design is structured and what sorts of
requirements you have over all. Just "read a file" doesn't really imply
Strategy Pattern or Template or anything at all really.

That said, I think I'd look at the Factory Pattern. The idea is that at
the highest level, you pass the file you want to read to the factory and
the factory figures out all the details.

File f = ...
MyParser parser = MyParser.factory( f );

Next, you might have different criteria for each file. What the
extension is, what magic bytes it has, in a pinch you might have to read
the file and look for a few different constructs to figure things out.
Then you want to instantiate a parser based on what you figured out.

So I see two types there. The first where do mach the file types is
maybe called a FileTypeCriteria, and the second is a FileParser. I'd
use both Strategy and Template for those, as you want a good API like
Template but you also might want to replace each individually and also
modify each with out affecting other code.

So the factory itself might be involved in just matching
FileTypeCriteria to FileParsers, with the real work delegated to a
FileParser, which maybe ought to parse a Stream rather than a file, now
that I think about it.

You'd have to init the factory with both FileTypeCriteria and FileParsers:

FileTypeCriteria criteria = ...
FileParser parser = ...
MyParser.addParser( criteria, parser );

Then you're off to the races, just make the criteria and parsers that
you need, and let the factory worry about binding them and serving them
to users.

I think a simple criteria could be implemented just from file names
something like this:

class FileNameCriteria implements FileTypeCriteria {
private final Pattern pattern;
public FileNameCriteria( String pattern ) {
pattern = Pattern.compile( pattern );
}
public boolean matchers( File f ) {
return pattern.matcher( f.getName() ).matches();
}
}

Then you could just make a new criteria for, say, HTML files something
like this:

new FileNameCriteria( "*.html^" );

and any other extensions similarly. Code not compiled or tested! Good
luck.
 
M

Martin Gregorie

I am creating an application that will allow the user to specify
different types of input files, and then my application reads in the
file, parses it, extracts rows and columns/fields, and stores the
contents in a mySQL table (my common format for all input data). I
never change the file I read in. I read it once and discard it. The
algorithm will read a specific format and then return results in a
common format that can be acted upon or stored in a mySQL table.

For example, a file could be an Excel file or a database file.
I think you're skating over the most important part: thinking about how
you're going to tell the program:
- what an input file is to be parsed
- how the output table(s) are defined
- how to map input fields to columns in tables
- how the specify and manage format conversion from input fields
to column content. Dates will almost certainly need format
conversion but you may also need a variety of other conversions
- how to validate the input and what to do if a field can't be
converted into something acceptable to SQL

Once you have that figured out you're about 95% of the way there.

The rest is just a record-by-record copy with your conversion code called
between the file read operation and the SQL INSERT operation.
 
L

Lew

markspace said:
...
That said, I think I'd look at the Factory Pattern. The idea is that at
the highest level, you pass the file you want to read to the factory and
the factory figures out all the details.

File f = ...
MyParser parser = MyParser.factory( f );

Next, you might have different criteria for each file. What the
extension is, what magic bytes it has, in a pinch you might have to read
the file and look for a few different constructs to figure things out.
Then you want to instantiate a parser based on what you figured out.

So I see two types there. The first where do mach the file types is
maybe called a FileTypeCriteria, and the second is a FileParser. I'd use
both Strategy and Template for those, as you want a good API like
Template but you also might want to replace each individually and also
modify each with out affecting other code.

So the factory itself might be involved in just matching
FileTypeCriteria to FileParsers, with the real work delegated to a
FileParser, which maybe ought to parse a Stream rather than a file, now
that I think about it.

You'd have to init the factory with both FileTypeCriteria and FileParsers:

FileTypeCriteria criteria = ...
FileParser parser = ...
MyParser.addParser( criteria, parser );

Then you're off to the races, just make the criteria and parsers that
you need, and let the factory worry about binding them and serving them
to users.

I think a simple criteria could be implemented just from file names
something like this:

class FileNameCriteria implements FileTypeCriteria {
private final Pattern pattern;
public FileNameCriteria( String pattern ) {
pattern = Pattern.compile( pattern );
}
public boolean matchers( File f ) {
return pattern.matcher( f.getName() ).matches();
}
}

Then you could just make a new criteria for, say, HTML files something
like this:

new FileNameCriteria( "*.html^" );

and any other extensions similarly. Code not compiled or tested! Good luck.

Beautiful.
 
S

Stefan Ram

Tom Whittaker said:
Does anyone have any insight into the best design pattern(s) to use
for this feature?

I tend to write such code in a more procedural manner first.

Then, I see the patterns in the code.

Then, I refactor to make the patterns explicit if this seems
to be of any use at all.

While it is true that there can be a »rush to code« problem,
there also might be a »premature design« problem.

The goal is to solve the problem and to get source code
that can be read and maintained easily - it is not to apply
design patterns. When design patterns serve that purpose,
they impose themselves on the experienced programmer during
the development.

Look here:

http://www.mindhacks.com/blog/2008/03/rock_climbing_hacks.html
http://dilbert.com/blog/entry/visualizing_the_entire_path/
 
T

Tom Anderson

I tend to write such code in a more procedural manner first.

Then, I see the patterns in the code.

Then, I refactor to make the patterns explicit if this seems
to be of any use at all.

This is what i do too. When i was younger, i used to start with a lovely
objecty design. I think what broke that habit was spending a few years in
Python instead of Java, mostly writing very simple, procedural programs,
because of the nature of the work i was doing. When i came back to Java, i
found myself in the same mindset as Stefan - start simple, and refactor
out the objects and patterns as they become obvious.
While it is true that there can be a ?rush to code? problem,
there also might be a ?premature design? problem.
Absolutely.

The goal is to solve the problem and to get source code
that can be read and maintained easily - it is not to apply
design patterns.

Perfectly put.
When design patterns serve that purpose,
they impose themselves on the experienced programmer during
the development.

They can do, but do not inevitably do so. If you apply the right pattern
at the right time, it reduces complexity from then on; that moment can
come at any point in development. But there's no sense trying to shoehorn
in a pattern before it does.

tom

--
We are going to have to be speculative, but there is good and
bad speculation, and this is not an unparalleled activity in
science. [...] Those scientists who have no taste for this sort of
speculative enterprise will just have to stay in the trenches and do
without it, while the rest of us risk embarrassing mistakes and have a
lot of fun. -- Daniel Dennett
 
A

Arne Vajhøj

I think you're skating over the most important part: thinking about how
you're going to tell the program:
- what an input file is to be parsed
- how the output table(s) are defined
- how to map input fields to columns in tables
- how the specify and manage format conversion from input fields
to column content. Dates will almost certainly need format
conversion but you may also need a variety of other conversions
- how to validate the input and what to do if a field can't be
converted into something acceptable to SQL

Once you have that figured out you're about 95% of the way there.

The phrasing of the question makes me assume that a give piece
of code (strategy/plugin/whatever) is hardcoded for a given input
format.
The rest is just a record-by-record copy with your conversion code called
between the file read operation and the SQL INSERT operation.

True.

Arne
 
A

Arne Vajhøj

This is what i do too. When i was younger, i used to start with a lovely
objecty design. I think what broke that habit was spending a few years
in Python instead of Java, mostly writing very simple, procedural
programs, because of the nature of the work i was doing. When i came
back to Java, i found myself in the same mindset as Stefan - start
simple, and refactor out the objects and patterns as they become obvious.

I think it depends on the problem.

I don't think starting a 500 KLOC project procedural and then
refactoring to OO later will work.

For small apps it may work fine.
Perfectly put.


They can do, but do not inevitably do so. If you apply the right pattern
at the right time, it reduces complexity from then on; that moment can
come at any point in development. But there's no sense trying to
shoehorn in a pattern before it does.

Yep - over applying patterns is a known anti-pattern.

Arne
 
T

Tom Anderson

I think it depends on the problem.

I don't think starting a 500 KLOC project procedural and then
refactoring to OO later will work.

Not after you've written the half million lines, no. But you can do it as
you go along - write something straightforward, then "refactor out the
objects and patterns as they become obvious".

tom
 
A

Arne Vajhøj

Not after you've written the half million lines, no. But you can do it
as you go along - write something straightforward, then "refactor out
the objects and patterns as they become obvious".

As soon as the first part become OO'ish, then building
procedural on top of that can become tricky.

Arne
 
L

Lew

As soon as the first part become OO'ish, then building
procedural on top of that can become tricky.

The "write first, refactor later" strategy is not incompatible with "get it
right first". If you are well-versed in object-oriented (or better,
type-based) programming, you will code that way from the get-go. You're not
going to write crappy spaghetti and then magically decide to apply good sense
to it. You're either going to apply good sense from the beginning or you are
never going to.

That said, initial knowledge is imperfect and refactoring will be needed.
Those that wrote their code intelligently up front will find this less of a
problem than those that didn't. So yes, you "refactor out the objects and
patterns as they become obvious", but a lot of them will be "obvious" at the
start.

If you aren't finding at least some patterns and programming for durability at
the start, good effing luck. You'll need it.
 
T

Tom Anderson

As soon as the first part become OO'ish, then building
procedural on top of that can become tricky.

Conceivably. One of the things i learned from my time in python is that it
isn't necessarily, or even frequently, the case: you can write procedural
code that picks up some objects, does stuff with them, and then puts them
down and carries on being procedural. It depends on your objects of
course: for example, you can happily write very procedural-ish code using
Java's collections classes - they're just another datatype the code can
work with. They don't force you to object-orient your own code. There are
classes which do force you to adopt object-orientation: SAX parsers and
Swing spring to mind, both because they need you to write event handlers.

tom
 
M

Martin Gregorie

The phrasing of the question makes me assume that a give piece of code
(strategy/plugin/whatever) is hardcoded for a given input format.
I wasn't certain either way but may have been influenced by a similar job
that required a data dictionary approach, i.e. the ability to define
matching fields in input and output and the type of conversion required.
An explicit requirement was that the format converter must be able to
deal with arbitrary changes to input formats without requiring any code
to be rewritten.

But, no matter which approach is taken to the format conversion,
designing and writing a set of hard-coded plugins for a common interface
is almost certainly harder than the OP thinks: ignoring that while asking
about patterns for the outer loop sounded like misdirected effort. Hence
what I wrote.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top