design pattern for a file converter...

Discussion in 'Java' started by Tom Whittaker, Dec 9, 2010.

  1. I am creating an application that will allow the user to specify
    different types of input files, and then my application reads in the
    file, parses it, extracts rows and columns/fields, and stores the
    contents in a mySQL table (my common format for all input data). I
    never change the file I read in. I read it once and discard it. The
    algorithm will read a specific format and then return results in a
    common format that can be acted upon or stored in a mySQL table.

    For example, a file could be an Excel file or a database file.

    I'm thinking about using the strategy pattern, or potentially the
    template method pattern.

    Does anyone have any insight into the best design pattern(s) to use
    for this feature?
     
    Tom Whittaker, Dec 9, 2010
    #1
    1. Advertising

  2. Tom Whittaker

    Roedy Green Guest

    On Thu, 9 Dec 2010 12:08:58 -0800 (PST), Tom Whittaker
    <> wrote, quoted or indirectly quoted someone who
    said :

    >
    >I'm thinking about using the strategy pattern, or potentially the
    >template method pattern.


    All you really need is an interface for your converters. Then you can
    pluggably invoke them by name with:

    final Class<? extends Macro> macroClass = Class.forName(
    binaryClassName ).asSubclass( Macro.class );

    presuming your interface is called Macro.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    In programming, and documenting programs, keep vocabulary consistent and precisely defined! Variation in vocabulary to relieve the tedium is for novels.
     
    Roedy Green, Dec 9, 2010
    #2
    1. Advertising

  3. Tom Whittaker

    markspace Guest

    On 12/9/2010 12:08 PM, Tom Whittaker wrote:
    > I am creating an application that will allow the user to specify
    > different types of input files, and then my application reads in the
    > file, parses it,



    First, I think design patterns are somewhat low level implementations
    that depend a lot on how the design is structured and what sorts of
    requirements you have over all. Just "read a file" doesn't really imply
    Strategy Pattern or Template or anything at all really.

    That said, I think I'd look at the Factory Pattern. The idea is that at
    the highest level, you pass the file you want to read to the factory and
    the factory figures out all the details.

    File f = ...
    MyParser parser = MyParser.factory( f );

    Next, you might have different criteria for each file. What the
    extension is, what magic bytes it has, in a pinch you might have to read
    the file and look for a few different constructs to figure things out.
    Then you want to instantiate a parser based on what you figured out.

    So I see two types there. The first where do mach the file types is
    maybe called a FileTypeCriteria, and the second is a FileParser. I'd
    use both Strategy and Template for those, as you want a good API like
    Template but you also might want to replace each individually and also
    modify each with out affecting other code.

    So the factory itself might be involved in just matching
    FileTypeCriteria to FileParsers, with the real work delegated to a
    FileParser, which maybe ought to parse a Stream rather than a file, now
    that I think about it.

    You'd have to init the factory with both FileTypeCriteria and FileParsers:

    FileTypeCriteria criteria = ...
    FileParser parser = ...
    MyParser.addParser( criteria, parser );

    Then you're off to the races, just make the criteria and parsers that
    you need, and let the factory worry about binding them and serving them
    to users.

    I think a simple criteria could be implemented just from file names
    something like this:

    class FileNameCriteria implements FileTypeCriteria {
    private final Pattern pattern;
    public FileNameCriteria( String pattern ) {
    pattern = Pattern.compile( pattern );
    }
    public boolean matchers( File f ) {
    return pattern.matcher( f.getName() ).matches();
    }
    }

    Then you could just make a new criteria for, say, HTML files something
    like this:

    new FileNameCriteria( "*.html^" );

    and any other extensions similarly. Code not compiled or tested! Good
    luck.
     
    markspace, Dec 9, 2010
    #3
  4. On Thu, 09 Dec 2010 12:08:58 -0800, Tom Whittaker wrote:

    > I am creating an application that will allow the user to specify
    > different types of input files, and then my application reads in the
    > file, parses it, extracts rows and columns/fields, and stores the
    > contents in a mySQL table (my common format for all input data). I
    > never change the file I read in. I read it once and discard it. The
    > algorithm will read a specific format and then return results in a
    > common format that can be acted upon or stored in a mySQL table.
    >
    > For example, a file could be an Excel file or a database file.
    >

    I think you're skating over the most important part: thinking about how
    you're going to tell the program:
    - what an input file is to be parsed
    - how the output table(s) are defined
    - how to map input fields to columns in tables
    - how the specify and manage format conversion from input fields
    to column content. Dates will almost certainly need format
    conversion but you may also need a variety of other conversions
    - how to validate the input and what to do if a field can't be
    converted into something acceptable to SQL

    Once you have that figured out you're about 95% of the way there.

    The rest is just a record-by-record copy with your conversion code called
    between the file read operation and the SQL INSERT operation.


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, Dec 10, 2010
    #4
  5. Tom Whittaker

    Lew Guest

    markspace wrote:
    > ...
    > That said, I think I'd look at the Factory Pattern. The idea is that at
    > the highest level, you pass the file you want to read to the factory and
    > the factory figures out all the details.
    >
    > File f = ...
    > MyParser parser = MyParser.factory( f );
    >
    > Next, you might have different criteria for each file. What the
    > extension is, what magic bytes it has, in a pinch you might have to read
    > the file and look for a few different constructs to figure things out.
    > Then you want to instantiate a parser based on what you figured out.
    >
    > So I see two types there. The first where do mach the file types is
    > maybe called a FileTypeCriteria, and the second is a FileParser. I'd use
    > both Strategy and Template for those, as you want a good API like
    > Template but you also might want to replace each individually and also
    > modify each with out affecting other code.
    >
    > So the factory itself might be involved in just matching
    > FileTypeCriteria to FileParsers, with the real work delegated to a
    > FileParser, which maybe ought to parse a Stream rather than a file, now
    > that I think about it.
    >
    > You'd have to init the factory with both FileTypeCriteria and FileParsers:
    >
    > FileTypeCriteria criteria = ...
    > FileParser parser = ...
    > MyParser.addParser( criteria, parser );
    >
    > Then you're off to the races, just make the criteria and parsers that
    > you need, and let the factory worry about binding them and serving them
    > to users.
    >
    > I think a simple criteria could be implemented just from file names
    > something like this:
    >
    > class FileNameCriteria implements FileTypeCriteria {
    > private final Pattern pattern;
    > public FileNameCriteria( String pattern ) {
    > pattern = Pattern.compile( pattern );
    > }
    > public boolean matchers( File f ) {
    > return pattern.matcher( f.getName() ).matches();
    > }
    > }
    >
    > Then you could just make a new criteria for, say, HTML files something
    > like this:
    >
    > new FileNameCriteria( "*.html^" );
    >
    > and any other extensions similarly. Code not compiled or tested! Good luck.


    Beautiful.

    --
    Lew
     
    Lew, Dec 10, 2010
    #5
  6. Tom Whittaker

    Stefan Ram Guest

    Tom Whittaker <> writes:
    >Does anyone have any insight into the best design pattern(s) to use
    >for this feature?


    I tend to write such code in a more procedural manner first.

    Then, I see the patterns in the code.

    Then, I refactor to make the patterns explicit if this seems
    to be of any use at all.

    While it is true that there can be a »rush to code« problem,
    there also might be a »premature design« problem.

    The goal is to solve the problem and to get source code
    that can be read and maintained easily - it is not to apply
    design patterns. When design patterns serve that purpose,
    they impose themselves on the experienced programmer during
    the development.

    Look here:

    http://www.mindhacks.com/blog/2008/03/rock_climbing_hacks.html
    http://dilbert.com/blog/entry/visualizing_the_entire_path/
     
    Stefan Ram, Dec 10, 2010
    #6
  7. Tom Whittaker

    Roedy Green Guest

    On Thu, 09 Dec 2010 14:34:24 -0800, markspace <>
    wrote, quoted or indirectly quoted someone who said :

    > class FileNameCriteria implements FileTypeCriteria {
    > private final Pattern pattern;
    > public FileNameCriteria( String pattern ) {
    > pattern = Pattern.compile( pattern );
    > }
    > public boolean matchers( File f ) {
    > return pattern.matcher( f.getName() ).matches();
    > }
    > }


    for tested code to implement the filters, see
    http://mindprod.com/products1.html#FILTER
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com

    In programming, and documenting programs, keep vocabulary consistent and precisely defined! Variation in vocabulary to relieve the tedium is for novels.
     
    Roedy Green, Dec 10, 2010
    #7
  8. Tom Whittaker

    Tom Anderson Guest

    On Fri, 10 Dec 2010, Stefan Ram wrote:

    > Tom Whittaker <> writes:
    >> Does anyone have any insight into the best design pattern(s) to use
    >> for this feature?

    >
    > I tend to write such code in a more procedural manner first.
    >
    > Then, I see the patterns in the code.
    >
    > Then, I refactor to make the patterns explicit if this seems
    > to be of any use at all.


    This is what i do too. When i was younger, i used to start with a lovely
    objecty design. I think what broke that habit was spending a few years in
    Python instead of Java, mostly writing very simple, procedural programs,
    because of the nature of the work i was doing. When i came back to Java, i
    found myself in the same mindset as Stefan - start simple, and refactor
    out the objects and patterns as they become obvious.

    > While it is true that there can be a ?rush to code? problem,
    > there also might be a ?premature design? problem.


    Absolutely.

    > The goal is to solve the problem and to get source code
    > that can be read and maintained easily - it is not to apply
    > design patterns.


    Perfectly put.

    > When design patterns serve that purpose,
    > they impose themselves on the experienced programmer during
    > the development.


    They can do, but do not inevitably do so. If you apply the right pattern
    at the right time, it reduces complexity from then on; that moment can
    come at any point in development. But there's no sense trying to shoehorn
    in a pattern before it does.

    tom

    --
    We are going to have to be speculative, but there is good and
    bad speculation, and this is not an unparalleled activity in
    science. [...] Those scientists who have no taste for this sort of
    speculative enterprise will just have to stay in the trenches and do
    without it, while the rest of us risk embarrassing mistakes and have a
    lot of fun. -- Daniel Dennett
     
    Tom Anderson, Dec 11, 2010
    #8
  9. On 09-12-2010 19:39, Martin Gregorie wrote:
    > On Thu, 09 Dec 2010 12:08:58 -0800, Tom Whittaker wrote:
    >> I am creating an application that will allow the user to specify
    >> different types of input files, and then my application reads in the
    >> file, parses it, extracts rows and columns/fields, and stores the
    >> contents in a mySQL table (my common format for all input data). I
    >> never change the file I read in. I read it once and discard it. The
    >> algorithm will read a specific format and then return results in a
    >> common format that can be acted upon or stored in a mySQL table.
    >>
    >> For example, a file could be an Excel file or a database file.
    >>

    > I think you're skating over the most important part: thinking about how
    > you're going to tell the program:
    > - what an input file is to be parsed
    > - how the output table(s) are defined
    > - how to map input fields to columns in tables
    > - how the specify and manage format conversion from input fields
    > to column content. Dates will almost certainly need format
    > conversion but you may also need a variety of other conversions
    > - how to validate the input and what to do if a field can't be
    > converted into something acceptable to SQL
    >
    > Once you have that figured out you're about 95% of the way there.


    The phrasing of the question makes me assume that a give piece
    of code (strategy/plugin/whatever) is hardcoded for a given input
    format.

    > The rest is just a record-by-record copy with your conversion code called
    > between the file read operation and the SQL INSERT operation.


    True.

    Arne
     
    Arne Vajhøj, Dec 12, 2010
    #9
  10. Tom Whittaker

    Arne Vajhøj Guest

    On 11-12-2010 08:57, Tom Anderson wrote:
    > On Fri, 10 Dec 2010, Stefan Ram wrote:
    >
    >> Tom Whittaker <> writes:
    >>> Does anyone have any insight into the best design pattern(s) to use
    >>> for this feature?

    >>
    >> I tend to write such code in a more procedural manner first.
    >>
    >> Then, I see the patterns in the code.
    >>
    >> Then, I refactor to make the patterns explicit if this seems
    >> to be of any use at all.

    >
    > This is what i do too. When i was younger, i used to start with a lovely
    > objecty design. I think what broke that habit was spending a few years
    > in Python instead of Java, mostly writing very simple, procedural
    > programs, because of the nature of the work i was doing. When i came
    > back to Java, i found myself in the same mindset as Stefan - start
    > simple, and refactor out the objects and patterns as they become obvious.


    I think it depends on the problem.

    I don't think starting a 500 KLOC project procedural and then
    refactoring to OO later will work.

    For small apps it may work fine.

    >> The goal is to solve the problem and to get source code
    >> that can be read and maintained easily - it is not to apply
    >> design patterns.

    >
    > Perfectly put.
    >
    >> When design patterns serve that purpose,
    >> they impose themselves on the experienced programmer during
    >> the development.

    >
    > They can do, but do not inevitably do so. If you apply the right pattern
    > at the right time, it reduces complexity from then on; that moment can
    > come at any point in development. But there's no sense trying to
    > shoehorn in a pattern before it does.


    Yep - over applying patterns is a known anti-pattern.

    Arne
     
    Arne Vajhøj, Dec 12, 2010
    #10
  11. Tom Whittaker

    Tom Anderson Guest

    On Sat, 11 Dec 2010, Arne Vajhøj wrote:

    > On 11-12-2010 08:57, Tom Anderson wrote:
    >> On Fri, 10 Dec 2010, Stefan Ram wrote:
    >>
    >>> Tom Whittaker <> writes:
    >>>> Does anyone have any insight into the best design pattern(s) to use
    >>>> for this feature?
    >>>
    >>> I tend to write such code in a more procedural manner first.
    >>>
    >>> Then, I see the patterns in the code.
    >>>
    >>> Then, I refactor to make the patterns explicit if this seems
    >>> to be of any use at all.

    >>
    >> This is what i do too. When i was younger, i used to start with a lovely
    >> objecty design. I think what broke that habit was spending a few years
    >> in Python instead of Java, mostly writing very simple, procedural
    >> programs, because of the nature of the work i was doing. When i came
    >> back to Java, i found myself in the same mindset as Stefan - start
    >> simple, and refactor out the objects and patterns as they become obvious.

    >
    > I think it depends on the problem.
    >
    > I don't think starting a 500 KLOC project procedural and then
    > refactoring to OO later will work.


    Not after you've written the half million lines, no. But you can do it as
    you go along - write something straightforward, then "refactor out the
    objects and patterns as they become obvious".

    tom

    --
    22% Essential Components, 22% Repetitive Patterns, 56% Pauses
     
    Tom Anderson, Dec 12, 2010
    #11
  12. Tom Whittaker

    Arne Vajhøj Guest

    On 11-12-2010 21:31, Tom Anderson wrote:
    > On Sat, 11 Dec 2010, Arne Vajhøj wrote:
    >
    >> On 11-12-2010 08:57, Tom Anderson wrote:
    >>> On Fri, 10 Dec 2010, Stefan Ram wrote:
    >>>
    >>>> Tom Whittaker <> writes:
    >>>>> Does anyone have any insight into the best design pattern(s) to use
    >>>>> for this feature?
    >>>>
    >>>> I tend to write such code in a more procedural manner first.
    >>>>
    >>>> Then, I see the patterns in the code.
    >>>>
    >>>> Then, I refactor to make the patterns explicit if this seems
    >>>> to be of any use at all.
    >>>
    >>> This is what i do too. When i was younger, i used to start with a lovely
    >>> objecty design. I think what broke that habit was spending a few years
    >>> in Python instead of Java, mostly writing very simple, procedural
    >>> programs, because of the nature of the work i was doing. When i came
    >>> back to Java, i found myself in the same mindset as Stefan - start
    >>> simple, and refactor out the objects and patterns as they become
    >>> obvious.

    >>
    >> I think it depends on the problem.
    >>
    >> I don't think starting a 500 KLOC project procedural and then
    >> refactoring to OO later will work.

    >
    > Not after you've written the half million lines, no. But you can do it
    > as you go along - write something straightforward, then "refactor out
    > the objects and patterns as they become obvious".


    As soon as the first part become OO'ish, then building
    procedural on top of that can become tricky.

    Arne
     
    Arne Vajhøj, Dec 12, 2010
    #12
  13. Tom Whittaker

    Lew Guest

    On 12/11/2010 09:53 PM, Arne Vajhøj wrote:
    > On 11-12-2010 21:31, Tom Anderson wrote:
    >> On Sat, 11 Dec 2010, Arne Vajhøj wrote:
    >>
    >>> On 11-12-2010 08:57, Tom Anderson wrote:
    >>>> On Fri, 10 Dec 2010, Stefan Ram wrote:
    >>>>
    >>>>> Tom Whittaker <> writes:
    >>>>>> Does anyone have any insight into the best design pattern(s) to use
    >>>>>> for this feature?
    >>>>>
    >>>>> I tend to write such code in a more procedural manner first.
    >>>>>
    >>>>> Then, I see the patterns in the code.
    >>>>>
    >>>>> Then, I refactor to make the patterns explicit if this seems
    >>>>> to be of any use at all.
    >>>>
    >>>> This is what i do too. When i was younger, i used to start with a
    >>>> lovely
    >>>> objecty design. I think what broke that habit was spending a few years
    >>>> in Python instead of Java, mostly writing very simple, procedural
    >>>> programs, because of the nature of the work i was doing. When i came
    >>>> back to Java, i found myself in the same mindset as Stefan - start
    >>>> simple, and refactor out the objects and patterns as they become
    >>>> obvious.
    >>>
    >>> I think it depends on the problem.
    >>>
    >>> I don't think starting a 500 KLOC project procedural and then
    >>> refactoring to OO later will work.

    >>
    >> Not after you've written the half million lines, no. But you can do it
    >> as you go along - write something straightforward, then "refactor out
    >> the objects and patterns as they become obvious".

    >
    > As soon as the first part become OO'ish, then building
    > procedural on top of that can become tricky.


    The "write first, refactor later" strategy is not incompatible with "get it
    right first". If you are well-versed in object-oriented (or better,
    type-based) programming, you will code that way from the get-go. You're not
    going to write crappy spaghetti and then magically decide to apply good sense
    to it. You're either going to apply good sense from the beginning or you are
    never going to.

    That said, initial knowledge is imperfect and refactoring will be needed.
    Those that wrote their code intelligently up front will find this less of a
    problem than those that didn't. So yes, you "refactor out the objects and
    patterns as they become obvious", but a lot of them will be "obvious" at the
    start.

    If you aren't finding at least some patterns and programming for durability at
    the start, good effing luck. You'll need it.

    --
    Lew
     
    Lew, Dec 12, 2010
    #13
  14. Tom Whittaker

    Tom Anderson Guest

    On Sat, 11 Dec 2010, Arne Vajh?j wrote:

    > On 11-12-2010 21:31, Tom Anderson wrote:
    >> On Sat, 11 Dec 2010, Arne Vajh?j wrote:
    >>
    >>> On 11-12-2010 08:57, Tom Anderson wrote:
    >>>> On Fri, 10 Dec 2010, Stefan Ram wrote:
    >>>>
    >>>>> Tom Whittaker <> writes:
    >>>>>> Does anyone have any insight into the best design pattern(s) to use
    >>>>>> for this feature?
    >>>>>
    >>>>> I tend to write such code in a more procedural manner first.
    >>>>>
    >>>>> Then, I see the patterns in the code.
    >>>>>
    >>>>> Then, I refactor to make the patterns explicit if this seems
    >>>>> to be of any use at all.
    >>>>
    >>>> This is what i do too. When i was younger, i used to start with a lovely
    >>>> objecty design. I think what broke that habit was spending a few years
    >>>> in Python instead of Java, mostly writing very simple, procedural
    >>>> programs, because of the nature of the work i was doing. When i came
    >>>> back to Java, i found myself in the same mindset as Stefan - start
    >>>> simple, and refactor out the objects and patterns as they become
    >>>> obvious.
    >>>
    >>> I think it depends on the problem.
    >>>
    >>> I don't think starting a 500 KLOC project procedural and then
    >>> refactoring to OO later will work.

    >>
    >> Not after you've written the half million lines, no. But you can do it
    >> as you go along - write something straightforward, then "refactor out
    >> the objects and patterns as they become obvious".

    >
    > As soon as the first part become OO'ish, then building
    > procedural on top of that can become tricky.


    Conceivably. One of the things i learned from my time in python is that it
    isn't necessarily, or even frequently, the case: you can write procedural
    code that picks up some objects, does stuff with them, and then puts them
    down and carries on being procedural. It depends on your objects of
    course: for example, you can happily write very procedural-ish code using
    Java's collections classes - they're just another datatype the code can
    work with. They don't force you to object-orient your own code. There are
    classes which do force you to adopt object-orientation: SAX parsers and
    Swing spring to mind, both because they need you to write event handlers.

    tom

    --
    I might feel irresponsible if you couldn't go almost anywhere and see
    naked, aggressive political maneuvers in iteration, marinating in your
    ideology of choice. That's simply not the case. -- Tycho Brahae
     
    Tom Anderson, Dec 12, 2010
    #14
  15. On Sat, 11 Dec 2010 20:29:23 -0500, Arne Vajhøj wrote:

    > The phrasing of the question makes me assume that a give piece of code
    > (strategy/plugin/whatever) is hardcoded for a given input format.
    >

    I wasn't certain either way but may have been influenced by a similar job
    that required a data dictionary approach, i.e. the ability to define
    matching fields in input and output and the type of conversion required.
    An explicit requirement was that the format converter must be able to
    deal with arbitrary changes to input formats without requiring any code
    to be rewritten.

    But, no matter which approach is taken to the format conversion,
    designing and writing a set of hard-coded plugins for a common interface
    is almost certainly harder than the OP thinks: ignoring that while asking
    about patterns for the outer loop sounded like misdirected effort. Hence
    what I wrote.


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, Dec 12, 2010
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. sunny
    Replies:
    1
    Views:
    478
    Salt_Peter
    Dec 7, 2006
  2. Tom Whittaker

    file converter design pattern

    Tom Whittaker, Dec 9, 2010, in forum: C++
    Replies:
    1
    Views:
    424
    Pavel
    Dec 10, 2010
  3. Pallav singh
    Replies:
    0
    Views:
    410
    Pallav singh
    Jan 22, 2012
  4. Pallav singh
    Replies:
    0
    Views:
    433
    Pallav singh
    Jan 22, 2012
  5. Pallav singh
    Replies:
    1
    Views:
    471
    Peter Remmers
    Jan 22, 2012
Loading...

Share This Page