Design of a pipelined architecture/framework for handling large data sets

Discussion in 'Java' started by nish, Nov 30, 2006.

  1. nish

    nish Guest

    I am facing an inconvenience which I believe should have been faced
    before by other Java developers but I am finding it difficult to
    articulate it in keywords so that google will give me the right
    answers..so here goes

    1. I am using eclipse ide with mulitple java projects, each one sourced
    from a CVS repository on an external server in the local LAN.
    2. Almost all of these projects basically handle big data sets (read
    100mbs - 500mbs of xml and text files) which is basically data crawled
    from the web, act and transform it in some way and then pass it along
    for other projects to act on it. Some of hte data is in single big
    files and some of it is in 100's of small files inside a single
    directory.

    Basically what I am looking for is a better way to handle this data.
    Currently if I put the data in CVS then it is not that efficient , plus
    there needs to be some central lookup for all the data.I guess this is
    partly a java design question and partly ignorance on my part to use
    the right tools to do this job.

    Thanks for any help.
     
    nish, Nov 30, 2006
    #1
    1. Advertising

  2. nish

    nish Guest

    Other issues I could think about:

    3. I should be able to specify how a data set is being archived. So for
    example for some large data I dont want it to be revisioned in CVS
    because it is not going to change, for others i might want it to be
    checked into cvs so that it gets revisioned


    nish wrote:
    > I am facing an inconvenience which I believe should have been faced
    > before by other Java developers but I am finding it difficult to
    > articulate it in keywords so that google will give me the right
    > answers..so here goes
    >
    > 1. I am using eclipse ide with mulitple java projects, each one sourced
    > from a CVS repository on an external server in the local LAN.
    > 2. Almost all of these projects basically handle big data sets (read
    > 100mbs - 500mbs of xml and text files) which is basically data crawled
    > from the web, act and transform it in some way and then pass it along
    > for other projects to act on it. Some of hte data is in single big
    > files and some of it is in 100's of small files inside a single
    > directory.
    >
    > Basically what I am looking for is a better way to handle this data.
    > Currently if I put the data in CVS then it is not that efficient , plus
    > there needs to be some central lookup for all the data.I guess this is
    > partly a java design question and partly ignorance on my part to use
    > the right tools to do this job.
    >
    > Thanks for any help.
     
    nish, Nov 30, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Fred Bartoli

    Pipelined binary encoder

    Fred Bartoli, Nov 7, 2004, in forum: VHDL
    Replies:
    1
    Views:
    748
    Jonathan Bromley
    Nov 10, 2004
  2. Replies:
    5
    Views:
    581
    Ray Andraka
    Mar 3, 2005
  3. MB
    Replies:
    1
    Views:
    746
  4. wallge
    Replies:
    0
    Views:
    1,440
    wallge
    Feb 20, 2006
  5. Martin Pirker

    handling large data sets

    Martin Pirker, Dec 8, 2003, in forum: Ruby
    Replies:
    10
    Views:
    258
    Ian Hobson
    Dec 9, 2003
Loading...

Share This Page