Program Design for Large volume file processing

Discussion in 'C Programming' started by soren juhu, Dec 16, 2004.

  1. soren juhu

    soren juhu Guest

    Hi,

    I am developing a C Program for reading over a million files of size 1
    kilobytes each and sending the contents to another program using some
    middle ware. I need some help on designing the program to process such
    a large number of files in less than 8 hours.

    TIA
    Soren
    soren juhu, Dec 16, 2004
    #1
    1. Advertising

  2. soren juhu

    Jack Klein Guest

    On 15 Dec 2004 19:50:00 -0800, (soren juhu)
    wrote in comp.lang.c:

    > Hi,
    >
    > I am developing a C Program for reading over a million files of size 1
    > kilobytes each and sending the contents to another program using some
    > middle ware. I need some help on designing the program to process such
    > a large number of files in less than 8 hours.
    >
    > TIA
    > Soren


    From the information you have provided in your post, the only advice
    anyone could possibly give you would be to buy a faster computer with
    faster hard disk drives to run your program on.

    Even if you posted detailed information about the "processing" that
    you had to do on the files, you don't have a C language question, you
    have one about choosing the most efficient algorithm. For that you
    need to post to an algorithm group such as news:comp.programming, and
    be very explicit about the processing you need to do.

    Once you have selected an algorithm, possibly with the help of an
    appropriate group, if you have difficulties writing standard C code
    that compiles and executes correctly, then post the problem code here,
    explain your problems with it, and ask for C language advice.

    --
    Jack Klein
    Home: http://JK-Technology.Com
    FAQs for
    comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
    comp.lang.c++ http://www.parashift.com/c -faq-lite/
    alt.comp.lang.learn.c-c++
    http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
    Jack Klein, Dec 16, 2004
    #2
    1. Advertising

  3. soren juhu

    CBFalconer Guest

    soren juhu wrote:
    >
    > Hi,
    >
    > I am developing a C Program for reading over a million files of size 1
    > kilobytes each and sending the contents to another program using some
    > middle ware. I need some help on designing the program to process such
    > a large number of files in less than 8 hours.


    If we make the liberal assumption that you can locate and open each
    file in 25 millisecs, that leaves you about 3800 seconds to process
    1e9 bytes, or you will require a throughput in the order of 250k
    bytes per second. Have fun.

    --
    Chuck F () ()
    Available for consulting/temporary embedded and systems.
    <http://cbfalconer.home.att.net> USE worldnet address!
    CBFalconer, Dec 16, 2004
    #3
  4. On Wed, 15 Dec 2004 19:50:00 -0800, soren juhu wrote:

    > Hi,
    >
    > I am developing a C Program for reading over a million files of size 1
    > kilobytes each and sending the contents to another program using some
    > middle ware. I need some help on designing the program to process such
    > a large number of files in less than 8 hours.


    Your main bottleneck is likely to be the file access. Accessing lots of
    small files over a hard disk could end up being very slow, disks are much
    more efficient reading large chunks of sequential data. You may want to
    consider how your file os organised in the first place. If for example you
    had the data written in 1K blocks in a single file (perhaps even do both)
    the problem reduces to transferring a gigabyte of data which can be done
    in seconds or minutes with normal LAN speeds.

    This isn't a question about C but about the design of a system of
    file management. You need to sit down and specify your real requirements,
    e.g. why there are over a million 1K files in the first place and whether
    a better approach is possible. There may be things you can do to aid this
    transfer process when those million files are being generated (such as
    append them to a single file, perhaps even put them in a database).

    There is a lot you need to consider before worrying about C related issues.

    Lawrence
    Lawrence Kirby, Dec 16, 2004
    #4
  5. soren juhu

    infobahn Guest

    soren juhu wrote:

    > Hi,
    >
    > I am developing a C Program for reading over a million files of size 1
    > kilobytes each and sending the contents to another program using some
    > middle ware. I need some help on designing the program to process such
    > a large number of files in less than 8 hours.


    Do the arithmetic.

    1000000 * 1024 * 8 (assuming an 8-bit-byte platform for the moment)
    comes to 8192000000 bits. If you have, say, 7 hours to transfer this
    amount of data, you will need to throw bits down the wire at a
    rate of at least 325 kbps. This should easily be within the reach
    of modern network cards. I don't think you'll have a problem.

    I suggest you write an "obvious" program, and then test it to
    see if it's quick enough. If so, fabulous. If not, post it here
    and maybe we can help you speed it up.

    It's worth remembering that this newsgroup can't - or rather,
    won't - help you on the networking aspects of such a program.
    But they are likely to come up with some good ideas on the
    rest of it, given the catalyst of some source code to inspect.

    Best of luck.
    infobahn, Dec 20, 2004
    #5
  6. soren juhu

    Soren Juhu Guest

    soren juhu wrote:
    > Hi,
    >
    > I am developing a C Program for reading over a million files of size

    1
    > kilobytes each and sending the contents to another program using some
    > middle ware. I need some help on designing the program to process

    such
    > a large number of files in less than 8 hours.
    >
    > TIA
    > Soren



    Hi all,

    Sorry for my late reply, I am posting my message using Google Groups.

    Thanks a lot for your valuable inputs to the problem. It definitely
    helped in knowing where to start for solving the problem. I will surely
    inform you about this development effort.

    Thanks,
    Soren
    Soren Juhu, Dec 22, 2004
    #6
  7. In article <>,
    Jack Klein <> wrote:
    >On 15 Dec 2004 19:50:00 -0800, (soren juhu)
    >wrote in comp.lang.c:
    >
    >> Hi,
    >>
    >> I am developing a C Program for reading over a million files of size 1
    >> kilobytes each and sending the contents to another program using some
    >> middle ware. I need some help on designing the program to process such
    >> a large number of files in less than 8 hours.
    >>
    >> TIA
    >> Soren

    >
    >From the information you have provided in your post, the only advice
    >anyone could possibly give you would be to buy a faster computer with
    >faster hard disk drives to run your program on.
    >
    >Even if you posted detailed information about the "processing" that
    >you had to do on the files, you don't have a C language question, you
    >have one about choosing the most efficient algorithm. For that you
    >need to post to an algorithm group such as news:comp.programming, and
    >be very explicit about the processing you need to do.


    I interpret the question this way: what standard c-function
    are appropriate and how should I use them.

    My answer proves you wrong.

    1. Assuming you can guarantee a maximum size of each file,
    read them in one go in a static buffer of that size.
    2. Go for the lowest level calls, (read/write) and
    handle the rest yourself.
    3. You total througput seems to be in reach for modern
    disks. C is low overhead and shouldn't get in your
    way for a reasonable amount of processing.

    There is no way the OP ask about "processing to be done".

    >Once you have selected an algorithm, possibly with the help of an
    >appropriate group, if you have difficulties writing standard C code


    There is no way you could mention an algorithm. You have not the
    slightest clue, if you wanted to. The OP was well aware that
    would be off topic.

    >that compiles and executes correctly, then post the problem code here,
    >explain your problems with it, and ask for C language advice.


    Aren't we going overboard? Such that only home work questions
    are appropriate?

    >--
    >Jack Klein


    Groetjes Albert.

    --
    --
    Albert van der Horst,Oranjestr 8,3511 RA UTRECHT,THE NETHERLANDS
    One man-hour to invent,
    One man-week to implement,
    One lawyer-year to patent.
    Albert van der Horst, Dec 23, 2004
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Maxim
    Replies:
    0
    Views:
    394
    Maxim
    Jul 7, 2003
  2. Arsen V.
    Replies:
    5
    Views:
    343
    bruce barker
    Feb 3, 2004
  3. Thomas Matthews

    Re: large file processing

    Thomas Matthews, Aug 29, 2003, in forum: C++
    Replies:
    0
    Views:
    398
    Thomas Matthews
    Aug 29, 2003
  4. DJTB

    processing a Very Large file

    DJTB, May 17, 2005, in forum: Python
    Replies:
    4
    Views:
    302
  5. Robert Brewer

    RE: processing a Very Large file

    Robert Brewer, May 17, 2005, in forum: Python
    Replies:
    3
    Views:
    306
    Gregory Bond
    May 19, 2005
Loading...

Share This Page