moving data in a file without using system memory

Discussion in 'C Programming' started by ulyses, Jan 9, 2006.

  1. ulyses

    ulyses Guest

    Let's assume I have following file:

    2938929384902491233.....
    923949919199191919112....

    File contains INTs only. What is more they are huge. For example first
    row in file may contain integer which size is 50MB and the second 30MB.
    Now we come to my problem. Is there possibility to swap this rows
    without using system memory (preferably in Unix/Linux)? Is there any
    function in C to do this?

    Thanks for help,
    John
    ulyses, Jan 9, 2006
    #1
    1. Advertising

  2. ulyses

    Nelu Guest

    On 2006-01-09, ulyses <> wrote:
    > Let's assume I have following file:
    >
    > 2938929384902491233.....
    > 923949919199191919112....
    >
    > File contains INTs only. What is more they are huge. For example first
    > row in file may contain integer which size is 50MB and the second 30MB.
    > Now we come to my problem. Is there possibility to swap this rows
    > without using system memory (preferably in Unix/Linux)? Is there any
    > function in C to do this?
    >

    I assume you want to make the swap without using char *, because you
    need to use some memory.
    One way you could do this is to use two files: f1 and f2.
    You can read the first line, character by character and put it in the
    f1 file using fgetc and fputc.
    Read the second line in the same way and put it in f2.
    Create f3 or rewrite the initial file, write the content of f2 (also
    with fgetc and fputc) to f3, add the EOL and then add the content
    of f1 to f3. Your lines should be swapped. Now, you have to pay
    attention at what EOL means both when reading and writing. Also,
    given the fact that you have integers, you could use isdigit() (pay
    attention to negative numbers).

    --
    Ioan - Ciprian Tandau
    tandau _at_ freeshell _dot_ org (hope it's not too late)
    (... and that it still works...)
    Nelu, Jan 9, 2006
    #2
    1. Advertising

  3. ulyses

    Alex Colvin Guest


    >2938929384902491233.....
    >923949919199191919112....


    >File contains INTs only. What is more they are huge. For example first
    >row in file may contain integer which size is 50MB and the second 30MB.
    >Now we come to my problem. Is there possibility to swap this rows
    >without using system memory (preferably in Unix/Linux)? Is there any
    >function in C to do this?


    First you have to scan the first 50MB to locate the end-of-line, unless
    you happen to know it. That doesn't take a whole lot of memory.

    Swapping the lines using a second file is pretty trivial. Just start
    copying the second line to the destination. When you're done, rewind the
    source and copy the first.

    If you can't actually afford a second file, the problem gets more
    interesting. You want to slide the second 30MB forward in the file, and
    the first 50MB back. Although it's tricky, this too can be done
    incrementally, using very little memory. Look up peristaltic in-place
    out-of-core permutation.

    --
    mac the naïf
    Alex Colvin, Jan 9, 2006
    #3
  4. Nelu <> writes:

    > On 2006-01-09, ulyses <> wrote:
    >> Let's assume I have following file:
    >>
    >> 2938929384902491233.....
    >> 923949919199191919112....
    >>
    >> File contains INTs only. What is more they are huge. For example first
    >> row in file may contain integer which size is 50MB and the second 30MB.
    >> Now we come to my problem. Is there possibility to swap this rows
    >> without using system memory (preferably in Unix/Linux)? Is there any
    >> function in C to do this?
    >>

    > I assume you want to make the swap without using char *, because you
    > need to use some memory.
    > One way you could do this is to use two files: f1 and f2.
    > You can read the first line, character by character and put it in the
    > f1 file using fgetc and fputc.
    > Read the second line in the same way and put it in f2.
    > Create f3 or rewrite the initial file, write the content of f2 (also
    > with fgetc and fputc) to f3, add the EOL and then add the content
    > of f1 to f3. Your lines should be swapped. Now, you have to pay
    > attention at what EOL means both when reading and writing. Also,
    > given the fact that you have integers, you could use isdigit() (pay
    > attention to negative numbers).


    This wouldn't prevent the use of system memory, since any I/O will be
    cached by default.

    To avoid using system memory you'd have to use the raw device drive on
    linux. Perhaps there is something similar on other OSes. Anyways, it's
    out of topic on comp.lang.c...

    Perhaps the OP could explain what he wants exactly, since it's rather
    silly to buy GigaBytes of RAM not to use it later...


    --
    __Pascal Bourguignon__ http://www.informatimago.com/

    NOTE: The most fundamental particles in this product are held
    together by a "gluing" force about which little is currently known
    and whose adhesive power can therefore not be permanently
    guaranteed.
    Pascal Bourguignon, Jan 9, 2006
    #4
  5. In article <>,
    Pascal Bourguignon <> wrote:
    >Nelu <> writes:
    >
    >> On 2006-01-09, ulyses <> wrote:


    >>> File contains INTs only. What is more they are huge. For example first
    >>> row in file may contain integer which size is 50MB and the second 30MB.
    >>> Now we come to my problem. Is there possibility to swap this rows
    >>> without using system memory (preferably in Unix/Linux)?


    >> I assume you want to make the swap without using char *, because you
    >> need to use some memory.
    >> One way you could do this is to use two files: f1 and f2.


    >This wouldn't prevent the use of system memory, since any I/O will be
    >cached by default.


    >Perhaps the OP could explain what he wants exactly, since it's rather
    >silly to buy GigaBytes of RAM not to use it later...


    My speculation would be that the OP does not want to read the entire
    file into memory, as that might be a strain on the system resources.

    The OP gave, by the way, no indication that gigabytes of RAM are available
    nor that the file contents would be able to fit within the address space
    available.


    If the OP has numerous rows to exchange, then a "create a new file
    each time" algorithm could get rather slow. What might be practical,
    though, is to create a couple of I/O wrapper routines that kept
    track of the files and which "logically current" portions of the files
    are physically somewhere else (probably easiest in this circumstance
    if the I/O wrappers operated at the "record" level.) Then, at the
    end, create as many temporary files as there were files logically
    written to, and for each logically-written file do a linear run
    pulling the data out of the original data files in the appropriate order;
    when the temporary files are fully generated, then either rename
    them into existance where the original data files were, or else
    [e.g., for permissions or NFS reasons] then copy the data out of the
    temporary files into the original files and trunc() the original files
    and throw away the temporary files.
    --
    "No one has the right to destroy another person's belief by
    demanding empirical evidence." -- Ann Landers
    Walter Roberson, Jan 9, 2006
    #5
  6. "ulyses" <> writes:
    > Let's assume I have following file:
    >
    > 2938929384902491233.....
    > 923949919199191919112....
    >
    > File contains INTs only. What is more they are huge. For example first
    > row in file may contain integer which size is 50MB and the second 30MB.
    > Now we come to my problem. Is there possibility to swap this rows
    > without using system memory (preferably in Unix/Linux)? Is there any
    > function in C to do this?


    Do you mean that your file is a text file where each line consists of
    a sequence decimal digits representing a large integer?

    If so, for purposes of you problem, you can just treat them as
    sequences of characters (which happen to be decimal digits); the fact
    that they represent large integers is irrelevant.

    Can the file contain arbitrarily many lines?

    Here's one possible approach. Read the input file, keeping track of
    where each new line starts. Use ftell() or fgetpos() and build a list
    or array of indices.

    Then traverse your index in reverse order. For each index, use
    fseek() or fsetpos() to jump to that location in the file; read up to
    the end of the current line and write to your output file.

    (Note that some systems may provide a command to do this -- and 50MB
    isn't all that much memory these days. Since you mentioned
    Unix/Linux, <OT>"man tac" or "info tac"</OT>.)

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Jan 9, 2006
    #6
  7. "ulyses" <> wrote in message
    news:...

    > Let's assume I have following file:
    >
    > 2938929384902491233.....
    > 923949919199191919112....
    >
    > File contains INTs only. What is more they are huge. For example first
    > row in file may contain integer which size is 50MB and the second 30MB.
    > Now we come to my problem. Is there possibility to swap this rows
    > without using system memory (preferably in Unix/Linux)? Is there any
    > function in C to do this?


    It's not clear what you mean by "without using system memory". Why
    wouldn't you want to use memory if this results in faster operation?

    For example, you could move 16Kb at a time, and rig it to use very
    little system memory. But then the head of the disk would have to jump back
    and forth 5,000 times. It would almost certainly make more sense to use more
    memory.

    Are you concerned about consumption of system cache?

    It's a bit tricky, but there are exchange algorithms. You can tweak them
    to minimize user memory consumption and then let the system figure out how
    much cache is appropriate. You can hint to the cache to help with that.

    DS
    David Schwartz, Jan 11, 2006
    #7
  8. ulyses

    ulyses Guest

    David Schwartz napisal(a):
    > "ulyses" <> wrote in message
    > news:...
    >
    > > Let's assume I have following file:
    > >
    > > 2938929384902491233.....
    > > 923949919199191919112....
    > >
    > > File contains INTs only. What is more they are huge. For example first
    > > row in file may contain integer which size is 50MB and the second 30MB.
    > > Now we come to my problem. Is there possibility to swap this rows
    > > without using system memory (preferably in Unix/Linux)? Is there any
    > > function in C to do this?

    >
    > It's not clear what you mean by "without using system memory". Why
    > wouldn't you want to use memory if this results in faster operation?
    >


    I thouhg about something that would be some kind of low level function
    that would modify pointers not real data, the same way as we do with
    pointer to data in memory, e.g.:
    int a*, b*, temp*;
    ....
    //the swap
    temp = a;
    a = b;
    b = temp;

    And the data was swapped without moving it in memory. I asked if there
    is such functionality that would enable me to do something like that
    but with rows in file. Swap them without moving them on disk.

    Thanks again for help,
    John
    ulyses, Jan 12, 2006
    #8
  9. In article <>,
    "ulyses" <> wrote:

    > David Schwartz napisal(a):
    > > "ulyses" <> wrote in message
    > > news:...
    > >
    > > > Let's assume I have following file:
    > > >
    > > > 2938929384902491233.....
    > > > 923949919199191919112....
    > > >
    > > > File contains INTs only. What is more they are huge. For example first
    > > > row in file may contain integer which size is 50MB and the second 30MB.
    > > > Now we come to my problem. Is there possibility to swap this rows
    > > > without using system memory (preferably in Unix/Linux)? Is there any
    > > > function in C to do this?

    > >
    > > It's not clear what you mean by "without using system memory". Why
    > > wouldn't you want to use memory if this results in faster operation?
    > >

    >
    > I thouhg about something that would be some kind of low level function
    > that would modify pointers not real data, the same way as we do with
    > pointer to data in memory, e.g.:
    > int a*, b*, temp*;
    > ...
    > //the swap
    > temp = a;
    > a = b;
    > b = temp;
    >
    > And the data was swapped without moving it in memory. I asked if there
    > is such functionality that would enable me to do something like that
    > but with rows in file. Swap them without moving them on disk.


    No, because the information in files is not organized by lines, it's
    just sequences of bytes organized by disk blocks. If you wanted to
    rearrange the blocks it would theoretically be possible to update the
    block pointers in the inode. However, there's no API for this, so you
    would have to do it by writing a new ioctl or device driver, or by
    accessing the disk device directly (and you'd need to unmount the
    filesystem first, to avoid conflicts with the kernel's in-memory copies
    of inodes).

    But to rearrange lines you have to read the file into memory, search for
    the newline characters, then write the lines back out to the file.

    --
    Barry Margolin,
    Arlington, MA
    *** PLEASE post questions in newsgroups, not directly to me ***
    *** PLEASE don't copy me on replies, I'll read them in the group ***
    Barry Margolin, Jan 13, 2006
    #9
  10. Barry Margolin <> writes:
    > In article <>,
    > "ulyses" <> wrote:
    >> David Schwartz napisal(a):
    >> > "ulyses" <> wrote in message
    >> > news:...
    >> >
    >> > > Let's assume I have following file:
    >> > >
    >> > > 2938929384902491233.....
    >> > > 923949919199191919112....
    >> > >
    >> > > File contains INTs only. What is more they are huge. For example first
    >> > > row in file may contain integer which size is 50MB and the second 30MB.
    >> > > Now we come to my problem. Is there possibility to swap this rows
    >> > > without using system memory (preferably in Unix/Linux)? Is there any
    >> > > function in C to do this?
    >> >
    >> > It's not clear what you mean by "without using system memory". Why
    >> > wouldn't you want to use memory if this results in faster operation?
    >> >

    >>
    >> I thouhg about something that would be some kind of low level function
    >> that would modify pointers not real data, the same way as we do with
    >> pointer to data in memory, e.g.:
    >> int a*, b*, temp*;
    >> ...
    >> //the swap
    >> temp = a;
    >> a = b;
    >> b = temp;
    >>
    >> And the data was swapped without moving it in memory. I asked if there
    >> is such functionality that would enable me to do something like that
    >> but with rows in file. Swap them without moving them on disk.

    >
    > No, because the information in files is not organized by lines, it's
    > just sequences of bytes organized by disk blocks. If you wanted to
    > rearrange the blocks it would theoretically be possible to update the
    > block pointers in the inode. However, there's no API for this, so you
    > would have to do it by writing a new ioctl or device driver, or by
    > accessing the disk device directly (and you'd need to unmount the
    > filesystem first, to avoid conflicts with the kernel's in-memory copies
    > of inodes).
    >
    > But to rearrange lines you have to read the file into memory, search for
    > the newline characters, then write the lines back out to the file.


    If you want to rearrange the lines so you can read the rearranged file
    with ordinary stdio calls, you'll need to physically re-write the file
    (unless you can manage to do some nasty low-level file system stuff).
    But if you want to be able to access the lines in some specified order
    other than their physical order in the file, you can just create a
    separate index. Do one pass over the file, creating an index of the
    position of the start of each line (using ftell() or fgetpos()). Once
    you have the index, you can use fseek() or fsetpos() to jump directly
    to the beginning of any line you want.

    Reading the file in reverse order is likely to be less efficient than
    if you had physically reversed the file; the performance tradeoff
    depends on how often you read it.

    --
    Keith Thompson (The_Other_Keith) <http://www.ghoti.net/~kst>
    San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
    We must do something. This is something. Therefore, we must do this.
    Keith Thompson, Jan 13, 2006
    #10
  11. "ulyses" <> writes:

    > David Schwartz napisal(a):
    >> "ulyses" <> wrote in message
    >> news:...
    >>
    >> > Let's assume I have following file:
    >> >
    >> > 2938929384902491233.....
    >> > 923949919199191919112....
    >> >
    >> > File contains INTs only. What is more they are huge. For example first
    >> > row in file may contain integer which size is 50MB and the second 30MB.
    >> > Now we come to my problem. Is there possibility to swap this rows
    >> > without using system memory (preferably in Unix/Linux)? Is there any
    >> > function in C to do this?

    >>
    >> It's not clear what you mean by "without using system memory". Why
    >> wouldn't you want to use memory if this results in faster operation?
    >>

    >
    > I thouhg about something that would be some kind of low level function
    > that would modify pointers not real data, the same way as we do with
    > pointer to data in memory, e.g.:
    > int a*, b*, temp*;
    > ...
    > //the swap
    > temp = a;
    > a = b;
    > b = temp;
    >
    > And the data was swapped without moving it in memory. I asked if there
    > is such functionality that would enable me to do something like that
    > but with rows in file. Swap them without moving them on disk.


    You can build an index yourself.

    Read the data file byte by byte, and note the offset of all newline
    characters. Save the list of offsets to an index file.

    Later you can read the index file, and when you want to read the nth
    number, you seek in the data file to the nth offset.

    If you want to swap two numbers, you just swap the two offsets in the
    index, which involves reading and writing at most two blocks. You can
    also easily "delete" or "insert" numbers. To "delete" a number you
    just remove its offset in the index (or put it in a "free list" if you
    want to be able to reuse the space). To "insert" a number, you just
    note the file size as the offset to the new number which you merely
    append to the end of the file, and insert the offset in the index.


    --
    __Pascal Bourguignon__ http://www.informatimago.com/

    "Debugging? Klingons do not debug! Our software does not coddle the
    weak."
    Pascal Bourguignon, Jan 13, 2006
    #11
  12. In article <>,
    Pascal Bourguignon <> wrote:

    > You can build an index yourself.
    >
    > Read the data file byte by byte, and note the offset of all newline
    > characters. Save the list of offsets to an index file.


    Even better would be if you could get the program that creates the file
    in the first place to create an index at the same time.

    --
    Barry Margolin,
    Arlington, MA
    *** PLEASE post questions in newsgroups, not directly to me ***
    *** PLEASE don't copy me on replies, I'll read them in the group ***
    Barry Margolin, Jan 14, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. linkswanted
    Replies:
    0
    Views:
    673
    linkswanted
    Dec 21, 2007
  2. linkswanted
    Replies:
    0
    Views:
    1,537
    linkswanted
    Jan 6, 2008
  3. linkswanted
    Replies:
    0
    Views:
    437
    linkswanted
    Jan 23, 2008
  4. linkswanted
    Replies:
    0
    Views:
    620
    linkswanted
    Jan 24, 2008
  5. Andreas
    Replies:
    1
    Views:
    1,668
    Piet van Oostrum
    Mar 18, 2009
Loading...

Share This Page