Re: Manipulate Large Binary Files

Discussion in 'Python' started by Derek Martin, Apr 2, 2008.

  1. Derek Martin

    Derek Martin Guest

    On Wed, Apr 02, 2008 at 02:09:45PM -0400, Derek Tracy wrote:
    > Both are clocking in at the same time (1m 5sec for 2.6Gb), are there
    > any ways I can optimize either solution?


    Buy faster disks? How long do you expect it to take? At 65s, you're
    already reading/writing 2.6GB at a sustained transfer rate of about
    42.6 MB/s. That's nothing to sneeze at... Your disks, and not your
    program, are almost certainly the real bottleneck. Unless you have
    reason to believe your hardware should be significantly faster...

    That said, due to normal I/O generally involving double-buffering, you
    might be able to speed things up noticably by using Memory-Mapped I/O
    (MMIO). It depends on whether or not the implementation of the Python
    things you're using already use MMIO under the hood, and whether or
    not MMIO happens to be broken in your OS. :)

    > Would turning off the read/write buff increase speed?


    No...

    --
    Derek D. Martin
    http://www.pizzashack.org/
    GPG Key ID: 0x81CFE75D


    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.1 (GNU/Linux)

    iD8DBQFH8+x5HEnASN++rQIRAvGNAJ92jOdZw3hG21PJz6Nav5wfv5FaxACdHjkN
    qiJtKaj2brdY+spF1bClRT0=
    =5QGm
    -----END PGP SIGNATURE-----
     
    Derek Martin, Apr 2, 2008
    #1
    1. Advertising

  2. Derek Martin

    Paul Rubin Guest

    Derek Martin <> writes:
    > > Both are clocking in at the same time (1m 5sec for 2.6Gb), are there
    > > any ways I can optimize either solution?


    Getting 40+ MB/sec through a file system is pretty impressive.
    Sounds like a RAID?

    > That said, due to normal I/O generally involving double-buffering, you
    > might be able to speed things up noticably by using Memory-Mapped I/O
    > (MMIO). It depends on whether or not the implementation of the Python
    > things you're using already use MMIO under the hood, and whether or
    > not MMIO happens to be broken in your OS. :)


    Python has the mmap module and I use it sometimes, but it's not
    necessarily the right thing for something like this. Each page you
    try to read from results in own delay while the resulting page fault
    is serviced, so any overlapped i/o you get comes from the OS being
    nice enough to do some predictive readahead for you on sequential
    access if it does that. By coincidence there are a couple other
    threads mentioning AIO which is a somewhat more powerful mechanism.
     
    Paul Rubin, Apr 3, 2008
    #2
    1. Advertising

  3. Derek Martin

    Derek Tracy Guest

    ---------------------------
    Derek Tracy

    ---------------------------

    On Apr 3, 2008, at 3:03 AM, Paul Rubin <"http://
    phr.cx"@NOSPAM.invalid> wrote:

    > Derek Martin <> writes:
    >>> Both are clocking in at the same time (1m 5sec for 2.6Gb), are there
    >>> any ways I can optimize either solution?

    >
    > Getting 40+ MB/sec through a file system is pretty impressive.
    > Sounds like a RAID?
    >
    >> That said, due to normal I/O generally involving double-buffering,
    >> you
    >> might be able to speed things up noticably by using Memory-Mapped I/O
    >> (MMIO). It depends on whether or not the implementation of the
    >> Python
    >> things you're using already use MMIO under the hood, and whether or
    >> not MMIO happens to be broken in your OS. :)

    >
    > Python has the mmap module and I use it sometimes, but it's not
    > necessarily the right thing for something like this. Each page you
    > try to read from results in own delay while the resulting page fault
    > is serviced, so any overlapped i/o you get comes from the OS being
    > nice enough to do some predictive readahead for you on sequential
    > access if it does that. By coincidence there are a couple other
    > threads mentioning AIO which is a somewhat more powerful mechanism.
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list
     
    Derek Tracy, Apr 3, 2008
    #3
  4. Derek Martin

    Derek Tracy Guest

    On Apr 3, 2008, at 3:03 AM, Paul Rubin <"http://
    phr.cx"@NOSPAM.invalid> wrote:

    > Derek Martin <> writes:
    >>> Both are clocking in at the same time (1m 5sec for 2.6Gb), are there
    >>> any ways I can optimize either solution?

    >
    > Getting 40+ MB/sec through a file system is pretty impressive.
    > Sounds like a RAID?
    >
    >> That said, due to normal I/O generally involving double-buffering,
    >> you
    >> might be able to speed things up noticably by using Memory-Mapped I/O
    >> (MMIO). It depends on whether or not the implementation of the
    >> Python
    >> things you're using already use MMIO under the hood, and whether or
    >> not MMIO happens to be broken in your OS. :)

    >
    > Python has the mmap module and I use it sometimes, but it's not
    > necessarily the right thing for something like this. Each page you
    > try to read from results in own delay while the resulting page fault
    > is serviced, so any overlapped i/o you get comes from the OS being
    > nice enough to do some predictive readahead for you on sequential
    > access if it does that. By coincidence there are a couple other
    > threads mentioning AIO which is a somewhat more powerful mechanism.
    >
    > --
    > http://mail.python.org/mailman/listinfo/python-list


    I am running it on a RAID(stiped raid 5 using fibre channel), but I
    was expecting better performance.

    I will have to check into AIO, thanks for the bone.
     
    Derek Tracy, Apr 3, 2008
    #4
  5. Derek Martin

    Derek Martin Guest

    On Thu, Apr 03, 2008 at 02:36:02PM -0400, Derek Tracy wrote:
    > I am running it on a RAID(stiped raid 5 using fibre channel), but I
    > was expecting better performance.


    Don't forget that you're reading from and writing to the same
    spindles. Writes are slower on RAID 5, and you have to read the data
    before you can write it...

    --
    Derek D. Martin
    http://www.pizzashack.org/
    GPG Key ID: 0x81CFE75D



    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.2.1 (GNU/Linux)

    iD8DBQFH9WNNHEnASN++rQIRAiqbAKCH9C/9KI/Tyg9scbDPwEg8RO8XdwCgsX1F
    GxJohTpsKQ4IKVyxWWZumRM=
    =qcoT
    -----END PGP SIGNATURE-----
     
    Derek Martin, Apr 4, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Olivier Matrot
    Replies:
    2
    Views:
    449
    Olivier Matrot
    May 3, 2005
  2. Dave
    Replies:
    1
    Views:
    331
    Mike Wahler
    Jan 22, 2005
  3. ross
    Replies:
    10
    Views:
    1,987
  4. Madhusudhanan Chandrasekaran

    regd efficient methods to manipulate *large* files

    Madhusudhanan Chandrasekaran, May 1, 2006, in forum: Python
    Replies:
    2
    Views:
    346
    Paddy
    May 1, 2006
  5. Replies:
    5
    Views:
    612
Loading...

Share This Page