Creating a file with $SIZE

k.i.n.g. · Mar 12, 2008

Hi All,

I would like create files of different size, taking size as user
input. I need to check the data transfer rates from one network to
another . In order to do this I will have to create files of diff size
and work out. I am new to Python

Thanks in advance.

KK

Chris · Mar 12, 2008

Hi All,

I would like create files of different size, taking size as user
input. I need to check the data transfer rates from one network to
another . In order to do this I will have to create files of diff size
and work out. I am new to Python

Thanks in advance.

KK

Welcome to Python.

If you just want to create files with random junk from the user input
then maybe something along these lines would help:

import sys, random

def random_junk(number_of_characters):
tmp = []
while number_of_characters:
tmp.append(random.randint(0, 127))
number_of_characters -= 1
return ''.join(map(str,tmp))

if len(sys.argv) < 2:
sys.exit('Usage

ython %s <space seperated
filesizes>'%sys.argv[0])

for each_argv in sys.argv[1:]:
output_file = open(each_argv,'wb').write(random_junk(each_argv))

Chris · Mar 12, 2008

Hi All,

Click to expand...

I would like create files of different size, taking size as user
input. I need to check the data transfer rates from one network to
another . In order to do this I will have to create files of diff size
and work out. I am new to Python

Click to expand...

Thanks in advance.

Click to expand...

KK

Click to expand...

Welcome to Python.

If you just want to create files with random junk from the user input
then maybe something along these lines would help:

import sys, random

def random_junk(number_of_characters):
tmp = []
while number_of_characters:
tmp.append(random.randint(0, 127))
number_of_characters -= 1
return ''.join(map(str,tmp))

if len(sys.argv) < 2:
sys.exit('Usageython %s <space seperated
filesizes>'%sys.argv[0])

for each_argv in sys.argv[1:]:
output_file = open(each_argv,'wb').write(random_junk(each_argv))

Sorry, meant

def random_junk(number_of_characters):
tmp = []
while number_of_characters:
tmp.append(chr(random.randint(0, 127)))
number_of_characters -= 1
return ''.join(tmp)

k.i.n.g. · Mar 12, 2008

I think I am not clear with my question, I am sorry. Here goes the
exact requirement.

We use dd command in Linux to create a file with of required size. In
similar way, on windows I would like to use python to take the size of
the file( 50MB, 1GB ) as input from user and create a uncompressed
file of the size given by the user.

ex: If user input is 50M, script should create 50Mb of blank or empty
file

Thank you

Robert Bossy · Mar 12, 2008

k.i.n.g. said:
I think I am not clear with my question, I am sorry. Here goes the
exact requirement.

We use dd command in Linux to create a file with of required size. In
similar way, on windows I would like to use python to take the size of
the file( 50MB, 1GB ) as input from user and create a uncompressed
file of the size given by the user.

ex: If user input is 50M, script should create 50Mb of blank or empty
file

def make_blank_file(path, size):
f = open(path, 'w')
f.seek(size - 1)
f.write('\0')
f.close()

I'm not sure the f.seek() trick will work on all platforms, so you can:

def make_blank_file(path, size):
f = open(path, 'w')
f.write('\0' * size)
f.close()

Cheers,
RB

Matt Nordhoff · Mar 12, 2008

Robert said:
def make_blank_file(path, size):
f = open(path, 'w')
f.seek(size - 1)
f.write('\0')
f.close()

I'm not sure the f.seek() trick will work on all platforms, so you can:

def make_blank_file(path, size):
f = open(path, 'w')
f.write('\0' * size)
f.close()

I point out that a 1 GB string is probably not a good idea.

def make_blank_file(path, size):
chunksize = 10485760 # 10 MB
chunk = '\0' * chunksize
left = size
fh = open(path, 'wb')
while left > chunksize:
fh.write(chunk)
left -= chunksize
if left > 0:
fh.write('\0' * left)
fh.close()

Cheers,
RB

--

Robert Bossy · Mar 12, 2008

Matt said:
I point out that a 1 GB string is probably not a good idea.

def make_blank_file(path, size):
chunksize = 10485760 # 10 MB
chunk = '\0' * chunksize
left = size
fh = open(path, 'wb')
while left > chunksize:
fh.write(chunk)
left -= chunksize
if left > 0:
fh.write('\0' * left)
fh.close()

Indeed! Maybe the best choice for chunksize would be the file's buffer
size... I won't search the doc how to get the file's buffer size because
I'm too cool to use that function and prefer the seek() option since
it's lighning fast regardless the size of the file and it takes near to
zero memory.

Cheers,
RB

cokofreedom · Mar 12, 2008

Indeed! Maybe the best choice for chunksize would be the file's buffer
size... I won't search the doc how to get the file's buffer size because
I'm too cool to use that function and prefer the seek() option since
it's lighning fast regardless the size of the file and it takes near to
zero memory.

Cheers,
RB

But what platforms does it work on / not work on?

Marco Mariani · Mar 12, 2008

Robert said:
Indeed! Maybe the best choice for chunksize would be the file's buffer
size... I won't search the doc how to get the file's buffer size because
I'm too cool to use that function and prefer the seek() option since
it's lighning fast regardless the size of the file and it takes near to
zero memory.

And makes a hole in the file, I suppose, hence the fragmentation.

The OP explicitly asked for an uncompressed file.

drobinow · Mar 13, 2008

We use dd command in Linux to create a file with of required size.

If you just want to get your work done, you might consider the cygwin
dd command.
Learning to write python is a worthwhile endeavour in any case.

k.i.n.g. · Mar 13, 2008

If you just want to get your work done, you might consider the cygwin
dd command.
Learning to write python is a worthwhile endeavour in any case.

While I just started learning programming/python, I got this
requirement at my workplace. I want to learn python than just get
things done.

Thank you all for the solutions, I will try them and let you all know
about my results.

Robert Bossy · Mar 13, 2008

But what platforms does it work on / not work on?

Posix. It's been ages since I touched Windows, so I don't know if XP and
Vista are posix or not.
Though, as Marco Mariani mentioned, this may create a fragmented file.
It may or may not be an hindrance depending on what you want to do with
it, but the circumstances in which this is a problem are quite rare.

RB

Bryan Olson · Mar 13, 2008

k.i.n.g. said:
I think I am not clear with my question, I am sorry. Here goes the
exact requirement.

We use dd command in Linux to create a file with of required size. In
similar way, on windows I would like to use python to take the size of
the file( 50MB, 1GB ) as input from user and create a uncompressed
file of the size given by the user.

ex: If user input is 50M, script should create 50Mb of blank or empty
file

You mean all zero bytes? Python cannot guarantee that the system
will not compress such a file. For testing data transfer rates,
random data is a usually a better choice.

Bryan Olson · Mar 14, 2008

That bit strikes me as silly.

Posix.

Posix is on the does-work side, just to be clear.

http://www.opengroup.org/onlinepubs/000095399/functions/fseek.html

It's been ages since I touched Windows, so I don't know if XP and
Vista are posix or not.

I tried on WinXP, with both an NTFS and FAT32 disk, and it worked
on both.

I found some Microsoft documentation noting: "On some
platforms, seeking past the end of a file and then doing a write
operation results in undefined behavior."

http://msdn2.microsoft.com/en-us/library/system.io.filestream.seek(VS.71).aspx

Though, as Marco Mariani mentioned, this may create a fragmented file.
It may or may not be an hindrance depending on what you want to do with
it, but the circumstances in which this is a problem are quite rare.

Writing zeros might also create a fragmented and/or compressed file.
Using random data, which is contrary to the stated requirement but
usually better for stated application, will prevent compression but
not prevent fragmentation.

I'm not entirely clear on what the OP is doing. If he's testing
network throughput just by creating this file on a remote server,
the seek-way-past-end-then-write trick won't serve his purpose.
Even if the filesystem has to write all the zeros, the protocols
don't actually send those zeros.

Robert Bossy · Mar 14, 2008

Bryan said:
That bit strikes me as silly.

The size of the chunk must be as little as possible in order to minimize
memory consumption. However below the buffer-size, you'll end up filling
the buffer anyway before actually writing on disk.

Writing zeros might also create a fragmented and/or compressed file.
Using random data, which is contrary to the stated requirement but
usually better for stated application, will prevent compression but
not prevent fragmentation.

I'm not entirely clear on what the OP is doing. If he's testing
network throughput just by creating this file on a remote server,
the seek-way-past-end-then-write trick won't serve his purpose.
Even if the filesystem has to write all the zeros, the protocols
don't actually send those zeros.

Amen.

Cheers,
RB

Bryan Olson · Mar 14, 2008

Robert said:
The size of the chunk must be as little as possible in order to minimize
memory consumption. However below the buffer-size, you'll end up filling
the buffer anyway before actually writing on disk.

First, which buffer? The file library's buffer is of trivial size,
a few KB, and if we wanted to save even that we'd use os.open and
have no such buffer at all. The OS may set up a file-specific
buffer, but again those are small, and we could fill our file much
faster with larger writes.

Kernel buffers/pages are dynamically assigned on modern operating
systems. There is no particular buffer size for the file if you mean
the amount of kernel memory holding the written data. Some OS's
do not buffer writes to disk files; the write doesn't return until
the data goes to disk (though they may cache it for future reads).

To fill the file fast, there's a large range of reasonable sizes
for writing, but user-space buffer size - typically around 4K - is
too small. 1 GB is often disastrously large, forcing paging to and
from disk to access the memory. In this thread, Matt Nordhoff used
10MB; fine size today, and probably for several years to come.

If the OP is writing to a remote disk file to test network
throughput, there's another size limit to consider. Network file-
system protocols do not steam very large writes; the client has to
break a large write into several smaller writes. NFS version 2 had
a limit of 8 KB; version 3 removed the limit by allowing the server
to tell the client the largest size it supports. (Version 4 is now
out, in hundreds of pages of RFC that I hope to avoid reading.)

rbossy · Mar 15, 2008

Quoting Bryan Olson said:
First, which buffer? The file library's buffer is of trivial size,
a few KB, and if we wanted to save even that we'd use os.open and
have no such buffer at all. The OS may set up a file-specific
buffer, but again those are small, and we could fill our file much
faster with larger writes.

Kernel buffers/pages are dynamically assigned on modern operating
systems. There is no particular buffer size for the file if you mean
the amount of kernel memory holding the written data. Some OS's
do not buffer writes to disk files; the write doesn't return until
the data goes to disk (though they may cache it for future reads).

To fill the file fast, there's a large range of reasonable sizes
for writing, but user-space buffer size - typically around 4K - is
too small. 1 GB is often disastrously large, forcing paging to and
from disk to access the memory. In this thread, Matt Nordhoff used
10MB; fine size today, and probably for several years to come.

If the OP is writing to a remote disk file to test network
throughput, there's another size limit to consider. Network file-
system protocols do not steam very large writes; the client has to
break a large write into several smaller writes. NFS version 2 had
a limit of 8 KB; version 3 removed the limit by allowing the server
to tell the client the largest size it supports. (Version 4 is now
out, in hundreds of pages of RFC that I hope to avoid reading.)

Wow. That's a lot knowledge in a single post. Thanks for the information, Bryan.

Cheers,
RB

Gabriel Genellina · Mar 16, 2008

We use dd command in Linux to create a file with of required size. In
similar way, on windows I would like to use python to take the size of
the file( 50MB, 1GB ) as input from user and create a uncompressed
file of the size given by the user.

The equivalent command on Windows would be:

fsutil file createnew filename size

Creating a file with $SIZE	1	Mar 12, 2008
Creating books with Sets	1	Sep 27, 2022
Creating a direct download div link for pdf file	3	Mar 19, 2023
creating size-limited tar files	22	Nov 7, 2012
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023
Accessing array index addresses with custom datatype in a function	0	Jun 2, 2022
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
How to sort a CSV file with merge sort JAVA	7	May 6, 2021

Creating a file with $SIZE

k.i.n.g.

Chris

Chris

k.i.n.g.

Robert Bossy

Matt Nordhoff

Robert Bossy

cokofreedom

Marco Mariani

drobinow

k.i.n.g.

Robert Bossy

Bryan Olson

Bryan Olson

Robert Bossy

Bryan Olson

rbossy

Gabriel Genellina

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads