Creating a file with $SIZE

K

k.i.n.g.

Hi All,

I would like create files of different size, taking size as user
input. I need to check the data transfer rates from one network to
another . In order to do this I will have to create files of diff size
and work out. I am new to Python

Thanks in advance.

KK
 
C

Chris

Hi All,

I would like create files of different size, taking size as user
input. I need to check the data transfer rates from one network to
another . In order to do this I will have to create files of diff size
and work out. I am new to Python

Thanks in advance.

KK

Welcome to Python.

If you just want to create files with random junk from the user input
then maybe something along these lines would help:

import sys, random

def random_junk(number_of_characters):
tmp = []
while number_of_characters:
tmp.append(random.randint(0, 127))
number_of_characters -= 1
return ''.join(map(str,tmp))

if len(sys.argv) < 2:
sys.exit('Usage:python %s <space seperated
filesizes>'%sys.argv[0])

for each_argv in sys.argv[1:]:
output_file = open(each_argv,'wb').write(random_junk(each_argv))
 
C

Chris

I would like create files of different size, taking size as user
input. I need to check the data transfer rates from one network to
another . In order to do this I will have to create files of diff size
and work out. I am new to Python
Thanks in advance.

Welcome to Python.

If you just want to create files with random junk from the user input
then maybe something along these lines would help:

import sys, random

def random_junk(number_of_characters):
    tmp = []
    while number_of_characters:
        tmp.append(random.randint(0, 127))
        number_of_characters -= 1
    return ''.join(map(str,tmp))

if len(sys.argv) < 2:
    sys.exit('Usage:python %s <space seperated
filesizes>'%sys.argv[0])

for each_argv in sys.argv[1:]:
    output_file = open(each_argv,'wb').write(random_junk(each_argv))

Sorry, meant

def random_junk(number_of_characters):
tmp = []
while number_of_characters:
tmp.append(chr(random.randint(0, 127)))
number_of_characters -= 1
return ''.join(tmp)
 
K

k.i.n.g.

I think I am not clear with my question, I am sorry. Here goes the
exact requirement.

We use dd command in Linux to create a file with of required size. In
similar way, on windows I would like to use python to take the size of
the file( 50MB, 1GB ) as input from user and create a uncompressed
file of the size given by the user.

ex: If user input is 50M, script should create 50Mb of blank or empty
file

Thank you
 
R

Robert Bossy

k.i.n.g. said:
I think I am not clear with my question, I am sorry. Here goes the
exact requirement.

We use dd command in Linux to create a file with of required size. In
similar way, on windows I would like to use python to take the size of
the file( 50MB, 1GB ) as input from user and create a uncompressed
file of the size given by the user.

ex: If user input is 50M, script should create 50Mb of blank or empty
file
def make_blank_file(path, size):
f = open(path, 'w')
f.seek(size - 1)
f.write('\0')
f.close()

I'm not sure the f.seek() trick will work on all platforms, so you can:

def make_blank_file(path, size):
f = open(path, 'w')
f.write('\0' * size)
f.close()

Cheers,
RB
 
M

Matt Nordhoff

Robert said:
def make_blank_file(path, size):
f = open(path, 'w')
f.seek(size - 1)
f.write('\0')
f.close()

I'm not sure the f.seek() trick will work on all platforms, so you can:

def make_blank_file(path, size):
f = open(path, 'w')
f.write('\0' * size)
f.close()

I point out that a 1 GB string is probably not a good idea.

def make_blank_file(path, size):
chunksize = 10485760 # 10 MB
chunk = '\0' * chunksize
left = size
fh = open(path, 'wb')
while left > chunksize:
fh.write(chunk)
left -= chunksize
if left > 0:
fh.write('\0' * left)
fh.close()
Cheers,
RB
--
 
R

Robert Bossy

Matt said:
I point out that a 1 GB string is probably not a good idea.

def make_blank_file(path, size):
chunksize = 10485760 # 10 MB
chunk = '\0' * chunksize
left = size
fh = open(path, 'wb')
while left > chunksize:
fh.write(chunk)
left -= chunksize
if left > 0:
fh.write('\0' * left)
fh.close()
Indeed! Maybe the best choice for chunksize would be the file's buffer
size... I won't search the doc how to get the file's buffer size because
I'm too cool to use that function and prefer the seek() option since
it's lighning fast regardless the size of the file and it takes near to
zero memory.

Cheers,
RB
 
C

cokofreedom

Indeed! Maybe the best choice for chunksize would be the file's buffer
size... I won't search the doc how to get the file's buffer size because
I'm too cool to use that function and prefer the seek() option since
it's lighning fast regardless the size of the file and it takes near to
zero memory.

Cheers,
RB

But what platforms does it work on / not work on?
 
M

Marco Mariani

Robert said:
Indeed! Maybe the best choice for chunksize would be the file's buffer
size... I won't search the doc how to get the file's buffer size because
I'm too cool to use that function and prefer the seek() option since
it's lighning fast regardless the size of the file and it takes near to
zero memory.

And makes a hole in the file, I suppose, hence the fragmentation.

The OP explicitly asked for an uncompressed file.
 
D

drobinow

We use dd command in Linux to create a file with of required size.
If you just want to get your work done, you might consider the cygwin
dd command.
Learning to write python is a worthwhile endeavour in any case.
 
K

k.i.n.g.

If you just want to get your work done, you might consider the cygwin
dd command.
Learning to write python is a worthwhile endeavour in any case.

While I just started learning programming/python, I got this
requirement at my workplace. I want to learn python than just get
things done.

Thank you all for the solutions, I will try them and let you all know
about my results.
 
R

Robert Bossy

But what platforms does it work on / not work on?
Posix. It's been ages since I touched Windows, so I don't know if XP and
Vista are posix or not.
Though, as Marco Mariani mentioned, this may create a fragmented file.
It may or may not be an hindrance depending on what you want to do with
it, but the circumstances in which this is a problem are quite rare.

RB
 
B

Bryan Olson

k.i.n.g. said:
I think I am not clear with my question, I am sorry. Here goes the
exact requirement.

We use dd command in Linux to create a file with of required size. In
similar way, on windows I would like to use python to take the size of
the file( 50MB, 1GB ) as input from user and create a uncompressed
file of the size given by the user.

ex: If user input is 50M, script should create 50Mb of blank or empty
file

You mean all zero bytes? Python cannot guarantee that the system
will not compress such a file. For testing data transfer rates,
random data is a usually a better choice.
 
B

Bryan Olson

That bit strikes me as silly.

Posix is on the does-work side, just to be clear.

http://www.opengroup.org/onlinepubs/000095399/functions/fseek.html
It's been ages since I touched Windows, so I don't know if XP and
Vista are posix or not.

I tried on WinXP, with both an NTFS and FAT32 disk, and it worked
on both.

I found some Microsoft documentation noting: "On some
platforms, seeking past the end of a file and then doing a write
operation results in undefined behavior."

http://msdn2.microsoft.com/en-us/library/system.io.filestream.seek(VS.71).aspx

Though, as Marco Mariani mentioned, this may create a fragmented file.
It may or may not be an hindrance depending on what you want to do with
it, but the circumstances in which this is a problem are quite rare.

Writing zeros might also create a fragmented and/or compressed file.
Using random data, which is contrary to the stated requirement but
usually better for stated application, will prevent compression but
not prevent fragmentation.

I'm not entirely clear on what the OP is doing. If he's testing
network throughput just by creating this file on a remote server,
the seek-way-past-end-then-write trick won't serve his purpose.
Even if the filesystem has to write all the zeros, the protocols
don't actually send those zeros.
 
R

Robert Bossy

Bryan said:
That bit strikes me as silly.
The size of the chunk must be as little as possible in order to minimize
memory consumption. However below the buffer-size, you'll end up filling
the buffer anyway before actually writing on disk.

Writing zeros might also create a fragmented and/or compressed file.
Using random data, which is contrary to the stated requirement but
usually better for stated application, will prevent compression but
not prevent fragmentation.

I'm not entirely clear on what the OP is doing. If he's testing
network throughput just by creating this file on a remote server,
the seek-way-past-end-then-write trick won't serve his purpose.
Even if the filesystem has to write all the zeros, the protocols
don't actually send those zeros.
Amen.

Cheers,
RB
 
B

Bryan Olson

Robert said:
The size of the chunk must be as little as possible in order to minimize
memory consumption. However below the buffer-size, you'll end up filling
the buffer anyway before actually writing on disk.

First, which buffer? The file library's buffer is of trivial size,
a few KB, and if we wanted to save even that we'd use os.open and
have no such buffer at all. The OS may set up a file-specific
buffer, but again those are small, and we could fill our file much
faster with larger writes.

Kernel buffers/pages are dynamically assigned on modern operating
systems. There is no particular buffer size for the file if you mean
the amount of kernel memory holding the written data. Some OS's
do not buffer writes to disk files; the write doesn't return until
the data goes to disk (though they may cache it for future reads).

To fill the file fast, there's a large range of reasonable sizes
for writing, but user-space buffer size - typically around 4K - is
too small. 1 GB is often disastrously large, forcing paging to and
from disk to access the memory. In this thread, Matt Nordhoff used
10MB; fine size today, and probably for several years to come.

If the OP is writing to a remote disk file to test network
throughput, there's another size limit to consider. Network file-
system protocols do not steam very large writes; the client has to
break a large write into several smaller writes. NFS version 2 had
a limit of 8 KB; version 3 removed the limit by allowing the server
to tell the client the largest size it supports. (Version 4 is now
out, in hundreds of pages of RFC that I hope to avoid reading.)
 
R

rbossy

Quoting Bryan Olson said:
First, which buffer? The file library's buffer is of trivial size,
a few KB, and if we wanted to save even that we'd use os.open and
have no such buffer at all. The OS may set up a file-specific
buffer, but again those are small, and we could fill our file much
faster with larger writes.

Kernel buffers/pages are dynamically assigned on modern operating
systems. There is no particular buffer size for the file if you mean
the amount of kernel memory holding the written data. Some OS's
do not buffer writes to disk files; the write doesn't return until
the data goes to disk (though they may cache it for future reads).

To fill the file fast, there's a large range of reasonable sizes
for writing, but user-space buffer size - typically around 4K - is
too small. 1 GB is often disastrously large, forcing paging to and
from disk to access the memory. In this thread, Matt Nordhoff used
10MB; fine size today, and probably for several years to come.

If the OP is writing to a remote disk file to test network
throughput, there's another size limit to consider. Network file-
system protocols do not steam very large writes; the client has to
break a large write into several smaller writes. NFS version 2 had
a limit of 8 KB; version 3 removed the limit by allowing the server
to tell the client the largest size it supports. (Version 4 is now
out, in hundreds of pages of RFC that I hope to avoid reading.)

Wow. That's a lot knowledge in a single post. Thanks for the information, Bryan.

Cheers,
RB
 
G

Gabriel Genellina

We use dd command in Linux to create a file with of required size. In
similar way, on windows I would like to use python to take the size of
the file( 50MB, 1GB ) as input from user and create a uncompressed
file of the size given by the user.

The equivalent command on Windows would be:

fsutil file createnew filename size
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,013
Latest member
KatriceSwa

Latest Threads

Top