mimicking a file in memory

p. · Nov 20, 2007

I am using the mutagen module to extract id3 information from mp3
files. In order to do this, you give mutagen a filename, which it
converts into a file object using the python built-in "file" function.

Unfortunately, my mp3 files don't live locally. They are on a number
of remote servers which I access using urllib2.

Here is my dilemma:
I don't want to copy the files into a local directory for mutagen's
sake, only to have to remove them afterward. Instead, I'd like to load
the files into memory and still be able to hand the built-in "file"
function a filename to access the file in memory.

Any ideas on how to do this?

Larry Bates · Nov 20, 2007

p. said:
I am using the mutagen module to extract id3 information from mp3
files. In order to do this, you give mutagen a filename, which it
converts into a file object using the python built-in "file" function.

Unfortunately, my mp3 files don't live locally. They are on a number
of remote servers which I access using urllib2.

Here is my dilemma:
I don't want to copy the files into a local directory for mutagen's
sake, only to have to remove them afterward. Instead, I'd like to load
the files into memory and still be able to hand the built-in "file"
function a filename to access the file in memory.

Any ideas on how to do this?

Looks like you would need to "hack" the source and replace lines like:

def load(self, filename):
self.filename = filename
fileobj = file(filename, "rb")

with something like:

def load(self, filename):
if hasattr(filename, 'read'):
fileobj=filename
if hasattr(filename, 'name'):
self.filename = filename
else:
self.filename = 'unknown'

else:
self.filename = filename
fileobj = file(filename, "rb")

-Larry

p. · Nov 20, 2007

Looks like you would need to "hack" the source and replace lines like:

def load(self, filename):
self.filename = filename
fileobj = file(filename, "rb")

with something like:

def load(self, filename):
if hasattr(filename, 'read'):
fileobj=filename
if hasattr(filename, 'name'):
self.filename = filename
else:
self.filename = 'unknown'

else:
self.filename = filename
fileobj = file(filename, "rb")

-Larry

I thought about this approach originally, but here's the catch there:
the read method isn't the only method i need. mutagen calls the seek
method on the file object. urllib2 returns a "file-like object" that
does not have a seek method associated with it, which means i'd have
to extend urllib2 to add that method. Problem is, i don't know how you
could implement a seek method with urllib2.

Jarek Zgoda · Nov 20, 2007

p. pisze:

I am using the mutagen module to extract id3 information from mp3
files. In order to do this, you give mutagen a filename, which it
converts into a file object using the python built-in "file" function.

Unfortunately, my mp3 files don't live locally. They are on a number
of remote servers which I access using urllib2.

Here is my dilemma:
I don't want to copy the files into a local directory for mutagen's
sake, only to have to remove them afterward. Instead, I'd like to load
the files into memory and still be able to hand the built-in "file"
function a filename to access the file in memory.

Any ideas on how to do this?

Try with StringIO/cStringIO, these modules are supposed to give you
in-memory objects compatible with file object interface.

Tim Chase · Nov 20, 2007

I thought about this approach originally, but here's the catch

there: the read method isn't the only method i need. mutagen
calls the seek method on the file object. urllib2 returns a
"file-like object" that does not have a seek method associated
with it, which means i'd have to extend urllib2 to add that
method. Problem is, i don't know how you could implement a
seek method with urllib2.

It sounds like you're almost reaching for the StringIO library
(or the similar cStringIO library).
'google'

Hope this helps,

-tkc

Grant Edwards · Nov 20, 2007

By "memory" I presume you mean virtual memory? RAM with
disk-blocks as backing store? On any real OS, tempfiles are
just RAM with disk-blocks as backing store.

Sound similar? The only difference is the API used to access
the bytes. You want a file-I/O API, so you can either use the
extensively tested and and highly optimized filesystem code in
the OS to make disk-backed-RAM look like a file, or you can try
to write Python code that does the same thing.

Which do you think is going to work faster/better?

[The kernel is generally better at knowing what needs to be in
RAM than you are -- let it do its job.]

IOW: just use a temp file. Life will be simple. The bytes
probably won't ever hit the platters (if they do, then that
means they would have the other way too).

p. · Nov 20, 2007

By "memory" I presume you mean virtual memory? RAM with
disk-blocks as backing store? On any real OS, tempfiles are
just RAM with disk-blocks as backing store.

Sound similar? The only difference is the API used to access
the bytes. You want a file-I/O API, so you can either use the
extensively tested and and highly optimized filesystem code in
the OS to make disk-backed-RAM look like a file, or you can try
to write Python code that does the same thing.

Which do you think is going to work faster/better?

[The kernel is generally better at knowing what needs to be in
RAM than you are -- let it do its job.]

IOW: just use a temp file. Life will be simple. The bytes
probably won't ever hit the platters (if they do, then that
means they would have the other way too).

Thanks all.

Grant, are temp files automatically put into ram for all linux
distros? at any rate, i could set up ram disk. much better solution
than using python...except that i've never done a ram disk before.
more reading to do...

Diez B. Roggisch · Nov 20, 2007

p. said:
Here is my dilemma: I don't want to copy the files into a
local directory for mutagen's sake, only to have to remove
them afterward. Instead, I'd like to load the files into
memory and still be able to hand the built-in "file" function
a filename to access the file in memory.
Any ideas on how to do this?

Click to expand...

By "memory" I presume you mean virtual memory? RAM with
disk-blocks as backing store? On any real OS, tempfiles are
just RAM with disk-blocks as backing store.

Sound similar? The only difference is the API used to access
the bytes. You want a file-I/O API, so you can either use the
extensively tested and and highly optimized filesystem code in
the OS to make disk-backed-RAM look like a file, or you can try
to write Python code that does the same thing.

Which do you think is going to work faster/better?

[The kernel is generally better at knowing what needs to be in
RAM than you are -- let it do its job.]

IOW: just use a temp file. Life will be simple. The bytes
probably won't ever hit the platters (if they do, then that
means they would have the other way too).

Click to expand...

Thanks all.

Grant, are temp files automatically put into ram for all linux
distros? at any rate, i could set up ram disk. much better solution
than using python...except that i've never done a ram disk before.
more reading to do...

You misunderstood Grant. Let the OS decide what needs to be dumped to
the HD or not - instead of creating a RAM-disk eating up precious ram.

All modern OS, including Linux, have hard-disc-caches in RAM. Which your
downloaded file ends in, whilst being stored to the HD in the background
- without affecting your performance. If you then pass it to some other
process that reads from the file, it will be fed from the HD-cache - fast.

So - if it makes your life easier to use tempfiles because then you have
an actual filename, use them and don't mind.

Diez

Grant Edwards · Nov 20, 2007

By "memory" I presume you mean virtual memory? RAM with
disk-blocks as backing store? On any real OS, tempfiles are
just RAM with disk-blocks as backing store.

Sound similar? The only difference is the API used to access
the bytes. You want a file-I/O API, so you can either use the
extensively tested and and highly optimized filesystem code in
the OS to make disk-backed-RAM look like a file, or you can try
to write Python code that does the same thing.

Which do you think is going to work faster/better?

[The kernel is generally better at knowing what needs to be in
RAM than you are -- let it do its job.]

IOW: just use a temp file. Life will be simple. The bytes
probably won't ever hit the platters (if they do, then that
means they would have the other way too).

Click to expand...

Grant, are temp files automatically put into ram for all linux
distros?

All files are put into ram for all linux distros that use
virtual memory. (You'll know if you're not using virtual.)

at any rate, i could set up ram disk. much better solution
than using python...except that i've never done a ram disk
before. more reading to do...

You don't have set up a ram disk. You already have one. All
your disks are ram disks. It's just that some of them have
magnetic platters as backing store so they get preserved during
a reboot. On some Linux distros, the /tmp directory is a
filesystem without prmanent magnetic backing-store. On others
it does have a permanent backing store. If you do a "mount"
command, you'll probably see a "filesystem" who's type is
"tmpfs". That's a filesystem with no permanent magnetic
backing-store[1].

See http://en.wikipedia.org/wiki/TMPFS

/tmp might or might not be in a tmpfs filesystem (depends on
the distro). In any case, you probably don't need to worry
about it.

Just call tempfile.NamedTemporaryFile() and tell it you want an
unbuffered file (that way you don't have to remember to flush
the file after writing to it). It will return a file object:

f = tempfile.NamedTemporaryFile(bufsize=0)

Write the data to that file object and flush it:

f.write(mydata)

Pass the file's name to whatever broken library it is that
insists on a file name instead of a file-like object:

brokenLib.brokenModule(f.name).

When you're done, delete the file object:

del f

NB: This particular approach won't work on Windows. On Windows
you'll have to use tempfile.mktemp(), which can have race
conditions. It returns a name, so you'll have to create
the file, write to it, and then pass the name to the broken
module.

[1] Tmpfs pages use the swap partition for temporary backing
store the same as for all other memory pages. If you're
using tmpfs for big stuff, make sure your swap partition is
large enough to hold whatever you're doing in tmpfs plus
whatever normal swapping capacity you need.

------------------------------demo.py------------------------------
def brokenModule(filename):
f = file(filename)
d = f.read()
print d
f.close()

import tempfile,os

f = tempfile.NamedTemporaryFile(bufsize=0)
n = f.name
print f,":",n
os.system("ls -l %s\n" % n)

f.write("hello world")
brokenModule(n)

del f
os.system("ls -l %s\n" % n)
------------------------------demo.py------------------------------

If you run this you'll see something like this:

$ python demo.py
<open file '<fdopen>', mode 'w+b' at 0xb7c37728> : /tmp/tmpgqSj8p
-rw------- 1 grante users 0 2007-11-20 17:11 /tmp/tmpgqSj8p
hello world
ls: cannot access /tmp/tmpgqSj8p: No such file or directory

p. · Nov 20, 2007

By "memory" I presume you mean virtual memory? RAM with
disk-blocks as backing store? On any real OS, tempfiles are
just RAM with disk-blocks as backing store.
Sound similar? The only difference is the API used to access
the bytes. You want a file-I/O API, so you can either use the
extensively tested and and highly optimized filesystem code in
the OS to make disk-backed-RAM look like a file, or you can try
to write Python code that does the same thing.
Which do you think is going to work faster/better?
[The kernel is generally better at knowing what needs to be in
RAM than you are -- let it do its job.]
IOW: just use a temp file. Life will be simple. The bytes
probably won't ever hit the platters (if they do, then that
means they would have the other way too).

Click to expand...

Click to expand...

Grant, are temp files automatically put into ram for all linux
distros?

Click to expand...

All files are put into ram for all linux distros that use
virtual memory. (You'll know if you're not using virtual.)

at any rate, i could set up ram disk. much better solution
than using python...except that i've never done a ram disk
before. more reading to do...

Click to expand...

You don't have set up a ram disk. You already have one. All
your disks are ram disks. It's just that some of them have
magnetic platters as backing store so they get preserved during
a reboot. On some Linux distros, the /tmp directory is a
filesystem without prmanent magnetic backing-store. On others
it does have a permanent backing store. If you do a "mount"
command, you'll probably see a "filesystem" who's type is
"tmpfs". That's a filesystem with no permanent magnetic
backing-store[1].

Seehttp://en.wikipedia.org/wiki/TMPFS

/tmp might or might not be in a tmpfs filesystem (depends on
the distro). In any case, you probably don't need to worry
about it.

Just call tempfile.NamedTemporaryFile() and tell it you want an
unbuffered file (that way you don't have to remember to flush
the file after writing to it). It will return a file object:

f = tempfile.NamedTemporaryFile(bufsize=0)

Write the data to that file object and flush it:

f.write(mydata)

Pass the file's name to whatever broken library it is that
insists on a file name instead of a file-like object:

brokenLib.brokenModule(f.name).

When you're done, delete the file object:

del f

NB: This particular approach won't work on Windows. On Windows
you'll have to use tempfile.mktemp(), which can have race
conditions. It returns a name, so you'll have to create
the file, write to it, and then pass the name to the broken
module.

[1] Tmpfs pages use the swap partition for temporary backing
store the same as for all other memory pages. If you're
using tmpfs for big stuff, make sure your swap partition is
large enough to hold whatever you're doing in tmpfs plus
whatever normal swapping capacity you need.

------------------------------demo.py------------------------------
def brokenModule(filename):
f = file(filename)
d = f.read()
print d
f.close()

import tempfile,os

f = tempfile.NamedTemporaryFile(bufsize=0)
n = f.name
print f,":",n
os.system("ls -l %s\n" % n)

f.write("hello world")
brokenModule(n)

del f
os.system("ls -l %s\n" % n)
------------------------------demo.py------------------------------

If you run this you'll see something like this:

$ python demo.py
<open file '<fdopen>', mode 'w+b' at 0xb7c37728> : /tmp/tmpgqSj8p
-rw------- 1 grante users 0 2007-11-20 17:11 /tmp/tmpgqSj8p
hello world
ls: cannot access /tmp/tmpgqSj8p: No such file or directory

excellent. didn't know tempfile was a module. thanks so much.

mgierdal · Dec 12, 2007

Try with StringIO/cStringIO, these modules are supposed to give you
in-memoryobjects compatible with file object interface.

I found this solution not working.
I had similar problem: I wanted to write some string into the in-
memory file, then transfer it via ftp to some file and forget in-
memory content.

from ftplib import FTP
ftp = FTP('ftp.server.org')
ftp.login('ID','pswd')
import StringIO
filename = 'some_file.txt'
command = 'STOR ' + filename
outfile = StringIO.StringIO()
outfile.write(some_string + '\n')
ftp.storlines(command, outfile)
ftp.quit()
outfile.close()

The file shows up on the FTP server, but with ZERO length. I think the
problem is that ftp.storelines attempts to use outfile's read()
function, which is not present in StringIO objects (they use
getvalue() instead). Quite an annoying inconsistency.
Any thoughts, please?

Peter Otten · Dec 12, 2007

I found this solution not working.
I had similar problem: I wanted to write some string into the in-
memory file, then transfer it via ftp to some file and forget in-
memory content.

from ftplib import FTP
ftp = FTP('ftp.server.org')
ftp.login('ID','pswd')
import StringIO
filename = 'some_file.txt'
command = 'STOR ' + filename
outfile = StringIO.StringIO()
outfile.write(some_string + '\n')

Try it again with

outfile.seek(0)

before the storlines() call.

ftp.storlines(command, outfile)
ftp.quit()
outfile.close()

The file shows up on the FTP server, but with ZERO length.

I believe you would see the same problem if outfile were a real file.

I think the
problem is that ftp.storelines attempts to use outfile's read()
function, which is not present in StringIO objects (they use
getvalue() instead). Quite an annoying inconsistency.

True

A quick look into the source code can put an end to the rest of that
speculation:

def storlines(self, cmd, fp):
'''Store a file in line mode.'''
self.voidcmd('TYPE A')
conn = self.transfercmd(cmd)
while 1:
buf = fp.readline()
if not buf: break
if buf[-2:] != CRLF:
if buf[-1] in CRLF: buf = buf[:-1]
buf = buf + CRLF
conn.sendall(buf)
conn.close()
return self.voidresp()

storlines() will happily accept every object fp featuring a readline()
method.

Any thoughts, please?

Keep the library docs under your pillow and the library source on your
screen

Peter

Neil Cerutti · Dec 12, 2007

I found this solution not working.
outfile = StringIO.StringIO()
outfile.write(some_string + '\n')

You need to rewind the file with outfile.seek(0) before
proceeding, or storlines will encounter an immediate EOF when it
attempts to read data.

ftp.storlines(command, outfile)

outfile.close()

Php combine identical lines in text file	4	Oct 11, 2023
List filenames that end in .mp4 and add to a list	10	Dec 25, 2023
ogg2mp3 utility?	0	Nov 14, 2013
mimicking __proto__ in Opera	10	Oct 12, 2009
Creating a direct download div link for pdf file	3	Mar 19, 2023
Memory error	3	Mar 24, 2014
How to create PDF file in Batch	5	May 11, 2022
How can I view / open / render / display a pdf file with c code?	0	Sep 23, 2023

mimicking a file in memory

p.

Larry Bates

p.

Jarek Zgoda

Tim Chase

Grant Edwards

p.

Diez B. Roggisch

Grant Edwards

p.

mgierdal

Peter Otten

Neil Cerutti

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads