mimicking a file in memory

P

p.

I am using the mutagen module to extract id3 information from mp3
files. In order to do this, you give mutagen a filename, which it
converts into a file object using the python built-in "file" function.

Unfortunately, my mp3 files don't live locally. They are on a number
of remote servers which I access using urllib2.

Here is my dilemma:
I don't want to copy the files into a local directory for mutagen's
sake, only to have to remove them afterward. Instead, I'd like to load
the files into memory and still be able to hand the built-in "file"
function a filename to access the file in memory.

Any ideas on how to do this?
 
L

Larry Bates

p. said:
I am using the mutagen module to extract id3 information from mp3
files. In order to do this, you give mutagen a filename, which it
converts into a file object using the python built-in "file" function.

Unfortunately, my mp3 files don't live locally. They are on a number
of remote servers which I access using urllib2.

Here is my dilemma:
I don't want to copy the files into a local directory for mutagen's
sake, only to have to remove them afterward. Instead, I'd like to load
the files into memory and still be able to hand the built-in "file"
function a filename to access the file in memory.

Any ideas on how to do this?

Looks like you would need to "hack" the source and replace lines like:

def load(self, filename):
self.filename = filename
fileobj = file(filename, "rb")

with something like:

def load(self, filename):
if hasattr(filename, 'read'):
fileobj=filename
if hasattr(filename, 'name'):
self.filename = filename
else:
self.filename = 'unknown'

else:
self.filename = filename
fileobj = file(filename, "rb")

-Larry
 
P

p.

Looks like you would need to "hack" the source and replace lines like:

def load(self, filename):
self.filename = filename
fileobj = file(filename, "rb")

with something like:

def load(self, filename):
if hasattr(filename, 'read'):
fileobj=filename
if hasattr(filename, 'name'):
self.filename = filename
else:
self.filename = 'unknown'

else:
self.filename = filename
fileobj = file(filename, "rb")

-Larry

I thought about this approach originally, but here's the catch there:
the read method isn't the only method i need. mutagen calls the seek
method on the file object. urllib2 returns a "file-like object" that
does not have a seek method associated with it, which means i'd have
to extend urllib2 to add that method. Problem is, i don't know how you
could implement a seek method with urllib2.
 
J

Jarek Zgoda

p. pisze:
I am using the mutagen module to extract id3 information from mp3
files. In order to do this, you give mutagen a filename, which it
converts into a file object using the python built-in "file" function.

Unfortunately, my mp3 files don't live locally. They are on a number
of remote servers which I access using urllib2.

Here is my dilemma:
I don't want to copy the files into a local directory for mutagen's
sake, only to have to remove them afterward. Instead, I'd like to load
the files into memory and still be able to hand the built-in "file"
function a filename to access the file in memory.

Any ideas on how to do this?

Try with StringIO/cStringIO, these modules are supposed to give you
in-memory objects compatible with file object interface.
 
T

Tim Chase

I thought about this approach originally, but here's the catch
there: the read method isn't the only method i need. mutagen
calls the seek method on the file object. urllib2 returns a
"file-like object" that does not have a seek method associated
with it, which means i'd have to extend urllib2 to add that
method. Problem is, i don't know how you could implement a
seek method with urllib2.

It sounds like you're almost reaching for the StringIO library
(or the similar cStringIO library).
'google'

Hope this helps,

-tkc
 
G

Grant Edwards

By "memory" I presume you mean virtual memory? RAM with
disk-blocks as backing store? On any real OS, tempfiles are
just RAM with disk-blocks as backing store.

Sound similar? The only difference is the API used to access
the bytes. You want a file-I/O API, so you can either use the
extensively tested and and highly optimized filesystem code in
the OS to make disk-backed-RAM look like a file, or you can try
to write Python code that does the same thing.

Which do you think is going to work faster/better?

[The kernel is generally better at knowing what needs to be in
RAM than you are -- let it do its job.]

IOW: just use a temp file. Life will be simple. The bytes
probably won't ever hit the platters (if they do, then that
means they would have the other way too).
 
P

p.

By "memory" I presume you mean virtual memory? RAM with
disk-blocks as backing store? On any real OS, tempfiles are
just RAM with disk-blocks as backing store.

Sound similar? The only difference is the API used to access
the bytes. You want a file-I/O API, so you can either use the
extensively tested and and highly optimized filesystem code in
the OS to make disk-backed-RAM look like a file, or you can try
to write Python code that does the same thing.

Which do you think is going to work faster/better?

[The kernel is generally better at knowing what needs to be in
RAM than you are -- let it do its job.]

IOW: just use a temp file. Life will be simple. The bytes
probably won't ever hit the platters (if they do, then that
means they would have the other way too).

Thanks all.

Grant, are temp files automatically put into ram for all linux
distros? at any rate, i could set up ram disk. much better solution
than using python...except that i've never done a ram disk before.
more reading to do...
 
D

Diez B. Roggisch

p. said:
Here is my dilemma: I don't want to copy the files into a
local directory for mutagen's sake, only to have to remove
them afterward. Instead, I'd like to load the files into
memory and still be able to hand the built-in "file" function
a filename to access the file in memory.
Any ideas on how to do this?
By "memory" I presume you mean virtual memory? RAM with
disk-blocks as backing store? On any real OS, tempfiles are
just RAM with disk-blocks as backing store.

Sound similar? The only difference is the API used to access
the bytes. You want a file-I/O API, so you can either use the
extensively tested and and highly optimized filesystem code in
the OS to make disk-backed-RAM look like a file, or you can try
to write Python code that does the same thing.

Which do you think is going to work faster/better?

[The kernel is generally better at knowing what needs to be in
RAM than you are -- let it do its job.]

IOW: just use a temp file. Life will be simple. The bytes
probably won't ever hit the platters (if they do, then that
means they would have the other way too).

Thanks all.

Grant, are temp files automatically put into ram for all linux
distros? at any rate, i could set up ram disk. much better solution
than using python...except that i've never done a ram disk before.
more reading to do...

You misunderstood Grant. Let the OS decide what needs to be dumped to
the HD or not - instead of creating a RAM-disk eating up precious ram.

All modern OS, including Linux, have hard-disc-caches in RAM. Which your
downloaded file ends in, whilst being stored to the HD in the background
- without affecting your performance. If you then pass it to some other
process that reads from the file, it will be fed from the HD-cache - fast.

So - if it makes your life easier to use tempfiles because then you have
an actual filename, use them and don't mind.

Diez
 
G

Grant Edwards

By "memory" I presume you mean virtual memory? RAM with
disk-blocks as backing store? On any real OS, tempfiles are
just RAM with disk-blocks as backing store.

Sound similar? The only difference is the API used to access
the bytes. You want a file-I/O API, so you can either use the
extensively tested and and highly optimized filesystem code in
the OS to make disk-backed-RAM look like a file, or you can try
to write Python code that does the same thing.

Which do you think is going to work faster/better?

[The kernel is generally better at knowing what needs to be in
RAM than you are -- let it do its job.]

IOW: just use a temp file. Life will be simple. The bytes
probably won't ever hit the platters (if they do, then that
means they would have the other way too).

Grant, are temp files automatically put into ram for all linux
distros?

All files are put into ram for all linux distros that use
virtual memory. (You'll know if you're not using virtual.)
at any rate, i could set up ram disk. much better solution
than using python...except that i've never done a ram disk
before. more reading to do...

You don't have set up a ram disk. You already have one. All
your disks are ram disks. It's just that some of them have
magnetic platters as backing store so they get preserved during
a reboot. On some Linux distros, the /tmp directory is a
filesystem without prmanent magnetic backing-store. On others
it does have a permanent backing store. If you do a "mount"
command, you'll probably see a "filesystem" who's type is
"tmpfs". That's a filesystem with no permanent magnetic
backing-store[1].

See http://en.wikipedia.org/wiki/TMPFS

/tmp might or might not be in a tmpfs filesystem (depends on
the distro). In any case, you probably don't need to worry
about it.

Just call tempfile.NamedTemporaryFile() and tell it you want an
unbuffered file (that way you don't have to remember to flush
the file after writing to it). It will return a file object:

f = tempfile.NamedTemporaryFile(bufsize=0)

Write the data to that file object and flush it:

f.write(mydata)

Pass the file's name to whatever broken library it is that
insists on a file name instead of a file-like object:

brokenLib.brokenModule(f.name).

When you're done, delete the file object:

del f

NB: This particular approach won't work on Windows. On Windows
you'll have to use tempfile.mktemp(), which can have race
conditions. It returns a name, so you'll have to create
the file, write to it, and then pass the name to the broken
module.


[1] Tmpfs pages use the swap partition for temporary backing
store the same as for all other memory pages. If you're
using tmpfs for big stuff, make sure your swap partition is
large enough to hold whatever you're doing in tmpfs plus
whatever normal swapping capacity you need.


------------------------------demo.py------------------------------
def brokenModule(filename):
f = file(filename)
d = f.read()
print d
f.close()


import tempfile,os

f = tempfile.NamedTemporaryFile(bufsize=0)
n = f.name
print f,":",n
os.system("ls -l %s\n" % n)

f.write("hello world")
brokenModule(n)

del f
os.system("ls -l %s\n" % n)
------------------------------demo.py------------------------------

If you run this you'll see something like this:

$ python demo.py
<open file '<fdopen>', mode 'w+b' at 0xb7c37728> : /tmp/tmpgqSj8p
-rw------- 1 grante users 0 2007-11-20 17:11 /tmp/tmpgqSj8p
hello world
ls: cannot access /tmp/tmpgqSj8p: No such file or directory
 
P

p.

By "memory" I presume you mean virtual memory? RAM with
disk-blocks as backing store? On any real OS, tempfiles are
just RAM with disk-blocks as backing store.
Sound similar? The only difference is the API used to access
the bytes. You want a file-I/O API, so you can either use the
extensively tested and and highly optimized filesystem code in
the OS to make disk-backed-RAM look like a file, or you can try
to write Python code that does the same thing.
Which do you think is going to work faster/better?
[The kernel is generally better at knowing what needs to be in
RAM than you are -- let it do its job.]
IOW: just use a temp file. Life will be simple. The bytes
probably won't ever hit the platters (if they do, then that
means they would have the other way too).
Grant, are temp files automatically put into ram for all linux
distros?

All files are put into ram for all linux distros that use
virtual memory. (You'll know if you're not using virtual.)
at any rate, i could set up ram disk. much better solution
than using python...except that i've never done a ram disk
before. more reading to do...

You don't have set up a ram disk. You already have one. All
your disks are ram disks. It's just that some of them have
magnetic platters as backing store so they get preserved during
a reboot. On some Linux distros, the /tmp directory is a
filesystem without prmanent magnetic backing-store. On others
it does have a permanent backing store. If you do a "mount"
command, you'll probably see a "filesystem" who's type is
"tmpfs". That's a filesystem with no permanent magnetic
backing-store[1].

Seehttp://en.wikipedia.org/wiki/TMPFS

/tmp might or might not be in a tmpfs filesystem (depends on
the distro). In any case, you probably don't need to worry
about it.

Just call tempfile.NamedTemporaryFile() and tell it you want an
unbuffered file (that way you don't have to remember to flush
the file after writing to it). It will return a file object:

f = tempfile.NamedTemporaryFile(bufsize=0)

Write the data to that file object and flush it:

f.write(mydata)

Pass the file's name to whatever broken library it is that
insists on a file name instead of a file-like object:

brokenLib.brokenModule(f.name).

When you're done, delete the file object:

del f

NB: This particular approach won't work on Windows. On Windows
you'll have to use tempfile.mktemp(), which can have race
conditions. It returns a name, so you'll have to create
the file, write to it, and then pass the name to the broken
module.

[1] Tmpfs pages use the swap partition for temporary backing
store the same as for all other memory pages. If you're
using tmpfs for big stuff, make sure your swap partition is
large enough to hold whatever you're doing in tmpfs plus
whatever normal swapping capacity you need.

------------------------------demo.py------------------------------
def brokenModule(filename):
f = file(filename)
d = f.read()
print d
f.close()

import tempfile,os

f = tempfile.NamedTemporaryFile(bufsize=0)
n = f.name
print f,":",n
os.system("ls -l %s\n" % n)

f.write("hello world")
brokenModule(n)

del f
os.system("ls -l %s\n" % n)
------------------------------demo.py------------------------------

If you run this you'll see something like this:

$ python demo.py
<open file '<fdopen>', mode 'w+b' at 0xb7c37728> : /tmp/tmpgqSj8p
-rw------- 1 grante users 0 2007-11-20 17:11 /tmp/tmpgqSj8p
hello world
ls: cannot access /tmp/tmpgqSj8p: No such file or directory

excellent. didn't know tempfile was a module. thanks so much.
 
M

mgierdal

Try with StringIO/cStringIO, these modules are supposed to give you
in-memoryobjects compatible with file object interface.

I found this solution not working.
I had similar problem: I wanted to write some string into the in-
memory file, then transfer it via ftp to some file and forget in-
memory content.

from ftplib import FTP
ftp = FTP('ftp.server.org')
ftp.login('ID','pswd')
import StringIO
filename = 'some_file.txt'
command = 'STOR ' + filename
outfile = StringIO.StringIO()
outfile.write(some_string + '\n')
ftp.storlines(command, outfile)
ftp.quit()
outfile.close()

The file shows up on the FTP server, but with ZERO length. I think the
problem is that ftp.storelines attempts to use outfile's read()
function, which is not present in StringIO objects (they use
getvalue() instead). Quite an annoying inconsistency.
Any thoughts, please?
 
P

Peter Otten

I found this solution not working.
I had similar problem: I wanted to write some string into the in-
memory file, then transfer it via ftp to some file and forget in-
memory content.

from ftplib import FTP
ftp = FTP('ftp.server.org')
ftp.login('ID','pswd')
import StringIO
filename = 'some_file.txt'
command = 'STOR ' + filename
outfile = StringIO.StringIO()
outfile.write(some_string + '\n')

Try it again with

outfile.seek(0)

before the storlines() call.
ftp.storlines(command, outfile)
ftp.quit()
outfile.close()

The file shows up on the FTP server, but with ZERO length.

I believe you would see the same problem if outfile were a real file.
I think the
problem is that ftp.storelines attempts to use outfile's read()
function, which is not present in StringIO objects (they use
getvalue() instead). Quite an annoying inconsistency.
True

A quick look into the source code can put an end to the rest of that
speculation:

def storlines(self, cmd, fp):
'''Store a file in line mode.'''
self.voidcmd('TYPE A')
conn = self.transfercmd(cmd)
while 1:
buf = fp.readline()
if not buf: break
if buf[-2:] != CRLF:
if buf[-1] in CRLF: buf = buf[:-1]
buf = buf + CRLF
conn.sendall(buf)
conn.close()
return self.voidresp()

storlines() will happily accept every object fp featuring a readline()
method.
Any thoughts, please?

Keep the library docs under your pillow and the library source on your
screen :)

Peter
 
N

Neil Cerutti

I found this solution not working.
outfile = StringIO.StringIO()
outfile.write(some_string + '\n')

You need to rewind the file with outfile.seek(0) before
proceeding, or storlines will encounter an immediate EOF when it
attempts to read data.
ftp.storlines(command, outfile)
outfile.close()
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top