C
Christoph Krammer
Hello everybody,
I have to convert a huge mbox file (~1.5G) to MySQL.
I tried with the following simple code:
for m in mailbox.mbox(fileName):
msg = m.as_string(True)
hash = md5.new(msg).hexdigest()
try:
dbcurs.execute("""INSERT INTO archive (hash, msg) VALUES (%s,
%s)""", (hash, msg))
except MySQLdb.OperationalError, err:
print "%s Error (%d): %s" % (file, err[0], err[1])
else:
print "%s: Message successfully added to database" % (hash,
spamSource)
The problem seems to be the size of file, every time I try to execute
the script, after about 20000 messages, the following error occurs:
Traceback (most recent call last):
File "email_to_mysql_mbox.py", line 21, in <module>
for m in mailbox.mbox(fileName):
File "/usr/lib/python2.5/mailbox.py", line 98, in itervalues
value = self[key]
File "/usr/lib/python2.5/mailbox.py", line 70, in __getitem__
return self.get_message(key)
File "/usr/lib/python2.5/mailbox.py", line 633, in get_message
string = self._file.read(stop - self._file.tell())
MemoryError
My system has 512M RAM and 768M swap, which seems to run out at an
early stage of this. Is there a way to clean up memory for messages
already processed?
Thanks and regards,
Christoph
I have to convert a huge mbox file (~1.5G) to MySQL.
I tried with the following simple code:
for m in mailbox.mbox(fileName):
msg = m.as_string(True)
hash = md5.new(msg).hexdigest()
try:
dbcurs.execute("""INSERT INTO archive (hash, msg) VALUES (%s,
%s)""", (hash, msg))
except MySQLdb.OperationalError, err:
print "%s Error (%d): %s" % (file, err[0], err[1])
else:
print "%s: Message successfully added to database" % (hash,
spamSource)
The problem seems to be the size of file, every time I try to execute
the script, after about 20000 messages, the following error occurs:
Traceback (most recent call last):
File "email_to_mysql_mbox.py", line 21, in <module>
for m in mailbox.mbox(fileName):
File "/usr/lib/python2.5/mailbox.py", line 98, in itervalues
value = self[key]
File "/usr/lib/python2.5/mailbox.py", line 70, in __getitem__
return self.get_message(key)
File "/usr/lib/python2.5/mailbox.py", line 633, in get_message
string = self._file.read(stop - self._file.tell())
MemoryError
My system has 512M RAM and 768M swap, which seems to run out at an
early stage of this. Is there a way to clean up memory for messages
already processed?
Thanks and regards,
Christoph