Using try-catch to handle multiple possible file types?

V

Victor Hooi

Hi,

I have a script that needs to handle input files of different types (uncompressed, gzipped etc.).

My question is regarding how I should handle the different cases.

My first thought was to use a try-catch block and attempt to open it using the most common filetype, then if that failed, try the next most common type etc. before finally erroring out.

So basically, using exception handling for flow-control.

However, is that considered bad practice, or un-Pythonic?

What other alternative constructs could I also use, and pros and cons?

(I was thinking I could also use python-magic which wraps libmagic, or I can just rely on file extensions).

Other thoughts?

Cheers,
Victor
 
A

Amit Saha

Hi,

I have a script that needs to handle input files of different types (uncompressed, gzipped etc.).

My question is regarding how I should handle the different cases.

My first thought was to use a try-catch block and attempt to open it using the most common filetype, then if that failed, try the next most common type etc. before finally erroring out.

So basically, using exception handling for flow-control.

However, is that considered bad practice, or un-Pythonic?

What other alternative constructs could I also use, and pros and cons?

(I was thinking I could also use python-magic which wraps libmagic, or I can just rely on file extensions).

Other thoughts?

How about starting with a dictionary like this:

file_opener = {'.gz': gz_opener,
'.txt': text_opener,
'.zip': zip_opener}
# and so on.

where the *_opener are say functions which does the job of actually
opening the files.
The above dictionary is keyed on file extensions, but perhaps you
would be better off using MIME types instead.

Assuming you go ahead with using MIME type, how about using
python-magic to detect the type and then look in your dictionary
above, if there is a corresponding file_opener object. If you get a
KeyError, you can raise an exception saying that you cannot handle
this file.


How does that sound?

Best,
Amit.
 
C

Chris Angelico

My first thought was to use a try-catch block and attempt to open it using the most common filetype, then if that failed, try the next most common type etc. before finally erroring out.

So basically, using exception handling for flow-control.

However, is that considered bad practice, or un-Pythonic?

It's fairly common to work that way. But you may want to be careful
what order you try them in; some codecs might be technically capable
of reading other formats than you wanted, so start with the most
specific.

Alternatively, looking at a file's magic number (either with
python-magic/libmagic or by manually reading in a few bytes) might be
more efficient. Either way can work, take your choice!

ChrisA
 
M

Mark Lawrence

So basically, using exception handling for flow-control.

However, is that considered bad practice, or un-Pythonic?

If it works for you use it, practicality beats purity :)
 
V

Victor Hooi

Hi,

Is either approach (try-excepts, or using libmagic) considered more idiomatic? What would you guys prefer yourselves?

Also, is it possible to use either approach with a context manager ("with"), without duplicating lots of code?

For example:

try:
with gzip.open('blah.txt', 'rb') as f:
for line in f:
print(line)
except IOError as e:
with open('blah.txt', 'rb') as f:
for line in f:
print(line)

I'm not sure of how to do this without needing to duplicating the processing lines (everything inside the with)?

And using:

try:
f = gzip.open('blah.txt', 'rb')
except IOError as e:
f = open('blah.txt', 'rb')
finally:
for line in f:
print(line)

won't work, since the exception won't get thrown until you actually try to open the file. Plus, I'm under the impression that I should be using context-managers where I can.

Also, on another note, python-magic will return a string as a result, e.g.:

gzip compressed data, was "blah.txt", from Unix, last modified: Wed Nov 20 10:48:35 2013

I suppose it's enough to just do a?

if "gzip compressed data" in results:

or is there a better way?

Cheers,
Victor
 
S

Steven D'Aprano

Hi,

Is either approach (try-excepts, or using libmagic) considered more
idiomatic? What would you guys prefer yourselves?

Specifically in the case of file types, I consider it better to use
libmagic. But as a general technique, using try...except is a reasonable
approach in many situations.

Also, is it possible to use either approach with a context manager
("with"), without duplicating lots of code?

For example:

try:
with gzip.open('blah.txt', 'rb') as f:
for line in f:
print(line)
except IOError as e:
with open('blah.txt', 'rb') as f:
for line in f:
print(line)

I'm not sure of how to do this without needing to duplicating the
processing lines (everything inside the with)?

Write a helper function:

def process(opener):
with opener('blah.txt', 'rb') as f:
for line in f:
print(line)


try:
process(gzip.open)
except IOError:
process(open)


If you have many different things to try:


for opener in [gzip.open, open, ...]:
try:
process(opener)
except IOError:
continue
else:
break



[...]
Also, on another note, python-magic will return a string as a result,
e.g.:

gzip compressed data, was "blah.txt", from Unix, last modified: Wed Nov
20 10:48:35 2013

I suppose it's enough to just do a?

if "gzip compressed data" in results:

or is there a better way?

*shrug*

Read the docs of python-magic. Do they offer a programmable API? If not,
that kinda sucks.
 
N

Neil Cerutti

Steven D'Aprano (e-mail address removed) via python.org
8:56 PM (12 hours ago) said:
Write a helper function:

def process(opener):
with opener('blah.txt', 'rb') as f:
for line in f:
print(line)

As another option, you can enter the context manager after you decide.

try:
f = gzip.open('blah.txt', 'rb')
except IOError:
f = open('blah.txt', 'rb')
with f:
# processing
for line in f:
print(line)

contextlib.ExitStack was designed to handle cases where entering
context is optional, and so also works for this use case.

with contextlib.ExitStack() as stack:
try:
f = gzip.open('blah.txt', 'rb')
except IOError:
f = open('blah.txt', 'rb')
stack.enter_context(f)
for line in f:
print(line)

--
Neil Cerutti

Hi,

Is either approach (try-excepts, or using libmagic) considered more
idiomatic? What would you guys prefer yourselves?

Specifically in the case of file types, I consider it better to use
libmagic. But as a general technique, using try...except is a reasonable
approach in many situations.

Also, is it possible to use either approach with a context manager
("with"), without duplicating lots of code?

For example:

try:
with gzip.open('blah.txt', 'rb') as f:
for line in f:
print(line)
except IOError as e:
with open('blah.txt', 'rb') as f:
for line in f:
print(line)

I'm not sure of how to do this without needing to duplicating the
processing lines (everything inside the with)?

Write a helper function:

def process(opener):
with opener('blah.txt', 'rb') as f:
for line in f:
print(line)


try:
process(gzip.open)
except IOError:
process(open)


If you have many different things to try:


for opener in [gzip.open, open, ...]:
try:
process(opener)
except IOError:
continue
else:
break



[...]
Also, on another note, python-magic will return a string as a result,
e.g.:

gzip compressed data, was "blah.txt", from Unix, last modified: Wed Nov
20 10:48:35 2013

I suppose it's enough to just do a?

if "gzip compressed data" in results:

or is there a better way?

*shrug*

Read the docs of python-magic. Do they offer a programmable API? If not,
that kinda sucks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top