Refactoring; arbitrary expression in lists

Discussion in 'Python' started by Frans Englich, Jan 12, 2005.

  1. As continuation to a previous thread, "PyChecker messages", I have a question
    regarding code refactoring which the following snippet leads to:

    > > runner.py:200: Function (detectMimeType) has too many returns (11)
    > >
    > > The function is simply a long "else-if" clause, branching out to
    > > different return statements. What's wrong? It's simply a "probably ugly
    > > code" advice?

    >
    > That is also advice. Generally you use a dict of functions, or some other
    > structure to lookup what you want to do.


    More specifically, my function looks like this:

    #--------------------------------------------------------------
    def detectMimeType( filename ):

    extension = filename[-3:]
    basename = os.path.basename(filename)

    if extension == "php":
    return "application/x-php"

    elif extension == "cpp" or extension.endswith("cc"):
    return "text/x-c++-src"
    # etcetera
    elif extension == "xsl":
    return "text/xsl"

    elif basename.find( "Makefile" ) != -1:
    return "text/x-makefile"
    else:
    raise NoMimeError
    #--------------------------------------------------------------
    (don't bother if the MIME detection looks like stone age, it's temporary until
    PyXDG gets support for the XDG mime type spec..)

    I'm now wondering if it's possible to write this in a more compact way, such
    that the if-clause isn't necessary? Of course, the current code works, but
    perhaps it could be prettier.

    I'm thinking along the lines of nested lists, but what is the obstacle for me
    is that both the test and return statement are simple expressions; not
    functions or a particular data type. Any ideas?


    Cheers,

    Frans
     
    Frans Englich, Jan 12, 2005
    #1
    1. Advertising

  2. Frans Englich

    Paul McGuire Guest

    "Frans Englich" <> wrote in message
    news:...
    >
    > As continuation to a previous thread, "PyChecker messages", I have a

    question
    > regarding code refactoring which the following snippet leads to:
    >
    > > > runner.py:200: Function (detectMimeType) has too many returns (11)
    > > >
    > > > The function is simply a long "else-if" clause, branching out to
    > > > different return statements. What's wrong? It's simply a "probably

    ugly
    > > > code" advice?

    > >
    > > That is also advice. Generally you use a dict of functions, or some

    other
    > > structure to lookup what you want to do.

    >
    > More specifically, my function looks like this:
    >
    > #--------------------------------------------------------------
    > def detectMimeType( filename ):
    >
    > extension = filename[-3:]
    > basename = os.path.basename(filename)
    >
    > if extension == "php":
    > return "application/x-php"
    >
    > elif extension == "cpp" or extension.endswith("cc"):
    > return "text/x-c++-src"
    > # etcetera

    <snip>

    Since the majority of your tests will be fairly direct 'extension "XYZ"
    means mimetype "aaa/bbb"', this really sounds like a dictionary type
    solution is called for. Still, you might have some desire to do some
    order-dependent testing. Here are two ideas - the first iterates over a
    list of expressions and resulting types, the other uses a dictionary lookup.

    -- Paul


    import os

    extToMimeMap = [
    ('"php"', "application/x-php"),
    ('"cpp" or extension.endswith("cc")', "text/x-c++-src"),
    ('"xsl"', "text/xsl"),
    ]

    def detectMimeType1( filename ):

    extension = filename[-3:]
    basename = os.path.basename(filename)

    for exp,mimetype in extToMimeMap:
    if eval("extension=="+exp): return mimetype

    # do other non-extension-related tests here
    if basename.find( "Makefile" ) != -1:
    return "text/x-makefile"
    else:
    raise NoMimeError


    extToMimeDict = {
    "php": "application/x-php",
    "cpp": "text/x-c++-src",
    "xsl": "text/xsl",
    }

    def detectMimeType2( filename ):

    extension = filename[-3:]
    basename = os.path.basename(filename)

    # check for straight extension matches
    try:
    return extToMimeDict[extension]
    except KeyError:
    pass

    # do more complex extension and other non-extension-related tests here
    if extension.endswith("cc"):
    return extToMimeDict["cpp"]

    if basename.find( "Makefile" ) != -1:
    return "text/x-makefile"

    raise NoMimeError

    for detectMimeType in (detectMimeType1, detectMimeType2):
    for s in ("a.php","z.acc","Makefile","blork.xsl"):
    print s,"->",detectMimeType(s)
     
    Paul McGuire, Jan 12, 2005
    #2
    1. Advertising

  3. Frans Englich

    Guest

    I can not break the original code in 2.4, if I try this:
    import os, sys
    class NoMimeError(Exception):
    pass

    def detectMimeType( filename ):

    extension = filename[-3:]
    basename = os.path.basename(filename)
    if extension == "php":
    return "application/x-php"
    elif extension == "cpp" or extension.endswith("cc"):
    return "text/x-c++-src"
    elif extension == "1":
    return 'BlahBlah'
    elif extension == "2":
    return 'BlahBlah'
    elif extension == "3":
    return 'BlahBlah'
    elif extension == "4":
    return 'BlahBlah'
    elif extension == "5":
    return 'BlahBlah'
    elif extension == "6":
    return 'BlahBlah'
    elif extension == "7":
    return 'BlahBlah'
    elif extension == "8":
    return 'BlahBlah'
    elif extension == "9":
    return 'BlahBlah'
    elif extension == "10":
    return 'BlahBlah'
    elif extension == "11":
    return 'BlahBlah'
    elif extension == "12":
    return 'BlahBlah'
    elif extension == "13":
    return 'BlahBlah'
    elif extension == "14":
    return 'BlahBlah'
    elif extension == "15":
    return 'BlahBlah'
    elif extension == "16":
    return 'BlahBlah'
    elif extension == "17":
    return 'BlahBlah'
    elif extension == "18":
    return 'BlahBlah'
    elif extension == "19":
    return 'BlahBlah'
    elif extension == "20":
    return 'BlahBlah'

    elif extension == "xsl":
    return "text/xsl"

    elif basename.find( "Makefile" ) != -1:
    return "text/x-makefile"
    else:
    raise NoMimeError
    try:
    print detectMimeType(r'c:\test.php')
    print detectMimeType('c:\test.xsl')
    print detectMimeType('c:\test.xxx')
    except Exception, e:
    print >> sys.stderr, '%s: %s' %(e.__class__.__name__, e)

    I get
    >>>

    application/x-php
    text/xsl
    NoMimeError:
    >>>


    So although the dictionary solution is much nicer nothing seems wrong
    with your code as it is - or am I missing something?
     
    , Jan 12, 2005
    #3
  4. On Wednesday 12 January 2005 18:56, wrote:
    > I can not break the original code in 2.4, if I try this:


    [...]

    >
    > So although the dictionary solution is much nicer nothing seems wrong
    > with your code as it is - or am I missing something?


    Nope, the current code works. I'm just looking at Python's cool ways of
    solving problems. (the matter about 11 returns was a coding-style report from
    PyChecker).


    Cheers,

    Frans
     
    Frans Englich, Jan 12, 2005
    #4
  5. Frans Englich

    Jeff Shannon Guest

    Paul McGuire wrote:

    > "Frans Englich" <> wrote in message
    > news:...
    >>#--------------------------------------------------------------
    >>def detectMimeType( filename ):
    >>
    >> extension = filename[-3:]


    You might consider using os.path.splitext() here, instead of always
    assuming that the last three characters are the extension. That way
    you'll be consistent even with extensions like .c, .cc, .h, .gz, etc.

    Note that os.path.splitext() does include the extension separator (the
    dot), so that you'll need to test against, e.g., ".php" and ".cpp".

    > Since the majority of your tests will be fairly direct 'extension "XYZ"
    > means mimetype "aaa/bbb"', this really sounds like a dictionary type
    > solution is called for.


    I strongly agree with this. The vast majority of your cases seem to
    be a direct mapping of extension-string to mimetype-string; using a
    dictionary (i.e. mapping ;) ) for this is ideal. For those cases
    where you can't key off of an extension string (such as makefiles),
    you can do special-case processing if the dictionary lookup fails.


    > if extension.endswith("cc"):
    > return extToMimeDict["cpp"]


    If the intent of this is to catch .cc files, it's easy to add an extra
    entry into the dict to map '.cc' to the same string as '.cpp'.

    Jeff Shannon
    Technician/Programmer
    Credit International
     
    Jeff Shannon, Jan 12, 2005
    #5
  6. On Wed, 12 Jan 2005 18:16:23 +0000, Frans Englich <> wrote:

    >
    >As continuation to a previous thread, "PyChecker messages", I have a question
    >regarding code refactoring which the following snippet leads to:
    >
    >> > runner.py:200: Function (detectMimeType) has too many returns (11)
    >> >
    >> > The function is simply a long "else-if" clause, branching out to
    >> > different return statements. What's wrong? It's simply a "probably ugly
    >> > code" advice?

    >>
    >> That is also advice. Generally you use a dict of functions, or some other
    >> structure to lookup what you want to do.

    >
    >More specifically, my function looks like this:
    >
    >#--------------------------------------------------------------
    >def detectMimeType( filename ):
    >
    > extension = filename[-3:]
    > basename = os.path.basename(filename)
    >
    > if extension == "php":
    > return "application/x-php"
    >
    > elif extension == "cpp" or extension.endswith("cc"):
    > return "text/x-c++-src"
    ># etcetera
    > elif extension == "xsl":
    > return "text/xsl"
    >
    > elif basename.find( "Makefile" ) != -1:
    > return "text/x-makefile"
    > else:
    > raise NoMimeError
    >#--------------------------------------------------------------
    >(don't bother if the MIME detection looks like stone age, it's temporary until
    >PyXDG gets support for the XDG mime type spec..)
    >
    >I'm now wondering if it's possible to write this in a more compact way, such
    >that the if-clause isn't necessary? Of course, the current code works, but
    >perhaps it could be prettier.
    >
    >I'm thinking along the lines of nested lists, but what is the obstacle for me
    >is that both the test and return statement are simple expressions; not
    >functions or a particular data type. Any ideas?
    >

    I think I would refactor along these lines: (untested)

    extensiondict = dict(
    php = 'application/x-php',
    cpp = 'text/x-c-src',
    # etcetera
    xsl = 'test/xsl'
    )

    def detectMimeType(filename):
    extension = os.path.splitext(filename)[1].replace('.', '')
    try: return extensiondict[extension]
    except KeyError:
    basename = os.path.basename(filename)
    if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
    raise NoMimeError

    Regards,
    Bengt Richter
     
    Bengt Richter, Jan 13, 2005
    #6
  7. On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <> wrote:
    > extensiondict = dict(
    > php = 'application/x-php',
    > cpp = 'text/x-c-src',
    > # etcetera
    > xsl = 'test/xsl'
    > )
    >
    > def detectMimeType(filename):
    > extension = os.path.splitext(filename)[1].replace('.', '')
    > try: return extensiondict[extension]
    > except KeyError:
    > basename = os.path.basename(filename)
    > if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
    > raise NoMimeError


    Why not use a regexp based approach.
    extensionlist = [
    (re.compile(r'.*\.php') , "application/x-crap-language"),
    (re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
    (re.compile(r'[Mm]akefile') , 'text/x-makefile'),
    ]
    for regexp, mimetype in extensionlist:
    if regexp.match(filename):
    return mimetype

    if you were really concerned about efficiency, you could use something like:
    class SimpleMatch:
    def __init__(self, pattern): self.pattern = pattern
    def match(self, subject): return subject[-len(self.pattern):] == self.pattern

    Regards,
    Stephen Thorne
     
    Stephen Thorne, Jan 13, 2005
    #7
  8. On Thu, 13 Jan 2005 12:19:06 +1000, Stephen Thorne <> wrote:

    >On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <> wrote:
    >> extensiondict = dict(
    >> php = 'application/x-php',
    >> cpp = 'text/x-c-src',
    >> # etcetera
    >> xsl = 'test/xsl'
    >> )
    >>
    >> def detectMimeType(filename):
    >> extension = os.path.splitext(filename)[1].replace('.', '')

    extension = os.path.splitext(filename)[1].replace('.', '').lower() # better

    >> try: return extensiondict[extension]
    >> except KeyError:
    >> basename = os.path.basename(filename)
    >> if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
    >> raise NoMimeError

    >
    >Why not use a regexp based approach.

    ISTM the dict setup closely reflects the OP's if/elif tests and makes for an efficient substitute
    for the functionality when later used for lookup. The regex list is O(n) and the regexes themselves
    are at least that, so I don't see a benefit. If you are going to loop through extensionlist, you
    might as well write (untested)

    flowerew = filename.lower().endswith
    for ext, mimetype:
    if flowerew(ext): return mimetype
    else:
    if 'makefile' in filename.lower(): return 'text/x-makefile'
    raise NoMimeError

    using a lower case extension list including the dot. I think it would run faster
    than a regex, and not scare anyone unnecessarily ;-)

    The dict eliminates the loop, and is easy to understand, so IMO it's a better choice.

    >extensionlist = [
    >(re.compile(r'.*\.php') , "application/x-crap-language"),
    >(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
    >(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
    >]
    >for regexp, mimetype in extensionlist:
    > if regexp.match(filename):
    > return mimetype
    >
    >if you were really concerned about efficiency, you could use something like:
    >class SimpleMatch:
    > def __init__(self, pattern): self.pattern = pattern
    > def match(self, subject): return subject[-len(self.pattern):] == self.pattern


    I'm not clear on what you are doing here, but if you think you are going to compete
    with the timbot's dict efficiency with a casual few lines, I suspect you are PUI ;-)
    (Posting Under the Influence ;-)

    Regards,
    Bengt Richter
     
    Bengt Richter, Jan 13, 2005
    #8
  9. On Thu, 13 Jan 2005 05:18:57 GMT, Bengt Richter <> wrote:
    > On Thu, 13 Jan 2005 12:19:06 +1000, Stephen Thorne <> wrote:
    >
    > >On Thu, 13 Jan 2005 01:24:29 GMT, Bengt Richter <> wrote:
    > >> extensiondict = dict(
    > >> php = 'application/x-php',
    > >> cpp = 'text/x-c-src',
    > >> # etcetera
    > >> xsl = 'test/xsl'
    > >> )
    > >>
    > >> def detectMimeType(filename):
    > >> extension = os.path.splitext(filename)[1].replace('.', '')

    > extension = os.path.splitext(filename)[1].replace('.', '').lower() # better
    >
    > >> try: return extensiondict[extension]
    > >> except KeyError:
    > >> basename = os.path.basename(filename)
    > >> if "Makefile" in basename: return 'text/x-makefile' # XXX case sensitivity?
    > >> raise NoMimeError

    > >
    > >Why not use a regexp based approach.

    > ISTM the dict setup closely reflects the OP's if/elif tests and makes for an efficient substitute
    > for the functionality when later used for lookup. The regex list is O(n) and the regexes themselves
    > are at least that, so I don't see a benefit. If you are going to loop through extensionlist, you
    > might as well write (untested)

    <code snipped>

    *shrug*, O(n*m) actually, where n is the number of mime-types and m is
    the length of the extension.

    > >extensionlist = [
    > >(re.compile(r'.*\.php') , "application/x-crap-language"),
    > >(re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
    > >(re.compile(r'[Mm]akefile') , 'text/x-makefile'),
    > >]
    > >for regexp, mimetype in extensionlist:
    > > if regexp.match(filename):
    > > return mimetype
    > >
    > >if you were really concerned about efficiency, you could use something like:
    > >class SimpleMatch:
    > > def __init__(self, pattern): self.pattern = pattern
    > > def match(self, subject): return subject[-len(self.pattern):] == self.pattern

    >
    > I'm not clear on what you are doing here, but if you think you are going to compete
    > with the timbot's dict efficiency with a casual few lines, I suspect you are PUI ;-)
    > (Posting Under the Influence ;-)


    Sorry about that, what I was trying to say was something along the lines of:

    extensionlist = [
    (re.compile(r'.*\.php') , "application/x-crap-language"),
    (re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
    (re.compile(r'[Mm]akefile') , 'text/x-makefile'),
    ]
    can be made more efficient by doing something like this:
    extensionlist = [
    SimpleMatch(".php"), "application/x-crap-language"),
    (re.compile(r'.*\.(cpp|c)') , 'text/x-c-src'),
    (re.compile(r'[Mm]akefile') , 'text/x-makefile'),
    ]
    Where SimpleMatch uses a slice and a comparison instead of a regular
    expression engine. SimpleMatch and re.compile both return an object
    that when you call .match(s) returns a value that can be interpreted
    as a boolean.

    As for the overall efficiency concerns, I feel that talking about any
    of this is premature optimisation. The optimisation that is really
    required in this situation is the same as with any
    large-switch-statement idiom, be it C or Python. First one must do a
    frequency analysis of the inputs to the switch statement in order to
    discover the optimal order of tests!

    Regards,
    Stephen Thorne
     
    Stephen Thorne, Jan 13, 2005
    #9
  10. Stephen Thorne <> wrote:
    > Why not use a regexp based approach.


    Good idea... You could also use sre.Scanner which is supposed to be
    fast like this...

    import re, sre

    scanner = sre.Scanner([
    (r"\.php$", "application/x-php"),
    (r"\.(cc|cpp)$", "text/x-c++-src"),
    (r"\.xsl$", "xsl"),
    (r"Makefile", "text/x-makefile"),
    (r".", None),
    ])

    def detectMimeType( filename ):
    t = scanner.scan(filename)[0]
    if len(t) < 1:
    return None
    # raise NoMimeError
    return t[0]


    for f in ("index.php", "index.php3", "prog.cc", "prog.cpp", "flodge.xsl", "Makefile", "myMakefile", "potato.123"):
    print f, detectMimeType(f)

    ....

    prints

    index.php application/x-php
    index.php3 None
    prog.cc text/x-c++-src
    prog.cpp text/x-c++-src
    flodge.xsl xsl
    Makefile text/x-makefile
    myMakefile text/x-makefile
    potato.123 None

    --
    Nick Craig-Wood <> -- http://www.craig-wood.com/nick
     
    Nick Craig-Wood, Jan 13, 2005
    #10
  11. Frans Englich

    Paul McGuire Guest

    Despite the regexp alternatives, I refuse to post a pyparsing solution to
    this! :)

    -- Paul
     
    Paul McGuire, Jan 13, 2005
    #11
  12. Frans Englich

    Jeff Shannon Guest

    Stephen Thorne wrote:

    > As for the overall efficiency concerns, I feel that talking about any
    > of this is premature optimisation.


    I disagree -- using REs is adding unnecessary complication and
    dependency. Premature optimization is a matter of using a
    conceptually more-complicated method when a simpler one would do; REs
    are, in fairly simple cases such as this, clearly *not* simpler than
    dict lookups.

    > The optimisation that is really
    > required in this situation is the same as with any
    > large-switch-statement idiom, be it C or Python. First one must do a
    > frequency analysis of the inputs to the switch statement in order to
    > discover the optimal order of tests!


    But if you're using a dictionary lookup, then the frequency of inputs
    is irrelevant. Regardless of the value of the input, you're doing a
    single hash-compute and (barring hash collisions) a single
    bucket-lookup. Since dicts are unordered, the ordering of the literal
    (or of a set of statements adding to the dict) doesn't matter.

    Jeff Shannon
    Technician/Programmer
    Credit International
     
    Jeff Shannon, Jan 13, 2005
    #12
  13. > # do other non-extension-related tests here
    > if basename.find( "Makefile" ) != -1:
    > return "text/x-makefile"


    I believe this can be nicelier written as:

    if "Makefile" in basename:

    --
    mvh Björn
     
    =?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=, Jan 13, 2005
    #13
  14. BJörn Lindqvist wrote:
    >> # do other non-extension-related tests here
    >> if basename.find( "Makefile" ) != -1:
    >> return "text/x-makefile"

    >
    >
    > I believe this can be nicelier written as:
    >
    > if "Makefile" in basename:
    >


    +1 for "nicelier" as VOTW (Vocabulation of the week) =)

    Steve
     
    Steven Bethard, Jan 13, 2005
    #14
  15. Frans Englich

    Peter Maas Guest

    Steven Bethard schrieb:
    > BJörn Lindqvist wrote:

    [...]
    >> I believe this can be nicelier written as:
    >>
    >> if "Makefile" in basename:
    >>

    >
    > +1 for "nicelier" as VOTW (Vocabulation of the week) =)


    Me too, because nicelier is nicer than more nicely. :)

    --
    -------------------------------------------------------------------
    Peter Maas, M+R Infosysteme, D-52070 Aachen, Tel +49-241-93878-0
    E-mail 'cGV0ZXIubWFhc0BtcGx1c3IuZGU=\n'.decode('base64')
    -------------------------------------------------------------------
     
    Peter Maas, Jan 13, 2005
    #15
  16. Frans Englich

    Steve Holden Guest

    Peter Maas wrote:

    > Steven Bethard schrieb:
    >
    >> BJörn Lindqvist wrote:

    >
    > [...]
    >
    >>> I believe this can be nicelier written as:
    >>>
    >>> if "Makefile" in basename:
    >>>

    >>
    >> +1 for "nicelier" as VOTW (Vocabulation of the week) =)

    >
    >
    > Me too, because nicelier is nicer than more nicely. :)
    >


    Is that really the niceliest way to express that thought?

    regards-li-er y'rs - steve
    --
    Steve Holden http://www.holdenweb.com/
    Python Web Programming http://pydish.holdenweb.com/
    Holden Web LLC +1 703 861 4237 +1 800 494 3119
     
    Steve Holden, Jan 14, 2005
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. JustSomeGuy

    Sorting lists of lists...

    JustSomeGuy, Jun 17, 2004, in forum: C++
    Replies:
    0
    Views:
    324
    JustSomeGuy
    Jun 17, 2004
  2. Honestmath
    Replies:
    5
    Views:
    559
    Honestmath
    Dec 13, 2004
  3. Jon Slaughter

    lists of lists

    Jon Slaughter, Dec 13, 2004, in forum: C++
    Replies:
    4
    Views:
    425
    Buster
    Dec 13, 2004
  4. Nick

    Combining arbitrary lists

    Nick, Nov 15, 2004, in forum: Python
    Replies:
    7
    Views:
    292
    N Chackowsky
    Nov 15, 2004
  5. =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==

    List of lists of lists of lists...

    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==, May 8, 2006, in forum: Python
    Replies:
    5
    Views:
    409
    =?UTF-8?B?w4FuZ2VsIEd1dGnDqXJyZXogUm9kcsOtZ3Vleg==
    May 15, 2006
Loading...

Share This Page