incorrect(?) shlex behaviour

Discussion in 'Python' started by bill, May 14, 2005.

  1. bill

    bill Guest

    Consider:
    >>> import shlex
    >>> shlex.split('$(which sh)')

    ['$(which', 'sh)']

    Is this behavior correct? It seems that I should
    either get one token, or the list
    ['$','(','which','sh',')'],
    but certainly breaking it the way it does is
    erroneous.

    Can anyone explain why the string is being split
    that way?
     
    bill, May 14, 2005
    #1
    1. Advertising

  2. bill

    M.E.Farmer Guest

    bill wrote:
    > Consider:
    > >>> import shlex
    > >>> shlex.split('$(which sh)')

    > ['$(which', 'sh)']
    >
    > Is this behavior correct? It seems that I should
    > either get one token, or the list
    > ['$','(','which','sh',')'],
    > but certainly breaking it the way it does is
    > erroneous.
    >
    > Can anyone explain why the string is being split
    > that way?

    This may help.
    http://www.python.org/dev/doc/devel/lib/module-shlex.html
    This works on Python 2.4:
    >>> import shlex
    >>> sh = shlex.shlex('$(which sh)')
    >>> sh.get_token()

    '$'
    >>> sh.get_token()

    '('
    >>> sh.get_token()

    'which'
    >>> sh.get_token()

    'sh'
    >>> sh.get_token()

    ')'
    >>> sh.get_token()

    etc...

    Python 2.2 and maybe lower:
    >>> import shlex
    >>> import StringIO
    >>> s = StringIO.StringIO('$(which sh)')
    >>> sh = shlex.shlex(s)
    >>> sh.get_token()

    '$'
    >>> sh.get_token()

    '('
    >>> sh.get_token()

    'which'
    >>> sh.get_token()

    'sh'
    >>> sh.get_token()

    ')'
    >>> sh.get_token()

    etc...

    Hth,
    M.E.Farmer
     
    M.E.Farmer, May 15, 2005
    #2
    1. Advertising

  3. bill

    bill Guest

    Its gets worse:
    >>> from shlex import StringIO
    >>> from shlex import shlex
    >>> t = shlex(StringIO("2>&1"))
    >>> while True:

    .... b = t.read_token()
    .... if not b: break
    .... print b
    ....
    2
    &
    1 <----------- where's the '>' !?
    >>> import shlex
    >>> print shlex.split("2>&1")

    ['2>&1']

    It strikes me that split should be behaving exactly the same way as
    read_token, but that may be a misunderstanding on my part of what split
    is doing.

    However, it is totally bizarre that read_token discards the '>' symbol
    in the string! I don't know much about lexical analysis, but it
    strikes me that discarding characters is a bad thing.
     
    bill, May 15, 2005
    #3
  4. bill

    M.E.Farmer Guest

    bill wrote:
    > Its gets worse:
    > >>> from shlex import StringIO
    > >>> from shlex import shlex
    > >>> t = shlex(StringIO("2>&1"))
    > >>> while True:

    > ... b = t.read_token()
    > ... if not b: break
    > ... print b
    > ...
    > 2
    > &
    > 1 <----------- where's the '>' !?
    > >>> import shlex
    > >>> print shlex.split("2>&1")

    > ['2>&1']
    >
    > It strikes me that split should be behaving exactly the same way as
    > read_token, but that may be a misunderstanding on my part of what

    split
    > is doing.
    >
    > However, it is totally bizarre that read_token discards the '>'

    symbol
    > in the string! I don't know much about lexical analysis, but it
    > strikes me that discarding characters is a bad thing.
    >From the docs:

    split(s[, comments])
    Split the string s using shell-like syntax. If comments is False
    (the default), the parsing of comments in the given string will be
    disabled (setting the commenters member of the shlex instance to the
    empty string). This function operates in POSIX mode. New in version
    2.3.

    Maybe looking at string methods split might help.
    >>> "$(which sh)".split()

    ['($(which', 'sh)']

    >From the docs:

    read_token()
    Read a raw token. Ignore the pushback stack, and do not interpret
    source requests. (This is not ordinarily a useful entry point, and is
    documented here only for the sake of completeness.)

    # Just like in my first post
    >>> from StringIO import StringIO
    >>> from shlex import shlex
    >>> t = shlex(StringIO("2>&1"))
    >>> t.get_token()

    '2'
    >>> t.get_token()

    '>'
    >>> t.get_token()

    '&'
    >>> t.get_token()

    '1'
    >>> t.get_token()

    ''
    # Your way
    >>> t = shlex(StringIO("2>&1"))
    >>> t.read_token()

    '2'
    >>> t.read_token()

    '&'
    >>> t.read_token()

    '1'
    >>> t.read_token()

    ''
    >>>


    Hth,
    M.E.Farmer
     
    M.E.Farmer, May 15, 2005
    #4
  5. bill

    Donn Cave Guest

    In article <>,
    "bill" <> wrote:

    > Consider:
    > >>> import shlex
    > >>> shlex.split('$(which sh)')

    > ['$(which', 'sh)']
    >
    > Is this behavior correct? It seems that I should
    > either get one token, or the list
    > ['$','(','which','sh',')'],
    > but certainly breaking it the way it does is
    > erroneous.
    >
    > Can anyone explain why the string is being split
    > that way?


    Python 2.3.5 (#1, Mar 20 2005, 20:38:20)
    [GCC 3.3 20030304 (Apple Computer, Inc. build 1809)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import shlex
    >>> print shlex.__doc__

    A lexical analyzer class for simple shell-like syntaxes.


    This has a little potential to mislead. Bourne shell
    syntax is naturally "shell-like", but it is not "simple" -
    as grammars go, it's a notorious mess. In theory, someone
    could certainly write Python code to accurately parse Bourne
    shell statements, but that doesn't appear to have been the
    intention here. The "Parsing Rules" section of the documentation
    describes what you can expect, and right off hand I don't see
    how the result you got was erroneous.

    Donn Cave,
     
    Donn Cave, May 16, 2005
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    4
    Views:
    1,224
  2. Bug in shlex??

    , Apr 3, 2008, in forum: Python
    Replies:
    2
    Views:
    300
  3. Replies:
    1
    Views:
    404
    Nobody
    Aug 30, 2010
  4. Karim

    shlex parsing

    Karim, Jul 27, 2011, in forum: Python
    Replies:
    10
    Views:
    928
    Karim
    Jul 29, 2011
  5. Daniel Stojanov

    Weird bahaviour from shlex - line no

    Daniel Stojanov, Sep 28, 2013, in forum: Python
    Replies:
    0
    Views:
    111
    Daniel Stojanov
    Sep 28, 2013
Loading...

Share This Page