Re: Why don't people like lisp?

Discussion in 'Python' started by Andrew Dalke, Oct 22, 2003.

  1. Andrew Dalke

    Andrew Dalke Guest

    Me:
    > Python has the ability to do exactly
    > what you're saying (domain language -> AST -> Python code or AST ->
    > compiler). It's rarely needed (I've used it twice now in my six years
    > or so of Python), so why should a language cater to make that
    > easy at the expense of making frequent things harder?


    As an example, here's a quick hack of a way to parse a simple
    stack-based language and make a native Python function out
    of it. I am the first to admit that it's ugly code, but it does work.

    I am curious to see the equivalent code in Lisp. Here's the spec

    All tokens are separated by whitespace (space, tab, newline, and
    include vertical tab and form feed if you want).

    Numbers are the standard IEEE floats (excepting NaN, Inf, etc)
    and represented "as usual".

    The operators are addition ("+"), subtraction ("-"), multiplication
    ("*"), division ("/"), and exponentiation ("**"). These are binary
    operators only, so to get -b use 0 b - .

    The variable names are of the form [A-Za-Z_][A-Za-z0-9_]*
    and must be passed to the function when making the function call.
    The order of names is the same as the order the names appear
    in the RPN expression, so in "b a +" the function created takes
    "b" as the first argument and "a" as the second.

    Here's some examples of using the Python version

    >>> import rpn
    >>> f = rpn.compile("a 2 b * +")
    >>> f(3, 4)

    11.0
    >>> f(a=4, b=3)

    10.0
    >>> f(b=4, a=3)

    11.0
    >>> g = rpn.compile("1 2 3 4 5 + + + + a **")
    >>> g(2)

    225.0
    >>> h = rpn.compile("""

    .... 0 b -
    .... b 2 **
    .... 4 a c * *
    .... -
    .... 0.5 **
    .... +
    .... 2 a *
    .... /
    .... """)
    >>> h(a=1, b=6, c=3)

    -0.55051025721682212
    >>> (-6+(6*6-4*1*3)**0.5)/(2*1)

    -0.55051025721682212
    >>>


    Note that the code also handles errors reasonably well.
    Eg, the following examples show that it correctly reports
    the line numbers as given in the original string

    >>> h(a="q", b=6, c=3)

    Traceback (most recent call last):
    File "<interactive input>", line 1, in ?
    File "<RPN string RPN2>", line 3, in RPN2
    TypeError: can't multiply sequence to non-int
    >>> h(a=1, b="", c=3)

    Traceback (most recent call last):
    File "<interactive input>", line 1, in ?
    File "<RPN string RPN2>", line 1, in RPN2
    TypeError: unsupported operand type(s) for -: 'float' and 'str'
    >>> h(a=0, b=6, c=4)

    Traceback (most recent call last):
    File "<interactive input>", line 1, in ?
    File "<RPN string RPN2>", line 7, in RPN2
    ZeroDivisionError: float division
    >>>


    Here's an error detected by the translator

    >>> f = rpn.compile("a + 2 * b")

    Traceback (most recent call last):
    File "<interactive input>", line 1, in ?
    File "E:\dalke\rpn.py", line 122, in compile
    return translate(parse(tokenize(s)))
    File "E:\dalke\rpn.py", line 98, in translate
    stack.add_oper(_oper_table[token.val], token)
    File "E:\dalke\rpn.py", line 72, in add_oper
    raise RPNError(
    RPNError: Binary operator at line 0 char 3 missing terms

    And here's a problem detected during tokenization

    >>> rpn.compile("5t")

    Traceback (most recent call last):
    File "<interactive input>", line 1, in ?
    File "E:\dalke\rpn.py", line 122, in compile
    return translate(parse(tokenize(s)))
    File "E:\dalke\rpn.py", line 56, in parse
    tokens = list(tokens) # this tree is very flat :)
    File "E:\dalke\rpn.py", line 46, in tokenize
    raise RPNError(
    RPNError: Unknown token '5t' at line 0, character 1
    >>>


    I expect any similar Lisp implementation to track the
    line numbers for error reporting.

    Andrew


    import re
    import sys
    from compiler import ast, misc, pycodegen

    class RPNError(Exception):
    pass

    class Token:
    VAR = 1
    FLOAT = 2
    OPER = 3
    def __init__(self, type, val, lineno, charpos):
    self.type = type
    self.val = val
    self.lineno = lineno
    self.charpos = charpos

    _symbol_re = re.compile(r"\S+")
    _operators = "+ - * / **".split()
    _variable_re = re.compile(r"[A-Za-z_][A-Za-z0-9_]*$")
    def tokenize(s):
    for lineno, line in enumerate(s.split("\n")):
    for match in _symbol_re.finditer(line):
    word = match.group(0)
    charpos = match.start(0) + 1

    try:
    yield Token(Token.FLOAT, float(word),
    lineno, charpos)
    continue
    except ValueError:
    # okay, it isn't a float
    pass

    if word in _operators:
    yield Token(Token.OPER, word,
    lineno, charpos)
    continue

    if _variable_re.match(word):
    yield Token(Token.VAR, word,
    lineno, charpos)
    continue

    # Hmm, wonder what it is.
    raise RPNError(
    "Unknown token %r at line %d, character %d" %
    (word, lineno, charpos))

    class ParseTree:
    def __init__(self, param_names, tokens):
    self.param_names = param_names
    self.tokens = tokens

    def parse(tokens):
    tokens = list(tokens) # this tree is very flat :)
    param_names = []
    for token in tokens:
    if token.type == Token.VAR:
    if token.val not in param_names:
    param_names.append(token.val)
    return ParseTree(param_names, tokens)

    class Stack:
    def __init__(self):
    self.stack = []
    def add_term(self, term, token):
    term.lineno = token.lineno
    self.stack.append(term)
    def add_oper(self, klass, token):
    if len(self.stack) < 2:
    raise RPNError(
    "Binary operator at line %d char %d missing terms" %
    (token.lineno, token.charpos))
    term = klass(self.stack[-2:])
    term.lineno = token.lineno
    self.stack[-2:] = [term]


    _id_gen = iter(xrange(sys.maxint))
    _oper_table = {
    "+": ast.Add,
    "-": ast.Sub,
    "*": ast.Mul,
    "/": ast.Div,
    "**": ast.Power,
    }

    def translate(parse_tree):
    stack = Stack()

    for token in parse_tree.tokens:
    if token.type == Token.FLOAT:
    stack.add_term(ast.Const(token.val), token)
    elif token.type == Token.VAR:
    stack.add_term(ast.Name(token.val),
    token)
    elif token.type == Token.OPER:
    stack.add_oper(_oper_table[token.val], token)
    else:
    raise AssertionError(repr(token.type))

    stack = stack.stack
    if len(stack) != 1:
    raise RPNError("evaluation ends with stack size %d" %
    len(stack))

    # go through an ugly bit of shenanigans
    # (I don't like the compiler API ):
    fctn_name = 'RPN' + str(_id_gen.next())
    fctn = ast.Function(fctn_name,
    parse_tree.param_names, [],
    0, None,
    ast.Stmt([ast.Return(stack[0])]))
    mod = ast.Module(None, ast.Stmt([fctn]))
    misc.set_filename("<RPN string " + fctn_name + ">", mod)
    code = pycodegen.ModuleCodeGenerator(mod).getCode()
    d = {"__builtins__": {}}
    exec code in d, d
    return d[fctn_name]

    def compile(s):
    return translate(parse(tokenize(s)))

    def main():
    assert compile("a 2 3 + -")(a=6) == 1
    assert compile("a 2 3 + -")(7) == 2
    assert compile("a b *")(2, 3) == 6
    assert compile("a b -")(b=2, a=3) == 1
    assert compile("1 2 3 4 + + +")() == 10
    print "All tests passed"


    if __name__ == "__main__":
    main()
     
    Andrew Dalke, Oct 22, 2003
    #1
    1. Advertising

  2. Andrew Dalke

    Paul Rubin Guest

    "Andrew Dalke" <> writes:
    > As an example, here's a quick hack of a way to parse a simple
    > stack-based language and make a native Python function out
    > of it. I am the first to admit that it's ugly code, but it does work.
    >
    > I am curious to see the equivalent code in Lisp. Here's the spec


    You could of course do something similar in Lisp, but normally you'd
    just use S-expressions instead of concocting your own weird syntax
    every time you want to build a small language into something. Then
    you just use the regular Lisp "read" function and the S-expressions
    get parsed automatically for you. Your whole parser becomes just one
    line of code.
     
    Paul Rubin, Oct 22, 2003
    #2
    1. Advertising

  3. Andrew Dalke

    Erann Gat Guest

    In article <8Nnlb.619$>, "Andrew
    Dalke" <> wrote:

    > Me:
    > > Python has the ability to do exactly
    > > what you're saying (domain language -> AST -> Python code or AST ->
    > > compiler). It's rarely needed (I've used it twice now in my six years
    > > or so of Python), so why should a language cater to make that
    > > easy at the expense of making frequent things harder?

    >
    > As an example, here's a quick hack of a way to parse a simple
    > stack-based language and make a native Python function out
    > of it. I am the first to admit that it's ugly code, but it does work.
    >
    > I am curious to see the equivalent code in Lisp.


    Here's a quick and dirty version:


    (defvar *operations* '(+ - * / **))

    (defun ** (n e) (expt n e))

    (defun rpn-compile (expr)
    (let ( stack free-vars )
    (dolist (term expr)
    (cond ( (numberp term) (push term stack) )
    ( (member term *operations*)
    (push (list term (pop stack) (pop stack)) stack)
    (rotatef (second (first stack)) (third (first stack))) )
    (t (push term stack)
    (pushnew term free-vars))))
    (unless (= (length stack) 1) (error "Syntax error"))
    (eval `(lambda (&key ,@free-vars) ,(first stack)))))


    ? (funcall (rpn-compile '(0 b - b 2 ** 4 a c * * - 0.5 ** + 2 a * /))
    :a 1 :b 6 :c 3)
    -0.5505102572168221
    ?


    E.
     
    Erann Gat, Oct 22, 2003
    #3
  4. Andrew Dalke

    Peter Seibel Guest

    "Andrew Dalke" <> writes:

    > Me:


    > > Python has the ability to do exactly what you're saying (domain
    > > language -> AST -> Python code or AST -> compiler). It's rarely
    > > needed (I've used it twice now in my six years or so of Python),
    > > so why should a language cater to make that easy at the expense of
    > > making frequent things harder?

    >
    > As an example, here's a quick hack of a way to parse a simple
    > stack-based language and make a native Python function out of it. I
    > am the first to admit that it's ugly code, but it does work.
    >
    > I am curious to see the equivalent code in Lisp. Here's the spec
    >
    > All tokens are separated by whitespace (space, tab, newline, and
    > include vertical tab and form feed if you want).
    >
    > Numbers are the standard IEEE floats (excepting NaN, Inf, etc)
    > and represented "as usual".
    >
    > The operators are addition ("+"), subtraction ("-"), multiplication
    > ("*"), division ("/"), and exponentiation ("**"). These are binary
    > operators only, so to get -b use 0 b - .
    >
    > The variable names are of the form [A-Za-Z_][A-Za-z0-9_]*
    > and must be passed to the function when making the function call.
    > The order of names is the same as the order the names appear
    > in the RPN expression, so in "b a +" the function created takes
    > "b" as the first argument and "a" as the second.


    Okay, here's a Common Lisp version. Because Lisp functions take either
    keyword or positional arguments, I choose to make the generated
    functions use keyword arguments. But the variables are gathered in the
    right order, so take out the '&key' below to use positional arguments:

    (defun compile-rpn (input)
    (compile nil (parse input)))

    (defun parse (input)
    (let (stack variables (idx 0))
    (loop
    (multiple-value-bind (tok next-idx) (read-from-string input nil nil :start idx)
    (labels
    ((line-number () (1+ (count #\Newline input :end idx)))
    (pop-arg ()
    (or (pop stack)
    (error "Not enough arguments for ~a at position ~d (line ~d)." tok idx (line-number)))))
    (cond
    ((not tok) (return `(lambda (&key ,@(nreverse variables)) (prog1 ,@stack))))
    ((member tok '(+ * - /)) (push (cons tok (reverse (list (pop-arg) (pop-arg)))) stack))
    ((eq tok '**) (push (cons 'expt (reverse (list (pop-arg) (pop-arg)))) stack))
    ((numberp tok) (push tok stack))
    ((variablep tok) (push tok stack) (pushnew tok variables))
    (t (error "Invalid token ~a at position ~d (line ~d)." tok idx (line-number))))
    (setq idx next-idx))))))

    (defun variablep (sym)
    (and (symbolp sym) (every #'alpha-char-p (string sym))))


    Here's how it works (FUNCALL is the Common Lisp operator to calls a
    function object such as the one returned by compile-rpn):

    * (funcall (compile-rpn "1 2 3 4 5 + + + + a **") :a 2)
    225

    Compile-time error messages:

    * (compile-rpn "a + 2 * b")
    Error: Not enough arguments for + at position 2 (line 1).

    * (compile-rpn "5t")
    Error: Invalid token 5T at position 0 (line 1).

    Runtime error handling:

    * (funcall (compile-rpn "a b +") :a "foo" :b 2)
    Error: `"foo"' is not of the expected type `NUMBER'

    And for grins here's the x86 machine code generated for the first function:

    * (disassemble (compile-rpn "1 2 3 4 5 + + + + a **"))
    ;; disassembly of #<Function :)ANONYMOUS-LAMBDA 51) @ #x7608e53a>
    ;; formals: &KEY A
    ;; constant vector:
    0: NIL
    1: 1
    2: :A
    3: EXPT

    ;; code start: #x7608e4d4:
    0: 55 pushl ebp
    1: 8b ec movl ebp,esp
    3: 56 pushl esi
    4: 83 ec 2c subl esp,$44
    7: 89 45 08 movl [ebp+8],eax
    10: 89 55 0c movl [ebp+12],edx
    13: 8d 09 leal ecx,[ecx+0]
    15: 8d 45 08 leal eax,[ebp+8]
    18: 8d 55 e0 leal edx,[ebp-32]
    21: 33 db xorl ebx,ebx
    23: b3 48 movb bl,$72
    25: ff 57 4f call *[edi+79] ; SYS::KWDSCAN
    28: 8d 5d e0 leal ebx,[ebp-32]
    31: 8b 13 movl edx,[ebx+0]
    33: 3b fa cmpl edi,edx
    35: 75 1a jnz 63
    37: 8b d7 movl edx,edi
    39: 80 7f 97 00 cmpb [edi-105],$0 ; SYS::C_INTERRUPT
    43: 74 02 jz 47
    45: cd 64 int $100 ; EXCL::TRAP-SIGNAL-HIT
    47: 8b 5e 1e movl ebx,[esi+30] ; EXPT
    50: 33 c0 xorl eax,eax
    52: b0 3c movb al,$60
    54: ff 57 27 call *[edi+39] ; SYS::TRAMP-TWO
    57: f8 clc
    58: c9 leave
    59: 8b 75 fc movl esi,[ebp-4]
    62: c3 ret
    63: 8b 53 04 movl edx,[ebx+4]
    66: eb e3 jmp 39


    -Peter


    --
    Peter Seibel

    Lisp is the red pill. -- John Fraser, comp.lang.lisp
     
    Peter Seibel, Oct 22, 2003
    #4
  5. Andrew Dalke

    Andrew Dalke Guest

    Paul Rubin:
    > You could of course do something similar in Lisp, but normally you'd
    > just use S-expressions instead of concocting your own weird syntax
    > every time you want to build a small language into something. Then
    > you just use the regular Lisp "read" function and the S-expressions
    > get parsed automatically for you. Your whole parser becomes just one
    > line of code.


    Ahh, but that's writing the domain language to fit the implementation
    language. The moral equivalent for Python would be to use Python's
    syntax to define a function, which is simple. Eg, Suppose the domain
    language required that identifiers be labeled with parens, as in a
    chemistry specific language where you might want
    1-1'-(ethylenediimino)di-2-proponal
    as a native identifier.


    Andrew
     
    Andrew Dalke, Oct 22, 2003
    #5
  6. Andrew Dalke

    Andrew Dalke Guest

    Erann Ga
    > Here's a quick and dirty version:
    >
    > (defvar *operations* '(+ - * / **))
    >
    > (defun ** (n e) (expt n e))
    >
    > (defun rpn-compile (expr)
    > (let ( stack free-vars )
    > (dolist (term expr)
    > (cond ( (numberp term) (push term stack) )
    > ( (member term *operations*)
    > (push (list term (pop stack) (pop stack)) stack)
    > (rotatef (second (first stack)) (third (first stack))) )
    > (t (push term stack)
    > (pushnew term free-vars))))
    > (unless (= (length stack) 1) (error "Syntax error"))
    > (eval `(lambda (&key ,@free-vars) ,(first stack)))))
    >
    >
    > ? (funcall (rpn-compile '(0 b - b 2 ** 4 a c * * - 0.5 ** + 2 a * /))
    > :a 1 :b 6 :c 3)
    > -0.5505102572168221
    > ?


    You know, I messed up a bit in the explanation in several
    ways. First, I wanted to show that Python code could directly
    manipulate the AST and generate usable code. I then chose
    a problem which could be done that way and implemented it
    in the canonical tokenizer/parser/translater/compiler framework
    Kaz Kylheku (to whom I was responding) wanted, when it's
    inappropriate for this case, hence complicating my original code.

    If I were actually asked to solve that problem I would have
    done it like you did

    import re
    _name_re = re.compile("[A-Za-z_][A-Za-z_0-9]*$")
    _operators = "* + - / **".split()
    def compile(s):
    stack = []
    names = []
    for token in s.split():
    try:
    stack.append(float(token))
    continue
    except ValueError:
    pass
    if _name_re.match(token):
    names.append(token)
    stack.append(token)
    elif token in _operators:
    s = "(%s)%s(%s)" % (stack[-2], token, stack[-1])
    stack[-2:] =
    else:
    raise AssertionError(token)
    assert len(stack) == 1
    s = "def RPN(" + ",".join(names) + "):\n\treturn "
    s += str(stack[0])
    exec s in {"__builtins__": {}}
    return d["RPN"]

    That is, just build up then exec a string to get the function.

    Second, I didn't mention until the end that I expected the
    Lisp code to report the location (line numbers and even
    character positions) of where an error occured in the RPN.
    The code you have, and this quick approach, don't attempt
    to handle error numbers, making it much more succinct.

    Third, I didn't stress the requirement to identify illegal
    inputs, like 'a(', which are accepted by your code up
    until the eval step. (I could have made it worse for
    all of us if I allowed the characters '()#; in an identifier ;)

    OTOH, the quick&dirty solution I just now did
    doesn't allow Python keywords as variable names,
    while my AST approach does. That could be solved
    with a bit of variable munging, but complicates things.

    Eg, I changed my re pattern (in my AST version) to
    accept anything other than a number or operator as a
    keyword. The following then works

    f = compile("(#) ');; try + -")
    # can't call the keyword arguments directly since they
    # aren't legal Python names.
    print f(**{"(#)": 5, "');;": 7, "try": 3})

    and gives -5. Our eval/exec approaches just could not
    handle that case.

    What does your code do for my example above? ;)

    Andrew
     
    Andrew Dalke, Oct 22, 2003
    #6
  7. Andrew Dalke

    Andrew Dalke Guest

    Me:
    > if _name_re.match(token):
    > names.append(token)


    Should be
    if _name_re.match(token):
    if token not in names:
    names.append(token)

    Andrew
     
    Andrew Dalke, Oct 22, 2003
    #7
  8. Andrew Dalke

    Andrew Dalke Guest

    Peter Seibel:
    > Okay, here's a Common Lisp version. Because Lisp functions take either
    > keyword or positional arguments, I choose to make the generated
    > functions use keyword arguments. But the variables are gathered in the
    > right order, so take out the '&key' below to use positional arguments:


    Cool code. I like that it identifies some of the error locations.

    > * (funcall (compile-rpn "a b +") :a "foo" :b 2)
    > Error: `"foo"' is not of the expected type `NUMBER'


    My Python code reports a line number for this. I figured it
    would be harder for simple Lisp code to do the same. But
    that's because I'm cheating; the Python stack trace normally
    reports the line number so I get most of it for free. I suspect
    it would be more complicated if the spec said to report the
    line number *and* character position of the operator which
    failed.

    (Not all that hard though; I can use a new object
    class Add:
    def __init__(self, lineno, charpos):
    self.lineno = lineno
    self.charpos = charpos
    def __call__(self, left, right):
    try:
    return left + right
    except:
    ... report the error location ...

    and make the code look like

    Mult(4, 2)(3.0, Add(1, 30)(a, 1) + 9.8)

    but if I'm making a native Python function which doesn't
    manipulate its own stack then it's probably for the speed,
    and the above will be slow.)


    In my reply to Erann Gat I noted that his eval approach
    might exclude the possibility of using names like 'a)' as identifiers.
    (I don't know if it does, because I don't know enough Lisp.)

    My AST solution could easily be changed to support that
    case, as well as variable like ";;b" and "#c" and ":d:".

    I'll ask the same of you. Does your code require that the
    RPN names also be valid Lisp tokens? If so, what happens
    if it isn't?

    (Looking at it, the key line is
    > ((not tok) (return `(lambda (&key ,@(nreverse

    variables)) (prog1 ,@stack))))

    but I don't know what @() does nor what happens if a "strange"
    value is used as the list of keywords)

    > And for grins here's the x86 machine code generated for the first

    function:

    I was never any good at Intel assembly, and I thankfully haven't
    touched it in about 15 years. I was trying to figure out if it
    optimized the additions into 15**a, but I could see neither the
    sequence 1, 2, 3, 4, 5 nor the value 15 (octal 17, hex F) anywhere
    in the code. Pointers?

    Andrew
     
    Andrew Dalke, Oct 22, 2003
    #8
  9. Andrew Dalke

    Peter Seibel Guest

    "Andrew Dalke" <> writes:

    > Peter Seibel:
    > > Okay, here's a Common Lisp version. Because Lisp functions take either
    > > keyword or positional arguments, I choose to make the generated
    > > functions use keyword arguments. But the variables are gathered in the
    > > right order, so take out the '&key' below to use positional arguments:

    >
    > Cool code. I like that it identifies some of the error locations.
    >
    > > * (funcall (compile-rpn "a b +") :a "foo" :b 2)
    > > Error: `"foo"' is not of the expected type `NUMBER'

    >
    > My Python code reports a line number for this.


    Right. Well that's one disadvantage, such as it is, of having the
    compiler built in--I'm at the mercy of its error reporting and Lisp
    isn't big on line-numbers, perhaps because many definitions don't come
    from a file. What I didn't show in the output was that I was actually
    dropped into the debugger at that point so had all the capabilities
    that it provided to figure out what was going wrong. Anyway, that's
    the price I pay for getting machine code I guess. I could, if it was
    really important wrap all the function calls in condition handling
    code that reported the line number but it didn't seem worth it.

    > In my reply to Erann Gat I noted that his eval approach might
    > exclude the possibility of using names like 'a)' as identifiers. (I
    > don't know if it does, because I don't know enough Lisp.)
    >
    > My AST solution could easily be changed to support that case, as
    > well as variable like ";;b" and "#c" and ":d:".
    >
    > I'll ask the same of you. Does your code require that the RPN names
    > also be valid Lisp tokens? If so, what happens if it isn't?


    Sure. I just used the built in Lisp reader because the syntax you
    specified was compatibile. But if I wanted to allow names including
    special characters I could either tweak the readtable so the Lisp
    reader didn't treat them as special or I could do something like this.
    Notice that the actual change to the parse function is to one function
    call, replacing a call to the built-in READ-FROM-STRING with my own
    next-token function. The rest of the patch is my own simple tokenizer.
    (And I had to redefine variablep to allow weird variable names.) The
    tokenizer could probably be written better but it's late here and I
    should be in bed.

    ==== //depot/lisp-book/code/rpn.cl#1 - /home/peter/localperforce/lisp-book/code/rpn.cl ====
    @@ -10,7 +10,7 @@
    (defun parse (input)
    (let (stack variables (idx 0))
    (loop
    - (multiple-value-bind (tok next-idx) (read-from-string input nil nil :start idx)
    + (multiple-value-bind (tok next-idx) (next-token input :start idx)
    (labels
    ((line-number () (1+ (count #\Newline input :end idx)))
    (pop-arg ()
    @@ -25,8 +25,35 @@
    (t (error "Invalid token ~a at position ~d (line ~d)." tok idx (line-number))))
    (setq idx next-idx))))))

    -(defun variablep (sym)
    - (and (symbolp sym) (every #'alpha-char-p (string sym))))
    +
    +
    +(defun next-token (input &key (start 0))
    + (multiple-value-bind (str end) (next-string input :start start)
    + (when str
    + (values
    + (if (digit-char-p (char str 0))
    + (read-from-string str nil nil)
    + (intern (string-upcase str)))
    + end))))
    +
    +(defun next-string (input &key (start 0))
    + (let ((real-start (skip-white-space input :start start)))
    + (loop for idx from real-start below (length input)
    + while (not (white-space-p (char input idx)))
    + finally (return
    + (values
    + (if (= real-start idx) nil (subseq input real-start idx))
    + (skip-white-space input :start idx))))))
    +
    +(defun skip-white-space (input &key (start 0))
    + (loop for idx from start below (length input)
    + while (white-space-p (char input idx))
    + finally (return idx)))
    +
    +(defun white-space-p (char)
    + (member char '(#\Space #\Newline #\Tab)))
    +
    +(defun variablep (sym) (symbolp sym))


    > (Looking at it, the key line is
    > > ((not tok) (return `(lambda (&key ,@(nreverse

    > variables)) (prog1 ,@stack))))
    >
    > but I don't know what @() does nor what happens if a "strange"
    > value is used as the list of keywords)


    Actually the key line is the one I patched above. This is just
    generating the form which Lisp knows how to compile. But since Lisp
    symbols can actually contain any character, as long as they're
    properly escaped I have no real problems.
    >
    > > And for grins here's the x86 machine code generated for the first

    > function:
    >
    > I was never any good at Intel assembly, and I thankfully haven't
    > touched it in about 15 years. I was trying to figure out if it
    > optimized the additions into 15**a, but I could see neither the
    > sequence 1, 2, 3, 4, 5 nor the value 15 (octal 17, hex F) anywhere
    > in the code. Pointers?


    Well, I know less about it than you. I just threw that in to point out
    that my 24-line compiler generates machine code.

    -Peter

    --
    Peter Seibel

    Lisp is the red pill. -- John Fraser, comp.lang.lisp
     
    Peter Seibel, Oct 22, 2003
    #9
  10. Andrew Dalke

    Andrew Dalke Guest

    Peter Seibel:
    > Right. Well that's one disadvantage, such as it is, of having the
    > compiler built in--I'm at the mercy of its error reporting and Lisp
    > isn't big on line-numbers, perhaps because many definitions don't come
    > from a file.


    True enough, but I sketched out a few ways that could work in
    Python and they would work equally well in Lisp.

    > What I didn't show in the output was that I was actually
    > dropped into the debugger at that point so had all the capabilities
    > that it provided to figure out what was going wrong. Anyway, that's
    > the price I pay for getting machine code I guess.


    I haven't used the Python debuggers (I'm still stuck in the
    archaic era of print statements ;) so I can't compare. In any
    case, the error reporting is going to be at the level of the
    translated code, which may or may not be helpful.

    There is a compiler for Python code called Psyco, but it only
    works for Intel hardware and my primary platform is a Mac.
    I have an old copy on my Intel box, but it needs a recompile
    to support dumping the assembly code, and that's too much
    effort for this task.

    > I could, if it was
    > really important wrap all the function calls in condition handling
    > code that reported the line number but it didn't seem worth it.


    Yup, it ain't.

    > replacing a call to the built-in READ-FROM-STRING with my own
    > next-token function. The rest of the patch is my own simple tokenizer.


    Interesting. The code reminds me of working with strings
    in Pascal, because it tracks the string and the offset. Equivalent
    C code would use the pointers directly. But Python code
    works as a larger chunk level, with split() and regexps, in
    part because working at the lowest level requires a lot of
    fiddling around with things that are slow in native Python. (Eg,
    I would do i=s.find("a") instead of
    for i in range(len(s)): if s == 'a': break )

    Well, *I* thought it was interesting.

    > (And I had to redefine variablep to allow weird variable names.) The
    > tokenizer could probably be written better but it's late here and I
    > should be in bed.


    Understood. Also, my schedule is so backwards now that I'm
    going to bed when the sun rises. :(

    > Actually the key line is the one I patched above. This is just
    > generating the form which Lisp knows how to compile. But since Lisp
    > symbols can actually contain any character, as long as they're
    > properly escaped I have no real problems.


    Got it. I was thinking {} is equivalent to a paste when it's
    more like an insert.

    Andrew
     
    Andrew Dalke, Oct 22, 2003
    #10
  11. Andrew Dalke

    Andrew Dalke Guest

    Me:
    > I then chose
    > a problem which could be done that way and implemented it
    > in the canonical tokenizer/parser/translater/compiler framework
    > Kaz Kylheku (to whom I was responding) wanted, when it's
    > inappropriate for this case, hence complicating my original code.


    Here's what the code would look like without that strict
    partitioning. It can be made a bit smaller (I don't really need the
    'Stack' class; it just helps simplify dealing with line numbers,
    and the is_float make the if/elif/else structure nice, compared
    to using a try/except.)

    The point though is that Python allows the same sorts of
    domain language -> AST -> "native" AST -> native code
    steps that Kaz Kylheku wanted.

    Andrew


    import re, sys
    from compiler import ast, misc, pycodegen

    class RPNError(Exception):
    pass

    _symbol_re = re.compile(r"\S+")
    #_variable_re = re.compile(r"[A-Za-z_][A-Za-z0-9_]*$")
    # Let's take anything!
    _variable_re = re.compile(r".*$")

    class Stack:
    def __init__(self):
    self.stack = []

    def add_term(self, term, lineno):
    term.lineno = lineno
    self.stack.append(term)

    def add_oper(self, klass, lineno, charpos):
    if len(self.stack) < 2:
    raise RPNError(
    "Binary operator at line %d char %d missing terms" %
    (lineno, charpos))
    term = klass(self.stack[-2:])
    term.lineno = lineno
    self.stack[-2:] = [term]

    def is_float(s):
    try:
    float(s)
    return 1
    except ValueError:
    return 0

    _id_gen = iter(xrange(sys.maxint))
    _oper_table = {
    "+": ast.Add,
    "-": ast.Sub,
    "*": ast.Mul,
    "/": ast.Div,
    "**": ast.Power,
    }

    def compile(s):
    param_names = []
    stack = Stack()

    for lineno, line in enumerate(s.split("\n")):
    for match in _symbol_re.finditer(line):
    word = match.group(0)
    charpos = match.start(0) + 1

    if is_float(word):
    stack.add_term(ast.Const(float(word)), lineno)

    elif word in _oper_table:
    stack.add_oper(_oper_table[word],
    lineno, charpos)

    elif _variable_re.match(word):
    stack.add_term(ast.Name(word), lineno)
    if word not in param_names:
    param_names.append(word)

    else:
    # Hmm, wonder what it is.
    raise RPNError(
    "Unknown token %r at line %d, character %d" %
    (word, lineno, charpos))

    stack = stack.stack
    if len(stack) != 1:
    raise RPNError("stack ends with size %d" %
    len(stack))

    # go through an ugly bit of shenanigans
    # (I don't like the compiler API ):
    fctn_name = 'RPN' + str(_id_gen.next())
    fctn = ast.Function(fctn_name, param_names, [],
    0, None,
    ast.Stmt([ast.Return(stack[0])]))
    mod = ast.Module(None, ast.Stmt([fctn]))
    misc.set_filename("<RPN string " + fctn_name + ">", mod)
    code = pycodegen.ModuleCodeGenerator(mod).getCode()
    d = {"__builtins__": {}}
    exec code in d
    return d[fctn_name]

    def main():
    assert compile("2 3 **")() == 8
    assert compile("a 2 3 + -")(a=6) == 1
    assert compile("a 2 3 + -")(7) == 2
    assert compile("a b *")(2, 3) == 6
    assert compile("a b -")(b=2, a=3) == 1
    assert compile("1 2 3 4 + + +")() == 10
    f = compile("(#) ') try + -")
    print f(**{"(#)": 5, "')": 7, "try": 3})
    print "All tests passed"


    if __name__ == "__main__":
    main()
     
    Andrew Dalke, Oct 22, 2003
    #11
  12. In article <zFqlb.746$>,
    Andrew Dalke <> wrote:
    > > And for grins here's the x86 machine code generated for the first
    > > function:

    >
    > I was never any good at Intel assembly, and I thankfully haven't
    > touched it in about 15 years. I was trying to figure out if it
    > optimized the additions into 15**a, but I could see neither the
    > sequence 1, 2, 3, 4, 5 nor the value 15 (octal 17, hex F) anywhere
    > in the code. Pointers?


    It did indeed optimize the constants. What you're missing is that in
    most implementations Lisp fixnums are tagged so that they can fit in a
    descriptor slot by themselves and not be boxed in more structure. In
    almost all (x86) implementations I've seen a fixnum is 30 high bits of
    signed number and 2 low bits of zero.

    If a value has a non-zero low tag than it is something else like a
    pointer, and not a fixnum.

    These fixnums have the property that you can add two of them and get a
    third with no shifting at all, and to multiply, you only need to shift
    one of them.

    So the number you're looking for is 60, which is 15 << 2.

    47: 8b 5e 1e movl ebx,[esi+30] ; EXPT
    50: 33 c0 xorl eax,eax
    52: b0 3c movb al,$60
    54: ff 57 27 call *[edi+39] ; SYS::TRAMP-TWO

    -bcd
    --
    *** Brian Downing <bdowning at lavos dot net>
     
    Brian Downing, Oct 22, 2003
    #12
  13. Andrew Dalke

    Edi Weitz Guest

    On Wed, 22 Oct 2003 13:31:25 GMT, Brian Downing <> wrote:

    > What you're missing is that in most implementations Lisp fixnums are
    > tagged so that they can fit in a descriptor slot by themselves and
    > not be boxed in more structure. In almost all (x86) implementations
    > I've seen a fixnum is 30 high bits of signed number and 2 low bits
    > of zero.


    Not in LispWorks (LW pro 4.2.7 on Linux x86), unfortunately:

    CL-USER 10 > (log most-positive-fixnum 2)
    22.99999982801734

    Does anybody have an idea why this is the case? Historical reasons?

    Thanks,
    Edi.
     
    Edi Weitz, Oct 22, 2003
    #13
  14. Andrew Dalke

    Guest

    "Andrew Dalke" <> writes:


    > I was never any good at Intel assembly, and I thankfully haven't
    > touched it in about 15 years. I was trying to figure out if it
    > optimized the additions into 15**a, but I could see neither the
    > sequence 1, 2, 3, 4, 5 nor the value 15 (octal 17, hex F) anywhere
    > in the code. Pointers?


    When you read the raw machine code you have to remember that
    entities are tagged. I believe line 52

    52: b0 3c movb al,$60

    is the relevant one. 60 = 15 * 4, so it seems that this
    implementation represents fixnums by shifting left by two.
     
    , Oct 22, 2003
    #14
  15. Andrew Dalke writes:

    > Paul Rubin:
    >> You could of course do something similar in Lisp, but normally you'd
    >> just use S-expressions instead of concocting your own weird syntax
    >> every time you want to build a small language into something. Then
    >> you just use the regular Lisp "read" function and the S-expressions
    >> get parsed automatically for you. Your whole parser becomes just one
    >> line of code.

    >
    > Ahh, but that's writing the domain language to fit the implementation
    > language. The moral equivalent for Python would be to use Python's
    > syntax to define a function, which is simple. Eg, Suppose the domain


    Bingo, that's exactly the point :)

    The idiomatic way of developing little/domain-specific languages in
    Lisp is to leverage its sexp-based syntax whenever possible and
    appropriate. And yes, the scary macros are one of the main tools used
    for this. Python's mileage might (and the current threads suggest that
    it does :) vary.

    Is Lisp's approach complicated or confusing for certain domain experts
    or non programmers? Quite possibly. But they may not be the intended
    audience of Lisp.

    Learning Lisp, and the idiomatic ways of using it, does take time and
    effort, but Lispers believe that this pays off. Those who prefer other
    approaches have lots of alternatives, including Python.

    As to why one may want to design little/domain-specific languages with
    sexp-based syntax, some insight is provided in this paper by Olin
    Shivers:

    A Universal Scripting Framework - or - Lambda: the ultimate "little language"
    http://www.ai.mit.edu/~shivers/ll.ps

    Abstract: The "little languages" approach to systems programming is
    flawed: inefficient, fragile, error-prone, inexpressive, and
    difficult to compose. A better solution is to embed task-specific
    sublanguages within a powerful, syntactically extensible, universal
    language, such as Scheme. I demonstrate two such embeddings that
    have been implemented in scsh, a Scheme programming environment for
    Unix systems programming. The first embedded language is a highlevel
    process-control notation; the second provides for Awk-like
    processing. Embedding systems in this way is a powerful technique:
    for example, although the embedded Awk system was implemented with
    7% of the code required for the standard C-based Awk, it is
    significantly more expressive than its C counterpart.

    Although Shivers presents examples in Scheme, the ideas apply to other
    Lisp dialects, and possibly also to other languages. The `scsh' system
    mentioned in the paper is available at:

    http://scsh.sourceforge.net


    Paolo
    --
    Paolo Amoroso <>
     
    Paolo Amoroso, Oct 22, 2003
    #15
  16. Andrew Dalke wrote:
    > Paul Rubin:
    >
    >>You could of course do something similar in Lisp, but normally you'd
    >>just use S-expressions instead of concocting your own weird syntax
    >>every time you want to build a small language into something. Then
    >>you just use the regular Lisp "read" function and the S-expressions
    >>get parsed automatically for you. Your whole parser becomes just one
    >>line of code.

    >
    >
    > Ahh, but that's writing the domain language to fit the implementation
    > language.


    Yes. The point is that you can do that in CL easily. Not so in Python.



    > The moral equivalent for Python would be to use Python's
    > syntax to define a function, which is simple.


    CL binds morality with parcticality. Morality in CL is always practical :)


    > Eg, Suppose the domain
    > language required that identifiers be labeled with parens, as in a
    > chemistry specific language where you might want
    > 1-1'-(ethylenediimino)di-2-proponal
    > as a native identifier.


    cl-prompt> (symbolp '|1-1'-(ethylenediimino)di-2-proponal|)
    T

    In Python there is no equivalent AFAIK, in Java you can to

    "1-1'-(ethylenediimino)di-2-proponal".intern()

    to achieve a similar (although not so confortable) effect.
    This is no surprise, given that the designers of Java knew Lisp.

    Note also that in CL it is extremely easy to make the system accept
    chemical "concentration" variables like the following example

    (equation [CycB] (- k1 (* (+ k2p (*k2pp [Cdh1])) [CycB])))

    or (with the INFIX package)

    (equation [CyCB] #I(k1 - (k2p + k2pp * [Cdh1]) * [CycB]))

    This is just a few reader macros away (and the use of the INFIX package).

    In Python, you always have to com up with you full fledged parser.



    Cheers
    --
    Marco
     
    Marco Antoniotti, Oct 22, 2003
    #16
  17. Andrew Dalke

    Jim Newton Guest

    Perhaps lisp is not so well-liked is because there are not enough
    standard libraries available. Perl has the CPAN libraries that
    you can simply download and use. I've heard the Python also
    has a huge number of such libraries. But if i want a lisp-tk
    CLOS library, i'd have to write it myself, or lisp-DOM, or
    lisp-cvs, or lisp-mySQL.

    On this note, has anyone ever heard of openAccess or openEDA?
    http://www.openeda.org/

    it's an open source database for dealing with EDA design
    data. Currently there is a C++ interface and Python.
    At Tcl interface is either here on on its way.

    In my opinion a CLOS library would be much easier to use
    than any of the other, if it existed. But what i've decided
    to do in the mean time is to start from scratch learing
    Python.

    The C++ standard API is open, and well documented. Does
    anyone volunteer to start such a openAcces --> CLOS
    project?


    -jim
     
    Jim Newton, Oct 22, 2003
    #17
  18. Andrew Dalke wrote:

    > Paul Rubin:
    >
    >>You could of course do something similar in Lisp, but normally you'd
    >>just use S-expressions instead of concocting your own weird syntax
    >>every time you want to build a small language into something. Then
    >>you just use the regular Lisp "read" function and the S-expressions
    >>get parsed automatically for you. Your whole parser becomes just one
    >>line of code.

    >
    >
    > Ahh, but that's writing the domain language to fit the implementation
    > language.


    Yes, but we Lispers don't care about that. As I said before, syntax is
    boring. ;)

    To be more specific:

    Getting domain-specific in Lisp doesn't (necessarily) mean to adopt the
    notation of that domain. Lisp is rather pigheaded wrt notation - it
    strongly prefers s-expressions, i.e. prefix notation.

    Getting domain-specific rather means to model the abstractions of that
    domain.

    This means that when you program in most languages, the program source
    code will consist of objects, methods, functions, closures, and so on.
    Whereas when you get domain-specific, you see the abstractions of that
    domain and nothing else (or at least not much else).

    For example, the + operator is an abstraction of the domain of
    mathematics. Getting domain-specific means to make the + operator
    available in a way so that you don't care how it is actually implemented
    beneath. It doesn't necessarily mean that you need to switch to infix
    notation.


    Pascal
     
    Pascal Costanza, Oct 23, 2003
    #18
  19. Andrew Dalke

    Kenny Tilton Guest

    Jim Newton wrote:

    > Perhaps lisp is not so well-liked is because there are not enough
    > standard libraries available. Perl has the CPAN libraries that
    > you can simply download and use. I've heard the Python also
    > has a huge number of such libraries. But if i want a lisp-tk
    > CLOS library, i'd have to write it myself, or lisp-DOM, or
    > lisp-cvs, or lisp-mySQL.
    >
    > On this note, has anyone ever heard of openAccess or openEDA?
    > http://www.openeda.org/


    Wow. That's the first open source project that demanded a license
    agreement or something before letting me download. They use that word
    Open so much, but I do not think it means what they think it means. :)

    Anyway, I just wanted to see what everyone is interfacing to. Oops. You
    already said C++. Might take a C wrapper so Lisp FFIs can cope.

    >
    > it's an open source database for dealing with EDA design
    > data. Currently there is a C++ interface and Python.
    > At Tcl interface is either here on on its way.
    >
    > In my opinion a CLOS library would be much easier to use
    > than any of the other, if it existed. But what i've decided
    > to do in the mean time is to start from scratch learing
    > Python.


    If you are not part of the solution, you are part of the problem.
    Tilton's Law: Solve the right problem. Your problem is not that you do
    not know Python. A giveaway is that learning Python will not help you
    use OpenEDA from Lisp. That is your problem. What is the solution? UFFI
    bindings for OpenEDA. Regrettably, possibly also some gluing of C++
    classes to C structs so Lisp FFIs can cope. The good news is that the
    problem is linear and finite, albeit tedious. But crafty Lisp macros can
    help. Might take a week or two. I and others here can help with that.

    Tilton's Law: Move towards the light. If you have to do some hard work
    over the next few weeks, would you rather (a) know and be using Python
    or (b) know Lisp FFIs (it's not /that/ bad) and be using Lisp?

    >
    > The C++ standard API is open, and well documented. Does
    > anyone volunteer to start such a openAcces --> CLOS
    > project?
    >


    <g> Yep, some guy named:

    >
    > -jim
    >


    And I can help you with the FFI. Do they /really/ use C++ classes, or is
    it just C in C++ clothing? Is it a persistent C++ DB format? Probably
    not. I am thinking, just read the DB in Lisp, eliminate the bothersome
    middleman. They didn't use XML? Or is it more than a DB, really?

    kenny

    --
    http://tilton-technology.com
    What?! You are a newbie and you haven't answered my:
    http://alu.cliki.net/The Road to Lisp Survey
     
    Kenny Tilton, Oct 23, 2003
    #19
  20. Andrew Dalke wrote:
    > Marco Antoniotti:
    >
    >>Yes. The point is that you can do that in CL easily. Not so in Python.

    >
    >
    > But, but, but, ... I do it all the time in Python.
    >
    >
    >>cl-prompt> (symbolp '|1-1'-(ethylenediimino)di-2-proponal|)
    >>T

    >
    >
    > Again, that's forcing the domain language to fit the requirements
    > of the implementation language. It's a shortcut, and requires the
    > domain experts to have more knowledge of the implementation
    > and means the domain language cannot be reimplemented outside
    > of a Lisp environment. (Yes, I know, why would you want to.
    > Answer: because that's the way life is.)
    >
    >
    >>In Python there is no equivalent AFAIK, in Java you can to

    >
    >
    > Indeed. In Python I wouldn't translate such a system directly
    > into Python. Eg, I would write an emulator for that language
    > in Python, or my translation would do some munging for me.
    >
    >
    >>This is just a few reader macros away (and the use of the INFIX package).
    >>
    >>In Python, you always have to com up with you full fledged parser.

    >
    >
    > You use reader macros for that. I use parser generators. You
    > cheat by storing the domain language directly as s-exps, I cheat
    > by storing the domain language in native Python data structures.
    > I fail to see a fundamental difference.


    You fail to see the point because your objective is to show that there
    are tasks that are equally difficult in either Python or Common Lisp.

    If you have to write a parser for a any language (say Ada 9x, just for
    the heck of it), then obviously you have to resort to the full parse
    toolchain (tokenizer, AST constructor etc etc). It does not matter if
    you use INTERCAL or Python.

    But, and here your argument does not hold, in Common Lisp you have an
    extra trick up your sleeve. You can make your "domain language" S-expr
    based (instead of inventing an XML format :) ). This is *in addition*
    to what you can achieve with Python or other languages. And this
    *addition* makes CL more powerful. Not only that. With reader macros,
    CL has even more flexibility for achieving these effects. So, yes, it
    is a trick. It is a trick that makes the language more usable.

    >
    > The point of my exercise was to show that such a task is
    > not hard in Python. Yes, it's about twice as long as the Lisp
    > code - this is a place where Lisp shines,


    Yes. The CL code also compiles down to fast native code (how close is
    Psyco to ECL?). Another place where Common Lisp shines. Especially
    when you have to do numerical computations.

    Cheers
    --
    Marco
     
    Marco Antoniotti, Oct 23, 2003
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    401
    Greg Menke
    Oct 19, 2003
  2. Replies:
    3
    Views:
    468
    Bjorn Pettersen
    Oct 26, 2003
  3. Replies:
    16
    Views:
    489
    Kaz Kylheku
    Oct 28, 2003
  4. Mr. SweatyFinger
    Replies:
    2
    Views:
    2,128
    Smokey Grindel
    Dec 2, 2006
  5. SAZ
    Replies:
    3
    Views:
    339
    Beauregard T. Shagnasty
    May 6, 2011
Loading...

Share This Page