Atoms, Identifiers, and Primaries

Bruce McGoveran · Apr 16, 2013

These are terms that appear in section 5 (Expressions) of the Python onlinedocumentation. I'm having some trouble understanding what, precisely, these terms mean. I'd appreciate the forum's thoughts on these questions:

1. Section 5.2.1 indicates that an identifier occurring as an atom is a name. However, Section 2.3 indicates that identifiers are names. My question: can an identifier be anything other than a name?

2. Section 5.3 defines primaries as the most tightly bound operations of Python. What does this mean? In particular, if an atom is a primary, what operation is the atom performing that leads to the label "most tightly bound"? To put it a different way, I think of atoms as things (i.e. identifiers). The documentation makes me think atoms actually do something, as opposed to being things (I think I have in my mind the difference between a nounand a verb as I write this). Perhaps the doing in this case (or binding, if you like) is linking (binding) the identifier to the underlying object? I think it might help if I had a better working notion of what a primary is.

3. Section 5.3.1 offers this definition of an attributeref:
attributeref ::= primary "." identifier

Now, I was at first a little concerned to see the non-terminal primary on the right hand side of the definition, since primary is defined to include attributeref in section 5.3 (so this struck me as circular). Am I correct in thinking attributeref is defined this way to allow for situations in which the primary, whether an atom, attributeref (example: an object on which a method is called that returns another object), subscription, slicing, or call, returns an object with property identifier?

These are, I know, long-winded questions. I appreciate in advance any thoughts the group can offer.

The relevant documentation link is: http://docs.python.org/2/reference/expressions.html#expressions

Thanks,
Bruce

rusi · Apr 16, 2013

These are terms that appear in section 5 (Expressions) of the Python online documentation. I'm having some trouble understanding what, precisely,these terms mean. I'd appreciate the forum's thoughts on these questions:

1. Section 5.2.1 indicates that an identifier occurring as an atom is a name. However, Section 2.3 indicates that identifiers are names. Myquestion: can an identifier be anything other than a name?

2. Section 5.3 defines primaries as the most tightly bound operations of Python. What does this mean? In particular, if an atom is a primary, what operation is the atom performing that leads to the label "most tightly bound"? To put it a different way, I think of atoms as things (i.e. identifiers). The documentation makes me think atoms actually do something, as opposed to being things (I think I have in my mind the difference between a noun and a verb as I write this). Perhaps the doing in this case(or binding, if you like) is linking (binding) the identifier to the underlying object? I think it might help if I had a better working notion of what a primary is.

3. Section 5.3.1 offers this definition of an attributeref:
attributeref ::= primary "." identifier

Now, I was at first a little concerned to see the non-terminal primary onthe right hand side of the definition, since primary is defined to includeattributeref in section 5.3 (so this struck me as circular). Am I correct in thinking attributeref is defined this way to allow for situations in which the primary, whether an atom, attributeref (example: an object on which a method is called that returns another object), subscription, slicing, or call, returns an object with property identifier?

These are, I know, long-winded questions. I appreciate in advance any thoughts the group can offer.

The relevant documentation link is: http://docs.python.org/2/reference/expressions.html#expressions

Thanks,
Bruce

The specific details of python grammar I am not deeply familiar with,
so I'll leave others to comment.
One general comment I will make is regarding your distress at what you
call 'circular'
Circular just means recursive and recursion is the bedrock for
language-design.
You cannot hope to define an infinite object such as the python
language (there are an infinite number of python programs) with a
finite specification -- a useful language definition must start and
end and preferably fit in one's pocket!
The trick is to find ways of making an inifinite object finitely
generated. So much of language design is a generalization of Peano's
method of defining (designing?) natural numbers:
a. 0 is a natural number
b. If x is a natural number then the successor of x (informally x+1)
is a natural number

Note 1. that if we take the above as a definition of natural no. then
its a recursive definition, in particular b. You can call it circular
if you like. Just drop the pejorative color.
Note 2. We need to add the 'informally' because + needs to be defined
in terms of the above definition and not the other way round, else we
get a bad case of recursion. Such a definition would run as
0 + y = y
Sx + y = S(x+y)
(read Sx as successor of x)

Coming closer to your question, consider the 3 expressions:
A: 2*x + 3*y
B: (2*x) + 3*y
C: 2*(x+3)*y

Informally we may say that A and B are the same whereas C is
different.
Formally all three are different.
Not only are they different as strings, their parse-trees are
different. Their evaluations as defined by the python interpreter may
give the same value to A and B... thats a separate question from yours
which is really a syntax question.

A is an a_expr because 2*x is an m_expr and 3*y is an m-expr
Since an m-expr is (a trivial instance of an) a_expr therefore 2*x is
an a_expr
Since one way of making an a_expr is by doing a_expr + m_expr (note
the recursion) therefore 2*x + 3*y is an a_expr

In short Ive described the derivation:
a_expr -> a_expr + m_expr -> a_expr + 3*y -> m_expr + 3*y -> 2*x + 3*y

The derivation of B would be longer involving fact that (2*x) is a
parent-form therefore atom
atom -> enclosure -> enclosure -> parenth_form -> (expr_list) ->
(expression)
-> (conditional-e) -> (or_test) -> (and_test) -> (or-exp) -> (xor->
exp) -> (and-exp) -> (shift-e) -> (a_expr)

[And now I got fedup of following the grammar but hopefully you get
the idea!]

Ian Kelly · Apr 17, 2013

These are terms that appear in section 5 (Expressions) of the Python online documentation. I'm having some trouble understanding what, precisely, these terms mean. I'd appreciate the forum's thoughts on these questions:

1. Section 5.2.1 indicates that an identifier occurring as an atom is a name. However, Section 2.3 indicates that identifiers are names. My question: can an identifier be anything other than a name?

Yes. For example:

from a import b

Here "a" is an identifier but not a name, as it does not carry
object-binding semantics.

2. Section 5.3 defines primaries as the most tightly bound operations of Python. What does this mean?

"Tightly bound" here refers to operator precedence. For example, we
say that the multiplication operator binds more tightly [to the
surrounding operands] than the arithmetic operator, because the
multiplication takes precedence. This section defines that the most
tightly bound operations in Python are attribute references,
subscriptions, slices and calls; these always take precedence over
other neighboring operations.

In particular, if an atom is a primary, what operation is the atom performing that leads to the label "most tightly bound"?

An atom doesn't perform an operation. The grammar defines that a
primary can be just an atom, so that anywhere in the grammar that
expects a primary, a simple atom with no primary operation performed
on it can equally be used.

Dave Angel · Apr 17, 2013

These are terms that appear in section 5 (Expressions) of the Python online documentation. I'm having some trouble understanding what, precisely, these terms mean. I'd appreciate the forum's thoughts on these questions:

3. Section 5.3.1 offers this definition of an attributeref:
attributeref ::= primary "." identifier

Now, I was at first a little concerned to see the non-terminal primary on the right hand side of the definition, since primary is defined to include attributeref in section 5.3 (so this struck me as circular). Am I correct in thinking attributeref is defined this way to allow for situations in which the primary, whether an atom, attributeref (example: an object on which a method is called that returns another object), subscription, slicing, or call, returns an object with property identifier?

It is circular. Nothing wrong with that. It means that not only can
you use
a.b

but also
a.b.c

and
a.b.c.d.e.f.g

without any explicit limit. if a non-circular definition were to be
attempted, you might need a few dozen rules, just to cover what someone
*might* happen to use in an expression. Of course normally, one doesn't
go much beyond a.b.c in a single expression.

Steven D'Aprano · Apr 17, 2013

These are terms that appear in section 5 (Expressions) of the Python
online documentation. I'm having some trouble understanding what,
precisely, these terms mean. I'd appreciate the forum's thoughts on
these questions:

1. Section 5.2.1 indicates that an identifier occurring as an atom is a
name. However, Section 2.3 indicates that identifiers are names. My
question: can an identifier be anything other than a name?

Yes and no.

According to the Python grammar as documented, no, identifiers are just
another name for, er, name.

But according to common practice, yes, we call many things identifiers:

x # a bare name, or "identifier" according to the grammar

x.attr # name with an attribute, or "attributeref"

x[key] # name with a subscript, or "subscription"

x[5] # name with an indexed subscript, or "slicing"

x[start:stop:step] # name with a slice subscript

2. Section 5.3 defines primaries as the most tightly bound operations
of Python. What does this mean?

It means that primaries are evaluated at the highest priority. For
example, given:

x.a+b

that is evaluated as:

(x.a) + b

rather than:

x . (a+b)

In particular, if you have a variable:

name = "Fred"

then x.name will look for an attribute "name", *not* x.Fred.

In particular, if an atom is a
primary, what operation is the atom performing that leads to the label
"most tightly bound"? To put it a different way, I think of atoms as
things (i.e. identifiers).
Correct.

The documentation makes me think atoms
actually do something, as opposed to being things (I think I have in my
mind the difference between a noun and a verb as I write this).

The only thing they do is be evaluated.

3. Section 5.3.1 offers this definition of an attributeref:
attributeref ::= primary "." identifier

Now, I was at first a little concerned to see the non-terminal primary
on the right hand side of the definition, since primary is defined to
include attributeref in section 5.3 (so this struck me as circular). Am
I correct in thinking attributeref is defined this way to allow for
situations in which the primary, whether an atom, attributeref (example:
an object on which a method is called that returns another object),
subscription, slicing, or call, returns an object with property
identifier?

Yes. It means you can write things like this:

module.Class.method()[0](arg).attr.name[2]['spam'].aardvark()

and have it evaluated from left to right.

With respect, I think you may be confusing yourself unnecessarily with an
excessive concern for the formal grammar of Python. You may find it makes
a lot more sense in practice than it makes in theory. I strongly
recommend you open up an interactive interpreter and just play around
with the syntax and see what happens.

Bruce McGoveran · Apr 17, 2013

Thank you all for your thoughtful replies. I appreciate your collective insight. I didn't mean to cast the concept of recursion in a negative light - I'm actually comfortable with the concept, at least to some extent, and Iappreciate the need for its use in this documentation. I also appreciate the need to play with expressions at the command line, to gain a feel for how expressions are evaluated. My interest in the language's formal description arises merely out of a desire to understand as precisely as possible what happens when I hit enter at the command line, or when I run a module.
Your answers to my initial questions in this thread and the ones I posed inanother thread ("Understanding Boolean Expressions") have lead me to some follow-up questions. Suppose I'm working at the command line, and I bind xto the value 1 and y to the value 0. Suppose I next type x and y and hit enter. Python returns 0 (zero). I'm glad I checked this before sending inthis post because I thought it would return a value of False based on the presence of the and operand. My question: what did the interpreter have to do to evaluate the expression x and y and return a value of zero?
I know the lexical analyzer has to parse the stream of characters into tokens. I presume this parsing generates the toxens x, y, and, and a NEWLINE. Beyond that, things get a little fuzzy, and it occurs to me that this fuzziness is the result of my looking at the expression x and y knowing full well what each token means and what I want done with them, whereas the interpreter won't know these things until it can parse the character stream and sort the tokens into some recognizable (and syntactically correct) order.
As I look at it, the expression x and y has two atoms, namely x and y. x and y are also primaries, and they represent the most tightly bound parts ofthis expression (meaning they bind more tightly to their underlying objects than to the and operator). Incidentally, how does Python figure out that the x and y in this expression refer to the x and y I previously bound tointeger values? I know there's a symbol table in each execution frame. How does Python know to go to that table and check for x and y?
The and token represents an operator, a boolean operator to be specific. As I look at the grammar for and_test in section 5.10 of the documentation, it would appear that the and_test resolves via not_test's definition to twocomparisons, which in turn resolve to or_expr, and then via a series of binary bitwise definitions to shift_expr, then to a_expr, then to m_expr, then to u_expr, to power, and then primary, and then to atom, which lands us finally at non-terminal identifiers (i.e. x and y themselves).
Questions: In working through these steps, what I have actually demonstrated? Is this how Python deconstructs an and statement with two operands? Do I take from the fact that the progression from and_test to identifier involved reference to bitwise operators that the boolean testing of x and y involves a bitwise comparison of x and y? I have to admit these questions are a little confusing; this may reflect the fact I am not exactly sure what it is I am trying to ask. In general terms, I am trying to understand how Python evalutes the expression x and y in this context.
For my sanity's sake (and, perhaps, for yours) I will stop there. I send thanks in advance for any thoughts you have on my questions.

Steven D'Aprano · Apr 17, 2013

Wow, that's some impressive wall of text! Splitting your comments up into a
few paragraphs would make it much easier to read

My comments below...

Thank you all for your thoughtful replies. I appreciate your collective
insight. I didn't mean to cast the concept of recursion in a negative
light - I'm actually comfortable with the concept, at least to some
extent, and I appreciate the need for its use in this documentation. I
also appreciate the need to play with expressions at the command line,
to gain a feel for how expressions are evaluated. My interest in the
language's formal description arises merely out of a desire to
understand as precisely as possible what happens when I hit enter at the
command line, or when I run a module.

You won't gain that from the *grammar* of the language. Grammar is only part
of the story, and in some ways, the least important part. If I tell you
that the grammar of English includes:

ADJECTIVE NOUN

that alone is not going to help you understand the differences between
a "wise man" and a "wise guy", or "peanut oil" and "baby oil".

I'm not saying that syntax and grammar is unimportant, but it is independent
of the *semantics* of the program, and really its the semantics (the
meaning) of code that is important. One can easily imagine four languages
where the identical operation was written as:

name.attribute
name->attribute
name(attribute)
attribute of name

Contrariwise, just because two languages have ostensibly the same syntax for
something, doesn't mean they are doing the same thing. E.g. there are
subtle differences between attribute lookup name.attribute in Java and
Python.

Your answers to my initial
questions in this thread and the ones I posed in another thread
("Understanding Boolean Expressions") have lead me to some follow-up
questions. Suppose I'm working at the command line, and I bind x to the
value 1 and y to the value 0. Suppose I next type x and y and hit
enter. Python returns 0 (zero). I'm glad I checked this before sending
in this post because I thought it would return a value of False based on
the presence of the and operand.

The command line is actually irrelevant here. With one or two minor
exceptions, Python will do the same thing whether you are working
interactively or not. The main differences are that at the interactive
prompt, Python will automatically print the result of any expression which
is not otherwise bound to a value, and also bind it to the variable
name "_". (A single underscore.) Other interactive command prompts may do
more, or less.

My question: what did the interpreter
have to do to evaluate the expression x and y and return a value of
zero? I know the lexical analyzer has to parse the stream of characters
into tokens. I presume this parsing generates the toxens x, y, and, and
a NEWLINE.

Well, yes, but you're being awfully reductionist here. I'm the first to be
in favour of curiosity for curiosity's sake, but I'm not sure that getting
bogged down at such a low level this early in your Python learning
experience is a good idea. *shrug* No skin off my nose though.

The answer is going to depend on the implementation. There are at least four
major implementations of Python these days, and another dozen or two
obsolete, experimental or minor implementations. (Oh, and let me apologise
in advance to anyone whose implementation I haven't listed as "major".)

CPython is the implementation you are probably using; Jython runs on top of
the Java virtual machine, IronPython runs on top of the Dot Net virtual
machine, and PyPy runs on the deepest, darkest voodoo known to Mankind. But
essentially, any implementation will have to perform all or most of these
steps:

* Parse the source code into tokens. CPython generates an AST, Abstract
Syntax Tree. What that means in practice, I have no idea. This is
relatively new: some versions back, the syntax was essentially identical,
but there was no AST involved.

* From the tokens, or the AST, generate byte code. Or machine code, if your
compiler is clever enough.

* Execute the byte code in some virtual machine, or the machine code
directly on your CPU, as the case may be.

You can view the byte code using the dis module, e.g.:

py> import dis
py> code = compile('x = 1; y = 0; print x and y', '', 'exec')
py> dis.dis(code)
1 0 LOAD_CONST 0 (1)
3 STORE_NAME 0 (x)
6 LOAD_CONST 1 (0)
9 STORE_NAME 1 (y)
12 LOAD_NAME 0 (x)
15 JUMP_IF_FALSE_OR_POP 21
18 LOAD_NAME 1 (y)22 PRINT_NEWLINE
23 LOAD_CONST 2 (None)
26 RETURN_VALUE

If you run this under another implementation of Python, such as WPython, or
even a different version of CPython, you may get completely different byte
code. But the *semantics* of what it does must be identical, otherwise it
isn't Python.

Beyond that, things get a little fuzzy, and it occurs to me
that this fuzziness is the result of my looking at the expression x and
y knowing full well what each token means and what I want done with
them, whereas the interpreter won't know these things until it can parse
the character stream and sort the tokens into some recognizable (and
syntactically correct) order.

No. Python will not sort the tokens into syntactically correct order. It
will only take them in the order you provide them. If you write code in the
wrong syntax:

x y <= # I prefer to write reverse Polish notation

you will get a syntax error, Python won't rearrange the tokens into the
correct order x <= y.

As I look at it, the expression x and y
has two atoms, namely x and y. x and y are also primaries, and they
represent the most tightly bound parts of this expression (meaning they
bind more tightly to their underlying objects than to the and operator).
Incidentally, how does Python figure out that the x and y in this
expression refer to the x and y I previously bound to integer values?

Ah, now you're finally asking the right questions! This is the important
part: Python's execution model. All that stuff about parsing source code
and generating ASTs and byte-code is just the mechanics that are needed to
make it work. In principle, you could simulate a Python interpreter in your
head, or build a CPU that executed Python code directly, and avoid
everything except the most simple parser. The behaviour of the interpreter
will be the same, regardless of how much work it does.

In this case, Python does a runtime lookup in the current namespace for
names "x" and "y", retrieve the appropriate values, or raise a NameError if
they don't exist. In pseudo-code:

get the current namespace
if 'x' is not in the namespace: raise NameError
if 'y' is not in the namespace: raise NameError
get the value of x in the current namespace
get the value of y in the current namespace
perform the "and" operator on those two values

(more or less).

Python's actual execution model is based on a stack, like Forth, but really,
unless you're trying to read byte code, you will never need to know this.
It's irrelevant to the day-to-day understanding of Python code.

I know there's a symbol table in each execution frame. How does Python
know to go to that table and check for x and y?

Because some person, many years ago, programmed the Python compiler to do
that.

The and token represents
an operator, a boolean operator to be specific. As I look at the
grammar for and_test in section 5.10 of the documentation, it would
appear that the and_test resolves via not_test's definition to two
comparisons, which in turn resolve to or_expr, and then via a series of
binary bitwise definitions to shift_expr, then to a_expr, then to
m_expr, then to u_expr, to power, and then primary, and then to atom,
which lands us finally at non-terminal identifiers (i.e. x and y
themselves). Questions: In working through these steps, what I have
actually demonstrated? Is this how Python deconstructs an and statement
with two operands?

Absolutely not. All you're doing is focusing on the formal grammar, instead
of either the Python virtual machine, or it's execution model.

Do I take from the fact that the progression from
and_test to identifier involved reference to bitwise operators that the
boolean testing of x and y involves a bitwise comparison of x and y?

Absolutely not.

What you actually need to know is much more high-level than this. When
Python evaluates "x and y", it runs pseudo-code something like this:

* if x is not a true-like value, return x
* return y

This is short-cut semantics of boolean "and", with truthy/falsey values
instead of just a pair of True/False values.

You won't learn this from the grammar, because it isn't part of the grammar.
It is part of the execution model of the language.

How does Python determine whether x is "true-like"? It asks x, "are you
true?". If x has a __bool__ method, it calls x.__bool__(). Otherwise, if x
has a __len__ method, it calls x.__len__() and compares the result to zero.
Otherwise, it (somewhat arbitrarily, but reasonably) declares that since x
is an object, it is something rather than nothing and therefore it must be
true-ish.

I hope this helps, and expect it will raise as many questions as it answers!

Mark Janssen · Apr 17, 2013

One general comment I will make is regarding your distress at what you
call 'circular'
Circular just means recursive and recursion is the bedrock for
language-design.

Rercursion the "bedrock" of language-design. I don't think so. From
what I know, a well-defined language ends at its symbols. It makes no
use of "infinities".

You cannot hope to define an infinite object such as the python
language (there are an infinite number of python programs) with a
finite specification

You've committed two grave sins in C.S.: Conflating a programming
language ("an infinite object such as python language") with a program
written in that language ("there are an infinite number of python
programs"). These two are entirely separate (at least anything
implemented on a real computer). Further, you've made a silly
description of python "an infinite object such as the python
language". A programming language that is well defined has complete,
finite, specification. The fact that there are an endless number of
programs that can be made from such is irrelevant to the language
itself.

-- a useful language definition must start and
end and preferably fit in one's pocket!

Likewise, a language specification must end in its symbols.

The trick is to find ways of making an inifinite object finitely
generated.

There is no trick involved.

So much of language design is a generalization of Peano's
method of defining (designing?) natural numbers:
a. 0 is a natural number
b. If x is a natural number then the successor of x (informally x+1)
is a natural number

Well now you're getting to the root of the confusion and what I'm
arguing within the C.S. community: there must be clear distinction
between lambda calculii and programming languages rooted in actual
hardware implementations. While, traditionally, the field has not
made much of a distinction, in practice the computational architecture
is different. One of these has a connection to reality and the other
not as much ;^).

In any case, talking about the mathematical realm *as a realm of
Platonic thought* is irrelevant to the discussion of program spaces
where *things actually get done*.

This is what this list (python) has not figured out yet, because they
look up to the theoretical C.S. field and it hasn't yet been
published.

alex23 · Apr 17, 2013

This is what this list (python) has not figured out yet, because they
look up to the theoretical C.S. field and it hasn't yet been
published.

No one here idolises "the theoretical C.S. field". They *use* Python
to *get things done*, not to engage in pointless masturbation.

Please keep your computer pseudo-science nonsense to your own threads,
don't use it to derail others.

Ian Kelly · Apr 17, 2013

Rercursion the "bedrock" of language-design. I don't think so. From
what I know, a well-defined language ends at its symbols. It makes no
use of "infinities".

From what I know, you can't have a Turing-complete language without

some form of recursion. So yeah, it's pretty damn important in
language design.

You've committed two grave sins in C.S.:

You've just committed the grave sin of being needlessly hyperbolic.

Conflating a programming
language ("an infinite object such as python language") with a program
written in that language ("there are an infinite number of python
programs"). These two are entirely separate (at least anything
implemented on a real computer).

Mathematically, a language (e.g. a programming language) is a set of
well-formed strings (i.e. programs) constructed from the symbols of an
alphabet (i.e. tokens). For most languages, this set is infinite;
saying "the Python language is infinite" is equivalent to saying
"there are an infinite number of Python programs".

Further, you've made a silly
description of python "an infinite object such as the python
language". A programming language that is well defined has complete,
finite, specification. The fact that there are an endless number of
programs that can be made from such is irrelevant to the language
itself.

A finite, non-recursive grammar can only hope to accept a finite
number of strings. To have an infinite language, the defining grammar
must then be either infinite (not practical) or recursive.

Well now you're getting to the root of the confusion and what I'm
arguing within the C.S. community: there must be clear distinction
between lambda calculii and programming languages rooted in actual
hardware implementations. While, traditionally, the field has not
made much of a distinction, in practice the computational architecture
is different. One of these has a connection to reality and the other
not as much ;^).

In any case, talking about the mathematical realm *as a realm of
Platonic thought* is irrelevant to the discussion of program spaces
where *things actually get done*.

Of course it's relevant. Without theory we would not have big-Oh
notation or efficient data structures or regular expressions or
context-free grammars; languages like Python would be harder to
invent. I'm sure one could come up with a myriad other examples, but
that's enough for me.

Mark Janssen · Apr 17, 2013

No one here idolises "the theoretical C.S. field". They *use* Python
to *get things done*, not to engage in pointless masturbation.

Woah! "no one" you say.... Interesting...

Mark

Mark Janssen · Apr 17, 2013

From what I know, you can't have a Turing-complete language without
some form of recursion. So yeah, it's pretty damn important in
language design.

A Turing-complete language generally has items that are defined in
terms of other, simpler items, but this is not called recursion in any
C.S. paper I know.
In C.S. of my world, recursion is a specific term that is related to
functional calculii. This type of recursion is sometimes often found
in imperative/iterative languages, but is rooted in the fomer.

Mathematically, a language (e.g. a programming language) is a set of
well-formed strings (i.e. programs) constructed from the symbols of an
alphabet (i.e. tokens).

Mathematically, perhaps, but from C.S. theory, a language is a
fully-specified set of expressions and tokens which are considered
valid -- it's grammar.

For most languages, this set is infinite;

This set is always finite, as you can see on the specification for
Python's language.

saying "the Python language is infinite" is equivalent to saying
"there are an infinite number of Python programs".

I don't think Guido would agree that "the Python language is
infinite", but then perhaps he doesn't care either.

A finite, non-recursive grammar can only hope to accept a finite
number of strings.

Is the language we're speaking in now one with a finite, non-recursive grammar?

Mark Lawrence · Apr 17, 2013

Woah! "no one" you say.... Interesting...

Mark

IMHO very few cos we all know that practically beats purity.

Mark Lawrence · Apr 17, 2013

A Turing-complete language generally has items that are defined in
terms of other, simpler items, but this is not called recursion in any
C.S. paper I know.
In C.S. of my world, recursion is a specific term that is related to
functional calculii. This type of recursion is sometimes often found
in imperative/iterative languages, but is rooted in the fomer.

Mathematically, perhaps, but from C.S. theory, a language is a
fully-specified set of expressions and tokens which are considered
valid -- it's grammar.

This set is always finite, as you can see on the specification for
Python's language.

I don't think Guido would agree that "the Python language is
infinite", but then perhaps he doesn't care either.

Is the language we're speaking in now one with a finite, non-recursive grammar?

Thanks for reminding me that I must add food for the trolls to the
bottom of my shopping list.

Chris Angelico · Apr 17, 2013

Rercursion the "bedrock" of language-design. I don't think so. From
what I know, a well-defined language ends at its symbols. It makes no
use of "infinities".

There's a difference between infinite and recursive, though. I was
defining a function (it converted from JSON to an internal format) and
wanted to explain that not all of JSON would reliably round-trip (ie
be able to be translated to the internal format and then back again).
To describe what _would_ round-trip correctly, I used this simple yet
technically illegal description:

typedef valid string|array(valid)|object(string:valid)

In other words, a string is valid, and a list/array of valid elements
is valid, and a dictionary/mapping/object with string keys and valid
elements is valid. It's a recursive definition, but it can't go
infinite (self-references aren't valid - though this isn't stated by
the typedef); however, it can go arbitrarily deep.

ChrisA

Steven D'Aprano · Apr 17, 2013

From what I know, you can't have a Turing-complete language without
some form of recursion. So yeah, it's pretty damn important in language
design.

Incorrect. Early Fortran, which was definitely Turing complete, was
incapable of using recursion. But that doesn't matter, since any
recursive algorithm can be re-written as iteration. So long as a language
can iterate an indefinite number of times, it may be Turing complete.

(Languages which can only iterate a fixed number of times cannot be
Turing complete.)

Hell, Turing machines themselves are not recursive. Since they don't have
a concept of functions, they don't have a concept of functions that call
themselves. A Turing machine only has a couple of trivial operations:

* read a cell
* write a cell
* advance the tape
* change direction

and so it's grammar is correspondingly trivial. Actually, talking about
the grammar of a Turing machine is probably wrong. In practice, Turing
machines are specified as a finite (and usually small) table of states
and cells. See here for examples:

http://en.wikipedia.org/wiki/Turing_machine_examples

So it isn't even correct to say that recursion is necessary for a
language's *grammar*. However, for any real-world practical language
(there's a reason that no practical language is based on Turing machines)
recursive grammars are extraordinarily useful.

A finite, non-recursive grammar can only hope to accept a finite number
of strings. To have an infinite language, the defining grammar must
then be either infinite (not practical) or recursive.

I don't believe that is true, so long as the grammar has a concept of
"zero or more" of some syntactic element. For example, suppose your
grammar has a concept of integers, defined recursively as either a digit,
or a digit followed by an integer:

INTEGER ::= DIGIT | DIGIT INTEGER

This can be defined more naturally as a digit followed by zero or more
digits:

INTEGER ::= DIGIT (DIGIT)*

Ian Kelly · Apr 17, 2013

A Turing-complete language generally has items that are defined in
terms of other, simpler items, but this is not called recursion in any
C.S. paper I know.
In C.S. of my world, recursion is a specific term that is related to
functional calculii. This type of recursion is sometimes often found
in imperative/iterative languages, but is rooted in the fomer.

You are thinking of recursive procedures. Recursion is the more
general concept of self-repetition. In a programming language, it can
be implemented by recursive procedures, or it can equivalently be
implemented by looping constructs.

Incidentally, in computability theory (also known as "recursion
theory"), "recursive" is basically a synonym for "computable", which
relates back to my point that recursion is necessary for
Turing-completeness; a Turing-complete language is one that can
compute any computable (i.e. recursive) function.

Mathematically, perhaps, but from C.S. theory, a language is a
fully-specified set of expressions and tokens which are considered
valid -- it's grammar.

Sorry, but as computer science *is* math, the computer science
definition is the same as the mathematical one. See for example this
CS paper which formally defines "language" as I described:

http://www.cs.ucr.edu/~jiang/cs215/tao-new.pdf

This set is always finite, as you can see on the specification for
Python's language.

No, the set of valid Python programs is not finite.

I don't think Guido would agree that "the Python language is
infinite", but then perhaps he doesn't care either.

Is the language we're speaking in now one with a finite, non-recursive grammar?

No, English is also recursive.

Ian Kelly · Apr 17, 2013

Incorrect. Early Fortran, which was definitely Turing complete, was
incapable of using recursion. But that doesn't matter, since any
recursive algorithm can be re-written as iteration. So long as a language
can iterate an indefinite number of times, it may be Turing complete.

You're also confusing "recursion" with "recursive programming". See
the response I just gave to Mark.

88888 Dihedral · Apr 18, 2013

Ianæ–¼ 2013å¹´4æœˆ17æ—¥æ˜ŸæœŸä¸‰UTC+8ä¸‹åˆ3æ™‚21åˆ†00ç§’å¯«é“ï¼š

These are terms that appear in section 5 (Expressions) of the Python online documentation. I'm having some trouble understanding what, precisely,these terms mean. I'd appreciate the forum's thoughts on these questions:

1. Section 5.2.1 indicates that an identifier occurring as an atom is a name. However, Section 2.3 indicates that identifiers are names. My question: can an identifier be anything other than a name?

Click to expand...

Yes. For example:

from a import b

Here "a" is an identifier but not a name, as it does not carry

object-binding semantics.

2. Section 5.3 defines primaries as the most tightly bound operations of Python. What does this mean?

Click to expand...

"Tightly bound" here refers to operator precedence. For example, we

say that the multiplication operator binds more tightly [to the

surrounding operands] than the arithmetic operator, because the

multiplication takes precedence. This section defines that the most

tightly bound operations in Python are attribute references,

subscriptions, slices and calls; these always take precedence over

other neighboring operations.

In particular, if an atom is a primary, what operation is the atom performing that leads to the label "most tightly bound"?

Click to expand...

An atom doesn't perform an operation. The grammar defines that a

primary can be just an atom, so that anywhere in the grammar that

expects a primary, a simple atom with no primary operation performed

on it can equally be used.

An atom can not be divided into further details.
An atom can be created and cloned or just referenced
in some relations.

An object is composed of atoms linked in someway.

Of course, one can box those atoms of an object
to make the object immutable at least in some situations
to be named and used.

88888 Dihedral · Apr 18, 2013

Ianæ–¼ 2013å¹´4æœˆ17æ—¥æ˜ŸæœŸä¸‰UTC+8ä¸‹åˆ3æ™‚21åˆ†00ç§’å¯«é“ï¼š

These are terms that appear in section 5 (Expressions) of the Python online documentation. I'm having some trouble understanding what, precisely,these terms mean. I'd appreciate the forum's thoughts on these questions:

1. Section 5.2.1 indicates that an identifier occurring as an atom is a name. However, Section 2.3 indicates that identifiers are names. My question: can an identifier be anything other than a name?

Click to expand...

Yes. For example:

from a import b

Here "a" is an identifier but not a name, as it does not carry

object-binding semantics.

2. Section 5.3 defines primaries as the most tightly bound operations of Python. What does this mean?

Click to expand...

"Tightly bound" here refers to operator precedence. For example, we

say that the multiplication operator binds more tightly [to the

surrounding operands] than the arithmetic operator, because the

multiplication takes precedence. This section defines that the most

tightly bound operations in Python are attribute references,

subscriptions, slices and calls; these always take precedence over

other neighboring operations.

In particular, if an atom is a primary, what operation is the atom performing that leads to the label "most tightly bound"?

Click to expand...

An atom doesn't perform an operation. The grammar defines that a

primary can be just an atom, so that anywhere in the grammar that

expects a primary, a simple atom with no primary operation performed

on it can equally be used.

An atom can not be divided into further details.
An atom can be created and cloned or just referenced
in some relations.

An object is composed of atoms linked in someway.

Of course, one can box those atoms of an object
to make the object immutable at least in some situations
to be named and used.

Generating valid identifiers	8	Jul 26, 2012
limits between identifiers and variable names	5	Oct 7, 2013
Non-identifiers in dictionary keys for **expression syntax	3	May 23, 2013
Identifiers - UnicodeEscapeSequence	6	Feb 15, 2010
Renaming identifiers & debugging	14	Feb 25, 2010
C++ grammar: universal-character-name in identifiers	4	Sep 6, 2009
JComboBox with IDentifiers	8	Aug 10, 2012
Function Parameter-List with Identifiers	9	Apr 25, 2012

Atoms, Identifiers, and Primaries

Bruce McGoveran

rusi

Ian Kelly

Dave Angel

Steven D'Aprano

Bruce McGoveran

Steven D'Aprano

Mark Janssen

alex23

Ian Kelly

Mark Janssen

Mark Janssen

Mark Lawrence

Mark Lawrence

Chris Angelico

Steven D'Aprano

Ian Kelly

Ian Kelly

88888 Dihedral

88888 Dihedral

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads