Tabs versus Spaces in Source Code

E

Edward Elliott

William said:
The problem is that tabs take you to the next tab stop, they don't
expand to a fixed number of spaces.

Got it. You're talking about using tabs other than for initial line
indentation on a source file. Yes, then tab expansion is not perfect.
 
E

Edward Elliott

Terry said:
Now, of course, the data I provide is nasty, mean, poorly-formatted
data, abhorable by space-zealots and tab-libertines alike (;-)), but the
point is, unless you have set up your editor to syntax color spaces
and tabs differently, you won't see the difference in the original
editor.

Sure, mixed tabs and spaces were not part of my use case.
 
T

Terry Hancock

Edmond said:
The real issue is, of course, that ASCII is showing its age and we should
probably supplant it with something better. But I know that will never fly,
given the torrents of code, configuration files, and everything else in
ASCII. Even Unicode couldn't put a dent in it, despite the obvious growing
global development efforts. Not sure how many compilers would be able to
handle Unicode source anyway. I suspect the large majority of them would
would choke big time.
I think that was the old conventional wisdom, but it's not so obvious
anymore. UTF-8 is a pretty cool standard. gVim handles it just
fine, Python source allows UTF-8 within string literals, even if it
doesn't like it in identifiers, and IIRC, Unicode is the official standard
for Java files. It continues not to be used so much, but a lot of the
capacity is there.

Also, the 'config files in ASCII' thing is simply not a problem -- ASCII
*is* a full-subset of UTF-8, so an ASCII config file is already a UTF-8
config file.

Personally, I don't think ASCII is nearly as entrenched as you suggest.
I wouldn't be surprised if Unicode/UTF-8 has fully supplanted it inside
of 10 years.

Cheers,
Terry
 
D

Dave Hansen

Yeah - we've got to the repeating ourselves stage.

But that's the problem with this issue: it's really hard to get the
space-indenters to actually think about it and to address what is being
said. Every time it comes up, there's always a few people trying to

Look in the mirror. There is non so blind...
explain why tabs give are a good idea, facing a whole raft of others

The problem is that TABs are a _bad_ idea.
spouting stuff like:
'mixing spaces and tabs is bad so use spaces only'

Mixing TABs and spaces is bad because it means using TABs. ;-)
'tabs are x spaces and I like to use y spaces'

I've not seen that argument. One of us needs to read closer.

Although I have seen the converse used to defend TABs: x spaces is x
spaces, and I like y spaces,
'tabs are bad end of story'

Works for me! ;-)
and these non-arguments are repeated over and over within the same
thread. At times it's like talking to a child - and not a bright one at
that.

These "non-arguments" are your own straw men. Either that, or you
need to work on reading comprehension.
Does it matter? Perhaps not if we can use tools which enable us to
bridge the divide, like indent auto-detection in emacs and vim. I'm
prepared to do that in cases where I have to work with an existing
group of coders uasing spaces.

It matters because not every programmer is willing to put in the time
effort required to learn how to use a sophisticated editor like emacs
or vim well. Or at all.

It matters because in industry you get programmers with a wide range
of skills, and you can't fire everyone who can't tell when there are
spaces in front of a tab character. Often these people have unique
and hard-to-find domain knowledge.
But unfortunately the situation is worse than that: tab indentation
needs to be actively defended. Most of the coding 'style guides' you'll

No, it needs to be stamped out. ;-)
find (including Python's) advocate spaces only. There are plenty of
Hallelujah!

people who would like tabs removed from the language as an acceptable
indentation method - look at the responses to Guido's April Fools blog
entry last year.

I would love to see the TAB character treated as a syntax error. I
have no illusions that's going to happen, though.

FWIW, I would be equally (well, almost, anyway) happy if Python said
that the _only_ place a TAB character could appear was at the
beginning of a line, and that the number of TAB characters _always_
indicated the indentation level (e.g., spaces do _not_ change
indentation level, and all the lines in a multi-line statement had to
be at the same indentation level). This would eliminate most of my
objections to TABs. I have no illusions this will happen either.
Unlikely perhaps. I hope so. It's a cruel irony that Python's creator
didn't appreciate the benefits that tab indentation would bring to his
own language - the only major language in which indentation levels
actually have semantic significance.

The problem with TAB characters is that they look just like the
equivalent number of space characters. This is, of course, their
major feature as well. The problem, especially with Python, is that
mistakes in the placement of TAB characters within a source file can
silently change the _meaning_ of the code.

TAB proponents seem to list one overriding advantage of using TAB
characters for indentation: "I can use my preferred indent level, and
everyone else can use theirs." I find this argument _very_ weak. I've
seen misuse of TABs break code. I've never seen an enforced
indentation level break a programmer.

Regards,
-=Dave
 
E

Edward Elliott

We've finally hit the meta-discussion point. Instead of talking about tabs
and spaces, we're talking about talking about tabs and spaces. Which
frankly is a much more interesting conversation anyway.
Does it matter? Perhaps not if we can use tools which enable us to
bridge the divide, like indent auto-detection in emacs and vim. I'm
prepared to do that in cases where I have to work with an existing
group of coders uasing spaces.

If you ask me, which of course you didn't, indentation is just one small
part of the larger issue of code formatting. Unfortunately it's the only
one that allows some semblance of flexibility. Formatting like brace/paren
placement and inter-operator spacing greatly affect readability but are
hard-coded into the source. And none of this matters a wit to the
semantics of the code.

What really should happen is that every time an editor reads in source code,
the code is reformatted for display according to the user's settings. The
editor becomes a parser, breaking the code down into tokens and emitting it
in a personally preferred format. Comments are left untouched apart from
initial indentation. On output back to a file, the code can be either
written as-is (the next guy's editor will reformat it anyway) or put in
some standard form (for the poor shlubs who code with cat/notepad).

All this becomes completely transparent to the user, who sees every file he
edits in exactly the format he's accustomed to. It's similar to the
various pushes for syntactic code storage formats like abstract syntax
trees or <shudder> xml, but works with the existing infrastructure built
around processing plain text files. Meanwhile LISP has been storing code
in paren-based ASTs since the 50s.

vim and emacs can already do this today. It might not be perfect, but if
people spent half as much time perfecting this as arguing about tabs vs
spaces, we'd all be a lot better off (yes I'm guilty too).

It's a cruel irony that Python's creator
didn't appreciate the benefits that tab indentation would bring to his
own language - the only major language in which indentation levels
actually have semantic significance.

Fate is a cruel mistress. Or maybe just a heartless bitch. Either way,
watch your back.
 
J

Jorge Godoy

achates said:
Jorge Godoy wrote


That sounds like useful behaviour.

Maybe this is an area where modern editors might be able to save us
from ourselves. I'll admit I'm suspicious of relying on editor
functionality - I'm happier if I know I can also use the old-school
methods just in case.. Sometimes adding intelligence to an interface
can be a usability disaster if it makes wrong assumptions about what
you want. But if people are hell-bent on converting tabs to spaces,
maybe it's the best way to accommodate them.

If you don't want the functionality, simply disable it. This is why
configuration files and options exist...

--
Jorge Godoy <[email protected]>

"Quidquid latine dictum sit, altum sonatur."
- Qualquer coisa dita em latim soa profundo.
- Anything said in Latin sounds smart.
 
A

ashesh

If I work on your project, I follow the coding and style standards you
specify.


Likewise if you work on my project you follow the established
standards.


Fortunately for you, I am fairly liberal on such matters.


I like to see 4 spaces for indentation. If you use tabs, that's what I

will see, and you're very likely to have your code reformatted by the
automated build process, when the standard copyright header is pasted
and missing javadoc tags are generated as warnings.


I like the open brace to start on the line of the control keyword. I
can deal with the open brace being on the next line, at the same level
of indentation as the control keyword. I don't quite understand the
motivation behind the GNU style, where the brace itself is treated as a

half-indent, but I can live with it on *your* project.


Any whitespace or other style that isn't happy to be reformatted
automatically is an error anyway.


I'd be very laissez-faire about it except for the fact that code
repositories are much easier to manage if everything is formatted
before
it goes in, or as a compromise, as a step at release tags.


Ashesh..
 
P

PoD

If tabs are easily misunderstood, then they are a MISfeature
and they need to be removed.


"Explicit is better than implicit..."
"In the face of ambiguity, refuse the temptation to guess..."
"Special cases aren't special enough to break the rules..."

Exactly.
How many levels of indentation does 12 spaces indicate?
It could be 1,2,3,4,6 or 12. If you say it's 3 then you are _implying_
that each level is represented by 4 spaces.

How many levels of indentation is 3 tabs? 3 levels in any code that you
will find in the wild.
 
C

Christophe

Carl J. Van Arsdall a écrit :
The converse can also be said, "it's difficult to make sure everyone
uses spaces and not tabs".

I think we've just about beat this discussion to death... nice work
everyone!

No, it's really easy : a simple precoomit hook which will refuse any .py
file with the \t char in it and it's done ;)
 
D

Duncan Booth

PoD said:
How many levels of indentation does 12 spaces indicate?
It could be 1,2,3,4,6 or 12. If you say it's 3 then you are
_implying_ that each level is represented by 4 spaces.

By reading the code I can see how many levels of indentation it
represents.
How many levels of indentation is 3 tabs? 3 levels in any code that
you will find in the wild.

No. That is precisely the problem: there is code in the wild which
contains mixed space and tab indentation, and any time that happens 3
tabs could mean any number of indentations.

Now, I just know someone is going to challenge me over my assertion that
there really could be code with mixed spaces and tabs out there, so here
are a few examples found by grepping a Plone Products folder. All the
projects below use spaces almost everywhere for indentation, but it looks
like a few tabs slipped through.

http://svn.plone.org/view/archetypes/Archetypes/trunk/BaseUnit.py?rev=5111&view=auto

contains tabs at the start of two lines. Fortunately these are
continuation lines so it doesn't really matter how you display them. I
think they are intended to be displayed with tab-size=8.

http://svn.plone.org/view/archetypes/Archetypes/trunk/Storage/__init__.py?rev=4970&view=auto

One tab used for indentation. The block is only one line long so the code
doesn't break whatever tabsize you use, but visually it would appear the
intended tabsize is 0.

http://svn.plone.org/view/plone/CMF...pts/computeRelatedItems.py?rev=9836&view=auto

A tab is used for two levels of indentation. Anything other than tabsize=8
would cause a syntax error.

http://svn.plone.org/view/plone/CMF..._scripts/computeRoleMap.py?rev=9836&view=auto

Lots of tabs, most but not all on continuation lines. The two which aren't
are on single line blocks with a single tab representing two indents.

CMFPlone\tests\testInterfaces.py
CMFPlone\tests\testTranslationServiceTool.py
ExternalEditor (various files)
kupu (spellcheck.py)

and finally, at the end of my Plone Products directory I found this beauty
where I've replaced the tab characters with <tab> to make them visible:

svn://svn.zope.org/repos/main/Zelenium/trunk/scripts/tinyWebServer.py

if __name__ == '__main__':
<tab>port = PORT
<tab>if len(sys.argv) > 1:
<tab> port = int(sys.argv[1])
<tab>
server_address = ('', port)
<tab>httpd = BaseHTTPServer.HTTPServer(server_address, HTTPHandler)

<tab>print "serving at port", port
<tab>print "To run the entire JsUnit test suite, open"
<tab>print " http://localhost:8000/jsunit/testRu...host:8000/tests/JsUnitSuite.html&autoRun=true"
<tab>print "To run the acceptance test suite, open"
<tab>print " http://localhost:8000/TestRunner.html"

<tab>while not HTTPHandler.quitRequestReceived :
<tab>httpd.handle_request()<tab>
<tab>

This is a genuine example of code in the wild which will look like
syntactically valid Python at either tab-size=4 or tab-size=8, but
if you view it at tab-size=4 you will see different block indentation
than the Python interpreter uses at tab-size=8.

At tab-size=4 it reads:

if __name__ == '__main__':
port = PORT
if len(sys.argv) > 1:
port = int(sys.argv[1])

server_address = ('', port)
httpd = BaseHTTPServer.HTTPServer(server_address, HTTPHandler)

print "serving at port", port
print "To run the entire JsUnit test suite, open"
print " http://localhost:8000/jsunit/testRu...host:8000/tests/JsUnitSuite.html&autoRun=true"
print "To run the acceptance test suite, open"
print " http://localhost:8000/TestRunner.html"

while not HTTPHandler.quitRequestReceived :
httpd.handle_request()

but at tab-size=8 it reads:

if __name__ == '__main__':
port = PORT
if len(sys.argv) > 1:
port = int(sys.argv[1])

server_address = ('', port)
httpd = BaseHTTPServer.HTTPServer(server_address, HTTPHandler)

print "serving at port", port
print "To run the entire JsUnit test suite, open"
print " http://localhost:8000/jsunit/testRu...host:8000/tests/JsUnitSuite.html&autoRun=true"
print "To run the acceptance test suite, open"
print " http://localhost:8000/TestRunner.html"

while not HTTPHandler.quitRequestReceived :
httpd.handle_request()

I wouldn't have a problem with tabs if Python rejected mixed indentation by
default, because then none of the code above would execute. But it doesn't.
:(

Anyone got a subversion checkin hook to reject mixed indentation? I think that big
repositories like Zope and Plone could benefit from it.

I just ran the same grep on the Python source tree. Not a tab in sight. :)
 
C

Christophe

PoD a écrit :
Exactly.
How many levels of indentation does 12 spaces indicate?
It could be 1,2,3,4,6 or 12. If you say it's 3 then you are _implying_
that each level is represented by 4 spaces.

Actually, who said you had to always use the same number of spaces to
indent ? 12 = 6 + 6 = 4 + 4 + 4 but also 12 = 2 + 10 = 1 + 1 + 3 + 3 + 4 :D
How many levels of indentation is 3 tabs? 3 levels in any code that you
will find in the wild.

No, it could be 3 levels or 3 tabs per level or 2 tabs for the first
level and 1 tab for the second ...
 
A

achates

Edward said:
What really should happen is that every time an editor reads in source code,
the code is reformatted for display according to the user's settings. The
editor becomes a parser, breaking the code down into tokens and emitting it
in a personally preferred format.

I completely agree, and I guess that is what I was groping towards in
my remarks about using modern editing tools.

At the same time I would be resist any move towards making source files
less huiman-readable. There will still be times when those tools aren't
available (e.g. for people working on embedded s/w or legacy systems),
and that's when having ASCII source with tabbed indentation would be so
useful. But it looks, sadly, like we're fighting a rearguard action on
that one.
 
A

Alain Picard

Bill Pursell said:
In my experience, the people who complain about the use
of tabs for indentation are the people who don't know
how to use their editor, and those people tend to use
emacs.

HA HA HA HA HA HA HA HA HA HA HA HA ....

Tee, hee heee.... snif!

Phew. Better now.

That was funny! Thanks! :)
 
A

achates

Duncan said:
No. That is precisely the problem: there is code in the wild which
contains mixed space and tab indentation...

I wouldn't have a problem with tabs if Python rejected mixed indentation by
default, because then none of the code above would execute.

I think it's great that at least we're all agreed that mixed
indentation is a bad idea in any code, and particularly in Python.

How would people feel about having the -t (or even -tt) behaviour
become the default in future Python releases? A legacy option would
obviously need to be provided for the old default behaviour.
 
P

Pascal Bourguignon

Edmond Dantes said:
It all depends on your editor of choice. Emacs editing of Lisp (and a few
other languages, such as Python) makes the issue more or less moot. I
personally would recommend choosing one editor to use with all your
projects, and Emacs is wonderful in that it has been ported to just about
every platform imaginable.

The real issue is, of course, that ASCII is showing its age and we should
probably supplant it with something better. But I know that will never fly,
given the torrents of code, configuration files, and everything else in
ASCII. Even Unicode couldn't put a dent in it, despite the obvious growing
global development efforts. Not sure how many compilers would be able to
handle Unicode source anyway. I suspect the large majority of them would
would choke big time.

All right unicode support is not 100% perfect already, but my main
compilers support it perfectly well, only 1/5 don't support it, and
1/5 support it partially:

------(unicode-script.lisp)---------------------------------------------

(defun clisp (file)
(ext:run-program "/usr/local/bin/clisp"
:arguments (list "-ansi" "-norc" "-on-error" "exit"
"-E" "utf-8"
"-i" file "-x" "(ext:quit)")
:input nil :eek:utput :terminal :wait t))

(defun gcl (file)
(ext:run-program "/usr/local/bin/gcl"
:arguments (list "-batch"
"-load" file "-eval" "(lisp:quit)")
:input nil :eek:utput :terminal :wait t))

(defun ecl (file)
(ext:run-program "/usr/local/bin/ecl"
:arguments (list "-norc"
"-load" file "-eval" "(si:quit)")
:input nil :eek:utput :terminal :wait t))

(defun sbcl (file)
(ext:run-program "/usr/local/bin/sbcl"
:arguments (list "--userinit" "/dev/null"
"--load" file "--eval" "(sb-ext:quit)")
:input nil :eek:utput :terminal :wait t))

(defun cmucl (file)
(ext:run-program "/usr/local/bin/cmucl"
:arguments (list "-noinit"
"-load" file "-eval" "(extensions:quit)")
:input nil :eek:utput :terminal :wait t))


(dolist (implementation '(clisp gcl ecl sbcl cmucl))
(sleep 3)
(terpri) (print implementation) (terpri)
(funcall implementation "unicode-source.lisp"))

------(unicode-source.lisp)---------------------------------------------
;; -*- coding: utf-8 -*-

(eval-when :)compile-toplevel :load-toplevel :execute)
(format t "~2%~A ~A~2%"
(lisp-implementation-type)
(lisp-implementation-version))
(finish-output))


(defun ιοτα (&key (номер 10) (단계 1) (בכוכ 0))
(loop :for i :from בכוכ :to номер :by 단계 :collect i))


(defun test ()
(format t "~%Calling ~S --> ~A~%"
'(ιοτα :номер 10 :단계 2 :בכוכ 2)
(ιοτα :номер 10 :단계 2 :בכוכ 2)))

(test)

------------------------------------------------------------------------

(load"unicode-script.lisp")
;; Loading file unicode-script.lisp ...

CLISP
i i i i i i i ooooo o ooooooo ooooo ooooo
I I I I I I I 8 8 8 8 8 o 8 8
I \ `+' / I 8 8 8 8 8 8
\ `-+-' / 8 8 8 ooooo 8oooo
`-__|__-' 8 8 8 8 8
| 8 o 8 8 o 8 8
------+------ ooooo 8oooooo ooo8ooo ooooo 8

Copyright (c) Bruno Haible, Michael Stoll 1992, 1993
Copyright (c) Bruno Haible, Marcus Daniels 1994-1997
Copyright (c) Bruno Haible, Pierpaolo Bernardi, Sam Steingold 1998
Copyright (c) Bruno Haible, Sam Steingold 1999-2000
Copyright (c) Sam Steingold, Bruno Haible 2001-2006

;; Loading file unicode-source.lisp ...

CLISP 2.38 (2006-01-24) (built 3347193361) (memory 3347193794)


Calling (ΙΟΤΑ :ÐОМЕР 10 :단계 2 :בכוכ 2) --> (2 4 6 8 10)
;; Loaded file unicode-source.lisp
Bye.


GCL


GNU Common Lisp (GCL) GCL 2.6.7


Calling (ιοτα :номер 10 :단계 2 :בכוכ 2) --> (2 4 6 8
10)


ECL
;;; Loading "unicode-source.lisp"


ECL 0.9g


Calling (ιοτα :номер 10 :단계 2 :בכוכ 2) --> (2 4 6 8 10)


SBCL
This is SBCL 0.9.12, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.


SBCL 0.9.12


Calling (|ιοτα| :|номер| 10 :|ˋ¨ʳ„| 2 :|בכוכ| 2) --> (2 4 6 8 10)


CMUCL
; Loading #P"/local/users/pjb/src/lisp/encours/unicode-source.lisp".


CMU Common Lisp 19c (19C)


Reader error at 214 on #<Stream for file "/local/users/pjb/src/lisp/encours/unicode-source.lisp">:
Undefined read-macro character #\Î
[Condition of type READER-ERROR]

Restarts:
0: [CONTINUE] Return NIL from load of "unicode-source.lisp".
1: [ABORT ] Skip remaining initializations.

Debug (type H for help)

(LISP::%READER-ERROR
#<Stream for file "/local/users/pjb/src/lisp/encours/unicode-source.lisp">
"Undefined read-macro character ~S"
#\Î)
Source: Error finding source:
Error in function DEBUG::GET-FILE-TOP-LEVEL-FORM: Source file no longer exists:
target:code/reader.lisp.
0] abort
*
Received EOF on *standard-input*, switching to *terminal-io*.
* (extensions:quit)
;; Loaded file unicode-script.lisp
T
[4]>
 
J

Jonathon McKitrick

Pascal said:
(defun ιοτα (&key (номер 10) (단계 1) (בכוכ 0))
(loop :for i :from בכוכ :to номер :by 단계 :collect i))

How do you even *enter* these characters? My browser seems to trap all
the special character combinations, and I *know* you don't mean
selecting from a character palette.

à¿¿ hey, this is weird...

î

I've got something happening, but I can't tell what.

Yes, I'm an ignorant Western world ASCII user. :)
 
E

Edward Elliott

Christophe said:
No, it's really easy : a simple precoomit hook which will refuse any .py
file with the \t char in it and it's done ;)

$ echo \t
t

Why would you wan_ _o remove all _ee charac_ers? Isn'_ _ha_ a li__le
awkward?
 
P

Pascal Bourguignon

Jonathon McKitrick said:
How do you even *enter* these characters? My browser seems to trap all
the special character combinations, and I *know* you don't mean
selecting from a character palette.

Why? Of course!
Aren't you either an emacs or a Mac user?

On a Mac, you just select the input keyboad from the Input menu (the
little flag on the right of the menubar, you may activate it from the
International System Preference panel).

On emacs, it's as simple: M-x set-input-method RET

I've bound C-F9, C-F10, C-F11, and C-F12 to various input methods:

(global-set-key [C-f9] (lambda()(interactive)(set-input-method 'chinese-py-b5)))
(global-set-key [C-f10] (lambda()(interactive)(set-input-method 'cyrillic-yawerty)))
(global-set-key [C-f11] (lambda()(interactive)(set-input-method 'greek)))
(global-set-key [C-f12] (lambda()(interactive)(set-input-method 'hebrew)))

C-\ is bound to toggle-input-method which allows to revert back to the
usual input method.

For the alphabetic scripts, there's no difficulty, it's like with
roman scripts: each key is a character. For ideographic scripts, the
input methods are more sophisticated.

Then, you have to learn some of these strange languages. I learned
several (but I forgot everything but: לודג גד דג ינד, здраÑтвуйте, Ñ
люблю тибе, 我 è½é¾, 我 ä¸ ä¸­å›½äºº). For the Korean, I copy-and-pasted
it from some web translation service. But keying them in is the
easiest part.
 
O

Oliver Bandel

Jonathon said:
How do you even *enter* these characters? My browser seems to trap all
the special character combinations, and I *know* you don't mean
selecting from a character palette.

Didn't you heard of that big keyboards?

12 meter x 2 meter wide I think.... you need a long
stick (maybe if you play golf, that can help).

The you have all UTF-8 characters there, that's fine,
but typing needs some time.
But it's good, because when ready with typing your email,
it's not necessary to go to sports after work. So your boss
can insist that you longer stay at work.


Ciao,
Oliver

;-)
 
P

PoD

By reading the code I can see how many levels of indentation it
represents.


No. That is precisely the problem: there is code in the wild which
contains mixed space and tab indentation, and any time that happens 3
tabs could mean any number of indentations.

I think it is universally accepted that mixed tabs and spaces is indeed
**EVIL**

I should have said any code using tabs exclusively.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top