Tabs versus Spaces in Source Code

Discussion in 'Python' started by Xah Lee, May 15, 2006.

  1. Xah Lee

    Xah Lee Guest

    Tabs versus Spaces in Source Code

    Xah Lee, 2006-05-13

    In coding a computer program, there's often the choices of tabs or
    spaces for code indentation. There is a large amount of confusion about
    which is better. It has become what's known as “religious war†—
    a heated fight over trivia. In this essay, i like to explain what is
    the situation behind it, and which is proper.

    Simply put, tabs is proper, and spaces are improper. Why? This may seem
    ridiculously simple given the de facto ball of confusion: the semantics
    of tabs is what indenting is about, while, using spaces to align code
    is a hack.

    Now, tech geekers may object this simple conclusion because they itch
    to drivel about different editors and so on. The alleged problem
    created by tabs as seen by the industry coders are caused by two
    things: (1) tech geeker's sloppiness and lack of critical thinking
    which lead them to not understanding the semantic purposes of tab and
    space characters. (2) Due to the first reason, they have created and
    propagated a massive none-understanding and mis-use, to the degree that
    many tools (e.g. vi) does not deal with tabs well and using spaces to
    align code has become widely practiced, so that in the end spaces seem
    to be actually better by popularity and seeming simplicity.

    In short, this is a phenomenon of misunderstanding begetting a snowball
    of misunderstanding, such that it created a cultural milieu to embrace
    this malpractice and kick what is true or proper. Situations like this
    happens a lot in unix. For one non-unix example, is the file name's
    suffix known as “extensionâ€, where the code of file's type became
    part of the file name. (e.g. “.txtâ€, “.htmlâ€, “.jpgâ€).
    Another well-known example is HTML practices in the industry, where
    badly designed tags from corporation's competitive greed, and stupid
    coding and misunderstanding by coders and their tools are so
    wide-spread such that they force the correct way to the side by the
    eventual standardization caused by sheer quantity of inproper but set
    practice.

    Now, tech geekers may still object, that using tabs requires the
    editors to set their positions, and plain files don't carry that
    information. This is a good question, and the solution is to advance
    the sciences such that your source code in some way embed such
    information. This would be progress. However, this is never thought of
    because the “unix philosophies†already conditioned people to hack
    and be shallow. In this case, many will simply use the character
    intended to separate words for the purpose of indentation or alignment,
    and spread the practice with militant drivels.

    Now, given the already messed up situation of the tabs vs spaces by the
    unixers and unix brain-washing of the coders in the industry... Which
    should we use today? I do not have a good proposition, other than just
    use whichever that works for you but put more critical thinking into
    things to prevent mishaps like this.

    Tabs vs Spaces can be thought of as parameters vs hard-coded values, or
    HTML vs ascii format, or XML/CSS vs HTML 4, or structural vs visual, or
    semantic vs format. In these, it is always easy to convert from the
    former to the latter, but near impossible from the latter to the
    former. And, that is because the former encodes information that is
    lost in the latter. If we look at the issue of tabs vs spaces, indeed,
    it is easy to convert tabs to spaces in a source code, but more
    difficult to convert from spaces to tabs. Because, tabs as indentation
    actually contains the semantic information about indentation. With
    spaces, this critical information is lost in space.

    This issue is intimately related to another issue in source code:
    soft-wrapped lines versus physical, hard-wrapped lines by EOL (end of
    line character). This issue has far more consequences than tabs vs
    spaces, and the unixer's unthinking has made far-reaching damages in
    the computing industry. Due to unix's EOL ways of thinking, it has
    created languages based on EOL (just about ALL languages except the
    Lisp family and Mathematica) and tools based on EOL (cvs, diff, grep,
    and basically every tool in unix), thoughts based on EOL (software
    value estimation by counting EOL, hard-coded email quoting system by
    “>†prefix, and silent line-truncations in many unix tools), such
    that any progress or development towards a “algorithmic code unitâ€
    concept or language syntaxes are suppressed. I have not written a full
    account on this issue, but i've touched it in this essay: “The Harm
    of hard-wrapping Linesâ€, at
    http://xahlee.org/UnixResource_dir/writ/hard-wrap.html
    ----
    This post is archived at:
    http://xahlee.org/UnixResource_dir/writ/tabs_vs_spaces.html

    Xah

    ∑ http://xahlee.org/
     
    Xah Lee, May 15, 2006
    #1
    1. Advertisements

  2. Xah Lee

    Eli Gottlieb Guest

    Actually, spaces are better for indenting code. The exact amount of
    space taken up by one space character will always (or at least tend to
    be) the same, while every combination of keyboard driver, operating
    system, text editor, content/file format, and character encoding all
    change precisely what the tab key does.

    There's no use in typing "tab" for indentation when my text editor will
    simply convert it to three spaces, or worse, autoindent and mix tabs
    with spaces so that I have no idea how many actual whitespace characters
    of what kinds are really taking up all that whitespace. I admit it
    doesn't usually matter, but then you go back to try and make your code
    prettier and find yourself asking "WTF?"

    Undoubtedly adding the second spark to the holy war,
    Eli
     
    Eli Gottlieb, May 15, 2006
    #2
    1. Advertisements

  3. Xah Lee

    Bryan Guest

    i agree, tabs is proper and i always use the tab key to indent... it puts in 4
    spaces.
     
    Bryan, May 15, 2006
    #3
  4. What you see as tabs' weakness is their strength. They encode '1 level of
    indentation', not a fixed width. Of course tabs are rendered differently
    by different editors -- that's the point. If you like indentation to be 2
    or 3 or 7 chars wide, you can view your preference without forcing it on
    the rest of the world. It's a logical rather than a fixed encoding.

    Sounds like the problem is your editor, not tabs. But I wouldn't rule out
    PEBCAK either. ;)

    Undoubtedly. Let's keep it civil, shall we? And please limit the
    cross-posting to a minimum. (directed at the group, not you personally
    Eli).
     
    Edward Elliott, May 15, 2006
    #4
  5. Personally, I don't think it matters whether you use tabs or spaces for
    code indentation. As long as you are consistent and do not mix the two.
     
    John McMonagle, May 15, 2006
    #5
  6. Spaces work better. Hitting the TAB key in my Emacs will auto-indent
    the current line. Only spaces will be used for fill. The worst thing
    you can do is mix the two regardless of how you feel about tab vs
    space.

    The next step in evil is to give tab actual significance like in
    make.

    Xah Lee is getting better at trolling. He might fill up Google's
    storage.
     
    David Steuber, May 15, 2006
    #6
  7. Xah Lee

    jmcgill Guest

    If I work on your project, I follow the coding and style standards you
    specify.

    Likewise if you work on my project you follow the established standards.

    Fortunately for you, I am fairly liberal on such matters.

    I like to see 4 spaces for indentation. If you use tabs, that's what I
    will see, and you're very likely to have your code reformatted by the
    automated build process, when the standard copyright header is pasted
    and missing javadoc tags are generated as warnings.

    I like the open brace to start on the line of the control keyword. I
    can deal with the open brace being on the next line, at the same level
    of indentation as the control keyword. I don't quite understand the
    motivation behind the GNU style, where the brace itself is treated as a
    half-indent, but I can live with it on *your* project.

    Any whitespace or other style that isn't happy to be reformatted
    automatically is an error anyway.

    I'd be very laissez-faire about it except for the fact that code
    repositories are much easier to manage if everything is formatted before
    it goes in, or as a compromise, as a step at release tags.
     
    jmcgill, May 15, 2006
    #7
  8. Xah Lee

    Harry George Guest

    [snip]

    This has been discussed repeatedly, and the answer is "If you only
    work alone, never use anyone else's code and no one ever uses your
    codes, then do as you please. Otherwise use tab-is-4-spaces."

    When you do Agile Programming with people using emacs, vim, nedit,
    xedit, wordpad, eclipse, and who knows what else, the 4-spaces rule is
    necessary for survival.

    The reason is simple: People get confused, and accidentally get the
    wrong tab indents when they move among editors or among settings on
    the same editor. In most languages this is an irritation, requiring
    some cleanup. In Python it is a disaster requiring re-inventing the
    coded algorithms.
     
    Harry George, May 15, 2006
    #8
  9. Xah Lee

    mystilleef Guest

    I agree, use tabs.
     
    mystilleef, May 15, 2006
    #9
  10. Xah Lee

    Mumia W. Guest

    Thanks Xah. I value your posts. Keep posting. And since your posts
    usually cover broad areas of CS, keep crossposting. Don't go anywhere
    Xah :)

    I wouldn't say that spaces are a hack, but tabs are superior.
    Don't forget the laziness of programmers like me who don't put the
    tabbing information in the source file. Vim deals with tabs well IMO,
    but I almost never used to put the right auto-commands in the file to
    get it set up right for other users.
    Vim does this. We just have to use it.
    Nope. Conversion is relatively easy. I've written programs to do this
    myself, and everyone and his brother has also done this. Virtually every
    programmer's editor that I've ever used can do this, and a great, great
    many independent programs convert tabs to spaces. It's like saying,
    "it's near impossible to write a calculator program." :)

    I bet that someone has a Perl one-liner to do it.

    On any Debian system, try a "man expand" and see what you find. Also,
    emacs and vim do it. Perl has a Text::Tabs module. TCL's
    ::textutil::(un)?tabify routines do it. The birds do it, and the bees do
    it. Oh wait, that's something else :)
    Nope again. It's easy, you just keep track of the virtual character
    position as you decide whether to write a space or a tab. Computers do
    the "counting" thing fairly well.
    I've never thought of tabs-vs-spaces as a religious war. Anyway, the
    authority of the programming environment will determine which one is
    used. Have a good week Xah.
     
    Mumia W., May 15, 2006
    #10
  11. Xah Lee

    Peter Decker Guest

    Funny, I was going to say that the problem is when the author prefers
    a font with a differntly-sized space. Some of us got rid of editing in
    fixed-width fonts when we left Fortran.
     
    Peter Decker, May 15, 2006
    #11
  12. The answer is "Do what works best for your project". Smart people can agree
    on and use whatever convention they want without trouble. The key is
    consistency.
    Tab is not 4 spaces. Tab is 1 level of indentation. The confusion that
    tabs equals some fixed width, or can/should be set to some fixed width, is
    the entire problem hampering their use. It implies that conversion between
    tabs and spaces is straightforward when it is not. They are not comparable
    entities.
    IOW reward programmers for being sloppy and using poor tools. Anyone who
    programs in wordpad/xedit has far bigger problems than worrying about tabs
    vs spaces (as do projects who let people program in wordpad/xedit).
    Editors which are designed for programming handle tabs and spaces cleanly.
    Sounds like PEBCAK to me. :) If everyone uses tabs for indent, then it
    doesn't matter if Joe's vim showed them as 3 spaces while Mary's emacs
    showed them at 6. You can't get the 'wrong tab indents' when everything is
    a tab indent. Mixing tabs and spaces is where you get problems.
    Use the -tt flag to the Python interpreter and you'll catch inconsistencies
    immediately. Voila, no disaster. Again, you're not complaining about
    using tabs, you're complaining about mixing tabs and spaces. I completely
    agree with you that the latter is way too much hassle to even attempt.

    All I'm saying is that using tabs on their own is perfectly viable and a bit
    cleaner. I'm not trying to force that preference on anyone else, just get
    everyone to recognize that one is just as rational and salubrious as the
    other.
     
    Edward Elliott, May 15, 2006
    #12
  13. The problem with tabs is that people use tabs for alignment e.g.

    def foo():
    ->query = """SELECT *
    -> -> -> FROM sometable
    -> -> -> WHERE condition"""

    Now I change my editor to use 8-space tabs and the code is all messed
    up. Of course, a very disciplined group of people could be trained to
    never use tabs except to align with the current block level but, in
    practice, that doesn't work. Therefore tabs are bad.

    Cheers,
    Brian
     
    Brian Quinlan, May 15, 2006
    #13
  14. Xah Lee

    Peter Decker Guest

    And those of us who hate cutesy alignment like that think that people
    who do it are therefore bad.

    Spaces look like crap, too, when using proportional fonts.
     
    Peter Decker, May 15, 2006
    #14
  15. Xah Lee

    gregarican Guest

    Don't know what all of the hub-bub here is regarding tab/space
    indentation. My punched cards break down such matters as in quite fine
    fashion :-/
     
    gregarican, May 15, 2006
    #15
  16. Sure it's a problem. When programmers do bad things, what is your response?
    Slap his nose and say 'Bad, don't do that'? Or take away his toys so he
    can't do that? Therein lies your answer to tabs or spaces. Both are
    rational.
    Consistency is always hard for people and easy for machines. If you make it
    a machine task instead of a people task, it can easily work in practice,
    just like Python easily enforces no mixing of tabs and spaces with the -tt
    flag. Get editors that use different (background) colors for tabs and
    spaces so you can easily pick them out even when the above alignment
    problem isn't noticeable. Use editors that automatically pick the right
    character when indenting: tabs to the level of indentation of the current
    block, then spaces afterward for line continuations. Run the code through
    parsers that detect and flag inconsistencies on check-in.

    If such tools are lacking, use substitutes in the meantime. Don't allow any
    code to be checked in where a line departs more than one tab indentation
    level from its neighbors. It's not perfect, but it eliminates the worst
    offenses. Good enough often is.

    Not saying you should do this, just pointing out how tabs are viable.
     
    Edward Elliott, May 15, 2006
    #16
  17. Xah Lee

    Peter Decker Guest

    I've recently adopted the Dabo convention of using two indent levels
    to indicate continued lines, which helps distinguish such lines from
    the indented blocks below them. But other than that case, one tab it
    is!
     
    Peter Decker, May 15, 2006
    #17
  18. Xah Lee

    achates Guest

    People certainly do get confused. I'm always amazed that so many
    people, even amongst those who manage to make a living out of writing
    software, are unable to grasp the concept of a tab. But then a lot of
    code is badly written, so maybe it figures.

    A tab is not equivalent to a number of spaces. It is a character
    signifying an indent, just like the newline character signifies the end
    of a line. If your editor automatically converts tabs to spaces (i.e.
    you are unable to create source files containing tabs) then either it's
    broken or you have misconfigured it. Either way you probably shouldn't
    be using it to write code.
     
    achates, May 16, 2006
    #18
  19. Xah Lee

    Duncan Booth Guest

    That is true so far as it goes, but equally if your editor inserts a tab
    character when you press the tab key it is as broken as though it inserted
    a backspace character when you press the backspace key. In both of these
    cases you have an operation (move to next tabstop, move back one space) and
    an ascii control character which is intended to reflect that operation when
    rendering the file to an output device.

    An editor should be capable of letting you create or modify files
    containing control characters without gratuitously corrupting them, but the
    keys should perform the expected operations not insert the characters.
     
    Duncan Booth, May 16, 2006
    #19
  20. Duncan Booth enlightened us with:
    That all depends on the setting of your editor. After all, a TAB
    character could be the proper control character for the operation
    'move to the next tabstop'.
    I agree with that.
    But not with that, since it is contradicting. "Inserting the
    characters" could very well be the same as "performing the expected
    operations".

    Sybren
     
    Sybren Stuvel, May 16, 2006
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.