textwrap.dedent replaces tabs?

T

Tom Plunket

The documentation for dedent says, "Remove any whitespace than can be
uniformly removed from the left of every line in `text`", yet I'm
finding that it's also modifying the '\t' characters, which is highly
undesirable in my application. Is there any way to stop it from doing
this, or alternatively, to put those tabs back in?

I see that TextWrap and its member functions take an 'expand_tabs'
kwarg, but dedent unfortunately does not.

....I suppose such a function (replacement dedent) isn't terribly tough
to write, but it seems an odd default especially considering there's no
way to turn off the undesired changes, but were the changes /not/ made,
the same text could just be passed through TextWrapper and have them
removed...

thx,
-tom!

--
 
C

CakeProphet

Hmmm... a quick fix might be to temporarily replace all tab characters
with another, relatively unused control character.

MyString = MyString.replace("\t", chr(1))
MyString = textwrap.dedent(MyString)
MyString = MyString.replace(chr(1), "\t")

Of course... this isn't exactly safe, but it's not going to be fatal,
if it does mess something up. As long as you don't expect receiving any
ASCII 1 characters.
 
T

Tom Plunket

CakeProphet said:
Hmmm... a quick fix might be to temporarily replace all tab characters
with another, relatively unused control character.

MyString = MyString.replace("\t", chr(1))
MyString = textwrap.dedent(MyString)
MyString = MyString.replace(chr(1), "\t")

Of course... this isn't exactly safe, but it's not going to be fatal,
if it does mess something up. As long as you don't expect receiving any
ASCII 1 characters.

Well, there is that small problem that there are leading tabs that I
want stripped. I guess I could manually replace all tabs with eight
spaces (as opposed to 'correct' tab stops), and then replace them when
done, but it's probably just as easy to write a non-destructive dedent.

It's not that I don't understand /why/ it does it; indeed I'm sure it
does this so you can mix tabs and spaces in Python source. Why anyone
would intentionally do that, though, I'm not sure. ;)

-tom!

--
 
P

Peter Otten

Tom said:
I guess I could manually replace all tabs with eight
spaces (as opposed to 'correct' tab stops), and then replace them when
done, but it's probably just as easy to write a non-destructive dedent.

You mean, as easy as
' alpha beta '

?

Peter
 
F

Frederic Rentsch

Tom said:
CakeProphet wrote:



Well, there is that small problem that there are leading tabs that I
want stripped. I guess I could manually replace all tabs with eight
spaces (as opposed to 'correct' tab stops), and then replace them when
done, but it's probably just as easy to write a non-destructive dedent.

It's not that I don't understand /why/ it does it; indeed I'm sure it
does this so you can mix tabs and spaces in Python source. Why anyone
would intentionally do that, though, I'm not sure. ;)

-tom!
This should do the trick:
No indent
Three space indent
\tOne tab indent
\t\tThree space, two tab indent
\t \tOne tab, two space, one tab indent with two tabs here >\t\t<'''
print s
# Dedent demo
No indent
Three space indent
One tab indent
Three space - two tab indent
One tab - two spaces - one tab indent with two tabs here >
<

# Dedent demo
No indent
Three space indent
One tab indent
Three space - two tab indent
One tab - two spaces - one tab indent with two tabs here > <

-----------------------------------------------------------------------
 
T

Tom Plunket

Peter said:
You mean, as easy as

' alpha beta '

?

Umm, no, that's why I wrote "eight spaces (as opposed to 'correct' tab
stops)".

In either case, though, it'd be hard to know how to replace the tabs, so
it'd be better not to remove them in the first place. Indeed, dedent()
would work perfectly for my cases if it simply didn't do the
expandtabs() in the first place, but there'd be some pretty glaring
holes in its logic then.

I've since written up a fairly reasonable (IMO) replacement though, that
started with textwrap.dedent(), removed the expandtabs() call, and
Does The Right Thing with tabs vs. spaces (e.g. it doesn't treat a tab
at the beginning of a line as the same as eight spaces).


-tom!

--
 
T

Tom Plunket

Frederic said:
This should do the trick:

The fact that this doesn't do what dedent() does makes it not useful.
Stripping all leading spaces from text is as easy as calling lstrip() on
each line:

text = '\n'.join([line.lstrip() for line in text.split('\n')])

alas, that isn't what I am looking for, nor is that what
textwrap.dedent() is intended to do.

-tom!

--
 
F

Frederic Rentsch

Tom said:
Frederic Rentsch wrote:



The fact that this doesn't do what dedent() does makes it not useful.
Stripping all leading spaces from text is as easy as calling lstrip() on
each line:

My goodness! How right your are.
text = '\n'.join([line.lstrip() for line in text.split('\n')])

alas, that isn't what I am looking for, nor is that what
textwrap.dedent() is intended to do.

-tom!
Following a call to dedent () it shouldn't be hard to translate leading
groups of so many spaces back to tabs. But this is probably not what you
want. If I understand your problem, you want to restore the dedented
line to its original composition if spaces and tabs are mixed and this
doesn't work because the information doesn't survive dedent (). Could
the information perhaps be passed around dedent ()? Like this: make a
copy of your lines and translate the copy's tabs to so many (8?) marker
bytes (e.g. ascii 0). Dedent the originals. Left-strip each of the
marked line copies to the length of its dedented original and translate
the marked groups back to tabs.

Frederic
 
T

Tom Plunket

Frederic said:
Following a call to dedent () it shouldn't be hard to translate leading
groups of so many spaces back to tabs.

Sure, but the point is more that I don't think it's valid to change to
tabs in the first place.

E.g.:

input = ' ' + '\t' + 'hello\n' +
'\t' + 'world'

output = textwrap.dedent(input)

will yield all of the leading whitespace stripped, which IMHO is a
violation of its stated function. In this case, nothing should be
stripped, because the leading whitespace in these two lines does not
/actually/ match. Sure, it visually matches, but that's not the point
(although I can understand that that's a point of contention in the
interpreter anyway, I would have no problem with it not accepting "1 tab
= 8 spaces" for indentation... But that's another holy war.
If I understand your problem, you want to restore the dedented line to
its original composition if spaces and tabs are mixed and this doesn't
work because the information doesn't survive dedent ().

Sure, although would there be a case to be made to simply not strip the
tabs in the first place?

Like this, keeping current functionality and everything... (although I
would think if someone wanted tabs expanded, they'd call expandtabs on
the input before calling the function!):

def dedent(text, expand_tabs=True):
"""dedent(text : string, expand_tabs : bool) -> string

Remove any whitespace than can be uniformly removed from the left
of every line in `text`, optionally expanding tabs before altering
the text.

This can be used e.g. to make triple-quoted strings line up with
the left edge of screen/whatever, while still presenting it in the
source code in indented form.

For example:

def test():
# end first line with \ to avoid the empty line!
s = '''\
hello
\t world
'''
print repr(s) # prints ' hello\n \t world\n '
print repr(dedent(s)) # prints ' hello\n\t world\n'
"""
if expand_tabs:
text = text.expandtabs()
lines = text.split('\n')

margin = None
for line in lines:
if margin is None:
content = line.lstrip()
if not content:
continue
indent = len(line) - len(content)
margin = line[:indent]
elif not line.startswith(margin):
if len(line) < len(margin):
content = line.lstrip()
if not content:
continue
while not line.startswith(margin):
margin = margin[:-1]

if margin is not None and len(margin) > 0:
margin = len(margin)
for i in range(len(lines)):
lines = lines[margin:]

return '\n'.join(lines)

import unittest

class DedentTest(unittest.TestCase):
def testBasicWithSpaces(self):
input = "\n Hello\n World"
expected = "\nHello\n World"
self.failUnlessEqual(expected, dedent(input))

def testBasicWithTabLeadersSpacesInside(self):
input = "\n\tHello\n\t World"
expected = "\nHello\n World"
self.failUnlessEqual(expected, dedent(input, False))

def testAllTabs(self):
input = "\t\tHello\n\tWorld"
expected = "\tHello\nWorld"
self.failUnlessEqual(expected, dedent(input, False))

def testFirstLineNotIndented(self):
input = "Hello\n\tWorld"
expected = input
self.failUnlessEqual(expected, dedent(input, False))

def testMixedTabsAndSpaces(self):
input = " \t Hello\n \tWorld"
expected = "\t Hello\n \tWorld"
self.failUnlessEqual(expected, dedent(input, False))

if __name__ == '__main__':
unittest.main()
-tom!

--
 
F

Frederic Rentsch

Tom said:
Frederic Rentsch wrote:

Following a call to dedent () it shouldn't be hard to translate leading
groups of so many spaces back to tabs.

Sure, but the point is more that I don't think it's valid to change to
tabs in the first place.

E.g.:

input = ' ' + '\t' + 'hello\n' +
'\t' + 'world'

output = textwrap.dedent(input)

will yield all of the leading whitespace stripped, which IMHO is a
violation of its stated function. In this case, nothing should be
stripped, because the leading whitespace in these two lines does not
/actually/ match. Sure, it visually matches, but that's not the point
(although I can understand that that's a point of contention in the
interpreter anyway, I would have no problem with it not accepting "1 tab
= 8 spaces" for indentation... But that's another holy war.

If I understand your problem, you want to restore the dedented line to
its original composition if spaces and tabs are mixed and this doesn't
work because the information doesn't survive dedent ().

Sure, although would there be a case to be made to simply not strip the
tabs in the first place?

Like this, keeping current functionality and everything... (although I
would think if someone wanted tabs expanded, they'd call expandtabs on
the input before calling the function!):

def dedent(text, expand_tabs=True):
"""dedent(text : string, expand_tabs : bool) -> string

Remove any whitespace than can be uniformly removed from the left
of every line in `text`, optionally expanding tabs before altering
the text.

This can be used e.g. to make triple-quoted strings line up with
the left edge of screen/whatever, while still presenting it in the
source code in indented form.

For example:

def test():
# end first line with \ to avoid the empty line!
s = '''\
hello
\t world
'''
print repr(s) # prints ' hello\n \t world\n '
print repr(dedent(s)) # prints ' hello\n\t world\n'
"""
if expand_tabs:
text = text.expandtabs()
lines = text.split('\n')

margin = None
for line in lines:
if margin is None:
content = line.lstrip()
if not content:
continue
indent = len(line) - len(content)
margin = line[:indent]
elif not line.startswith(margin):
if len(line) < len(margin):
content = line.lstrip()
if not content:
continue
while not line.startswith(margin):
margin = margin[:-1]

if margin is not None and len(margin) > 0:
margin = len(margin)
for i in range(len(lines)):
lines = lines[margin:]

return '\n'.join(lines)

import unittest

class DedentTest(unittest.TestCase):
def testBasicWithSpaces(self):
input = "\n Hello\n World"
expected = "\nHello\n World"
self.failUnlessEqual(expected, dedent(input))

def testBasicWithTabLeadersSpacesInside(self):
input = "\n\tHello\n\t World"
expected = "\nHello\n World"
self.failUnlessEqual(expected, dedent(input, False))

def testAllTabs(self):
input = "\t\tHello\n\tWorld"
expected = "\tHello\nWorld"
self.failUnlessEqual(expected, dedent(input, False))

def testFirstLineNotIndented(self):
input = "Hello\n\tWorld"
expected = input
self.failUnlessEqual(expected, dedent(input, False))

def testMixedTabsAndSpaces(self):
input = " \t Hello\n \tWorld"
expected = "\t Hello\n \tWorld"
self.failUnlessEqual(expected, dedent(input, False))

if __name__ == '__main__':
unittest.main()
-tom!

It this works, good for you. I can't say I understand your objective.
(You dedent common leading tabs, except if preceded by common leading
spaces (?)). Neither do I understand the existence of indentations made
up of tabs mixed with spaces, but that is another topic.
I have been wasting a lot of time with things of this nature coding
away before forming a clear conception in my mind of what my code was
supposed to accomplish. Sounds stupid. But many problems seem trivial
enough at first sight to create the illusion of perfect understanding.
The encounter with the devil in the details can be put off but not
avoided. Best to get it over with from the start and write an exhaustive
formal description of the problem. Follows an exhaustive formal
description of the rules for its solution. The rules can then be morphed
into code in a straightforward manner. In other words, coding should be
the translation of a logical system into a language a machine
understands. It should not be the construction of the logical system.
This, anyway, is the conclusion I have arrived at, to my advantage I
believe.

Frederic
 
T

Tom Plunket

Frederic said:
It this works, good for you. I can't say I understand your objective.
(You dedent common leading tabs, except if preceded by common leading
spaces (?)).

I dedent common leading whitespace, and tabs aren't equivalent to
spaces.

E.g. if some text is indented exclusively with tabs, then the leading
tabs are stripped appropriately. If some other text is indented with
common leading spaces, those are stripped appropriately. If the text to
be stripped has some lines starting with spaces and others starting with
tabs, there are no /common/ leading whitespace characters, and thus
nothing is stripped.
Neither do I understand the existence of indentations made up of tabs
mixed with spaces, but that is another topic.

At one point it was a fairly common cry in the How To Indent Python
discussions. Maybe that cry has faded.
I have been wasting a lot of time with things of this nature coding
away before forming a clear conception in my mind of what my code was
supposed to accomplish. Sounds stupid.

Doesn't sound stupid, but there are in fact some fairly straight forward
methods that can be put in place to alleviate that problem.
The encounter with the devil in the details can be put off but not
avoided. Best to get it over with from the start and write an exhaustive
formal description of the problem. Follows an exhaustive formal
description of the rules for its solution.

Good lord, that's an amazingly 1970s way to look at programming! Modern
software engineering practices have in some ways made these problems go
away.
In other words, coding should be the translation of a logical system
into a language a machine understands. It should not be the construction
of the logical system. This, anyway, is the conclusion I have arrived
at, to my advantage I believe.

To each their own, eh? I've been doing this a long time and have found
that it is by far superior (for me) to refine the logical system as it
is being implemented, as long as the business rules are encoded in such
a way as to disallow the programmer from straying beyond them.

My unit tests are far from exhaustive, but with code this simple it
didn't seem terribly important since I was doing it more as a proof of
concept, proving that I could do this sort of thing in not-many-more-
lines-than-the-original-code-that-does-not-operate-to-its-published-
specification.


-tom!

--
 
F

Frederic Rentsch

Tom said:
Frederic Rentsch wrote:



I dedent common leading whitespace, and tabs aren't equivalent to
spaces.

E.g. if some text is indented exclusively with tabs, then the leading
tabs are stripped appropriately. If some other text is indented with
common leading spaces, those are stripped appropriately. If the text to
be stripped has some lines starting with spaces and others starting with
tabs, there are no /common/ leading whitespace characters, and thus
nothing is stripped.
Your rules seem incomplete. What if common tabs remain after stripping common white space? Does this never happen? Or can we hope it doesn't happen? To err on the side of caution I complete your rules and this is my (tested) attempt at expressing them pythonically. (I admit it does look awfully sevety-ish. Just a vulgar little function.)

Cheers

Frederic

-------------------------------------------------------------

def dedent (lines):

leading_space_re = re.compile (' *')
leading_tab_re = re.compile ('\t*')
number_of_lines = len (lines)

while 1:
common_space_length = common_tab_length = 100000
for line in lines:
if line: # No '\n'
try: common_space_length = min (common_space_length, len
(leading_space_re.match (line).group ()))
except AttributeError: pass
try: common_tab_length = min (common_tab_length, len
(leading_tab_re.match (line).group ()))
except AttributeError: pass
if 0 < common_space_length < 100000:
for i in xrange (number_of_lines):
lines = lines [common_space_length:]
elif 0 < common_tab_length < 100000:
for i in xrange (number_of_lines):
lines = lines [common_tab_length:]
else:
break

return lines
 
T

Tom Plunket

Frederic said:
Your rules seem incomplete.

Not my rules, the stated documentation for dedent. "My" understanding
of them may not be equivalent to yours, however.
What if common tabs remain after stripping common white space?

What if we just go with, "[r]emove any whitespace than can be uniformly
removed from the left of every line in `text`." ?
Does this never happen? Or can we hope it doesn't happen?

"Hope" has no place in programming software that is to be used by
others.
To err on the side of caution I complete your rules and this is my
(tested) attempt at expressing them pythonically.

Inasmuch as "my" rules have been expressed via tests, the provided code
fails four of the five tests provided.
(I admit it does look awfully sevety-ish. Just a vulgar little
function.)

Seventys-ish is as much a statement about the lack of statement about
how you actually tested it as it is that an implementation was made
apparently without understanding of the requirements.


-tom!

--
 
F

Frederic Rentsch

Tom said:
Frederic Rentsch wrote:



Not my rules, the stated documentation for dedent. "My" understanding
of them may not be equivalent to yours, however.
It's not about understanding, It's about the objective. Let us consider
the difference between passing a driving test and riding a bicycle in
city traffic. The objective of passing the test is getting the license
and the means is knowing the rules. The objective of riding the bicycle
is surviving and the means is anticipating all possible breaches of
rules on he part of motorists.
What if common tabs remain after stripping common white space?
What if we just go with, "[r]emove any whitespace than can be uniformly
removed from the left of every line in `text`." ?
Does this never happen? Or can we hope it doesn't happen?

"Hope" has no place in programming software that is to be used by
others.
That's exactly what I am saying. That's exactly why it may be a good
idea to provide preventive measures for rules being breached be those
others over whom we have no control.
Inasmuch as "my" rules have been expressed via tests, the provided code
fails four of the five tests provided.
toms_test_data = (
( "\n Hello\n World", # Do this
"\nHello\n World", ), # Expect this
( "\n\tHello\n\t World",
"\nHello\n World", ),
( "\t\tHello\n\tWorld",
"\tHello\nWorld", ),
( "Hello\n\tWorld",
"Hello\n\tWorld", ),
( " \t Hello\n \tWorld",
"\t Hello\n \tWorld", ),
) done = '\n'.join (dedent (dedent_this.splitlines ()))
if done == expect_this: print 'BRAVO!!!'
else: print 'SHAME ON YOU!!!'

BRAVO!!!
BRAVO!!!
BRAVO!!!
BRAVO!!!
BRAVO!!!

You seem to have plugged my function into your tester. I wasn't
concerned about your testing interface but about the dedentation.
Seventys-ish is as much a statement about the lack of statement about
how you actually tested it as it is that an implementation was made
apparently without understanding of the requirements.


-tom!
Best regards

Frederic
 
O

OKB (not okblacke)

Frederic said:
(You dedent common leading tabs, except if preceded by common leading
spaces (?)).

There cannot be common leading tabs if they are preceded by
anything. If they were preceded by something, they wouldn't be
"leading".

--
--OKB (not okblacke)
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is
no path, and leave a trail."
--author unknown
 
T

Tom Plunket

OKB said:
There cannot be common leading tabs if they are preceded by
anything. If they were preceded by something, they wouldn't be
"leading".

Right, but 'common leading whitespace' is a broader term but similarly
unambiguous. <space><tab> != <tab>, but there are two tabs of common
leading whitespace in '\t\t ' and '\t\t\t'.

-tom!

--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,432
Messages
2,571,682
Members
48,796
Latest member
Greg L.

Latest Threads

Top