Share Code Tips

  • Thread starter Devyn Collier Johnson
  • Start date
D

Devyn Collier Johnson

Aloha Python Users!

I have some coding tips and interesting functions that I want to share
with all of you. I want to give other programmers ideas and inspiration.
It is all Python3; most of it should work in Python2. I am a Unix/Linux
person, so some of these will only work on Unix systems. Sorry Microsuck
users :-D ;-)

All of the below Python3 code came from Neobot v0.8dev. I host an
artificial intelligence program on Launchpad (LP Username:
devyncjohnson-d). I have not released my Python version yet. The current
version of Neobot (v0.7a) is written in BASH and Python3.

To emulate the Linux shell's date command, use this Python

function def DATE(): print(time.strftime("%a %B %d %H:%M:%S %Z %Y"))

Want an easy way to clear the terminal screen? Then try this:

def clr(): os.system(['clear','cls'][os.name == 'nt'])

Here are two Linux-only functions:

def GETRAM(): print(linecache.getline('/proc/meminfo',
1).replace('MemTotal:', '').strip()) #Get Total RAM in kilobytes#
def KDE_VERSION(): print(subprocess.getoutput('kded4 --version | awk -F:
\'NR == 2 {print $2}\'').strip()) ##Get KDE version##

Need a case-insensitive if-statement? Check this out:

if 'YOUR_STRING'.lower() in SOMEVAR.lower():

Have a Python XML browser and want to add awesome tags? This code would
see if the code to be parsed contains chess tags. If so, then they are
replaced with chess symbols. I know, many people hate trolls, but trolls
are my best friends. Try this:

if '<chess_'.lower() in PTRNPRS.lower(): DATA =
re.sub('<chess_white_king/>', 'â™”', PTRNPRS, flags=re.I); DATA =
re.sub('<chess_white_queen/>', '♕', DATA, flags=re.I); DATA =
re.sub('<chess_white_castle/>', 'â™–', DATA, flags=re.I); DATA =
re.sub('<chess_white_bishop/>', 'â™—', DATA, flags=re.I); DATA =
re.sub('<chess_white_knight/>', '♘', DATA, flags=re.I); DATA =
re.sub('<chess_white_pawn/>', 'â™™', DATA, flags=re.I); DATA =
re.sub('<chess_black_king/>', '♚', DATA, flags=re.I); DATA =
re.sub('<chess_black_queen/>', 'â™›', DATA, flags=re.I); DATA =
re.sub('<chess_black_castle/>', '♜', DATA, flags=re.I); DATA =
re.sub('<chess_black_bishop/>', 'â™', DATA, flags=re.I); DATA =
re.sub('<chess_black_knight/>', '♞', DATA, flags=re.I); PTRNPRS =
re.sub('<chess_black_pawn/>', '♟', DATA, flags=re.I)

For those of you making scripts to be run in a terminal, try this for a
fancy terminal prompt:

INPUTTEMP = input('User ≻≻≻')


I may share more code later. Tell me what you think of my coding style
and tips.


Mahalo,

Devyn Collier Johnson
(e-mail address removed)
 
S

Steven D'Aprano

def KDE_VERSION():
print(subprocess.getoutput('kded4 --version | awk -F:
\'NR == 2 {print $2}\'').strip()) ##Get KDE version##

I run KDE 3, and the above does not work for me.

*half a wink*

By the way, a comment that doesn't tell you anything that you don't
already know is worse than useless. The function is called "KDE_VERSION,
what else would it do other than return the KDE version?


x += 1 # add 1 to x

Worse than just being useless, redundant comments are dangerous, because
as a general rule comments that don't say anything useful eventually
become out-of-date, they become *inaccurate* rather than *redundant*, and
that's worse than being useless.

Need a case-insensitive if-statement? Check this out:

if 'YOUR_STRING'.lower() in SOMEVAR.lower():

Case-insensitivity is very hard. Take German for example:

STRASSE <-> straße

Or Turkish:

Ä° <-> i
I <-> ı


In Python 3.3, you should use casefold rather than lowercase or uppercase:

if some_string.casefold() in another_string.casefold(): ...


but even that can't always take into account localised rules, e.g. in
German, you should not convert SS to ß for placenames or person names, so
for example Herr Meißner and Herr Meissner are two different people. This
is one of the motivating reasons for introducing the uppercase ß.

http://opentype.info/blog/2011/01/24/capital-sharp-s/
 
D

Devyn Collier Johnson

I run KDE 3, and the above does not work for me.

*half a wink*

By the way, a comment that doesn't tell you anything that you don't
already know is worse than useless. The function is called "KDE_VERSION,
what else would it do other than return the KDE version?


x += 1 # add 1 to x

Worse than just being useless, redundant comments are dangerous, because
as a general rule comments that don't say anything useful eventually
become out-of-date, they become *inaccurate* rather than *redundant*, and
that's worse than being useless.


Case-insensitivity is very hard. Take German for example:

STRASSE <-> straße

Or Turkish:

Ä° <-> i
I <-> ı


In Python 3.3, you should use casefold rather than lowercase or uppercase:

if some_string.casefold() in another_string.casefold(): ...


but even that can't always take into account localised rules, e.g. in
German, you should not convert SS to ß for placenames or person names, so
for example Herr Meißner and Herr Meissner are two different people. This
is one of the motivating reasons for introducing the uppercase ß.

http://opentype.info/blog/2011/01/24/capital-sharp-s/
Steven, thanks for your interesting comments. Your emails are very
insightful.

As for the KDE function, I should fix that. Thank you for catching that.
Notice that the shell command in the function is "kded4". That would
only check the version for the KDE4 series. The function will only work
for KDE4 users. As for the comment, you would be amazed with the people
that ask me "what does this do?". These people are redundant (^u^).

As for the case-insensitive if-statements, most code uses Latin letters.
Making a case-insensitive-international if-statement would be
interesting. I can tackle that later. For now, I only wanted to take
care of Latin letters. I hope to figure something out for all characters.

Thank you for your reply. I found it to be very helpful.

Mahalo,
DCJ
 
D

Dave Angel

On 07/19/2013 01:59 PM, Steven D'Aprano wrote:

As for the case-insensitive if-statements, most code uses Latin letters.
Making a case-insensitive-international if-statement would be
interesting. I can tackle that later. For now, I only wanted to take
care of Latin letters. I hope to figure something out for all characters.

Once Steven gave you the answer, what's to figure out? You simply use
casefold() instead of lower(). The only constraint is it's 3.3 and
later, so you can't use it for anything earlier.

http://docs.python.org/3.3/library/stdtypes.html#str.casefold

"""
str.casefold()
Return a casefolded copy of the string. Casefolded strings may be used
for caseless matching.

Casefolding is similar to lowercasing but more aggressive because it is
intended to remove all case distinctions in a string. For example, the
German lowercase letter 'ß' is equivalent to "ss". Since it is already
lowercase, lower() would do nothing to 'ß'; casefold() converts it to "ss".

The casefolding algorithm is described in section 3.13 of the Unicode
Standard.

New in version 3.3.
"""
 
D

Devyn Collier Johnson

Once Steven gave you the answer, what's to figure out? You simply use
casefold() instead of lower(). The only constraint is it's 3.3 and
later, so you can't use it for anything earlier.

http://docs.python.org/3.3/library/stdtypes.html#str.casefold

"""
str.casefold()
Return a casefolded copy of the string. Casefolded strings may be used
for caseless matching.

Casefolding is similar to lowercasing but more aggressive because it
is intended to remove all case distinctions in a string. For example,
the German lowercase letter 'ß' is equivalent to "ss". Since it is
already lowercase, lower() would do nothing to 'ß'; casefold()
converts it to "ss".

The casefolding algorithm is described in section 3.13 of the Unicode
Standard.

New in version 3.3.
"""
Chris Angelico said that casefold is not perfect. In the future, I want
to make the perfect international-case-insensitive if-statement. For
now, my code only supports a limited range of characters. Even with
casefold, I will have some issues as Chris Angelico mentioned. Also, "ß"
is not really the same as "ss".

Mahalo,
DCJ
 
C

Chris Angelico

Chris Angelico said that casefold is not perfect. In the future, I want to
make the perfect international-case-insensitive if-statement. For now, my
code only supports a limited range of characters. Even with casefold, I will
have some issues as Chris Angelico mentioned. Also, "ß" is not really the
same as "ss".

Well, casefold is about as good as it's ever going to be, but that's
because "the perfect international-case-insensitive comparison" is a
fundamentally impossible goal. Your last sentence hints as to why;
there is no simple way to compare strings containing those characters,
because the correct treatment varies according to context.

Your two best options are: Be case sensitive (and then you need only
worry about composition and combining characters and all those
nightmares - the ones you have to worry about either way), or use
casefold(). Of those, I prefer the first, because it's safer; the
second is also a good option.

ChrisA
 
D

Dave Angel

Chris Angelico said that casefold is not perfect. In the future, I want
to make the perfect international-case-insensitive if-statement. For
now, my code only supports a limited range of characters. Even with
casefold, I will have some issues as Chris Angelico mentioned. Also, "ß"
is not really the same as "ss".

Sure, the casefold() method has its problems. But you're going to avoid
using it till you can do a "perfect" one?

Perfect in what context? For "case sensitively" comparing people's
names in a single language in a single country? Perhaps that can be
made perfect. For certain combinations of language and country.

But if you want to compare words in an unspecified language with an
unspecified country, it cannot be done.

If you've got a particular goal in mind, great. But as a library
function, you're better off using the best standard method available,
and document what its limitations are. One way of documenting such is
to quote the appropriate standards, with their caveats.


By the way, you mentioned earlier that you're restricting yourself to
Latin characters. The lower() method is inadequate for many of those as
well. Perhaps you meant ASCII instead.
 
S

Steven D'Aprano

As for the case-insensitive if-statements, most code uses Latin letters.
Making a case-insensitive-international if-statement would be
interesting. I can tackle that later. For now, I only wanted to take
care of Latin letters. I hope to figure something out for all
characters.

As I showed, even for Latin letters, the trick of "if astring.lower() ==
bstring.lower()" doesn't *quite* work, although it can be "close enough"
for some purposes. For example, some languages treat accents as mere
guides to pronunciation, so ö == o, while other languages treat them as
completely different letters. Same with ligatures: in modern English, æ
should be treated as equal to ae, but in Old English, Danish, Norwegian
and Icelandic it is a distinct letter.

Case-insensitive testing may be easier in many non-European languages,
because they don't have cases.

A full solution to the problem of localized string matching requires
expert knowledge for each language, but a 90% solution is pretty simple:

astring.casefold() == bstring.casefold()

or before version 3.3, just use lowercase. It's not a perfect solution,
but it works reasonably well if you don't care about full localization.
 
S

Steven D'Aprano

In the future, I want to
make the perfect international-case-insensitive if-statement. For now,
my code only supports a limited range of characters. Even with casefold,
I will have some issues as Chris Angelico mentioned.

There are hundreds of written languages in the world, with thousands of
characters, and most of them have rules about case-sensitivity and
character normalization. For example, in Greek, lowercase Σ is σ except
at the end of a word, when it is Ï‚.

≻≻≻ 'Σσς'.upper()
'ΣΣΣ'
≻≻≻ 'Σσς'.lower()
'σσς'
≻≻≻ 'Σσς'.casefold()
'σσσ'


So in this case, casefold() correctly solves the problem, provided you
are comparing modern Greek text. But if you're comparing text in some
other language which merely happens to use Greek letters, but doesn't
have the same rules about letter sigma, then it will be inappropriate. So
you cannot write a single "perfect" case-insensitive comparison, the best
you can hope for is to write dozens or hundreds of separate case-
insensitive comparisons, one for each language or family of languages.

For an introduction to the problem:

http://www.w3.org/International/wiki/Case_folding

http://www.unicode.org/faq/casemap_charprop.html



Also, "ß" is not really the same as "ss".

Sometimes it is. Sometimes it isn't.
 
D

David Hutto

It seems, without utilizing this, or googling, that a case sensitive
library is either developed, or could be implemented by utilizing case
sensitive translation through a google translation page using an urlopener,
and placing in the data to be processed back to the boolean value. Never
attempted, but the algorithm seems simpler than the dozens of solutions
method.
 
D

Devyn Collier Johnson

Sure, the casefold() method has its problems. But you're going to
avoid using it till you can do a "perfect" one?

Perfect in what context? For "case sensitively" comparing people's
names in a single language in a single country? Perhaps that can be
made perfect. For certain combinations of language and country.

But if you want to compare words in an unspecified language with an
unspecified country, it cannot be done.

If you've got a particular goal in mind, great. But as a library
function, you're better off using the best standard method available,
and document what its limitations are. One way of documenting such is
to quote the appropriate standards, with their caveats.


By the way, you mentioned earlier that you're restricting yourself to
Latin characters. The lower() method is inadequate for many of those
as well. Perhaps you meant ASCII instead.
Of course not, Dave; I will implement casefold. I just plan to not stop
there. My program should not come across unspecified languages. Yeah, I
meant ASCII, but I was unaware that lower() had some limitation on Latin
letters.

Mahalo,
DCJ
 
D

Devyn Collier Johnson

As I showed, even for Latin letters, the trick of "if astring.lower() ==
bstring.lower()" doesn't *quite* work, although it can be "close enough"
for some purposes. For example, some languages treat accents as mere
guides to pronunciation, so ö == o, while other languages treat them as
completely different letters. Same with ligatures: in modern English, æ
should be treated as equal to ae, but in Old English, Danish, Norwegian
and Icelandic it is a distinct letter.

Case-insensitive testing may be easier in many non-European languages,
because they don't have cases.

A full solution to the problem of localized string matching requires
expert knowledge for each language, but a 90% solution is pretty simple:

astring.casefold() == bstring.casefold()

or before version 3.3, just use lowercase. It's not a perfect solution,
but it works reasonably well if you don't care about full localization.
Thanks for the tips. I am learning a lot from this mailing list. I hope
my code helped some people though.

Mahalo,
DCJ
 
D

Devyn Collier Johnson

There are hundreds of written languages in the world, with thousands of
characters, and most of them have rules about case-sensitivity and
character normalization. For example, in Greek, lowercase Σ is σ except
at the end of a word, when it is Ï‚.

≻≻≻ 'Σσς'.upper()
'ΣΣΣ'
≻≻≻ 'Σσς'.lower()
'σσς'
≻≻≻ 'Σσς'.casefold()
'σσσ'


So in this case, casefold() correctly solves the problem, provided you
are comparing modern Greek text. But if you're comparing text in some
other language which merely happens to use Greek letters, but doesn't
have the same rules about letter sigma, then it will be inappropriate. So
you cannot write a single "perfect" case-insensitive comparison, the best
you can hope for is to write dozens or hundreds of separate case-
insensitive comparisons, one for each language or family of languages.

For an introduction to the problem:

http://www.w3.org/International/wiki/Case_folding

http://www.unicode.org/faq/casemap_charprop.html




Sometimes it is. Sometimes it isn't.
Wow, my if-statement is so imperfect! Thankfully, only English people
will talk to an English chatbot (I hope), so for my use of the code, it
will work.
Do the main Python3 developers plan to do something about this?

Mahalo,
DCJ
 
D

Devyn Collier Johnson

Well, casefold is about as good as it's ever going to be, but that's
because "the perfect international-case-insensitive comparison" is a
fundamentally impossible goal. Your last sentence hints as to why;
there is no simple way to compare strings containing those characters,
because the correct treatment varies according to context.

Your two best options are: Be case sensitive (and then you need only
worry about composition and combining characters and all those
nightmares - the ones you have to worry about either way), or use
casefold(). Of those, I prefer the first, because it's safer; the
second is also a good option.

ChrisA
Thanks everyone (especially Chris Angelico and Steven D'Aprano) for all
of your helpful suggests and ideas. I plan to implement casefold() in
some of my programs.

Mahalo,
DCJ
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,057
Messages
2,570,443
Members
47,115
Latest member
DorothyLus

Latest Threads

Top