M
metaperl
There is no end to the number of frantic pleas for help with
characters in the realm beyond ASCII.
However, in searching thru them, I do not see a workable approach to
changing them into other things.
I am dealing with a file and in my Emacs editor, I see "MASSACHUSETTS-
AMHERST" ... in other words, there is a dash between MASSACHUSETTS and
AMHERST.
However, if I do a grep for the text the shell returns this:
MASSACHUSETTSâAMHERST
and od -tc returns this:
0000540 O F M A S S A C H U S E T
T
0000560 S 342 200 223 A M H E R S T ; U N
I
So, the conclusion is the "dash" is actually 3 octal characters. My
goal is to take those 3 octal characters and convert them to an ascii
dash. Any idea how I might write such a filter? The closest I have got
it:
unicodedata.normalize('NFKD', s).encode('ASCII', 'replace')
but that puts a question mark there.
characters in the realm beyond ASCII.
However, in searching thru them, I do not see a workable approach to
changing them into other things.
I am dealing with a file and in my Emacs editor, I see "MASSACHUSETTS-
AMHERST" ... in other words, there is a dash between MASSACHUSETTS and
AMHERST.
However, if I do a grep for the text the shell returns this:
MASSACHUSETTSâAMHERST
and od -tc returns this:
0000540 O F M A S S A C H U S E T
T
0000560 S 342 200 223 A M H E R S T ; U N
I
So, the conclusion is the "dash" is actually 3 octal characters. My
goal is to take those 3 octal characters and convert them to an ascii
dash. Any idea how I might write such a filter? The closest I have got
it:
unicodedata.normalize('NFKD', s).encode('ASCII', 'replace')
but that puts a question mark there.