Fastest way to detect a non-ASCII character in a list of strings.

Dun Peal · Oct 17, 2010

`all_ascii(L)` is a function that accepts a list of strings L, and
returns True if all of those strings contain only ASCII chars, False
otherwise.

What's the fastest way to implement `all_ascii(L)`?

My ideas so far are:

1. Match against a regexp with a character range: `[ -~]`
2. Use s.decode('ascii')
3. `return all(31< ord(c) < 127 for s in L for c in s)`

Any other ideas? Which one do you think will be fastest?

Will reply with final benchmarks and implementations if there's any interest.

Thanks, D

Seebs · Oct 17, 2010

What's the fastest way to implement `all_ascii(L)`?

Start by defining it.

1. Match against a regexp with a character range: `[ -~]`

What about tabs and newlines? For that matter, what about DEL and
BEL? Seems to me that the entire 0-127 range are "ASCII characters".
Perhaps you mean "printable"?

Any other ideas? Which one do you think will be fastest?

I'd guess that a suitable regex (and see whether there's an
existing character class that already has the right semantics) will
be by far the fastest. Just anchor it on both ends and nothing will
have to do any fancy evaluation to test it.

-s

Carl Banks · Oct 18, 2010

`all_ascii(L)` is a function that accepts a list of strings L, and
returns True if all of those strings contain only ASCII chars, False
otherwise.

What's the fastest way to implement `all_ascii(L)`?

My ideas so far are:

1. Match against a regexp with a character range: `[ -~]`
2. Use s.decode('ascii')
3. `return all(31< ord(c) < 127 for s in L for c in s)`

Any other ideas? Which one do you think will be fastest?

If you do numpy the fastest way might be something like:

ns = np.ndarray(len(s),np.uint8,s)
return np.all(np.logical_and(ns>=32,ns<=127))

Carl Banks

printing a list with non-ascii strings	2	Jan 20, 2011
Is there a way to add strings to a list without the quotation marks in C++?	1	Nov 9, 2020
Detect non-ascii substrings in a file	1	Jun 19, 2008
The fastest way to convert a long list of date	3	Feb 8, 2009
Best way to insert sorted in a list	10	Jun 17, 2011
What is the best way to delete strings in a string list that thatmatch certain pattern?	13	Nov 6, 2009
DBD::Oracle, Unicode, non-UTF8-non-ASCII strings	0	Jul 23, 2009
Finding non ascii characters in a set of files	12	Feb 23, 2007

Fastest way to detect a non-ASCII character in a list of strings.

Dun Peal

Seebs

Carl Banks

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads