Fastest way to detect a non-ASCII character in a list of strings.

D

Dun Peal

`all_ascii(L)` is a function that accepts a list of strings L, and
returns True if all of those strings contain only ASCII chars, False
otherwise.

What's the fastest way to implement `all_ascii(L)`?

My ideas so far are:

1. Match against a regexp with a character range: `[ -~]`
2. Use s.decode('ascii')
3. `return all(31< ord(c) < 127 for s in L for c in s)`

Any other ideas? Which one do you think will be fastest?

Will reply with final benchmarks and implementations if there's any interest.

Thanks, D
 
S

Seebs

What's the fastest way to implement `all_ascii(L)`?

Start by defining it.
1. Match against a regexp with a character range: `[ -~]`

What about tabs and newlines? For that matter, what about DEL and
BEL? Seems to me that the entire 0-127 range are "ASCII characters".
Perhaps you mean "printable"?
Any other ideas? Which one do you think will be fastest?

I'd guess that a suitable regex (and see whether there's an
existing character class that already has the right semantics) will
be by far the fastest. Just anchor it on both ends and nothing will
have to do any fancy evaluation to test it.

-s
 
C

Carl Banks

`all_ascii(L)` is a function that accepts a list of strings L, and
returns True if all of those strings contain only ASCII chars, False
otherwise.

What's the fastest way to implement `all_ascii(L)`?

My ideas so far are:

1. Match against a regexp with a character range: `[ -~]`
2. Use s.decode('ascii')
3. `return all(31< ord(c) < 127 for s in L for c in s)`

Any other ideas?  Which one do you think will be fastest?

If you do numpy the fastest way might be something like:

ns = np.ndarray(len(s),np.uint8,s)
return np.all(np.logical_and(ns>=32,ns<=127))


Carl Banks
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top