Logging data from Arduino using PySerial


T

Thomas

I've written a script to log data from my Arduino to a csv file. The script works well enough but it's very, very slow. I'm quite new to Python and I just wanted to put this out there to see if any Python experts could help optimise my code. Here it is:

import serial
import re
import csv
import numpy as np
import matplotlib.pyplot as plt

portPath = "/dev/ttyACM0"
baud = 9600
sample_time = 0.5
sim_time = 30


# Initializing Lists
# Data Collection
data_log = []
line_data = []

def map(x, in_min, in_max, out_min, out_max):
return (((x - in_min) * (out_max - out_min))/(in_max - in_min)) + out_min

# Establishing Serial Connection
connection = serial.Serial(portPath,baud)

# Calculating the length of data to collect based on the
# sample time and simulation time (set by user)
max_length = sim_time/sample_time

# Collecting the data from the serial port
while True:
data_log.append(connection.readline())
if len(data_log) > max_length - 1:
break

# Cleaning the data_log and storing it in data.csv
with open('data.csv','wb') as csvfile:
for line in data_log:
line_data = re.findall('\d*\.\d*',line) # Find all digits
line_data = filter(None,line_data) # Filter out empty strings
line_data = [float(x) for x in line_data] # Convert Strings to float

for i in range(1,len(line_data)):
line_data=map(line_data,0,1023,0,5)

csvwrite = csv.writer(csvfile)
csvwrite.writerow(line_data)



plt.clf()
plt.close()
plt.plotfile('data.csv',(0,1,2),names=['time (s)','voltage2 (V)','voltage1 (V)'],newfig=True)
plt.show()


I'd appreciate any help/tips you can offer.
 
Ad

Advertisements

C

Chris Angelico

I've written a script to log data from my Arduino to a csv file. The script works well enough but it's very, very slow. I'm quite new to Python and I just wanted to put this out there to see if any Python experts could help optimise my code. Here it is:

The most important question is: Have you profiled your code? That is,
do you know which part(s) are slow?

And the first part of that question is: Define "slow". How long does
your program actually take? How much data are you accumulating in that
time? By my reading, you're getting 60 lines from the serial port; how
long is each line?

For a basic back-of-the-envelope timing estimate, work out how long
each line is (roughly), and multiply by 60 (number of lines), then
divide by 960 (your baud rate, divided by 10, which gives a rough
bytes-per-second rate). That'll give you a ball-park figure for how
many seconds this loop will take:
# Collecting the data from the serial port
while True:
data_log.append(connection.readline())
if len(data_log) > max_length - 1:
break

If your lines are 80 characters long, give or take, then that works
out to 80*60/960 = five seconds just to fetch the data from the serial
port. So if the script is taking anywhere from 3 to 8 seconds to run,
then you can assume that it's probably spending most of its time right
here. Yes, that might feel like it's really slow, but it's all down to
your baud rate. You can easily confirm this by surrounding that loop
with:

import time
start_time = time.time()
while ...
... append ...
print("Time to fetch from serial port:",time.time()-start_time)

(If this is Python 2, this will produce slightly ugly output as it'll
display it as a tuple. It'll work though.)

Put that in, and then run the program with some kind of external
timing. On Linux, that would be:

$ time python scriptname.py

If the total script execution is barely more than the time spent in
that loop, don't bother optimizing any of the rest of the code - it's
a waste of time improving something that's not materially affecting
your total run time.

But while I'm here looking at your code, I'll take the liberty of
making a few stylistic suggestions :) Feel free to ignore these, but
this is the more Pythonic way of writing the code. Starting with the
serial port fetch loop:
# Collecting the data from the serial port
while True:
data_log.append(connection.readline())
if len(data_log) > max_length - 1:
break

An infinite loop with a conditional break exactly at one end. This
would be clearer written as a straight-forward conditional loop, with
the condition inverted:

while len(data_log) < max_length:
data_log.append(connection.readline())

This isn't strictly identical to your previous version, but since
len(data_log) will always be an integer, it is - I believe -
functionally equivalent. But make sure I haven't introduced any bugs.
:)

(There is another difference, in that my version checks the condition
on entry where yours would check it only after doing the first append
- your version would guarantee a minimum of one line in the log, mine
won't. I'm guessing that this won't be significant in normal usage; if
it is, go back to your version of the code.)
def map(x, in_min, in_max, out_min, out_max):
return (((x - in_min) * (out_max - out_min))/(in_max - in_min)) + out_min

This shadows the built-in map() function. It works, but it's usually
safer to not do this, in case you subsequently want the default map().
line_data = re.findall('\d*\.\d*',line) # Find all digits
line_data = filter(None,line_data) # Filter out empty strings
line_data = [float(x) for x in line_data] # Convert Strings to float

You can combine these.

line_data = [float(x) for x in re.findall('\d*\.\d*',line) if x]

Optionally break out the re into a separate line, but definitely I'd
go with the conditional comprehension above the separate filter():

line_data = re.findall('\d*\.\d*',line)
line_data = [float(x) for x in line_data if x]
for i in range(1,len(line_data)):
line_data=map(line_data,0,1023,0,5)


Is it deliberate that you start from 1 here? Are you consciously
skipping the first element, leaving it unchanged? If not, I would go
for another comprehension:

line_data = [map(x,0,1023,0,5) for x in line_data]

but if so, this warrants a code comment explaining why the first one is special.
plt.clf()
plt.close()
plt.plotfile('data.csv',(0,1,2),names=['time (s)','voltage2 (V)','voltage1 (V)'],newfig=True)
plt.show()

This, btw, is the other place that I'd look for a potentially large
slab of time. Bracket this with time.time() calls as above, and see
whether it's slow enough to mean nothing else is a concern. But I'd
first look at the serial port read loop; depending on how long your
lines are, that could easily be quite a few seconds of time just on
its own.

ChrisA
 
T

Thomas

Wow...Thanks Chris! I really appreciate your suggestions (including the stylistic ones). I'll definitely be revising my code as soon as I find the time. As far as profiling goes, I've used timeit in the past but it's quite a pain going through any program block by block. I wish there were a program in which you could just toss in a script and it would spit out the bottlenecks in your code (with suggested performance improvements perhaps)...
 
C

Chris Angelico

Wow...Thanks Chris! I really appreciate your suggestions (including the stylistic ones). I'll definitely be revising my code as soon as I find the time. As far as profiling goes, I've used timeit in the past but it's quite a pain going through any program block by block. I wish there were a program in which you could just toss in a script and it would spit out the bottlenecks in your code (with suggested performance improvements perhaps)...

Well, timeit is good for microbenchmarking. (How useful
microbenchmarking itself is, now, that's a separate question.) For
anything where you can "feel" the program's time by human, it's easy
enough to use the time.time() function.

Add a little helper like this (untested):

last_time = time.time()
def tt(desc):
global last_time
cur_time=time.time()
print("%s: %f"%(desc,cur_time-last_time))
last_time=cur_time


Then put this all through your code:

.....

# Calculating the length of data to collect based on the
# sample time and simulation time (set by user)
max_length = sim_time/sample_time

tt("Init")

# Collecting the data from the serial port
while True:
data_log.append(connection.readline())
if len(data_log) > max_length - 1:
break

tt("Serial")

....

etc etc. Give it a short description saying what's just happened
(because it'll give the time since the previous timepoint), and then
just eyeball the results to see where the biggest numbers are. If you
do it right, you'll find a whole pile of sections with tiny numbers,
which you can easily ignore, and just a handful that even register.
Then you dig into those sections and see where the slowness is.

Be careful, though. You can easily waste hundreds of expensive dev
hours trying to track down an insignificant time delay. :)

ChrisA
 
D

Dennis Lee Bieber

I've written a script to log data from my Arduino to a csv file. The script works well enough but it's very, very slow. I'm quite new to Python and I just wanted to put this out there to see if any Python experts could help optimise my code. Here it is:

import serial
import re
import csv
import numpy as np
import matplotlib.pyplot as plt

portPath = "/dev/ttyACM0"
baud = 9600
sample_time = 0.5
sim_time = 30


# Initializing Lists
# Data Collection
data_log = []
line_data = []

def map(x, in_min, in_max, out_min, out_max):
return (((x - in_min) * (out_max - out_min))/(in_max - in_min)) + out_min
Doesn't the Arduino have a map() function internally? If you have
control over the Arduino couldn't you set it up to return the desired
mapping values directly?
# Establishing Serial Connection
connection = serial.Serial(portPath,baud)

# Calculating the length of data to collect based on the
# sample time and simulation time (set by user)
max_length = sim_time/sample_time

# Collecting the data from the serial port
while True:
data_log.append(connection.readline())
if len(data_log) > max_length - 1:
break
Here you are building up a list of raw lines...
# Cleaning the data_log and storing it in data.csv
with open('data.csv','wb') as csvfile:
for line in data_log:
line_data = re.findall('\d*\.\d*',line) # Find all digits
line_data = filter(None,line_data) # Filter out empty strings
line_data = [float(x) for x in line_data] # Convert Strings to float

for i in range(1,len(line_data)):
line_data=map(line_data,0,1023,0,5)

csvwrite = csv.writer(csvfile)


You are creating a new csv writer instance on each pass!
csvwrite.writerow(line_data)
And then you loop over all the lines looking for particular values,
just to scale them into another range, to write to a CSV file.

Personally, I'd have opened the CSV file at the start, and done all
this filtering/transforming on each line as it was read from the Arduino.

csvfile = open("data.csv", "wb")
csvwriter = csv.writer(csvfile)
line = ""
while len(line) < max_length:
if len(line) == 0: #skip first line (which your range(1,...) does
line = connection.readline()
line = connection.readline()
# do all your filtering here
if line: #not empty, so filtering didn't wipe it out
csvwrite.writerow(line)
csvfile.close()
 
Ad

Advertisements

M

MRAB

I've written a script to log data from my Arduino to a csv file. The script works well enough but it's very, very slow. I'm quite new to Python and I just wanted to put this out there to see if any Python experts could help optimise my code. Here it is:
[snip]
# Cleaning the data_log and storing it in data.csv
with open('data.csv','wb') as csvfile:
for line in data_log:
line_data = re.findall('\d*\.\d*',line) # Find all digits
line_data = filter(None,line_data) # Filter out empty strings
line_data = [float(x) for x in line_data] # Convert Strings to float

for i in range(1,len(line_data)):
line_data=map(line_data,0,1023,0,5)

You're doing this for every in line the log:
csvwrite = csv.writer(csvfile)
[snip]

Try moving before the 'for' loop so it's done only once.
 
Ad

Advertisements


Top