how to avoid spaghetti in Python?

C

CM

I've been learning and using Python for a number of years now but never really go particularly disciplined about all good coding practices. I've definitely learned *some*, but I'm hoping this year to take a good step up in terms of refactoring, maintainability, and mostly just "de-spaghettizing" myapproach to writing Python programs.

But I could use some pointers.

Of course, many pointers will be general programming advice that is common to all languages, and I get that, but I also think that it's possible certain bits of advice are particularly relevant to Python coders, and so I ask here. (Also because I know there are some great community people here who I'd enjoy getting their take on this).

A few specific questions in this area...

1) One of my main "spaghetti problems" is something I don't know what to ever call. Basically it is that I sometimes have a "chain" of functions or objects that get called in some order and these functions may live in different modules and the flow of information may jump around quite a bit, sometimes through 4-5 different parts of the code, and along the way variables get modified (and those variables might be child objects of the whole class, or they may just be objects that exist only within functions' namespaces, or both). This is hard to debug and maintain. What would people recommend to manage this? A system of notes? A way to simplify the flow? And what is this problem called (if something other than just spaghetti code) so I can Google more about it?

2) A related question: Often I find there are two ways to update the valueof an object, and I don't know what's best in which circumstances... To begin to explain, let's say the code is within a class that represents a Frame object in a GUI, like wxPython. Now let's say ALL the code is within this wxFrame class object. So, any object that is named with "self." prepended, like self.panel, or self.current_customer, or self.current_date, will bea child object of that frame class, and therefore is sort of "global" to the whole frame's namespace and can therefore be accessed from within any function in the class. So let's say I have a function called self.GetCurrentCustomer(). To actually get the name of the current customer into RAM, it goes into the database and uses some rule to get the current customer. NOW,the question is, which of these should I do? This:

def GetCurrentCustomer(self):
self.current_customer = #do some database stuff here....

Or this:

def GetCurrentCustomer(self):
current_customer = #do some database stuff here....
return current_customer

And what difference does it make? In the first case, I am just updating the "global" object of the current_customer, so that any function can then use it. In the second case, I am only returning the current_customer to whatever function(s) call this GetCurrentCustomer() function.

My hunch is the first way leads to spaghetti problems. But I want to understand what the best practices are in this regard. I have found in some cases the first method seemed handy, but I'm not sure what the best way of thinking about this is.

3) Generally, what are other tools or approaches you would use to organize well a good-sized project so to avoid fighting yourself later when you don't understand your own code and the flow of information through it? By goodsized, say about 20,000 lines of Python code, or something like that.

This is the sort of question that would be rejected on Stack Overflow, so Ihope you can excuse my fishing for insight in a somewhat open/vague way.

Thanks.
 
C

Chris Angelico

1) One of my main "spaghetti problems" is something I don't know what to ever call. Basically it is that I sometimes have a "chain" of functions orobjects that get called in some order and these functions may live in different modules and the flow of information may jump around quite a bit, sometimes through 4-5 different parts of the code, and along the way variables get modified (and those variables might be child objects of the whole class, or they may just be objects that exist only within functions' namespaces,or both). This is hard to debug and maintain.

Rule of thumb: Every function should be able to be summarized in a
single line. This isn't Python-specific, but in the case of Python,
it's part of the recommendations for docstrings [1]. When one function
calls another function calls another and so on, it's not a problem if
each one can be adequately described:

def is_positive(item):
"""Ascertain whether the item is deemed positive.

Per business requirement XYZ123, items are
deemed positive at 90% certainty, even though
they are deemed negative at only 75%.
"""
return item.certainty >= 0.9 and item.state > 0

def count_positive(lst):
"""Return the number of deemed-positive items in lst."""
return sum((1 for item in lst if is_positive(item)))

Each of these functions has a clear identity. (Okay, they're a little
trivial for the sake of the example, but you can see how this would
work.) Each one makes sense on its own, and it's obvious that one
should be deferring to the other. If business requirement XYZ123 ever
changes, count_positive's behaviour should change, ergo it calls on
is_positive to make the decision.

Rule of thumb: Anything that changes state should make sense. Neither
of the above functions has any right to *modify* lst or item (except
for stats, maybe - "time since last queried" could be reset). You
mention "variables getting modified", and then go on to use some
rather non-Pythonic terminology; I'm not entirely sure what you mean
there, so I'll ignore it and just say something that may or may not
have any relevance to your case: the function's one-line summary
should normally make it clear whether state is to be changed or not. A
function that queries something shouldn't usually change that state
(except when you read from a stream; there's a bit of a grey area with
retrieving the first element of a list, which shouldn't change the
list, vs retrieving the top element of a stack/queue/heap, which
possibly should, but then you'd call it "pop" to be clear).

Tip: Adding one-line descriptions to all your functions is a great way
to figure out (or force yourself to figure out) what your code's
doing. Having someone *else* add one-line descriptions to all your
functions is an awesome way to figure out where your function names
are unclear :) I had someone go through one of my open source projects
doing exactly that, and it was quite enlightening to see which of his
docstrings were majorly incorrect. Several of them ended up triggering
renames or code revamps to make something more intuitive.

ChrisA

[1] See PEP 257, http://www.python.org/dev/peps/pep-0257/
 
A

andrea crotti

2014/1/21 CM said:
I've been learning and using Python for a number of years now but never really go particularly disciplined about all good coding practices. I've definitely learned *some*, but I'm hoping this year to take a good step up interms of refactoring, maintainability, and mostly just "de-spaghettizing" my approach to writing Python programs.

It's not really a problem of Python, you just want to learn more about
OO principles and good design practices, and about that there are
hundreds of good books to read!
A few specific questions in this area...

1) One of my main "spaghetti problems" is something I don't know what to ever call. Basically it is that I sometimes have a "chain" of functions orobjects that get called in some order and these functions may live in different modules and the flow of information may jump around quite a bit, sometimes through 4-5 different parts of the code, and along the way variables get modified (and those variables might be child objects of the whole class, or they may just be objects that exist only within functions' namespaces,or both). This is hard to debug and maintain. What would people recommend to manage this? A system of notes? A way to simplify the flow? And what is this problem called (if something other than just spaghetti code) so Ican Google more about it?

Just define clearly objects and methods and how they interact with
each other in a logic way, and you won't have this problem anymore.
2) A related question: Often I find there are two ways to update the value of an object, and I don't know what's best in which circumstances... To begin to explain, let's say the code is within a class that represents a Frame object in a GUI, like wxPython. Now let's say ALL the code is within this wxFrame class object. So, any object that is named with "self." prepended, like self.panel, or self.current_customer, or self.current_date, will be a child object of that frame class, and therefore is sort of "global" tothe whole frame's namespace and can therefore be accessed from within any function in the class. So let's say I have a function called self.GetCurrentCustomer(). To actually get the name of the current customer into RAM, itgoes into the database and uses some rule to get the current customer. NOW, the question is, which of these should I do? This:

def GetCurrentCustomer(self):
self.current_customer = #do some database stuff here....

Or this:

def GetCurrentCustomer(self):
current_customer = #do some database stuff here....
return current_customer

And what difference does it make? In the first case, I am just updating the "global" object of the current_customer, so that any function can then use it. In the second case, I am only returning the current_customer to whatever function(s) call this GetCurrentCustomer() function.

GetCurrentCustomer should be really get_current_customer if you don't
want people screaming at you.
And about the question it depends, is the database stuff going to be expensive?
Do you need to have always a new value?

And by the way if you're never actually using "self" in a method
maybe it should be a function, or at least a classmethod instead.
My hunch is the first way leads to spaghetti problems. But I want to understand what the best practices are in this regard. I have found in some cases the first method seemed handy, but I'm not sure what the best way of thinking about this is.

3) Generally, what are other tools or approaches you would use to organize well a good-sized project so to avoid fighting yourself later when you don't understand your own code and the flow of information through it? By good sized, say about 20,000 lines of Python code, or something like that.

Good architecture and some meaningful directory structure is good
enough, to navigate Emacs + ack and I'm already very productive even
with bigger projects than that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,756
Messages
2,569,535
Members
45,008
Latest member
obedient dusk

Latest Threads

Top