Moving to an OOP model from an classically imperitive one

T

tim.thelion

Hello,

I am currently writting a program called subuser(subuser.org), which is written as classically imperative code. Subuser is, essentially, a package manager. It installs and updates programs from repositories.

I have a set of source files https://github.com/subuser-security/subuser/tree/master/logic/subuserCommands/subuserlib which have functions in them. Each function does something to a program, it identifies the program by the programs name. For example, I have an installProgram function defined as such:

def installProgram(programName, useCache):

Now I've run into a flaw in this model. There are certain situations wherea "programName" is not a unique identifier. It is possible for two repositories to each have a program with the same name. Obviously, I could go through my code and replace all use of the string "programName" with a tuple of (programName, repository). Or I could define a new class with two attributes: programName and repository, and pass such a simple object arround, orpass a dictionary. However, I think this would be better solved by movingfully to an OOP model. That is, I would have a SubuserProgram class whichhad methods such as "install", "describe", "isInstalled"...

There is one problem though. Currently, I have these functions logically organized into source files, each between 40 and 170 LOC. I fear that if I were to put all of these functions into one class, than I would have a single, very large source file. I don't like working with large source files for practicall reasons. If I am to define the class SubuserProgram in the file SubuserProgram.py, I do not want all <https://github.com/subuser-security/subuser/blob/master/logic/subuserCommands/subuserlib/run.py#L162> of run..py to be moved into that file as well.

I thought about keeping each method in a separate file, much as I do now, something like:

###################
#FileA.py
###################
def a(self):
blah

###################
#FileB.py
###################
def b(self):
blah

###################
#Class.py
###################
import FileA, FileB
class C:
a=FileA.a
b=FileB.b

This works, but I find that it is hard to read. When I come across FileA, and I see "self" it just seems very confusing. I suffer a bout of "who-am-i"ism.

I asked on IRC and it was sugested that I use multiple classes, however I see no logical way to separate a SubuserProgram object into multiple classes..

So I thought I would seek your advice.

Tim
 
M

Mark Lawrence

Hello,

I am currently writting a program called subuser(subuser.org), which is written as classically imperative code. Subuser is, essentially, a package manager. It installs and updates programs from repositories.

I have a set of source files https://github.com/subuser-security/subuser/tree/master/logic/subuserCommands/subuserlib which have functions in them. Each function does something to a program, it identifies the program by the programs name. For example, I have an installProgram function defined as such:

def installProgram(programName, useCache):

Now I've run into a flaw in this model. There are certain situations where a "programName" is not a unique identifier. It is possible for two repositories to each have a program with the same name. Obviously, I could go through my code and replace all use of the string "programName" with a tuple of (programName, repository). Or I could define a new class with two attributes: programName and repository, and pass such a simple object arround, or pass a dictionary. However, I think this would be better solved by moving fully to an OOP model. That is, I would have a SubuserProgram class which had methods such as "install", "describe", "isInstalled"...

There is one problem though. Currently, I have these functions logically organized into source files, each between 40 and 170 LOC. I fear that if I were to put all of these functions into one class, than I would have a single, very large source file. I don't like working with large source files for practicall reasons. If I am to define the class SubuserProgram in the file SubuserProgram.py, I do not want all <https://github.com/subuser-security/subuser/blob/master/logic/subuserCommands/subuserlib/run.py#L162> of run.py to be moved into that file as well.

I thought about keeping each method in a separate file, much as I do now, something like:

###################
#FileA.py
###################
def a(self):
blah

###################
#FileB.py
###################
def b(self):
blah

###################
#Class.py
###################
import FileA, FileB
class C:
a=FileA.a
b=FileB.b

This works, but I find that it is hard to read. When I come across FileA, and I see "self" it just seems very confusing. I suffer a bout of "who-am-i"ism.

I asked on IRC and it was sugested that I use multiple classes, however I see no logical way to separate a SubuserProgram object into multiple classes.

So I thought I would seek your advice.

Tim

You're writing Python, not Java, so put your code into one file and stop
messing about.
 
I

Ian Kelly

I asked on IRC and it was sugested that I use multiple classes, however I
see no logical way to separate a SubuserProgram object into multiple
classes.

You say you already have the methods logically separated into files. How
about adding one abstract class per file, and then letting SubuserProgram
inherit from each of those individual classes?

As another alternative that you haven't mentioned yet, you could create a
decorator to be applied to each method, which will inject it into the class
at the time it is defined. The explicitness of the decorator solves your
confusion problem of "why does this function use self". It creates a couple
of new problems though: 1) reading the class won't tell you what methods it
contains; and 2) making sure the class is fully constructed before it is
used becomes tricky.
 
E

Ethan Furman

There is one problem though. Currently, I have these functions logically
organized into source files, each between 40 and 170 LOC. I fear that if
I were to put all of these functions into one class, than I would have a
single, very large source file. I don't like working with large source
files for practicall reasons.

I'm curious what these practical reasons are. One my smallest source files has 870 lines in it, my largest nearly 9000.

If the problem is your editor, you should seriously consider switching.
 
M

MRAB

Hello,

I am currently writting a program called subuser(subuser.org), which
is written as classically imperative code. Subuser is, essentially,
a package manager. It installs and updates programs from
repositories.

I have a set of source files
https://github.com/subuser-security/subuser/tree/master/logic/subuserCommands/subuserlib
which have functions in them. Each function does something to a
program, it identifies the program by the programs name. For
example, I have an installProgram function defined as such:

def installProgram(programName, useCache):

Now I've run into a flaw in this model. There are certain situations
where a "programName" is not a unique identifier. It is possible for
two repositories to each have a program with the same name.
Obviously, I could go through my code and replace all use of the
string "programName" with a tuple of (programName, repository). Or I
could define a new class with two attributes: programName and
repository, and pass such a simple object arround, or pass a
dictionary. However, I think this would be better solved by moving
fully to an OOP model. That is, I would have a SubuserProgram class
which had methods such as "install", "describe", "isInstalled"...
[snip]
Could you make the program name unique just by combining it with the
repository name in a single string?
 
C

Chris Angelico

I'm curious what these practical reasons are. One my smallest source files
has 870 lines in it, my largest nearly 9000.

If the problem is your editor, you should seriously consider switching.

It's probably not the case here, but one good reason for splitting a
file into pieces is to allow separate people or systems to update
different parts. Lots of Linux programs support either
/etc/foobar.conf or /etc/foobar.conf.d/ where the former is one file
and the latter is a directory of separate files, generally deemed to
be concatenated to the main config file. (Example:
/etc/apt/sources.list and /etc/apt/sources.list.d/ - the main config
for your Debian repositories, the directory for additional ones for
VirtualBox or PostgreSQL.) It's easier to allow someone to completely
overwrite a file than to try to merge changes.

But that's not often the case with source code.

ChrisA
 
G

Gregory Ewing

I think this would be better solved
by moving fully to an OOP model. That is, I would have a SubuserProgram
class which had methods such as "install", "describe", "isInstalled"...

This wouldn't necessarily be better. Don't be taken in by the
"everything is better as a class" kind of dogma that seems to
prevail in some circles.

If all you do is take a bunch of module-level functions and
put them into a single class, you haven't really changed anything.
It's still the same design, you've just arranged the source
differently.

There are a couple of good reasons for turning a function into
a method. One is if it embodies implementation details that you
want to keep separated from the rest of the program. But if
*all* of your code is inside the class, there isn't any "rest
of the program" to keep it separate from.

Another is if you want to be able to override it in subclasses.
If there were different kinds of SubuserProgram that needed to
be installed in different ways, for example, it would make
sense to structure this as a collection of classes with an
install() method. But even then, you don't have to put all
the installation code in the classes -- the methods could just
be stubs that call out to different module-level functions if
you wanted.

A reasonable compromise might be to keep the *data* assocated
with a SubuserProgram in a class, maybe together with a few
methods that are tightly coupled to it, but have the major
pieces of functionality such as install() implemented by
separate functions that operate *on* the class, rather than
being inside it.
Currently, I have these functions logically
organized into source files, each between 40 and 170 LOC.

That's quite small as these things typically go. You can afford
to make them somewhat larger; I tend to find that files start to
get unwieldy around 1000 lines or so.
 
G

Gregory Ewing

Ian said:
How
about adding one abstract class per file, and then letting
SubuserProgram inherit from each of those individual classes?

I'd recommend against that kind of thing, because it makes
the code hard to follow. With module-level functions, you can
tell where any given function is coming from by looking at the
imports (as long as you haven't used 'import *', which is a
bad idea for this very reason).

But if you're looking at a method call on a class that
inherits from half a dozen base classes, it's hard to tell
which class it's implemented in.

In other words, massively multiple inheritance is the OO
equivalent of 'import *'.

The same goes for any scheme for injecting methods into a
class after defining it, only more so, because the reader
won't be expecting weird things like that.
 
D

Dave Angel

I don't really understand your problem or your examples, but
others apparently do. So I'll just make a few comments.


There is one problem though. Currently, I have these functions logically organized into source files, each between 40 and 170 LOC. I fear that if I were to put all of these functions into one class, than I would have a single, very large source file. I don't like working with large source files for practicall reasons.

Definitely limit your source file size. 10k lines is probably a
good limit.
If I am to define the class SubuserProgram in the file SubuserProgram.py,

That's a big mistake right there. Never let the module name match
the class name. If you really only happen to have a single class
in the file, then just use lower case for the filename.
 
T

tim.thelion

[snip]
Could you make the program name unique just by combining it with the

repository name in a single string?

In my case I cannot. But there is a larger reason why I wouldn't do this: It would mean adding a special character that could not be included in therepository name, that is, if I were to create a "unique-program-name" string which was of the format repo+"-"+programName then the repo name could not have the "-" symbol in it.

Tim
 
T

tim.thelion

I'm curious what these practical reasons are. One my smallest source files has 870 lines in it, my largest nearly 9000.
If the problem is your editor, you should seriously consider switching.

I think that the main reasons for doing so are as follows:

git status provides much more usefull output when the source files are separate. If all your code is in one file, than git status tells you nothing about what has changed, yet git diff will provide you with overly verbose output.

The seccond reason is that when refactoring code, it is much easier to manage a todo list when one knows which files have been processed already/what needs to be processed. This is especially true if the refactor will take several days or be done by several people.

Tim
 
C

Chris Angelico

[snip]

Could you make the program name unique just by combining it with the

repository name in a single string?

In my case I cannot. But there is a larger reason why I wouldn't do this: It would mean adding a special character that could not be included in the repository name, that is, if I were to create a "unique-program-name" string which was of the format repo+"-"+programName then the repo name could not have the "-" symbol in it.

Can you, then, simply substitute a tuple for the string? Instead of
comparing program name strings, compare tuples of (repo, program_name)
- it should work just fine.

ChrisA
 
S

Steven D'Aprano

[snip]

Could you make the program name unique just by combining it with the

repository name in a single string?

In my case I cannot. But there is a larger reason why I wouldn't do
this: It would mean adding a special character that could not be
included in the repository name,

Do you support \n or \r or \0 in repo names? If not, then there you go,
three special characters to choose from.

But I suspect that a tuple of (repo_name, program_name) will be better.
 
T

tim.thelion

A reasonable compromise might be to keep the *data* assocated
with a SubuserProgram in a class, maybe together with a few

methods that are tightly coupled to it, but have the major

pieces of functionality such as install() implemented by

separate functions that operate *on* the class, rather than

being inside it.

I think this is sound advice. I'm still not sure what I'll come up with.

One of the other reasons why an OOP model might be right for me is that of caching. I currently load a lot of attributes regarding programs from disk, and I do so multiple times, I could either pass those attributes around, OR, using a class, I could store those attributes in the object after loading them just once. I have no experience with OOP except in the domain of GUIs(where it seems inescapable, all major toolkits use OOP), so I'm not yetsure how this will turn out.

Tim
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top