data design

I

Imbaud Pierre

The applications I write are made of, lets say, algorithms and data.
I mean constant data, dicts, tables, etc: to keep algorithms simple,
describe what is peculiar, data dependent, as data rather than "case
statements". These could be called configuration data.

The lazy way to do this: have modules that initialize bunches of
objects, attributes holding the data: the object is somehow the row of
the "table", attribute names being the column. This is the way I
proceeded up to now.
Data input this way are almost "configuration data", with 2 big
drawbacks:
- Only a python programmer can fix the file: this cant be called a
configuration file.
- Even for the author, these data aint easy to maintain.

I feel pretty much ready to change this:
- make these data true text data, easier to read and fix.
- write the module that will make python objects out of these data:
the extra cost should yield ease of use.

2 questions arise:
- which kind of text data?
- csv: ok for simple attributes, not easy for lists or complex
data.
- xml: the form wont be easier to read than python code,
but an xml editor could be used, and a formal description
of what is expected can be used.
- how can I make the data-to-object transformation both easy, and able
to spot errors in text data?

Last, but not least: is there a python lib implementing at least part
of this dream?
 
L

Larry Bates

Imbaud said:
The applications I write are made of, lets say, algorithms and data.
I mean constant data, dicts, tables, etc: to keep algorithms simple,
describe what is peculiar, data dependent, as data rather than "case
statements". These could be called configuration data.

The lazy way to do this: have modules that initialize bunches of
objects, attributes holding the data: the object is somehow the row of
the "table", attribute names being the column. This is the way I
proceeded up to now.
Data input this way are almost "configuration data", with 2 big
drawbacks:
- Only a python programmer can fix the file: this cant be called a
configuration file.
- Even for the author, these data aint easy to maintain.

I feel pretty much ready to change this:
- make these data true text data, easier to read and fix.
- write the module that will make python objects out of these data:
the extra cost should yield ease of use.

2 questions arise:
- which kind of text data?
- csv: ok for simple attributes, not easy for lists or complex
data.
- xml: the form wont be easier to read than python code,
but an xml editor could be used, and a formal description
of what is expected can be used.
- how can I make the data-to-object transformation both easy, and able
to spot errors in text data?

Last, but not least: is there a python lib implementing at least part
of this dream?

Use the configurations module. It was built to provide a way to parse
configuration files that provide configuration data to program. It is
VERY fast so the overhead to parse even thousands of lines of config
data is extremely small. I use it a LOT and it is very flexible and
the format of the files is easy for users/programmers to work with.

-Larry Bates
 
S

Szabolcs Nagy

The lazy way to do this: have modules that initialize bunches of
objects, attributes holding the data: the object is somehow the row of
the "table", attribute names being the column. This is the way I
proceeded up to now.
Data input this way are almost "configuration data", with 2 big
drawbacks:
- Only a python programmer can fix the file: this cant be called a
configuration file.
- Even for the author, these data aint easy to maintain.

I feel pretty much ready to change this:
- make these data true text data, easier to read and fix.
- write the module that will make python objects out of these data:
the extra cost should yield ease of use.

2 questions arise:
- which kind of text data?
- csv: ok for simple attributes, not easy for lists or complex
data.
- xml: the form wont be easier to read than python code,
but an xml editor could be used, and a formal description
of what is expected can be used.
- how can I make the data-to-object transformation both easy, and able
to spot errors in text data?

Last, but not least: is there a python lib implementing at least part
of this dream?

there is a csv parser and multiple xml parsers in python (eg
xml.etree) also there is a ConfigParser module (able to parse .ini
like config files)

i personally like the python module as config file the most

eg if you need a bunch of key-value pairs or lists of data:
* python's syntax is pretty nice (dict, tuples and lists or just
key=value)
* xml is absolutely out of question
* csv is very limited
* .ini like config file for more complex stuff is not bad but then you
can use .py as well.
 
I

Imbaud Pierre

Szabolcs Nagy a écrit :
there is a csv parser and multiple xml parsers in python (eg
xml.etree)
I used both. both are ok, but only bring a low layer parsing.
also there is a ConfigParser module (able to parse .ini
like config files)
Used this years ago, I had forgotten. Another fine data text format.
i personally like the python module as config file the most

eg if you need a bunch of key-value pairs or lists of data:
* python's syntax is pretty nice (dict, tuples and lists or just
key=value)
But only python programmer editable!
* xml is absolutely out of question
* csv is very limited
* .ini like config file for more complex stuff is not bad but then you
can use .py as well.

Thanks a lot for your advices.
 
I

Imbaud Pierre

Larry Bates a écrit :
Use the configurations module. It was built to provide a way to parse
configuration files that provide configuration data to program. It is
VERY fast so the overhead to parse even thousands of lines of config
data is extremely small. I use it a LOT and it is very flexible and
the format of the files is easy for users/programmers to work with.

-Larry Bates
U mean configParser? Otherwise be more specific (if U dont mind...)
 
L

Larry Bates

Imbaud said:
Larry Bates a écrit :
U mean configParser? Otherwise be more specific (if U dont mind...)

Sorry, yes I meant configParser module. Had a little "brain disconnect"
there.

-Larry
 
P

Paddy

The applications I write are made of, lets say, algorithms and data.
I mean constant data, dicts, tables, etc: to keep algorithms simple,
describe what is peculiar, data dependent, as data rather than "case
statements". These could be called configuration data.

The lazy way to do this: have modules that initialize bunches of
objects, attributes holding the data: the object is somehow the row of
the "table", attribute names being the column. This is the way I
proceeded up to now.
Data input this way are almost "configuration data", with 2 big
drawbacks:
- Only a python programmer can fix the file: this cant be called a
configuration file.
- Even for the author, these data aint easy to maintain.

I feel pretty much ready to change this:
- make these data true text data, easier to read and fix.
- write the module that will make python objects out of these data:
the extra cost should yield ease of use.

2 questions arise:
- which kind of text data?
- csv: ok for simple attributes, not easy for lists or complex
data.
- xml: the form wont be easier to read than python code,
but an xml editor could be used, and a formal description
of what is expected can be used.
- how can I make the data-to-object transformation both easy, and able
to spot errors in text data?

Last, but not least: is there a python lib implementing at least part
of this dream?
Google for YAML and JSON formats too.
http://www.yaml.org/
http://www.json.org/

-Paddy
 
S

Szabolcs Nagy

Hurray for yaml! A perfect fit for my need! And a swell tool!
Thanks a lot!

i warn you against yaml
it looks nice, but the underlying format is imho too complex (just
look at their spec.)

you said you don't want python source because that's too complex for
the users.
i must say that yaml is not easier to use than python data structures.

if you want userfriedly config files then ConfigParser is the way to
go.

if you want somthing really simple and fast then i'd recommend s-
expressions of lisp

also here is an identation based xml-like tree/hierarchical data
structure syntax:
http://www.scottsweeney.com/projects/slip/
 
J

James Stroud

Szabolcs said:
i warn you against yaml
it looks nice, but the underlying format is imho too complex (just
look at their spec.)

you said you don't want python source because that's too complex for
the users.
i must say that yaml is not easier to use than python data structures.

if you want userfriedly config files then ConfigParser is the way to
go.

if you want somthing really simple and fast then i'd recommend s-
expressions of lisp

also here is an identation based xml-like tree/hierarchical data
structure syntax:
http://www.scottsweeney.com/projects/slip/

I've been spending the last 2 days weighing ConfigParser and yaml, with
much thought and re-organizing of each file type. The underlying
difference is that, conceptually, ini files are an absurdly limited
subset of yaml in that ini files are basically limited to a map of a map.

For instance, I have a copy_files section of a configuration. In order
to know what goes with what you have to resort to gymnastics with the
option names

[copy_files]
files_dir1 = this.file that.file
path_dir1 = /some/path

files_dir2 = the_other.file yet_another.file
path_dir2 = /some/other/path

In yaml, it might look thus.

copy_files :
- files : [this.file, that.file]
path : /some/path
- files : [the_other.file, yet_another.file]
path : /some/other/path

Both are readable (though I like equals signs in appearance over
colons), but yaml doesn't require a lot of string processing to group
the files with the paths. I don't even want to think the coding
gymnastics required to split all of the option names and then group
those with common suffixes.

Now if the config file were for copying only, ini would be okay, because
one could just have sections that group paths and dirs:

[dir1]
files = this.file, that.file
path = /some/path

[dir2]
....

But if you need different kinds of sections, you have outgrown ini.

In essence, ini is limited to a single dictionary of dictionaries while
yama can express pretty much arbitrary complexity.

James
 
S

skam

Google for YAML and JSON formats too

YAML and JSON are good when used as data-interchange format, not as
configuration files.
These formats are too complex for non-programmers, so they will ask
aid for every editing ;)

I suggest ini-like files, parsed using ConfigParser, but you should
have a look to ConfigObj that has got automatic type conversion and
other interesting features
 
I

Imbaud Pierre

James Stroud a écrit :I feel both thanful, and sorry, for your warning. And not convinced
yet, but Ill be cautious.complex indeed, but real powerful.
Is it not true that: if I used yaml, sticking to what .ini allows,
yaml files would be simple?
Easier to read and write, U must agree.
Surround strings with quotes is a python requirement, to distinguish
them from identifiers. This only makes data input for python somewhat
clumsy.
Granted, its a new format to learn. But sharing this format with a
much wider community than python, aint this worth the effort?
(well, if yaml succeeds and spreads...)Granted. for END users. I rather target administrators, programmers,
integrators: make customization an easy process, and allowing this
customization to go much farther than changing simple values, aint
this the REAL challenge for new applications?lisp is more powerful than python. its syntax deterred many
programmers, who adopted python, it will deter my targeted
"customizers". And the process to translate to python structure, I
have no idea. involves a python or lisp translater...
also here is an identation based xml-like tree/hierarchical data
structure syntax:
http://www.scottsweeney.com/projects/slip/ Pretty nice, too! James, have a look at this!

I've been spending the last 2 days weighing ConfigParser and yaml, with
much thought and re-organizing of each file type. The underlying
difference is that, conceptually, ini files are an absurdly limited
subset of yaml in that ini files are basically limited to a map of a map. U have a point here.

For instance, I have a copy_files section of a configuration. In order
to know what goes with what you have to resort to gymnastics with the
option names

[copy_files]
files_dir1 = this.file that.file
path_dir1 = /some/path

files_dir2 = the_other.file yet_another.file
path_dir2 = /some/other/path

In yaml, it might look thus.

copy_files :
- files : [this.file, that.file]
path : /some/path
- files : [the_other.file, yet_another.file]
path : /some/other/path

Both are readable (though I like equals signs in appearance over
colons), but yaml doesn't require a lot of string processing to group
the files with the paths. I don't even want to think the coding
gymnastics required to split all of the option names and then group
those with common suffixes.

Now if the config file were for copying only, ini would be okay, because
one could just have sections that group paths and dirs:

[dir1]
files = this.file, that.file
path = /some/path

[dir2]
...

But if you need different kinds of sections, you have outgrown ini.

In essence, ini is limited to a single dictionary of dictionaries while
yama can express pretty much arbitrary complexity.
James, this single formula makes things real clear.
As we both work on the subject, maybe we could continue to exchange
ideas, and information?
Have a look at the link Szabolcs Nagy <[email protected]> gives:
http://www.scottsweeney.com/projects/slip/
Ill further dig yaml, with 2 questions:
- how do I translate to python?
- how do I express and/or enforce rules the data should follow?
(avoid the classic: configuration data error raise some obscure
exception).

Big thanks to Szabolcs Nagy (hungarian, my friend? I love this
country), although I seem to disagree, your statements are pretty
clear and helpful, and... maybe U are right, and I am a fool...
Pierre
 
?

=?ISO-8859-1?Q?BJ=F6rn_Lindqvist?=

[copy_files]
files_dir1 = this.file that.file
path_dir1 = /some/path

files_dir2 = the_other.file yet_another.file
path_dir2 = /some/other/path

In yaml, it might look thus.

copy_files :
- files : [this.file, that.file]
path : /some/path
- files : [the_other.file, yet_another.file]
path : /some/other/path

Both are readable (though I like equals signs in appearance over
colons), but yaml doesn't require a lot of string processing to group
the files with the paths. I don't even want to think the coding
gymnastics required to split all of the option names and then group
those with common suffixes.

But is not that a perfect world example? Consider:

[copy_files]
files_dir1=this.file that.file
path_dir1=/some/path
files_dir2=the_other.file yet_another.file
path_dir2=/some/other/path

versus:

copy_files:
-files:[this.file,that.file]
path:/some/path
-files:[the_other.file,yet_another.file]
path:/some/other/path

Mandatory indentation is good in programming languages, but does it
really belong in configuration files? With tabs verboten to boot.
 
J

James Stroud

BJörn Lindqvist said:
[copy_files]
files_dir1 = this.file that.file
path_dir1 = /some/path

files_dir2 = the_other.file yet_another.file
path_dir2 = /some/other/path

In yaml, it might look thus.

copy_files :
- files : [this.file, that.file]
path : /some/path
- files : [the_other.file, yet_another.file]
path : /some/other/path

Both are readable (though I like equals signs in appearance over
colons), but yaml doesn't require a lot of string processing to group
the files with the paths. I don't even want to think the coding
gymnastics required to split all of the option names and then group
those with common suffixes.

But is not that a perfect world example? Consider:

[copy_files]
files_dir1=this.file that.file
path_dir1=/some/path
files_dir2=the_other.file yet_another.file
path_dir2=/some/other/path

versus:

copy_files:
-files:[this.file,that.file]
path:/some/path
-files:[the_other.file,yet_another.file]
path:/some/other/path

Mandatory indentation is good in programming languages, but does it
really belong in configuration files? With tabs verboten to boot.

I'm not sure whether to agree with you or disagree with you. My
conclusion is that if it is at all possible, try to use an ini file,
even if you have to stretch your imagination a bit. More complex formats
are prone to one's assigning some imperative meaning to the structure
(as I am doing with my example, which might make it a bad one). However,
these more complex formats can intensely useful for (1) knowledgeable
people with (2) complicated data.
 
J

Jussi Salmela

James Stroud kirjoitti:
<snip>

For instance, I have a copy_files section of a configuration. In order
to know what goes with what you have to resort to gymnastics with the
option names

[copy_files]
files_dir1 = this.file that.file
path_dir1 = /some/path

files_dir2 = the_other.file yet_another.file
path_dir2 = /some/other/path

<snip>
James

You don't have to. With a config file:

###
[copy_files]
/some/path = this.file that.file
C:\a windows\path with spaces= one.1 two.two
a_continuation_line_starting_with_a_tab.xyz
and_another_starting_with_a_some_spaces.abc
/some/other/path = the_other.file yet_another.file
###

the following program:

###
#!/usr/bin/python

import ConfigParser

config = ConfigParser.ConfigParser()
config.readfp(open(r'ConfigTest.INI'))
opts = config.options('copy_files')
print opts
print 'Files to be copied:'
for opt in opts:
path = opt
optVal = config.get('copy_files', opt)
#print opt, optVal
fileNames = optVal.split()
### The following lines are only needed for Windows
### because the use of ':' in Windows' file name's
### device part clashes with its use in ConfigParser
pathParts = ''
for ind in range(len(fileNames)):
if fileNames[ind][-1] in ':=':
path += ':' + pathParts + fileNames[ind][:-1]
del fileNames[:ind+1]
break
pathParts += fileNames[ind] + ' '
### Windows dependent section ends
print ' Path:', '>' + path + '<'
for fn in fileNames:
print ' >' + fn + '<'
###

produces the following output:

###
['c', '/some/other/path', '/some/path']
Files to be copied:
Path: >c:\a windows\path with spaces<
>one.1<
>two.two<
>a_continuation_line_starting_with_a_tab.xyz<
>and_another_starting_with_a_some_spaces.abc< Path: >/some/other/path<
>the_other.file<
>yet_another.file< Path: >/some/path<
>this.file<
>that.file<

###

Cheers,
Jussi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,431
Messages
2,571,679
Members
48,796
Latest member
Greg L.

Latest Threads

Top