F
Fried Egg
Hi Pythonist(a/o)s
I am interested if anyone can shed any light on a web application
problem, both in the specific details (see below) but also in the
theory of how to do ad hoc data processing and exploration through a
web interface (a tall order, I think). It is apropos of my job, if you
hadn't guessed, but the problem (and the solution) might be of more
general interest. I am going to describe the problem as well as I can
below. Any comments are appreciated, including clarification on the
description.
Goal: a web app that (1) takes as input a piece of text such as a novel
and stores it in a class instance, (2) derives transition probabilities
for each word, each two-word string, etc, stores those in a class
instance, (3) creates simulated texts by using these transition
probabilities, stores simulations in class instances, (4) presents
instances of all of the three classes nicely, including histograms or
other graphics, tabular counts of words, etc, (5) organizes each
instance created in a table-of-contents type layout.
Notes on the goal:
(1) Each of these phases in the data processing is worth storing and
viewing later--hence TOC.
(2) The sequence of operations might branch: we might input the text,
but then calculate three different transition rate objects, then
calculate inumerable simulated texts from each of these. Additionally,
we might implement different phases such that you can jump from (2) to
(7) without anything between.
(3) The design should be flexible enough that if there is a new phase
invented, it should be reasonable to add it later without major code
surgery, preferably by doing an insert into a database.
(4) It would be nice to have a way to deal with the versions of the
data objects ("phases" above); e.g. someone inputs texts, I upgrade the
software, then there input data is no longer processable--need to do
something.... An "upgrade" method in the object?..
(5) We need to have authentication, so users can only interact with
their datasets.
(6) The phases can almost be thought of "types" or OO classes, and I
will model them as classes. If you were doing this from a command
line, the phases would be either intermediate files or processes in a
pipeline.
(7) It is worth thinking about enforcing sequences of transformations,
transformations that take several instances to create a new single
instance of something else, and transactioning.
(8) We need to keep track and display (somehow) derivations of data, so
that if you want to grab all the simulated texts derived from a given
transition rate instance you can do that, or if you want to get the
upstream processes you can do that too.
Notes on ideas for the architecture, etc:
(1) My language of choice is Python, including PSP (the mod_python
answer to PHP) for interface work, database postgresql, development
platform gentoo linux.
(2) I am thinking of storing each of the above phases (e.g. word
transition stats object) as a Python object, but in a database in order
to link it with its owner and the preceding data that generated it
(e.g. a stochastic projection would need to be associated with the
original rates entered).
(3) I imagine an interface that kind of looks like google mail (or any
number of other mail programs): the left sidebar contains a list of the
various classes; click on a class name, and in the main portion you get
a list of instances of those classes (where your email messages would
be in gmail) and a list of operations across the top, like delete
(where save, archive, etc would be). If you click on an instance in
the main list, it would display nicely, including graphs etc; it would
also have a list of operations along the top to transform it into other
classes. Above all this would be a global command bar, with commands
like "logout" etc.
Select list of all instances of class 1:
------------------------------------------------------------
| global bar |
------------------------------------------------------------
|*class 1* | operations: on list of instances of class 1|
| |----------------------------------------------
| class 2 | instance 1, class 1 |
| |----------------------------------------------
| class 3 | instance 2, class 1 |
| |----------------------------------------------
| | instance 3, class 1 |
------------------------------------------------------------
Select one instance of class 1:
------------------------------------------------------------
| global bar |
------------------------------------------------------------
| *class 1* | operations: on *instance1* of class 1 |
| |-----------------------------------------------
| class 2 | instance 1, class 1, bunch of stuff |
| | would be text, |
| class 3 | graphics |
| | forms |
| | whatever |
------------------------------------------------------------
(4) I would like something such that you can either inherit or define
a few methods on your objects and incorporate them into the whole
thing. The framework should apply to all transformation of datasets,
from astronomical stuff to population data to whatever.
Main Questions:
(1) Has anybody done something general enough for me to use? I
haven't used a framework before, but most of them seemed geared to
content delivery, which is not the goal here. If there is one that
seems appropriate, please tell me its name and why it is appropriate.
If there are some that seem like they might help if not answer
completely, please tell me why too.
(2) Are there any good places to discuss this in cyberland?
(3) Would anybody else be interested in working on it if it were
general enough to meet their needs too? I could provide hosting and
coordination, and it would be freely available, etc.
(4) Do I sound like a crazy person? Sorry to post such a long thing to
a newsgroup I am not a regular on, but I am a little desperate
Thanks all.
I am interested if anyone can shed any light on a web application
problem, both in the specific details (see below) but also in the
theory of how to do ad hoc data processing and exploration through a
web interface (a tall order, I think). It is apropos of my job, if you
hadn't guessed, but the problem (and the solution) might be of more
general interest. I am going to describe the problem as well as I can
below. Any comments are appreciated, including clarification on the
description.
Goal: a web app that (1) takes as input a piece of text such as a novel
and stores it in a class instance, (2) derives transition probabilities
for each word, each two-word string, etc, stores those in a class
instance, (3) creates simulated texts by using these transition
probabilities, stores simulations in class instances, (4) presents
instances of all of the three classes nicely, including histograms or
other graphics, tabular counts of words, etc, (5) organizes each
instance created in a table-of-contents type layout.
Notes on the goal:
(1) Each of these phases in the data processing is worth storing and
viewing later--hence TOC.
(2) The sequence of operations might branch: we might input the text,
but then calculate three different transition rate objects, then
calculate inumerable simulated texts from each of these. Additionally,
we might implement different phases such that you can jump from (2) to
(7) without anything between.
(3) The design should be flexible enough that if there is a new phase
invented, it should be reasonable to add it later without major code
surgery, preferably by doing an insert into a database.
(4) It would be nice to have a way to deal with the versions of the
data objects ("phases" above); e.g. someone inputs texts, I upgrade the
software, then there input data is no longer processable--need to do
something.... An "upgrade" method in the object?..
(5) We need to have authentication, so users can only interact with
their datasets.
(6) The phases can almost be thought of "types" or OO classes, and I
will model them as classes. If you were doing this from a command
line, the phases would be either intermediate files or processes in a
pipeline.
(7) It is worth thinking about enforcing sequences of transformations,
transformations that take several instances to create a new single
instance of something else, and transactioning.
(8) We need to keep track and display (somehow) derivations of data, so
that if you want to grab all the simulated texts derived from a given
transition rate instance you can do that, or if you want to get the
upstream processes you can do that too.
Notes on ideas for the architecture, etc:
(1) My language of choice is Python, including PSP (the mod_python
answer to PHP) for interface work, database postgresql, development
platform gentoo linux.
(2) I am thinking of storing each of the above phases (e.g. word
transition stats object) as a Python object, but in a database in order
to link it with its owner and the preceding data that generated it
(e.g. a stochastic projection would need to be associated with the
original rates entered).
(3) I imagine an interface that kind of looks like google mail (or any
number of other mail programs): the left sidebar contains a list of the
various classes; click on a class name, and in the main portion you get
a list of instances of those classes (where your email messages would
be in gmail) and a list of operations across the top, like delete
(where save, archive, etc would be). If you click on an instance in
the main list, it would display nicely, including graphs etc; it would
also have a list of operations along the top to transform it into other
classes. Above all this would be a global command bar, with commands
like "logout" etc.
Select list of all instances of class 1:
------------------------------------------------------------
| global bar |
------------------------------------------------------------
|*class 1* | operations: on list of instances of class 1|
| |----------------------------------------------
| class 2 | instance 1, class 1 |
| |----------------------------------------------
| class 3 | instance 2, class 1 |
| |----------------------------------------------
| | instance 3, class 1 |
------------------------------------------------------------
Select one instance of class 1:
------------------------------------------------------------
| global bar |
------------------------------------------------------------
| *class 1* | operations: on *instance1* of class 1 |
| |-----------------------------------------------
| class 2 | instance 1, class 1, bunch of stuff |
| | would be text, |
| class 3 | graphics |
| | forms |
| | whatever |
------------------------------------------------------------
(4) I would like something such that you can either inherit or define
a few methods on your objects and incorporate them into the whole
thing. The framework should apply to all transformation of datasets,
from astronomical stuff to population data to whatever.
Main Questions:
(1) Has anybody done something general enough for me to use? I
haven't used a framework before, but most of them seemed geared to
content delivery, which is not the goal here. If there is one that
seems appropriate, please tell me its name and why it is appropriate.
If there are some that seem like they might help if not answer
completely, please tell me why too.
(2) Are there any good places to discuss this in cyberland?
(3) Would anybody else be interested in working on it if it were
general enough to meet their needs too? I could provide hosting and
coordination, and it would be freely available, etc.
(4) Do I sound like a crazy person? Sorry to post such a long thing to
a newsgroup I am not a regular on, but I am a little desperate
Thanks all.