Python good for data mining?

D

Dennis Lee Bieber

The 20th century perspective found it more flexible to base
everything on set theory (or category theory or similar)
which is fundamentally relational. Historically
hierarchical/network databases preceded rdbms's because they
are fundamentally more efficient. Unfortunately, they are
also fundamentally more inflexible (it is generally agreed).
<heh> My college database text book covered the subject with (in
order) a chapter on Hierarchical, then Network, and then Relational (as
a theoretical model).

The next edition of the text book started with Relational, and
treated Hierarchical and Network as historical artifacts.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
J

Jens

Jens,

You might be interested in this bookhttp://www.oreilly.com/catalog/9780596529321/index.html
which is new, I just ordered my copy. From the contents shown online,
it has lot of applicability to data mining, using Python, although it
its primary topic is data mining the web, it also covers analyzing the
data etc.

Ron Stephens

I'm a big fan of O'Reilly's books. This one looks very promising, and
I have just added it to my wishlist. Thanks for the tip Ron!

As for databases - I think its a good idea to support a good selection
of data sources. Just now, one of my main concerns, is to make a
structure that is flexible enough, as well as being easy to use.
Flexibility and usability!
 
F

Francesc

[I've just seen this thread. Although it might be a bit late, let me
state a couple of precisions]

Hi Maarten,

I respectfully disagree that HDF5 is not a DB. Its true that HDF5 on
its prima facie is not relational but rather hierarchical.

Yeah. This largely depends on what we understand by a DB. Lately,
RDBMs are used everywhere, and we tend to believe that they are the
only entities that can be truly called DBs. However, in a less
restrictive view, even a text file can be considered a DB (in fact,
many DBs have been implemented using text files as a base). So, I
wouldn't say that HDF5 is not a DB, but just that it is not a RDBM ;)
Hierarchical is truely a much more natural/elegant[1] design from my
perspective. HDF has always had meta-data capabilities and with the
new 1.8beta version available, it is increasing its ability with
'references/links' allowing for pure/partial relational datasets,
groups, and files as well as storing self implemented indexing.

The C API is obviously much more low level, and Pytables does not yet
support these new features.

That's correct. And although it is well possible that we, PyTables
developers, would end incorporating some relational features to it, we
also recognize that PyTables does not intend (and was never in our
plans) to be a competitor of a pure RDBMS, but rather a *teammate* (as
it is clearly stated in the www.pytables.org home page).

In our opinion, PyTables opens the door to a series of capabilities
that are not found in typical RDBMS, like hierarchical classification,
multidimensional datasets, powerful table entities that are able to
deal with multidimensional columns or nested records, but must
specially, the ability to work with extremely large amounts of data in
a very easy way, without having to renounce to first-class speed.
[1] Anything/everything that is physical/virtual, or can be conceived
is hierarchical... if the system itself is not random/chaotic. Thats a
lovely revelation I've had... EVERYTHING is hierarchical. If it has
context it has hierarchy.

While I agree that this sentence has a part of truth, it is also known
that a lot of things (perhaps much more than we think) in the universe
enter directly in the domain of random/chaotic ;)

IMO, the wisest path should be recognizing the strengths (and
weaknesses) of each approach and use whatever fits better to your
needs. If you need the best of both then go ahead and choose a RDBMS
in combination with a hierarchical DB, and utilize the powerful
capabilities of Python to take the most out of them.

Cheers,

Francesc Altet
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,143
Latest member
DewittMill
Top