Aiuto!

K

kronos

Volevo esporvi i problemi che ho riscontrato i quali non riguardano errori
di programmazione ma possibili ottimizzazioni del programma a me ignote.
Io sto realizzando un programma che realizza l'arricchimento linguistico
automatizzato di un ontologia tramite l'utilizzo di una risorsa linguistica
concettualizzata (tassonomica e con glosse) che nel mio caso particolare è
wordnet.
I programmi da me utilizzati sono:
-Eclipse SDK (piattaforma java)
-Wordnet 2.0
-Protege 3.1
-OneDollarDB (database)
Il problema principale che ho per adesso è la realizzazione di una tabella
nel database contenente due colonne: word,frequency.
Dove word è la colonna contenente tutte le parole che compongono le glosse
di wordnet e frequency è la loro frequenza di apparizione in esse.
Il problema consiste nel fatto che le operazioni effettuate sul database mi
portano via tanto di quel tempo che la tabella viene relaizzata in
esattamente 5 ore.
Ciò non è accettabile ma io non so come altro fare per migliorare questo
tempo.
Se mi dici che puoi aiutarmi su questa cosa io ti mando il codice del
programma e l'interfaccia per wordnet che utilizzo,inoltre lo schema della
tabella del database.
Inoltre ti spiego anche più specificatamente quello che faccio,
ma in poche parole prendo una glossa, la tokenizzo eliminandone gli elementi
di punteggiatura,per ogni parola faccio una select sulla tabella per vedere
se è già stata inserita se no la inserisco con frequenza uno, se si faccio
un update sulla frequenza.
 
N

Neil Padgen

<posted & mailed>

Ciao kronos!

Normalmente si scrive in inglese in comp.lang.java.programmer. Provo a
traduire che hai scritto.

Non-Italian speakers: here is my attempted, slightly paraphrased translation
of what kronos said:
Subject: Help!

I would like to explain problems I have encountered, not regarding
programming errors but possible optimisations of my program. I have
written a program which automatically translates from an ontology, using a
conceptualised liguistic resource (taxonomy with notes) which in my
particular case is Wordnet.

The programs I am using are:
- Eclipse SDK (Java platform)
- Wordnet 2.0
- Protege 3.1
- OneDollarDB (database)

The principal problem I have for now is the realisation [implementation?]
of a database table containing two columns: word, frequency.
Where word is the column containing all the words which comprise the
Wordnet note, and frequency is their frequency of appearance therein.

The problem consists of the fact that the operations on the database are
taking such a long time that the table is filled in exactly 5 hours.

This is not acceptable, but I don't know what else to do to improve this
time.

If you can tell me how to help me in this thing I will send you the
program code and the Wordnet interface which I am using, along with the
database schema.

I cannot tell you much more specifically what I'm doing, but in a few
words I'm taking a note, tokenizing it eliminating the stopwords, for
each word doing a select a select on the table to see if it has already
been inserted; if not I insert it with frequency one, if yes I update the
frequency.



-- Neil
 
B

Bjorn Abelli

kronos wrote:
"Neil Padgen" translated...

[Note: OP was in Italian]
I would like to explain problems I have encountered, not regarding
programming errors but possible optimisations of my program. I have
written a program which automatically translates from an ontology, using
a
conceptualised liguistic resource (taxonomy with notes) which in my
particular case is Wordnet.

The programs I am using are:
- Eclipse SDK (Java platform)
- Wordnet 2.0
- Protege 3.1
- OneDollarDB (database)

The principal problem I have for now is the realisation
[implementation?] of a database table containing two
columns: word, frequency. Where word is the column
containing all the words which comprise the Wordnet note,
and frequency is their frequency of appearance therein.

The problem consists of the fact that the operations on
the database are taking such a long time that the table
is filled in exactly 5 hours.

There can be many reasons for the time it takes...

Are you sure it's the database?

There are also the "reading" from WordNet, the tokenizing, beside the reads
from and writes to the db.

One suggestion is to time a smaller sample for each step, to be sure the db
*really* is the bottleneck.

Have you tried to create the database with a larger filesize than the
default? (5MB seems to be the default for One$DB, if it needs bigger space,
the "growing" can be an expensive factor)

The algorithm for tokenizing can be one bottleneck...

As this is done for *each* word, there's plenty of selects...

Is the column "word" indexed in some way?

If it isn't, you probably should, as that would be a huge bottleneck in the
search for a specific record...

How do you make this update? By the primary key you got in the previous
select, or by using a "WHERE"-clause?

No, I don't want the code, interface or schema.

I'm more into an open discussion than to do consulting work...

// Bjorn A
 
R

Roedy Green

Couple of thoughts.

Database loads can often be drastically improved by presorting the
data.

You are using a database called "one dollar". You get what you pay
for. You might see if another low price one gives better performance.
See http://mindprod.com/jgloss/sqlvendors.html

During database load it may be possible to turn off various forms of
transaction rollback or transaction replay for extra speed.

You may also find doubling your RAM will help.
 
R

Roedy Green

Database loads can often be drastically improved by presorting the
data.

It seems most likely the database is the bottleneck. You can easily
check that out with a dummy version of your code that bypasses the SQL
calls and see how long they take, or by dumping the massaged data to a
file then using whatever bulk load utility there is. You may also
find that bulk load is considerably faster than feeding records one at
a time.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top