I
IchBin
Info setup: I have a private project for myself because I like to
collect quotes. That is, people quotes, sayings or proverbs. I have
amassed currently 21,631 quotes over 1,255 authors of them. I am
selective in what I want. Each author has a quick biography and link to
Internet for a more complete biography at Wikipedia for further research
if I have the urge. Each quote has a possible reference and or comment.
I have a GUI with Trees with N-Tabs.. and so on.
On one tab I have a program that can be used to check for duplicates
(all or an author) that have gotten thru its initial insert duplication
check. I basically have a spinner to indicate the number of chars from 0
in the quotes that I check for exact duplication and then display them.
Then I can determine if they are true duplicates in the rest of the
quote or if the phraseology just a little different. Then do what may.
Current it works well but I feel that there must be a better
mathematical model to indicate the provability of duplication. So I can
look at only at some percent of that provability. Naturally, a quote can
be a duplicate and not have the same sequence of tokens on the quote or
even more abstract. I don't think you can ever be 100% correct in a
logic comparison because of the Language mechanics. Based on the
similarity of all tokens in the sting I think I can at least predict
what quote should be manually checked. This way there is no need to
enter any parameters to check for duplication.
I do not have a math degree but did have to take a lot a statistics
because of my degree. I have been in the computer field since collage
so my statistics is rusty. I think it can be done better this way but
can not put my finger on it. I guess I could pull out my old stat books.
I am just asking the math people if they could give me some insights or
better way ala mathematical algorithm... For the computer science side
of the house I guess it would definitional involve a tree parse at a
minimum.
Hope my question is understandable. There is no rush . I plan to at
least put the database on my site and then the all of the supporting
programs and docs. You never know their maybe someone out there with the
same interests to have a GUI to do all of that kinda stuff. Else I did
write it for myself anyway.
Thanks in Advance...
IchBin, Pocono Lake, Pa, USA
http://weconsultants.servebeer.com/JHackerAppManager
__________________________________________________________________________
'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
collect quotes. That is, people quotes, sayings or proverbs. I have
amassed currently 21,631 quotes over 1,255 authors of them. I am
selective in what I want. Each author has a quick biography and link to
Internet for a more complete biography at Wikipedia for further research
if I have the urge. Each quote has a possible reference and or comment.
I have a GUI with Trees with N-Tabs.. and so on.
On one tab I have a program that can be used to check for duplicates
(all or an author) that have gotten thru its initial insert duplication
check. I basically have a spinner to indicate the number of chars from 0
in the quotes that I check for exact duplication and then display them.
Then I can determine if they are true duplicates in the rest of the
quote or if the phraseology just a little different. Then do what may.
Current it works well but I feel that there must be a better
mathematical model to indicate the provability of duplication. So I can
look at only at some percent of that provability. Naturally, a quote can
be a duplicate and not have the same sequence of tokens on the quote or
even more abstract. I don't think you can ever be 100% correct in a
logic comparison because of the Language mechanics. Based on the
similarity of all tokens in the sting I think I can at least predict
what quote should be manually checked. This way there is no need to
enter any parameters to check for duplication.
I do not have a math degree but did have to take a lot a statistics
because of my degree. I have been in the computer field since collage
so my statistics is rusty. I think it can be done better this way but
can not put my finger on it. I guess I could pull out my old stat books.
I am just asking the math people if they could give me some insights or
better way ala mathematical algorithm... For the computer science side
of the house I guess it would definitional involve a tree parse at a
minimum.
Hope my question is understandable. There is no rush . I plan to at
least put the database on my site and then the all of the supporting
programs and docs. You never know their maybe someone out there with the
same interests to have a GUI to do all of that kinda stuff. Else I did
write it for myself anyway.
Thanks in Advance...
IchBin, Pocono Lake, Pa, USA
http://weconsultants.servebeer.com/JHackerAppManager
__________________________________________________________________________
'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)