Need help in comparing the string words in two arrays.

L

learner9

Hello,
I have a text paragraph and a String[] of StopWords. Now I will have
to compare the each word of the paragraph with the StopWords array and
then if the word in paragraph doesn't match it returns false and that
word be pushed into a vector. So to compare each word of the text
paragraph I have put it in a String[] like this

String[] arrAbstractText = txtAbstract.split("\\ ");
txtAbstract is the below text paragraph.
******************************Here is the
txtAbstract****************************************
Abstract : A comparative transcriptome analysis for successive stages
of Arabidopsis developmental leaf senescence (NS),

darkening-induced senescence of individual leaves attached to the plant
(DIS) and senescence in dark-incubated detached

leaves (DET) revealed many novel senescence-associated genes with
distinct expression profiles. The three senescence

processes share a high number of regulated genes, although the overall
number of regulated genes during DIS and DET is about

two times lower than during NS. Consequently, the number of NS-specific
genes is much higher than of DIS- or DET-specific

genes. The expression profiles of transporters, receptor like kinases,
autophagy genes and hormone pathways were analysed in

detail. The Arabidopsis transporters and other integral membrane
proteins were systematically re-classified based on the

Transporter Classification System. Coordinate activation or
inactivation of several genes is observed in some transporter

families in all three or only in individual senescence types,
indicating differences in the genetic programs for

remobilization of catabolites. Characteristic senescence type-specific
differences were also apparent in the expression

profiles of (putative) signaling kinases. For eight hormones the
expression of biosynthesis, metabolism, signaling and

(partially) response genes was investigated. In most pathways novel
senescence-associated genes were identified. The

expression profiles of hormone homeostasis and signaling genes reveal
additional players in the senescence regulatory

network.
*****************************************************************************************************

After putting using split("\\ ") function the above pragraph becomes an
array and when I debug the value of "arrAbstractText" is looks like
below

***********************************************arrAbstractText
array*********************************

[A, comparative, transcriptome, analysis, for, successive, stages, of,
Arabidopsis, developmental, leaf, senescence, (NS),, darkening-induced,
senescence, of, individual, leaves, attached, to, the, plant, (DIS),
and, senescence, in, dark-incubated, detached, leaves, (DET), revealed,
many, novel, senescence-associated, genes, with, distinct, expression,
profiles., The, three, senescence, processes, share, a, high, number,
of, regulated, genes,, although, the, overall, number, of, regulated,
genes, during, DIS, and, DET, is, about, two, times, lower, than,
during, NS., Consequently,, the, number, of, NS-specific, genes, is,
much, higher, than, of, DIS-, or, DET-specific, genes., The,
expression, profiles, of, transporters,, receptor, like, kinases,,
autophagy, genes, and, hormone, pathways, were, analysed, in, detail.,
The, Arabidopsis, transporters, and, other, integral, membrane,
proteins, were, systematically, re-classified, based, on, the,
Transporter, Classification, System., Coordinate, activation, or,
inactivation, of, several, genes, is, observed, in, some, transporter,
families, in, all, three, or, only, in, individual, senescence, types,,
indicating, differences, in, the, genetic, programs, for,
remobilization, of, catabolites., Characteristic, senescence,
type-specific, differences, were, also, apparent, in, the, expression,
profiles, of, (putative), signaling, kinases., For, eight, hormones,
the, expression, of, biosynthesis,, metabolism,, signaling, and,
(partially), response, genes, was, investigated., In, most, pathways,
novel, senescence-associated, genes, were, identified., The,
expression, profiles, of, hormone, homeostasis, and, signaling, genes,
reveal, additional, players, in, the, senescence, regulatory, network.]
********************************************************************************************************

And also when the StopWords array looks like below when I debug the
code
**************************************************StopWords
array***********************************
[a, a's, able, about, above, according, accordingly, across, actually,
after, afterwards, again, against, ain't, all, allow, allows, almost,
alone, along, already, also, although, always, am, among, amongst, an,
and, another, any, anybody, anyhow, anyone, anything, anyway, anyways,
anywhere, apart, appear, appreciate, appropriate, Approximately, are,
aren't, around, as, aside, ask, asking, associated, at, available,
away, awfully, b, be, became, because, become, becomes, becoming, been,
before, beforehand, behind, being, believe, below, beside, besides,
best, better, between, beyond, both, brief, but, by, c, c'mon, c's,
came, can, can't, cannot, cant, cause, causes, certain, certainly,
changes, clearly, co, com, come, comes, concerning, conditions,,
consequently, consider, considering, contain, containing, contains,
corresponding, could, couldn't, course, currently, d, definitely,
described, despite, did, didn't, different, do, does, doesn't, doing,
don't, done, down, downwards, during, e, each, edu, eg, eight, either,
else, elsewhere, enough, entirely, especially, et, etc, even, ever,
every, everybody, everyone, everything, everywhere, ex, exactly,
example, except, f, far, few, fifth, first, five, followed, followin,
follows, for, former, formerly, forth, four, from, further,
furthermore, g, get, gets, getting, given, gives, go, goes, going,
gone, got, gotten, greetings, h, had, hadn't, happens, hardly, has,
hasn't, have, haven't, having, he, he's, hello, help, hence, her, here,
here's, hereafter, hereby, herein, hereupon, hers, herself, hi, him,
himself, his, hither, hopefully, how, howbeit, however, i, i'd, i'll,
i'm, i've, ie, if, ignored, immediate, in, inasmuch, inc, indeed,
indicate, indicated, indicates, inner, insofar, instead, into, inward,
is, isn't, it, it'd, it'll, it's, its, itself, j, just, k, keep, keeps,
kept, know, knows, known, l, last, lately, later, latter, latterly,
least, less, lest, let, let's, like, liked, likely, little, look,
looking, looks, ltd, m, mainly, many, may, maybe, me, mean, meanwhile,
merely, might, more, moreover, most, mostly, much, must, my, myself, n,
name, namely, nd, near, nearly, necessary, need, needs, neither, never,
nevertheless, new, next, nine, no, nobody, non, none, noone, nor,
normally, not, nothing, novel, now, nowhere, o, obviously, of, off,
often, oh, ok, okay, old, on, once, one, ones, only, onto, or, other,
others, otherwise, ought, our, ours, ourselves, out, outside, over,
overall, own, p, particular, particularly, per, perhaps, placed,
please, plus, possess, possible, presumably, probably, provides, q,
que, quite, qv, r, rather, rd, re, really, reasonably, regarding,
regardless, regards, relatively, respectively, right, s, said, same,
saw, say, saying, says, second, secondly, see, seeing, seem, seemed,
seeming, seems, seen, self, selves, sensible, sent, serious, seriously,
seven, several, shall, she, should, shouldn't, since, six, so, some,
somebody, somehow, someone, something, sometime, sometimes, somewhat,
somewhere, soon, sorry, specified, specify, specifying, still, sub,
such, sup, sure, t, t's, take, taken, tell, tends, th, than, thank,
thanks, thanx, that, that's, thats, the, The, their, theirs, them,
themselves, then, thence, there, there's, thereafter, thereby,
therefore, therein, theres, thereupon, these, they, they'd, they'll,
they're, they've, think, third, this, thorough, thoroughly, those,
though, three, through, throughout, thru, thus, to, together, too,
took, toward, towards, tried, tries, truly, try, trying, twice, two, u,
un, under, unfortunately, unless, unlikely, until, unto, up, upon, us,
use, used, useful, uses, using, usually, uucp, v, value, various, very,
via, viz, vs, w, want, wants, was, wasn't, way, we, we'd, we'll, we're,
we've, welcome, well, went, were, weren't, what, what's, whatever,
when, whence, whenever, where, where's, whereafter, whereas, whereby,
wherein, whereupon, wherever, whether, which, while, whither, who,
who's, whoever, whole, whom, whose, why, will, willing, wish, with,
within, without, won't, wonder, would, would, wouldn't, x, y, yes, yet,
you, you'd, you'll, you're, you've, your, yours, yourself, yourselves,
z, zero, -, %, !, @, #, $, ^, &, *, (, ), +, =, ,, ., /, ?, <, >, ~, `]
*********************************************************************************************************

and the code I wrote to compare and filter the words is
*********************************************************************************************************
String[] arrAbstractText = txtAbstract.split("\\ ");
boolean match = false;
for (int k = 0; k < arrAbstractText.length; k++) {
for (int l = 0; l < stopWords.length; l++) {
s = String.valueOf(arrAbstractText[k]).trim();
if(s.length()>0 &&
stopWords[l].trim().equals(s.toLowerCase())){match=true;}
}
if (!match) {
vFWords.add(arrAbstractText[k].toString().toLowerCase());
System.out.println("Words do not match :" +
arrAbstractText[k].toLowerCase().trim());
}
}
*********************************************************************************************************

I am not sure if I am doing it right in the above code snippet but I
don't missing lot of words while comparing the text
Here is a set of words that it suppose to return
"Arabidopsis"
"senescence"
"proteins" and many words like this.

Could some one please help me with this? Its higly appreciated as I am
close to the dead line to my project.

thanks
-L
 
R

Rhino

learner9 said:
Hello,
I have a text paragraph and a String[] of StopWords. Now I will have
to compare the each word of the paragraph with the StopWords array and
then if the word in paragraph doesn't match it returns false and that
word be pushed into a vector. So to compare each word of the text
paragraph I have put it in a String[] like this

String[] arrAbstractText = txtAbstract.split("\\ ");
txtAbstract is the below text paragraph.
******************************Here is the
txtAbstract****************************************
Abstract : A comparative transcriptome analysis for successive stages
of Arabidopsis developmental leaf senescence (NS),

darkening-induced senescence of individual leaves attached to the plant
(DIS) and senescence in dark-incubated detached

leaves (DET) revealed many novel senescence-associated genes with
distinct expression profiles. The three senescence

processes share a high number of regulated genes, although the overall
number of regulated genes during DIS and DET is about

two times lower than during NS. Consequently, the number of NS-specific
genes is much higher than of DIS- or DET-specific

genes. The expression profiles of transporters, receptor like kinases,
autophagy genes and hormone pathways were analysed in

detail. The Arabidopsis transporters and other integral membrane
proteins were systematically re-classified based on the

Transporter Classification System. Coordinate activation or
inactivation of several genes is observed in some transporter

families in all three or only in individual senescence types,
indicating differences in the genetic programs for

remobilization of catabolites. Characteristic senescence type-specific
differences were also apparent in the expression

profiles of (putative) signaling kinases. For eight hormones the
expression of biosynthesis, metabolism, signaling and

(partially) response genes was investigated. In most pathways novel
senescence-associated genes were identified. The

expression profiles of hormone homeostasis and signaling genes reveal
additional players in the senescence regulatory

network.
*****************************************************************************************************

After putting using split("\\ ") function the above pragraph becomes an
array and when I debug the value of "arrAbstractText" is looks like
below

***********************************************arrAbstractText
array*********************************

[A, comparative, transcriptome, analysis, for, successive, stages, of,
Arabidopsis, developmental, leaf, senescence, (NS),, darkening-induced,
senescence, of, individual, leaves, attached, to, the, plant, (DIS),
and, senescence, in, dark-incubated, detached, leaves, (DET), revealed,
many, novel, senescence-associated, genes, with, distinct, expression,
profiles., The, three, senescence, processes, share, a, high, number,
of, regulated, genes,, although, the, overall, number, of, regulated,
genes, during, DIS, and, DET, is, about, two, times, lower, than,
during, NS., Consequently,, the, number, of, NS-specific, genes, is,
much, higher, than, of, DIS-, or, DET-specific, genes., The,
expression, profiles, of, transporters,, receptor, like, kinases,,
autophagy, genes, and, hormone, pathways, were, analysed, in, detail.,
The, Arabidopsis, transporters, and, other, integral, membrane,
proteins, were, systematically, re-classified, based, on, the,
Transporter, Classification, System., Coordinate, activation, or,
inactivation, of, several, genes, is, observed, in, some, transporter,
families, in, all, three, or, only, in, individual, senescence, types,,
indicating, differences, in, the, genetic, programs, for,
remobilization, of, catabolites., Characteristic, senescence,
type-specific, differences, were, also, apparent, in, the, expression,
profiles, of, (putative), signaling, kinases., For, eight, hormones,
the, expression, of, biosynthesis,, metabolism,, signaling, and,
(partially), response, genes, was, investigated., In, most, pathways,
novel, senescence-associated, genes, were, identified., The,
expression, profiles, of, hormone, homeostasis, and, signaling, genes,
reveal, additional, players, in, the, senescence, regulatory, network.]
********************************************************************************************************

And also when the StopWords array looks like below when I debug the
code
**************************************************StopWords
array***********************************
[a, a's, able, about, above, according, accordingly, across, actually,
after, afterwards, again, against, ain't, all, allow, allows, almost,
alone, along, already, also, although, always, am, among, amongst, an,
and, another, any, anybody, anyhow, anyone, anything, anyway, anyways,
anywhere, apart, appear, appreciate, appropriate, Approximately, are,
aren't, around, as, aside, ask, asking, associated, at, available,
away, awfully, b, be, became, because, become, becomes, becoming, been,
before, beforehand, behind, being, believe, below, beside, besides,
best, better, between, beyond, both, brief, but, by, c, c'mon, c's,
came, can, can't, cannot, cant, cause, causes, certain, certainly,
changes, clearly, co, com, come, comes, concerning, conditions,,
consequently, consider, considering, contain, containing, contains,
corresponding, could, couldn't, course, currently, d, definitely,
described, despite, did, didn't, different, do, does, doesn't, doing,
don't, done, down, downwards, during, e, each, edu, eg, eight, either,
else, elsewhere, enough, entirely, especially, et, etc, even, ever,
every, everybody, everyone, everything, everywhere, ex, exactly,
example, except, f, far, few, fifth, first, five, followed, followin,
follows, for, former, formerly, forth, four, from, further,
furthermore, g, get, gets, getting, given, gives, go, goes, going,
gone, got, gotten, greetings, h, had, hadn't, happens, hardly, has,
hasn't, have, haven't, having, he, he's, hello, help, hence, her, here,
here's, hereafter, hereby, herein, hereupon, hers, herself, hi, him,
himself, his, hither, hopefully, how, howbeit, however, i, i'd, i'll,
i'm, i've, ie, if, ignored, immediate, in, inasmuch, inc, indeed,
indicate, indicated, indicates, inner, insofar, instead, into, inward,
is, isn't, it, it'd, it'll, it's, its, itself, j, just, k, keep, keeps,
kept, know, knows, known, l, last, lately, later, latter, latterly,
least, less, lest, let, let's, like, liked, likely, little, look,
looking, looks, ltd, m, mainly, many, may, maybe, me, mean, meanwhile,
merely, might, more, moreover, most, mostly, much, must, my, myself, n,
name, namely, nd, near, nearly, necessary, need, needs, neither, never,
nevertheless, new, next, nine, no, nobody, non, none, noone, nor,
normally, not, nothing, novel, now, nowhere, o, obviously, of, off,
often, oh, ok, okay, old, on, once, one, ones, only, onto, or, other,
others, otherwise, ought, our, ours, ourselves, out, outside, over,
overall, own, p, particular, particularly, per, perhaps, placed,
please, plus, possess, possible, presumably, probably, provides, q,
que, quite, qv, r, rather, rd, re, really, reasonably, regarding,
regardless, regards, relatively, respectively, right, s, said, same,
saw, say, saying, says, second, secondly, see, seeing, seem, seemed,
seeming, seems, seen, self, selves, sensible, sent, serious, seriously,
seven, several, shall, she, should, shouldn't, since, six, so, some,
somebody, somehow, someone, something, sometime, sometimes, somewhat,
somewhere, soon, sorry, specified, specify, specifying, still, sub,
such, sup, sure, t, t's, take, taken, tell, tends, th, than, thank,
thanks, thanx, that, that's, thats, the, The, their, theirs, them,
themselves, then, thence, there, there's, thereafter, thereby,
therefore, therein, theres, thereupon, these, they, they'd, they'll,
they're, they've, think, third, this, thorough, thoroughly, those,
though, three, through, throughout, thru, thus, to, together, too,
took, toward, towards, tried, tries, truly, try, trying, twice, two, u,
un, under, unfortunately, unless, unlikely, until, unto, up, upon, us,
use, used, useful, uses, using, usually, uucp, v, value, various, very,
via, viz, vs, w, want, wants, was, wasn't, way, we, we'd, we'll, we're,
we've, welcome, well, went, were, weren't, what, what's, whatever,
when, whence, whenever, where, where's, whereafter, whereas, whereby,
wherein, whereupon, wherever, whether, which, while, whither, who,
who's, whoever, whole, whom, whose, why, will, willing, wish, with,
within, without, won't, wonder, would, would, wouldn't, x, y, yes, yet,
you, you'd, you'll, you're, you've, your, yours, yourself, yourselves,
z, zero, -, %, !, @, #, $, ^, &, *, (, ), +, =, ,, ., /, ?, <, >, ~, `]
*********************************************************************************************************

and the code I wrote to compare and filter the words is
*********************************************************************************************************
String[] arrAbstractText = txtAbstract.split("\\ ");
boolean match = false;
for (int k = 0; k < arrAbstractText.length; k++) {
for (int l = 0; l < stopWords.length; l++) {
s = String.valueOf(arrAbstractText[k]).trim();
if(s.length()>0 &&
stopWords[l].trim().equals(s.toLowerCase())){match=true;}
}
if (!match) {
vFWords.add(arrAbstractText[k].toString().toLowerCase());
System.out.println("Words do not match :" +
arrAbstractText[k].toLowerCase().trim());
}
}
*********************************************************************************************************

I am not sure if I am doing it right in the above code snippet but I
don't missing lot of words while comparing the text
Here is a set of words that it suppose to return
"Arabidopsis"
"senescence"
"proteins" and many words like this.

Could some one please help me with this? Its higly appreciated as I am
close to the dead line to my project.

Have you ever read the "trail" (chapter) on Collections in the Java
Tutorial? If not, I think you should have a good look at it. You should find
several techniques in there that will help you do what you want. The "trail"
starts here: http://java.sun.com/docs/books/tutorial/collections/index.html.
Several of the topics should be quite helpful to you, even if they don't do
_exactly_ what you are doing, particularly "Set Interface Bulk Operations"
in "The Set Interface" and "Multimaps" in "The Map Interface".
 
T

Trung Chinh Nguyen

learner9 said:
String[] arrAbstractText = txtAbstract.split("\\ ");
boolean match = false;
for (int k = 0; k < arrAbstractText.length; k++) {
for (int l = 0; l < stopWords.length; l++) {
s = String.valueOf(arrAbstractText[k]).trim();
if(s.length()>0 &&
stopWords[l].trim().equals(s.toLowerCase())){match=true;}
}
if (!match) {
vFWords.add(arrAbstractText[k].toString().toLowerCase());
System.out.println("Words do not match :" +
arrAbstractText[k].toLowerCase().trim());
}
}

I think this line
boolean match = false;
should be put inside the first loop instead?

Also, it might be better to use equalsIgnoreCase() instead of equals()
 
L

learner9

Hey it works thanks for the heads up. I was kind lost and kinda
wondering where I have done mistake :)
thanks once again. By the way you got any clue how do I print a
variable or text in bold using System.out.println.()?
For instance

System.out.println("this is bold text");

how do I print that in bold?
-L
 
L

learner9

Hello Rhino,
Sure I will definitely go through the linke you provided. I am kinda
newbie and any kind of useful links will be helpful to me. By the way I
solved the problem.

thanks for the reply,
-L
 
C

Chris Uppal

learner9 said:
By the way you got any clue how do I print a
variable or text in bold using System.out.println.()?

There is no easy way to do it, so you might as well give up on the idea.

(It /can/ be done if you happen to know exactly where your code will be running
and exactly which -- if any -- escape sequences cause the console or console
window to change mode, but you really don't want to be messing around with that
kind of stuff.)

-- chris
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top