Need help in comparing the string words in two arrays.

Discussion in 'Java' started by learner9, Apr 29, 2006.

  1. learner9

    learner9 Guest

    Hello,
    I have a text paragraph and a String[] of StopWords. Now I will have
    to compare the each word of the paragraph with the StopWords array and
    then if the word in paragraph doesn't match it returns false and that
    word be pushed into a vector. So to compare each word of the text
    paragraph I have put it in a String[] like this

    String[] arrAbstractText = txtAbstract.split("\\ ");
    txtAbstract is the below text paragraph.
    ******************************Here is the
    txtAbstract****************************************
    Abstract : A comparative transcriptome analysis for successive stages
    of Arabidopsis developmental leaf senescence (NS),

    darkening-induced senescence of individual leaves attached to the plant
    (DIS) and senescence in dark-incubated detached

    leaves (DET) revealed many novel senescence-associated genes with
    distinct expression profiles. The three senescence

    processes share a high number of regulated genes, although the overall
    number of regulated genes during DIS and DET is about

    two times lower than during NS. Consequently, the number of NS-specific
    genes is much higher than of DIS- or DET-specific

    genes. The expression profiles of transporters, receptor like kinases,
    autophagy genes and hormone pathways were analysed in

    detail. The Arabidopsis transporters and other integral membrane
    proteins were systematically re-classified based on the

    Transporter Classification System. Coordinate activation or
    inactivation of several genes is observed in some transporter

    families in all three or only in individual senescence types,
    indicating differences in the genetic programs for

    remobilization of catabolites. Characteristic senescence type-specific
    differences were also apparent in the expression

    profiles of (putative) signaling kinases. For eight hormones the
    expression of biosynthesis, metabolism, signaling and

    (partially) response genes was investigated. In most pathways novel
    senescence-associated genes were identified. The

    expression profiles of hormone homeostasis and signaling genes reveal
    additional players in the senescence regulatory

    network.
    *****************************************************************************************************

    After putting using split("\\ ") function the above pragraph becomes an
    array and when I debug the value of "arrAbstractText" is looks like
    below

    ***********************************************arrAbstractText
    array*********************************

    [A, comparative, transcriptome, analysis, for, successive, stages, of,
    Arabidopsis, developmental, leaf, senescence, (NS),, darkening-induced,
    senescence, of, individual, leaves, attached, to, the, plant, (DIS),
    and, senescence, in, dark-incubated, detached, leaves, (DET), revealed,
    many, novel, senescence-associated, genes, with, distinct, expression,
    profiles., The, three, senescence, processes, share, a, high, number,
    of, regulated, genes,, although, the, overall, number, of, regulated,
    genes, during, DIS, and, DET, is, about, two, times, lower, than,
    during, NS., Consequently,, the, number, of, NS-specific, genes, is,
    much, higher, than, of, DIS-, or, DET-specific, genes., The,
    expression, profiles, of, transporters,, receptor, like, kinases,,
    autophagy, genes, and, hormone, pathways, were, analysed, in, detail.,
    The, Arabidopsis, transporters, and, other, integral, membrane,
    proteins, were, systematically, re-classified, based, on, the,
    Transporter, Classification, System., Coordinate, activation, or,
    inactivation, of, several, genes, is, observed, in, some, transporter,
    families, in, all, three, or, only, in, individual, senescence, types,,
    indicating, differences, in, the, genetic, programs, for,
    remobilization, of, catabolites., Characteristic, senescence,
    type-specific, differences, were, also, apparent, in, the, expression,
    profiles, of, (putative), signaling, kinases., For, eight, hormones,
    the, expression, of, biosynthesis,, metabolism,, signaling, and,
    (partially), response, genes, was, investigated., In, most, pathways,
    novel, senescence-associated, genes, were, identified., The,
    expression, profiles, of, hormone, homeostasis, and, signaling, genes,
    reveal, additional, players, in, the, senescence, regulatory, network.]
    ********************************************************************************************************

    And also when the StopWords array looks like below when I debug the
    code
    **************************************************StopWords
    array***********************************
    [a, a's, able, about, above, according, accordingly, across, actually,
    after, afterwards, again, against, ain't, all, allow, allows, almost,
    alone, along, already, also, although, always, am, among, amongst, an,
    and, another, any, anybody, anyhow, anyone, anything, anyway, anyways,
    anywhere, apart, appear, appreciate, appropriate, Approximately, are,
    aren't, around, as, aside, ask, asking, associated, at, available,
    away, awfully, b, be, became, because, become, becomes, becoming, been,
    before, beforehand, behind, being, believe, below, beside, besides,
    best, better, between, beyond, both, brief, but, by, c, c'mon, c's,
    came, can, can't, cannot, cant, cause, causes, certain, certainly,
    changes, clearly, co, com, come, comes, concerning, conditions,,
    consequently, consider, considering, contain, containing, contains,
    corresponding, could, couldn't, course, currently, d, definitely,
    described, despite, did, didn't, different, do, does, doesn't, doing,
    don't, done, down, downwards, during, e, each, edu, eg, eight, either,
    else, elsewhere, enough, entirely, especially, et, etc, even, ever,
    every, everybody, everyone, everything, everywhere, ex, exactly,
    example, except, f, far, few, fifth, first, five, followed, followin,
    follows, for, former, formerly, forth, four, from, further,
    furthermore, g, get, gets, getting, given, gives, go, goes, going,
    gone, got, gotten, greetings, h, had, hadn't, happens, hardly, has,
    hasn't, have, haven't, having, he, he's, hello, help, hence, her, here,
    here's, hereafter, hereby, herein, hereupon, hers, herself, hi, him,
    himself, his, hither, hopefully, how, howbeit, however, i, i'd, i'll,
    i'm, i've, ie, if, ignored, immediate, in, inasmuch, inc, indeed,
    indicate, indicated, indicates, inner, insofar, instead, into, inward,
    is, isn't, it, it'd, it'll, it's, its, itself, j, just, k, keep, keeps,
    kept, know, knows, known, l, last, lately, later, latter, latterly,
    least, less, lest, let, let's, like, liked, likely, little, look,
    looking, looks, ltd, m, mainly, many, may, maybe, me, mean, meanwhile,
    merely, might, more, moreover, most, mostly, much, must, my, myself, n,
    name, namely, nd, near, nearly, necessary, need, needs, neither, never,
    nevertheless, new, next, nine, no, nobody, non, none, noone, nor,
    normally, not, nothing, novel, now, nowhere, o, obviously, of, off,
    often, oh, ok, okay, old, on, once, one, ones, only, onto, or, other,
    others, otherwise, ought, our, ours, ourselves, out, outside, over,
    overall, own, p, particular, particularly, per, perhaps, placed,
    please, plus, possess, possible, presumably, probably, provides, q,
    que, quite, qv, r, rather, rd, re, really, reasonably, regarding,
    regardless, regards, relatively, respectively, right, s, said, same,
    saw, say, saying, says, second, secondly, see, seeing, seem, seemed,
    seeming, seems, seen, self, selves, sensible, sent, serious, seriously,
    seven, several, shall, she, should, shouldn't, since, six, so, some,
    somebody, somehow, someone, something, sometime, sometimes, somewhat,
    somewhere, soon, sorry, specified, specify, specifying, still, sub,
    such, sup, sure, t, t's, take, taken, tell, tends, th, than, thank,
    thanks, thanx, that, that's, thats, the, The, their, theirs, them,
    themselves, then, thence, there, there's, thereafter, thereby,
    therefore, therein, theres, thereupon, these, they, they'd, they'll,
    they're, they've, think, third, this, thorough, thoroughly, those,
    though, three, through, throughout, thru, thus, to, together, too,
    took, toward, towards, tried, tries, truly, try, trying, twice, two, u,
    un, under, unfortunately, unless, unlikely, until, unto, up, upon, us,
    use, used, useful, uses, using, usually, uucp, v, value, various, very,
    via, viz, vs, w, want, wants, was, wasn't, way, we, we'd, we'll, we're,
    we've, welcome, well, went, were, weren't, what, what's, whatever,
    when, whence, whenever, where, where's, whereafter, whereas, whereby,
    wherein, whereupon, wherever, whether, which, while, whither, who,
    who's, whoever, whole, whom, whose, why, will, willing, wish, with,
    within, without, won't, wonder, would, would, wouldn't, x, y, yes, yet,
    you, you'd, you'll, you're, you've, your, yours, yourself, yourselves,
    z, zero, -, %, !, @, #, $, ^, &, *, (, ), +, =, ,, ., /, ?, <, >, ~, `]
    *********************************************************************************************************

    and the code I wrote to compare and filter the words is
    *********************************************************************************************************
    String[] arrAbstractText = txtAbstract.split("\\ ");
    boolean match = false;
    for (int k = 0; k < arrAbstractText.length; k++) {
    for (int l = 0; l < stopWords.length; l++) {
    s = String.valueOf(arrAbstractText[k]).trim();
    if(s.length()>0 &&
    stopWords[l].trim().equals(s.toLowerCase())){match=true;}
    }
    if (!match) {
    vFWords.add(arrAbstractText[k].toString().toLowerCase());
    System.out.println("Words do not match :" +
    arrAbstractText[k].toLowerCase().trim());
    }
    }
    *********************************************************************************************************

    I am not sure if I am doing it right in the above code snippet but I
    don't missing lot of words while comparing the text
    Here is a set of words that it suppose to return
    "Arabidopsis"
    "senescence"
    "proteins" and many words like this.

    Could some one please help me with this? Its higly appreciated as I am
    close to the dead line to my project.

    thanks
    -L
     
    learner9, Apr 29, 2006
    #1
    1. Advertisements

  2. learner9

    Rhino Guest

    Have you ever read the "trail" (chapter) on Collections in the Java
    Tutorial? If not, I think you should have a good look at it. You should find
    several techniques in there that will help you do what you want. The "trail"
    starts here: http://java.sun.com/docs/books/tutorial/collections/index.html.
    Several of the topics should be quite helpful to you, even if they don't do
    _exactly_ what you are doing, particularly "Set Interface Bulk Operations"
    in "The Set Interface" and "Multimaps" in "The Map Interface".
     
    Rhino, Apr 29, 2006
    #2
    1. Advertisements

  3. I think this line
    should be put inside the first loop instead?

    Also, it might be better to use equalsIgnoreCase() instead of equals()
     
    Trung Chinh Nguyen, Apr 29, 2006
    #3
  4. learner9

    learner9 Guest

    Hey it works thanks for the heads up. I was kind lost and kinda
    wondering where I have done mistake :)
    thanks once again. By the way you got any clue how do I print a
    variable or text in bold using System.out.println.()?
    For instance

    System.out.println("this is bold text");

    how do I print that in bold?
    -L
     
    learner9, Apr 30, 2006
    #4
  5. learner9

    learner9 Guest

    Hello Rhino,
    Sure I will definitely go through the linke you provided. I am kinda
    newbie and any kind of useful links will be helpful to me. By the way I
    solved the problem.

    thanks for the reply,
    -L
     
    learner9, Apr 30, 2006
    #5
  6. learner9

    Chris Uppal Guest

    There is no easy way to do it, so you might as well give up on the idea.

    (It /can/ be done if you happen to know exactly where your code will be running
    and exactly which -- if any -- escape sequences cause the console or console
    window to change mode, but you really don't want to be messing around with that
    kind of stuff.)

    -- chris
     
    Chris Uppal, Apr 30, 2006
    #6
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.