a question on google's algorithm

Discussion in 'C++' started by %NAME%, Feb 21, 2007.

  1. %NAME%

    %NAME% Guest

    The title is not exactly a disnomer, so pleaes let me explain...

    Say I have a large number of 0-1 vectors containing only
    bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
    I wish to find all the 0-1 vectors that are subsumed by that query bit
    vector.

    For example, given bit vector query as 01100, i wish to find all bit
    vectors that whose 1st, 4th, 5th bits are also zero, leaving
    the 2nd, 3rd bits as either 0 or 1.

    What I wish to do is to build an index on the bit vectors, so that I
    could avoid checking each 0-1 vector in sequence. In another word, I
    hope this index could help me quickly filter away those vectors that
    guarunteed not to be subsumed by the given 0-1 vector.

    I am not sure this is the right place to ask, but the index could be
    used to solve a search engine problem: the 0-1 vector
    corresponds to my keywords search (a 0 means that keyword NOT in my
    query, while a 1 means the keyword is in), and all the webpages
    corresponds to 0-1 vectors. I wish to find all the webpages that
    subsumes my query.

    If anyone happen to know how Google solves this problem, and it is not
    confidential, pls let me know, thanks a bunch!!!!
     
    %NAME%, Feb 21, 2007
    #1
    1. Advertising

  2. * %NAME%:
    > [interesting question, cross-posted to numerous groups]


    No C++ content means OFF-TOPIC in [comp.lang.c++].

    Thank you.

    FUT: [comp.programming]

    --
    A: Because it messes up the order in which people normally read text.
    Q: Why is it such a bad thing?
    A: Top-posting.
    Q: What is the most annoying thing on usenet and in e-mail?
     
    Alf P. Steinbach, Feb 21, 2007
    #2
    1. Advertising

  3. %NAME%

    Bill Reid Guest

    %NAME% <> wrote in message
    news:...

    > The title is not exactly a disnomer, so pleaes let me explain...
    >
    > Say I have a large number of 0-1 vectors containing only
    > bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
    > I wish to find all the 0-1 vectors that are subsumed by that query bit
    > vector.
    >
    > For example, given bit vector query as 01100, i wish to find all bit
    > vectors that whose 1st, 4th, 5th bits are also zero, leaving
    > the 2nd, 3rd bits as either 0 or 1.
    >
    > What I wish to do is to build an index on the bit vectors, so that I
    > could avoid checking each 0-1 vector in sequence. In another word, I
    > hope this index could help me quickly filter away those vectors that
    > guarunteed not to be subsumed by the given 0-1 vector.
    >
    > I am not sure this is the right place to ask, but the index could be
    > used to solve a search engine problem: the 0-1 vector
    > corresponds to my keywords search (a 0 means that keyword NOT in my
    > query, while a 1 means the keyword is in), and all the webpages
    > corresponds to 0-1 vectors. I wish to find all the webpages that
    > subsumes my query.
    >
    > If anyone happen to know how Google solves this problem, and it is not
    > confidential, pls let me know, thanks a bunch!!!!
    >

    Have you tried Googling(TM) your question?

    ---
    William Ernest Reid
     
    Bill Reid, Feb 21, 2007
    #3
  4. %NAME%

    Guest

    On Feb 20, 6:56 pm, "%NAME%" <> wrote:
    > The title is not exactly a disnomer, so pleaes let me explain...
    >
    > Say I have a large number of 0-1 vectors containing only
    > bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
    > I wish to find all the 0-1 vectors that are subsumed by that query bit
    > vector.
    >
    > For example, given bit vector query as 01100, i wish to find all bit
    > vectors that whose 1st, 4th, 5th bits are also zero, leaving
    > the 2nd, 3rd bits as either 0 or 1.
    >
    > What I wish to do is to build an index on the bit vectors, so that I
    > could avoid checking each 0-1 vector in sequence. In another word, I
    > hope this index could help me quickly filter away those vectors that
    > guarunteed not to be subsumed by the given 0-1 vector.
    >
    > I am not sure this is the right place to ask, but the index could be
    > used to solve a search engine problem: the 0-1 vector
    > corresponds to my keywords search (a 0 means that keyword NOT in my
    > query, while a 1 means the keyword is in), and all the webpages
    > corresponds to 0-1 vectors. I wish to find all the webpages that
    > subsumes my query.
    >
    > If anyone happen to know how Google solves this problem, and it is not
    > confidential, pls let me know, thanks a bunch!!!!


    Suppose I have a database of all the words from the
    Official Scrabble Players Dictionary (OSPD). I can
    create a consonant/vowel vector for all 5-letter
    words using the SQL statement:

    SELECT OSPD_words.OSPD,
    IIf(Mid$([OSPD],1,1) Like "[a,e,i,o,u]",1,0) AS bit1,
    IIf(Mid$([OSPD],2,1) Like "[a,e,i,o,u]",1,0) AS bit2,
    IIf(Mid$([OSPD],3,1) Like "[a,e,i,o,u]",1,0) AS bit3,
    IIf(Mid$([OSPD],4,1) Like "[a,e,i,o,u]",1,0) AS bit4,
    IIf(Mid$([OSPD],5,1) Like "[a,e,i,o,u]",1,0) AS bit5,
    [bit1] & [bit2] & [bit3] & [bit4] & [bit5] AS vector
    FROM OSPD_words
    WHERE (((Len([OSPD]))=5));

    which produces:

    OSPD bit1 bit2 bit3 bit4 bit5 vector
    added 1 0 0 1 0 10010
    adder 1 0 0 1 0 10010
    addle 1 0 0 0 1 10001
    adeem 1 0 1 1 0 10110
    adept 1 0 1 0 0 10100
    adieu 1 0 1 1 1 10111
    adios 1 0 1 1 0 10110
    adits 1 0 1 0 0 10100
    adman 1 0 0 1 0 10010
    admen 1 0 0 1 0 10010
    admit 1 0 0 1 0 10010
    admix 1 0 0 1 0 10010
    adobe 1 0 1 0 1 10101
    adobo 1 0 1 0 1 10101
    ..
    ..
    ..

    Once I have those vectors, I can then query them to
    find anything that has the 1st, 4th & 5th letters
    as consonants and the 2nd & 3rd letters can be either
    consonants or vowels. This is done using the
    like "0??00" pattern matching criteria. The question
    marks are single character wild cards:

    SELECT vector_test1.OSPD, vector_test1.vector
    FROM vector_test1
    WHERE (((vector_test1.vector) Like "0??00"));

    which filters the above output to show only the
    requested vectors:

    OSPD vector
    baals 01100
    backs 01000
    baddy 01000
    badly 01000
    baffs 01000
    baffy 01000
    baggy 01000
    bahts 01000
    bails 01100
    bairn 01100
    baith 01100
    baits 01100
    ..
    ..
    ..
    brach 00100
    bract 00100
    brads 00100
    brags 00100
    braky 00100
    brand 00100
    ..
    ..
    ..

    And because we didn'y define "y" as a vowel, there
    are even some 5-letter words with no vowels:

    OSPD vector
    ghyll 00000
    lymph 00000
    hymns 00000
    gypsy 00000
    crypt 00000
    dryly 00000
    lynch 00000
    synth 00000
    glyph 00000
    synch 00000
     
    , Feb 21, 2007
    #4
  5. %NAME%

    Jerry Coffin Guest

    In article <>,
    says...
    > The title is not exactly a disnomer, so pleaes let me explain...
    >
    > Say I have a large number of 0-1 vectors containing only
    > bits corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
    > I wish to find all the 0-1 vectors that are subsumed by that query bit
    > vector.
    >
    > For example, given bit vector query as 01100, i wish to find all bit
    > vectors that whose 1st, 4th, 5th bits are also zero, leaving
    > the 2nd, 3rd bits as either 0 or 1.


    Assuming your bit-vectors were stored in an integer type, you could use
    a comparison like this:

    if ((bit_vector | pattern) == pattern)
    // found
    else
    // not found

    If you need to store larger bit-vectors than will fit in any integer
    type (only 32 are guaranteed), you can look at std::bit_vector and/or
    std::vector<bool>. IIRC, bit_vector supports boolean operations
    directly, but for vector<bool> you'll probably need to write your own.

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Feb 21, 2007
    #5
  6. %NAME%

    CBFalconer Guest

    Jerry Coffin wrote:
    > says...
    >
    >> The title is not exactly a disnomer, so pleaes let me explain...
    >>
    >> Say I have a large number of 0-1 vectors containing only bits
    >> corresponding to 0 or 1. Now given an arbitrary 0-1 vector,
    >> I wish to find all the 0-1 vectors that are subsumed by that
    >> query bit vector.
    >>
    >> For example, given bit vector query as 01100, i wish to find
    >> all bit vectors that whose 1st, 4th, 5th bits are also zero,
    >> leaving the 2nd, 3rd bits as either 0 or 1.

    >
    > Assuming your bit-vectors were stored in an integer type, you
    > could use a comparison like this:
    >
    > if ((bit_vector | pattern) == pattern)
    > // found
    > else
    > // not found
    >
    > If you need to store larger bit-vectors than will fit in any
    > integer type (only 32 are guaranteed), you can look at
    > std::bit_vector and/or std::vector<bool>. IIRC, bit_vector
    > supports boolean operations directly, but for vector<bool>
    > you'll probably need to write your own.


    The fault is basically that of the OP, but please do not post
    off-topic information on c.l.c. It is easy to overlook the foolish
    cross-posting, but you should not answer in any group where the
    answer (and question) are off topic. F'ups set.

    --
    <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
    <http://www.securityfocus.com/columnists/423>

    "A man who is right every time is not likely to do very much."
    -- Francis Crick, co-discover of DNA
    "There is nothing more amazing than stupidity in action."
    -- Thomas Matthews
     
    CBFalconer, Feb 21, 2007
    #6
  7. %NAME%

    Randy Howard Guest

    On Tue, 20 Feb 2007 18:56:19 -0600, NAME% wrote
    (in article <>):

    > The title is not exactly a disnomer, so pleaes let me explain...


    What in the world is a disnomer? Is this another one of those "we made
    up a word because the mood struck us" deals?


    --
    Randy Howard (2reply remove FOOBAR)
    "The power of accurate observation is called cynicism by those
    who have not got it." - George Bernard Shaw
     
    Randy Howard, Mar 6, 2007
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Johann Blake
    Replies:
    0
    Views:
    340
    Johann Blake
    Jan 21, 2004
  2. Ahmed Moustafa
    Replies:
    0
    Views:
    805
    Ahmed Moustafa
    Nov 15, 2003
  3. Bapaiah Katepalli
    Replies:
    1
    Views:
    1,522
    Mike Treseler
    Jun 23, 2006
  4. %NAME%

    a question on google's algorithm

    %NAME%, Feb 21, 2007, in forum: C Programming
    Replies:
    6
    Views:
    311
    Randy Howard
    Mar 6, 2007
  5. Replies:
    1
    Views:
    734
    cirfu
    Jun 19, 2008
Loading...

Share This Page