lucene: add a field to index, based on html meta tag

Discussion in 'Java' started by Keith Beef, Oct 23, 2007.

  1. Keith Beef

    Keith Beef Guest

    I have a question about building an index file.

    I've been using the Lucene demo from
    http://lucene.apache.org/java/2_1_0/demo.html

    I want to add a field named "category" to my HTML documents, and ideally I
    would like to do this by reading a meta tag in the HTML document, so that
    when searching I can use a term like "category:spare_parts" to limit the
    hits returned.

    E.g., when indexing the file123456789.html the tag <meta name="category"
    content="spare_parts"> would put the value "spare parts" in the "category"
    field.

    So how could I do this?


    Regards,
    Keith.
    Keith Beef, Oct 23, 2007
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. karthikeyavenkat
    Replies:
    2
    Views:
    568
    Bryce
    Mar 17, 2005
  2. Zouplaz

    Lucene : rebuilding the index

    Zouplaz, Oct 19, 2005, in forum: Java
    Replies:
    3
    Views:
    4,743
    Zouplaz
    Oct 23, 2005
  3. shruds
    Replies:
    1
    Views:
    759
    John C. Bollinger
    Jan 27, 2006
  4. John Pritchard-williams

    Trying to open a Lucene-built index with Ferret...

    John Pritchard-williams, Nov 2, 2008, in forum: Ruby
    Replies:
    4
    Views:
    106
    Hugh Sasse
    Nov 3, 2008
  5. Tomasz Chmielewski

    sorting index-15, index-9, index-110 "the human way"?

    Tomasz Chmielewski, Mar 4, 2008, in forum: Perl Misc
    Replies:
    4
    Views:
    271
    Tomasz Chmielewski
    Mar 4, 2008
Loading...

Share This Page