Extract from Search Engine Queries

V

VisionSet

Dedicated 2 Java said:
Extract searched keywords from searchengine queries.
Ex:http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=java+forum&btnG=Go
ogle+Search

Expected answer is "java+forum".

import java.util.regex.*;
String stringToSearch =
"http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=java+forum&btnG=Goog
le+Search";
Pattern p = Pattern.compile("(^|&)q=([^&]*)"); // regex expression
Matcher m = p.matcher(stringToSearch);
m.find();
String find = m.group(2); // find second capturing group ie ([^&]*)

regex expression is saying 'look in any supplied string for this pattern':

starts with start of line {^} or {|} the character '&' is then followed by
'q=' (which incidently, since q is the parameter for the google search box,
will always precede the search words). then the bit we're interested in,
look for all {*} characters that are not {^} '&' (since '&' signifies the
end of the parameter).

The (...) are capturing groups and we want the last capturing group. ([^&]*)

^ means two things, when inside [] it means NOT. Otherwise it means
beginning of line.
| means or, but inside [] or is implicit.
[] is a character group, here we use it to mean any character, but not '&'
* means 0,1 or many characters of the type preceding it, so [^&]* means 0,1
or many characters that aren't '&', in this case it is our second capturing
group.

Loads of info in the API under java.util.regex.Pattern

I bet you can't do it like that though.
I imagine you have to flex the String classes substring() & indexOf()
methods ;-)
 
R

Roedy Green

Extract searched keywords from searchengine queries.
Ex:http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=java+forum&btnG=Google+Search

Expected answer is "java+forum".

Any help greatly appreciated

what you need to do is a large variety of queries, also using the
advanced search so you learn the various patterns.

Then you can proceed by:

writing a parser using a tool like JavaCC see
http://mindprod.com/parser.html

Or writing your own parser from scratch that looks for & and =.

See URLDecoder

or you can use Regexes to extract the patterns of interest.
see http://mindprod.com/jgloss/regex.html
 
D

Dedicated 2 Java

Hi Friends,


Joseph Miller, Roedy Green,Visionset, Thanks a trillion.

Happy Friendship Day.
Regards,
Shaiju Jose
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top