Regular Expression Analyzer

R

radimpe

Hi,

Hope someone can help me with a pet project. I have a huge set of
search terms based on historic searches. I'm working on a search engine
that can do various different types of searches. What I am trying to do
is to see if the historic search terms can determine (through a regular
expression match) which one of the search methods to use.

For example Search type one is an exact match search. I would like to
do an exact match search if a user types in an ISBN Book number (eg.
978-0-306-40615-7). If a user types in a book title "Moby Dick" I would
like to do a wildcard search etc.

Can anybody help me write a Java Application that will process the
historic terms to find out if there is a reliable RegEx for each search
type. Even just a pointer in the right direction would be useful.

Rad.
 
H

hiwa

Hi,

Hope someone can help me with a pet project. I have a huge set of
search terms based on historic searches. I'm working on a search engine
that can do various different types of searches. What I am trying to do
is to see if the historic search terms can determine (through a regular
expression match) which one of the search methods to use.

For example Search type one is an exact match search. I would like to
do an exact match search if a user types in an ISBN Book number (eg.
978-0-306-40615-7). If a user types in a book title "Moby Dick" I would
like to do a wildcard search etc.

Can anybody help me write a Java Application that will process the
historic terms to find out if there is a reliable RegEx for each search
type. Even just a pointer in the right direction would be useful.

Rad.
Be more specific and I could give advice.
 
T

TechBookReport

Hi,

Hope someone can help me with a pet project. I have a huge set of
search terms based on historic searches. I'm working on a search engine
that can do various different types of searches. What I am trying to do
is to see if the historic search terms can determine (through a regular
expression match) which one of the search methods to use.

For example Search type one is an exact match search. I would like to
do an exact match search if a user types in an ISBN Book number (eg.
978-0-306-40615-7). If a user types in a book title "Moby Dick" I would
like to do a wildcard search etc.

Can anybody help me write a Java Application that will process the
historic terms to find out if there is a reliable RegEx for each search
type. Even just a pointer in the right direction would be useful.

Rad.
Here's a quick pointer: http://javaalmanac.com/egs/java.util.regex/pkg.html
 
R

Robert Klemme

Hope someone can help me with a pet project. I have a huge set of
search terms based on historic searches. I'm working on a search engine
that can do various different types of searches. What I am trying to do
is to see if the historic search terms can determine (through a regular
expression match) which one of the search methods to use.

For example Search type one is an exact match search. I would like to
do an exact match search if a user types in an ISBN Book number (eg.
978-0-306-40615-7). If a user types in a book title "Moby Dick" I would
like to do a wildcard search etc.

How is this connected to the historic searches? First you say you want
to use the RE against historic searches but now it seems you want to use
it with current searches.
Can anybody help me write a Java Application that will process the
historic terms to find out if there is a reliable RegEx for each search
type. Even just a pointer in the right direction would be useful.

http://www.amazon.com/l/dp/0596528124/

robert
 
R

radimpe

How is this connected to the historic searches? First you say you want
to use the RE against historic searches but now it seems you want to use
it with current searches.

Appologies. It is linked in as much that I could use them to form a
'pattern' of how people are using the search. I essentially have two
sets of data. Firstly there is the base data that I want people to
search on and secondly there is the historic search terms of how people
have tried to search. My intention is to be able to form a good enough
pattern from both sets that I can 'direct' the historic search terms to
interrogate the data using the right 'type' of search.


Thanks for the links so far. I've got some reading to catch up with.
 
R

radimpe

So based on the javaalmanac (great starting point, thanks) examples
I've written the following 'test'

package com.regexp;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Parsing {

public static void main(String[] args) {
// Parse a line with and's and or's
String inputStr = "11N2222 22NB3333";
String patternStr = "\\d{2}\\D{1,2}\\d{4,5}";
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);

boolean matchFound = matcher.find();
String match = matcher.group();
matchFound = matcher.find(); // true

match = matcher.group();


}

}

Which does match the patterns I know about. What my ultimate aim is, is
to develop a parser that will, given a set of input, determine what the
"patternStr" should be. For example the manufacturer part number on an
electronic distributor's website. It should be possible to work a
number of patterns that can find 90% of the makeup of the manufacturer
part number. (I don't expect there to be just one)

Hope this makes much more sense this time around

Rad.
 
I

IchBin

Hi,

Hope someone can help me with a pet project. I have a huge set of
search terms based on historic searches. I'm working on a search engine
that can do various different types of searches. What I am trying to do
is to see if the historic search terms can determine (through a regular
expression match) which one of the search methods to use.

For example Search type one is an exact match search. I would like to
do an exact match search if a user types in an ISBN Book number (eg.
978-0-306-40615-7). If a user types in a book title "Moby Dick" I would
like to do a wildcard search etc.

Can anybody help me write a Java Application that will process the
historic terms to find out if there is a reliable RegEx for each search
type. Even just a pointer in the right direction would be useful.

Rad.
Eclipse has a plugin that does regular expression analysis. I installed
because I am coding PHP now and that lang seems to use regx much heavier
than Java. Anyway, the plugin is called "QuickREx"

Eclipse update url:
http://www.bastian-bergerhoff.com/eclipse/features.

Eclipse Plugins Site:
http://eclipse-plugins.2y.net/eclipse/plugin_details.jsp?id=964

Can be found here:
http://www.bastian-bergerhoff.com/eclipse/features/web/QuickREx/toc.html

--
Thanks in Advance... http://ichbin.9999mb.com
IchBin, Pocono Lake, Pa, USA http://weconsultants.phpnet.us
__________________________________________________________________________

'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
 
R

Robert Klemme

So based on the javaalmanac (great starting point, thanks) examples
I've written the following 'test'

Which does match the patterns I know about. What my ultimate aim is, is
to develop a parser that will, given a set of input, determine what the
"patternStr" should be. For example the manufacturer part number on an
electronic distributor's website. It should be possible to work a
number of patterns that can find 90% of the makeup of the manufacturer
part number. (I don't expect there to be just one)

Hope this makes much more sense this time around

Yes, I think so. Thanks for the clarification! I will try to rephrase
in my own words just to make sure I understood you correctly. So you
are basically searching for an algorithm that with a given set of inputs
(historic searches) partitions that set into like items (i.e. searches
that share some common pattern) and want to derive this common pattern
from each set. Then you want to use that to find out in which of these
sets of like searches a new search that the user enters belongs and
facilitate that knowledge to optimize the current search.

I think, depending on the inputs (historic searches) this could be a
quite difficult task and I am sorry to say that I do not have a simple
answer. Maybe you should look into text retrieval systems. They might
contain things like that.

Kind regards

robert
 
T

TechBookReport

IchBin said:
Eclipse has a plugin that does regular expression analysis. I installed
because I am coding PHP now and that lang seems to use regx much heavier
than Java. Anyway, the plugin is called "QuickREx"

Eclipse update url:
http://www.bastian-bergerhoff.com/eclipse/features.

Eclipse Plugins Site:
http://eclipse-plugins.2y.net/eclipse/plugin_details.jsp?id=964

Can be found here:
http://www.bastian-bergerhoff.com/eclipse/features/web/QuickREx/toc.html

Thanks for the pointer to QuickREx. Looks very useful indeed.

Pan
 
R

radimpe

Robert said:
Yes, I think so. Thanks for the clarification! I will try to rephrase
in my own words just to make sure I understood you correctly. So you
are basically searching for an algorithm that with a given set of inputs
(historic searches) partitions that set into like items (i.e. searches
that share some common pattern) and want to derive this common pattern
from each set. Then you want to use that to find out in which of these
sets of like searches a new search that the user enters belongs and
facilitate that knowledge to optimize the current search.

I think, depending on the inputs (historic searches) this could be a
quite difficult task and I am sorry to say that I do not have a simple
answer. Maybe you should look into text retrieval systems. They might
contain things like that.

Kind regards

robert

I thought it might be the case... Pet projects always sound so simple
when you start them...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top