Regular Expression finder

Discussion in 'Java' started by Joe Smith, Sep 14, 2004.

  1. Joe Smith

    Joe Smith Guest

    Hi,

    does anyone know of a tool that would be able to extract the regular
    expression that corresponds to a set of Strings?

    For instance:

    This tool, given
    "abc", "aec", "akkc"
    would return a regular expression like "a.+c"

    Is this possible? Is it done?

    Thanks!
    Joe Smith, Sep 14, 2004
    #1
    1. Advertising

  2. Joe Smith

    David Hilsee Guest

    "Joe Smith" <> wrote in message
    news:ci6nvt$9la$...
    > Hi,
    >
    > does anyone know of a tool that would be able to extract the regular
    > expression that corresponds to a set of Strings?
    >
    > For instance:
    >
    > This tool, given
    > "abc", "aec", "akkc"
    > would return a regular expression like "a.+c"
    >
    > Is this possible? Is it done?


    _The_ regular expression? There are an infinite number of regular
    expressions that match those strings. Even if there were a tool that could
    guess at a regex using heuristics, you'd still need to examine its output to
    ensure that its result meets your needs.

    Personally, I'd prefer using something that can quickly test the regexes
    that your brain comes up with. The Komodo IDE had such a feature that I
    found quite helpful. I haven't seen anything like it in other IDEs, though.

    --
    David Hilsee
    David Hilsee, Sep 14, 2004
    #2
    1. Advertising

  3. Joe Smith wrote:
    > does anyone know of a tool that would be able to extract the regular
    > expression that corresponds to a set of Strings?


    There is no "the" there.

    > For instance:
    >
    > This tool, given
    > "abc", "aec", "akkc"
    > would return a regular expression like "a.+c"


    Why not "a[bek].*" or "a.*"?

    > Is this possible? Is it done?


    It's certainly possible (and very easy) to write a method to
    return a regular expression that matches any of a given set of
    Strings:

    public String getRegexp(String[] strings){
    return ".*";
    }

    Or did you mean a regexp that matches all of the given Strings
    and *only* those? The example you give fails in that regard, but
    it's also quite easy to do:

    public String getRegexp(String[] strings){
    StringBuffer result = new StringBuffer("(");
    for(int i=0; i<strings.lenght; i++){
    result.append(strings+"|");
    }
    result.setCharAt(result.length()-1, ')');
    return result.toString();
    }

    (you'd have to add escape sequences for characters that have
    meaning in regexps)

    The real question is: which if the *infinite* number of regular expressions
    that matches a given set of Strings do you want to find?
    Michael Borgwardt, Sep 14, 2004
    #3
  4. Joe Smith

    Joe Smith Guest

    > > does anyone know of a tool that would be able to extract the regular
    > > expression that corresponds to a set of Strings?

    >
    > There is no "the" there.
    >
    > > For instance:
    > >
    > > This tool, given
    > > "abc", "aec", "akkc"
    > > would return a regular expression like "a.+c"

    >
    > Why not "a[bek].*" or "a.*"?
    >
    >
    > The real question is: which if the *infinite* number of regular

    expressions
    > that matches a given set of Strings do you want to find?


    Ok, ok... it's clear that my idea needs more explanations:

    It's true that there's an infinite number of regexps that may match a set of
    Strings... So perhaps, what I really want is to extract the common sections
    of these strings... And replace the other parts with the "minimum" regexp...
    And yes, there will be countless of them!!...
    Idea:

    "header body1 body2 footer epilogue"

    "Prolog header body1 footer"

    I would have something like: "(Prolog)? header body1 (body2)? footer
    (epilogue)?"

    For instance, "diff" is able to find the differences between two files...
    The tool I'm thinking off would perform diffs on several inputs, to be able
    to extract these common parts...

    But well, I guess it's too "abstract" for a program.

    Thanks anyway!!
    Joe Smith, Sep 14, 2004
    #4
  5. "Joe Smith" <> wrote in message
    news:ci6sji$kbk$...
    > > > does anyone know of a tool that would be able to extract the regular
    > > > expression that corresponds to a set of Strings?

    > >
    > > There is no "the" there.
    > >
    > > > For instance:
    > > >
    > > > This tool, given
    > > > "abc", "aec", "akkc"
    > > > would return a regular expression like "a.+c"

    > >
    > > Why not "a[bek].*" or "a.*"?
    > >
    > >
    > > The real question is: which if the *infinite* number of regular

    > expressions
    > > that matches a given set of Strings do you want to find?

    >
    > Ok, ok... it's clear that my idea needs more explanations:
    >
    > It's true that there's an infinite number of regexps that may match a set

    of
    > Strings... So perhaps, what I really want is to extract the common

    sections
    > of these strings... And replace the other parts with the "minimum"

    regexp...
    > And yes, there will be countless of them!!...
    > Idea:
    >
    > "header body1 body2 footer epilogue"
    >
    > "Prolog header body1 footer"
    >
    > I would have something like: "(Prolog)? header body1 (body2)? footer
    > (epilogue)?"
    >
    > For instance, "diff" is able to find the differences between two files...
    > The tool I'm thinking off would perform diffs on several inputs, to be

    able
    > to extract these common parts...
    >
    > But well, I guess it's too "abstract" for a program.


    This is a research area, particular in user interfaces. You may find
    something useful here:
    http://www.ics.uci.edu/~dhilbert/papers/EDEM-UCI-ICS-98-13.pdf in section
    4.4

    Cheers,
    Matt Humphrey http://www.iviz.com/
    Matt Humphrey, Sep 14, 2004
    #5
  6. Joe Smith

    sks Guest

    "David Hilsee" <> wrote in message
    news:...
    > "Joe Smith" <> wrote in message
    > news:ci6nvt$9la$...
    > > Hi,
    > >
    > > does anyone know of a tool that would be able to extract the regular
    > > expression that corresponds to a set of Strings?
    > >
    > > For instance:
    > >
    > > This tool, given
    > > "abc", "aec", "akkc"
    > > would return a regular expression like "a.+c"
    > >
    > > Is this possible? Is it done?

    >
    > _The_ regular expression? There are an infinite number of regular
    > expressions that match those strings. Even if there were a tool that

    could
    > guess at a regex using heuristics, you'd still need to examine its output

    to
    > ensure that its result meets your needs.
    >
    > Personally, I'd prefer using something that can quickly test the regexes
    > that your brain comes up with. The Komodo IDE had such a feature that I
    > found quite helpful. I haven't seen anything like it in other IDEs,

    though.

    There's a plug in for Eclipse, you'd have to search for it on google though.
    sks, Sep 14, 2004
    #6
  7. Joe Smith

    Carl Howells Guest

    Michael Borgwardt wrote:

    > public String getRegexp(String[] strings){
    > StringBuffer result = new StringBuffer("(");
    > for(int i=0; i<strings.lenght; i++){
    > result.append(strings+"|");
    > }
    > result.setCharAt(result.length()-1, ')');
    > return result.toString();
    > }
    >
    > (you'd have to add escape sequences for characters that have
    > meaning in regexps)


    Last I checked, the java regex engine is pretty bad for that... It uses
    recursion to build the automaton used for matching, which recurses too
    deeply on an alternation with a few thousand options, throwing an exception.
    Carl Howells, Sep 14, 2004
    #7
  8. Carl Howells wrote:
    > Last I checked, the java regex engine is pretty bad for that... It uses
    > recursion to build the automaton used for matching, which recurses too
    > deeply on an alternation with a few thousand options, throwing an
    > exception.


    It wasn't really meant as a serious suggestion. *any* Regexp engine would
    be a waste of resources to process that kind of pattern.
    Michael Borgwardt, Sep 14, 2004
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. VSK
    Replies:
    2
    Views:
    2,267
  2. Visili

    Best Expired domain finder

    Visili, Oct 6, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    361
    Visili
    Oct 6, 2005
  3. Replies:
    0
    Views:
    372
  4. Replies:
    0
    Views:
    321
  5. kgeffen
    Replies:
    1
    Views:
    386
    Sudsy
    Dec 29, 2003
Loading...

Share This Page