newbie Java regexp question

Discussion in 'Java' started by mitchmcc@yahoo.com, Jul 2, 2007.

  1. Guest

    Below is a small test program I wrote to try and
    do a simple parse of an XML expression, where I
    can extract the tag(s) and the data on a single
    line. Yes, I know about the other ways to parse
    real XML, but I am trying to learn Java only. My
    test case is very simple (see below). The problem
    seems to be something tricky about the fact that
    I am reading the input from the console.

    I have tried the regexp in all of the following forms:

    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");
    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\n");
    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");

    In Windows cmd.exe, none of these match when I enter

    <t1>foo</t1>

    as standard input.

    Any advice would be greatly appreciated.

    Mitch

    -----------------------------------------------------------------------------------------------

    import java.io.*;
    import java.net.*;
    import java.util.regex.*;

    public class test {
    public static void main(String[] args) throws IOException {

    PrintWriter out = null;
    BufferedReader stdIn = null;
    String server = "";
    String userInput;

    stdIn = new BufferedReader(new InputStreamReader(System.in));

    // read arguments
    if(args.length == 1) {
    server = args[0];
    } else {
    System.out.println("no args");
    }

    // this one works, but is not really what I want
    // Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)<(\\S+)>");

    // this one is the correct one that won't match unless the closing tag
    matches
    // the opening tag, but I cannot get it to work with input from the
    console...
    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");

    Matcher m1 = p1.matcher("<t1>foo</t1>\r\n");
    System.out.println("matched test string = " + m1.matches());

    while ((userInput = stdIn.readLine()) != null) {

    System.out.println("got user input: " + userInput + " length " +
    userInput.length());

    // Now see if the pattern matches

    Matcher m = p1.matcher(userInput);

    System.out.println("matched = " + m.matches());

    System.out.println("numGroups found: " + m.groupCount() + "\n");

    // If there were matches, print out the groups found

    if (m.matches()) {

    for (int j = 1; j <= m.groupCount(); j++) {
    System.out.println("group " + m.group(j) + " found\n");
    } // end for
    } // end if

    } // end while

    stdIn.close();

    } // end main

    } // end class test
     
    , Jul 2, 2007
    #1
    1. Advertising

  2. david.karr Guest

    On Jul 2, 11:31 am, "" <> wrote:
    > Below is a small test program I wrote to try and
    > do a simple parse of an XML expression, where I
    > can extract the tag(s) and the data on a single
    > line. Yes, I know about the other ways to parse
    > real XML, but I am trying to learn Java only.


    You're going to be following all sorts of gnarly twisty passages if
    you try to avoid not learning XML. The functionality for parsing XML
    is easily available in standard Java libraries.

    Feel free to explore regular expressions as an intellectual exercise,
    but it's a waste of time if you're actually trying to produce real
    code to parse XML.
     
    david.karr, Jul 2, 2007
    #2
    1. Advertising

  3. timjowers Guest

    On Jul 2, 2:31 pm, "" <> wrote:
    > Below is a small test program I wrote to try and
    > do a simple parse of an XML expression, where I
    > can extract the tag(s) and the data on a single
    > line. Yes, I know about the other ways to parse
    > real XML, but I am trying to learn Java only. My
    > test case is very simple (see below). The problem
    > seems to be something tricky about the fact that
    > I am reading the input from the console.
    >
    > I have tried the regexp in all of the following forms:
    >
    > Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");
    > Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\n");
    > Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");
    >
    > In Windows cmd.exe, none of these match when I enter
    >
    > <t1>foo</t1>
    >
    > as standard input.
    >
    > Any advice would be greatly appreciated.
    >
    > Mitch
    >
    > -----------------------------------------------------------------------------------------------
    >
    > import java.io.*;
    > import java.net.*;
    > import java.util.regex.*;
    >
    > public class test {
    > public static void main(String[] args) throws IOException {
    >
    > PrintWriter out = null;
    > BufferedReader stdIn = null;
    > String server = "";
    > String userInput;
    >
    > stdIn = new BufferedReader(new InputStreamReader(System.in));
    >
    > // read arguments
    > if(args.length == 1) {
    > server = args[0];
    > } else {
    > System.out.println("no args");
    > }
    >
    > // this one works, but is not really what I want
    > // Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)<(\\S+)>");
    >
    > // this one is the correct one that won't match unless the closing tag
    > matches
    > // the opening tag, but I cannot get it to work with input from the
    > console...
    > Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");
    >
    > Matcher m1 = p1.matcher("<t1>foo</t1>\r\n");
    > System.out.println("matched test string = " + m1.matches());
    >
    > while ((userInput = stdIn.readLine()) != null) {
    >
    > System.out.println("got user input: " + userInput + " length " +
    > userInput.length());
    >
    > // Now see if the pattern matches
    >
    > Matcher m = p1.matcher(userInput);
    >
    > System.out.println("matched = " + m.matches());
    >
    > System.out.println("numGroups found: " + m.groupCount() + "\n");
    >
    > // If there were matches, print out the groups found
    >
    > if (m.matches()) {
    >
    > for (int j = 1; j <= m.groupCount(); j++) {
    > System.out.println("group " + m.group(j) + " found\n");
    > } // end for
    > } // end if
    >
    > } // end while
    >
    > stdIn.close();
    >
    > } // end main
    >
    > } // end class test



    It works.

    Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");

    you may be putting a whitespace in the text of the element. Try
    revising the regexp to look for anything not the terminator. E.g. this
    works as is:
    <i>test</i>

    Yet this does not.
    <i>test two</i>


    TimJOwers
     
    timjowers, Jul 2, 2007
    #3
  4. kaldrenon Guest

    On Jul 2, 4:46 pm, timjowers <> wrote:
    > E.g. this
    > works as is:
    > <i>test</i>
    >
    > Yet this does not.
    > <i>test two</i>
    >
    > TimJOwers


    Which could easily be fixed by replacing the (\\S+) in the middle with
    (.?) or (.+), I believe.
     
    kaldrenon, Jul 2, 2007
    #4
  5. Roedy Green Guest

    On Mon, 02 Jul 2007 11:31:08 -0700, ""
    <> wrote, quoted or indirectly quoted someone who
    said :

    >
    > Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>");
    > Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\n");
    > Pattern p1 = Pattern.compile("<(\\S+)>(\\S+)</\\1>\r\n");


    You have 4 things that have to work for your regex as a whole to work.
    Chop your pattern down to just match <t1> then when you get the
    working add the next bit.

    Instead of trying all possibilities of \n, have a look at your string
    and see what is on the end. use charAt to examine it.

    see http://mindprod.com/jgloss/regex.html

    ..
    --
    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
     
    Roedy Green, Jul 2, 2007
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Greg Hurrell
    Replies:
    4
    Views:
    166
    James Edward Gray II
    Feb 14, 2007
  2. Mikel Lindsaar
    Replies:
    0
    Views:
    506
    Mikel Lindsaar
    Mar 31, 2008
  3. Joao Silva
    Replies:
    16
    Views:
    377
    7stud --
    Aug 21, 2009
  4. Uldis  Bojars
    Replies:
    2
    Views:
    196
    Janwillem Borleffs
    Dec 17, 2006
  5. Matìj Cepl

    new RegExp().test() or just RegExp().test()

    Matìj Cepl, Nov 24, 2009, in forum: Javascript
    Replies:
    3
    Views:
    191
    Matěj Cepl
    Nov 24, 2009
Loading...

Share This Page