REGEX: Problem with representing a SPACE

R

Ramon

Hi all,

I have a problem when trying to make a Regex's pattern that represents a
space (i.e. ' ').

On the 'net I have found the following:
* \s : "The set consisting of all space characters"
* \s : "A whitespace character".

So, I tried to do the following:
Scanner scan = new Scanner("XXhello hereXX");
System.err.println( scan.next("XX([\\sa-z]+)XX") );
// AIM: extract "hello here".

Can someone show me what am I doing wrong?

Thanks
 
J

John B. Matthews

Ramon said:
[...] I tried to do the following:
Scanner scan = new Scanner("XXhello hereXX");
System.err.println( scan.next("XX([\\sa-z]+)XX") );
// AIM: extract "hello here".

Can someone show me what am I doing wrong?

Following the example here:

<http://java.sun.com/javase/6/docs/api/java/util/Scanner.html>

<sscce>
import java.util.Scanner;
public class ScannerTest {
public static void main(String[] args) {
Scanner scan = new Scanner("XXhello hereXX");
scan.useDelimiter("XX");
System.out.println(scan.next());
}
}
</sscce>

<console>
hello here
</console>
 
A

Arved Sandstrom

Hi all,

I have a problem when trying to make a Regex's pattern that represents a
space (i.e. ' ').

On the 'net I have found the following:
* \s : "The set consisting of all space characters" * \s : "A
whitespace character".

So, I tried to do the following:
Scanner scan = new Scanner("XXhello hereXX"); System.err.println(
scan.next("XX([\\sa-z]+)XX") );
// AIM: extract "hello here".

Can someone show me what am I doing wrong?

Thanks

I'm not particularly familiar with Scanner, but you might be better off
using the 'findInLine' method rather than 'next' for grabbing chunks of
text that contain delimiters. The 'findInLine' method does advance. For
example:

String testStr = " hello here whassup ";
Scanner sc = new Scanner(testStr).useDelimiter("\\s+");
String hh = sc.findInLine("[a-zA-Z]+\\s+([a-zA-Z]+)");
MatchResult sw = sc.match();
if (sw.groupCount() == 1)
System.out.printf("Matched word = '%s'\n", sw.group(1));
System.out.printf("'%s'\n", hh);
System.out.printf("'%s'\n", sc.next());
sc.close();

produces

Matched word = 'here'
'hello here'
'whassup'

AHS
 
R

Ramon

Thanks

Solved the problem using regular expressions. It is *really* a pity
that Java does not have a method that is similar to the sscanf() of the
C language, since in my opinion, sscanf() is easier to parse strings
than regular exp.
 
A

Arved Sandstrom

Thanks

Solved the problem using regular expressions. It is *really* a pity
that Java does not have a method that is similar to the sscanf() of the
C language, since in my opinion, sscanf() is easier to parse strings
than regular exp.

To the best of my knowledge there isn't one either. But you can always
write something close...even sscanf in C is just a library function.

For example:

**************************************************
package org.ahs.scanner;

import java.util.Scanner;
import java.util.regex.MatchResult;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ScannerUtils {

public static Object[] sscanf(String input, String format) {

String formatCopy = new String(format);

Scanner sc = new Scanner(input);
String afterStrings = format.replaceAll("%s", "(\\\\w+)");
String afterDoubles =
afterStrings.replaceAll("%f", "(\\\\d+\\\\.\\\\d+)");
String finalFormat = afterDoubles.replaceAll("%d", "(\\\\d+)");
sc.findInLine(finalFormat);
MatchResult sw = sc.match();

Object[] results = new Object[sw.groupCount()];

Pattern formatPattern = Pattern.compile("(%[dfs])");
Matcher formatMatcher = formatPattern.matcher(formatCopy);

int i = 0;
while(formatMatcher.find()) {
int start = formatMatcher.start();
String found = formatCopy.substring(start, start+2);
if (found.equals("%s")) {
results = sw.group(++i);
} else if (found.equals("%d")) {
results = Integer.parseInt(sw.group(++i));
} else if (found.equals("%f")) {
results = Double.parseDouble(sw.group(++i));
} else {
break;
}
}

return results;
}
}
**************************************************
public static void main(String[] args) {
Object[] values = ScannerUtils.sscanf(
"Arved has 5 boxes weighing a total of 15.55 kg",
"%s has %d %s weighing a total of %f %s");
System.out.printf("Name = '%s'\n", values[0]);
System.out.printf("Item number = '%d'\n", values[1]);
System.out.printf("Item Type = '%s'\n", values[2]);
System.out.printf("Mass = '%f'\n", values[3]);
System.out.printf("Unit = '%s'\n", values[4]);
}
**************************************************

results in:

Name = 'Arved'
Item number = '5'
Item Type = 'boxes'
Mass = '15.550000'
Unit = 'kg'

It's not identical, of course...you don't have nice variable names to
refer to, rather array elements. But you've got a format string you're
familiar with.

Don't take the above as a robust piece of code - it's more illustrative
than anything. Some extra error checking or informative messages would be
good.

AHS
 
R

Ramon

Yes Arved, it is possible to implement a method that is similar to
sscanf() (as you have pointed out). My point was that it is a pity that
Sun did not implement a method that is similar to sscanf() in the
"standard" libraries. Maybe one day they will... :)


Arved said:
Thanks

Solved the problem using regular expressions. It is *really* a pity
that Java does not have a method that is similar to the sscanf() of the
C language, since in my opinion, sscanf() is easier to parse strings
than regular exp.

To the best of my knowledge there isn't one either. But you can always
write something close...even sscanf in C is just a library function.

For example:

**************************************************
package org.ahs.scanner;

import java.util.Scanner;
import java.util.regex.MatchResult;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class ScannerUtils {

public static Object[] sscanf(String input, String format) {

String formatCopy = new String(format);

Scanner sc = new Scanner(input);
String afterStrings = format.replaceAll("%s", "(\\\\w+)");
String afterDoubles =
afterStrings.replaceAll("%f", "(\\\\d+\\\\.\\\\d+)");
String finalFormat = afterDoubles.replaceAll("%d", "(\\\\d+)");
sc.findInLine(finalFormat);
MatchResult sw = sc.match();

Object[] results = new Object[sw.groupCount()];

Pattern formatPattern = Pattern.compile("(%[dfs])");
Matcher formatMatcher = formatPattern.matcher(formatCopy);

int i = 0;
while(formatMatcher.find()) {
int start = formatMatcher.start();
String found = formatCopy.substring(start, start+2);
if (found.equals("%s")) {
results = sw.group(++i);
} else if (found.equals("%d")) {
results = Integer.parseInt(sw.group(++i));
} else if (found.equals("%f")) {
results = Double.parseDouble(sw.group(++i));
} else {
break;
}
}

return results;
}
}
**************************************************
public static void main(String[] args) {
Object[] values = ScannerUtils.sscanf(
"Arved has 5 boxes weighing a total of 15.55 kg",
"%s has %d %s weighing a total of %f %s");
System.out.printf("Name = '%s'\n", values[0]);
System.out.printf("Item number = '%d'\n", values[1]);
System.out.printf("Item Type = '%s'\n", values[2]);
System.out.printf("Mass = '%f'\n", values[3]);
System.out.printf("Unit = '%s'\n", values[4]);
}
**************************************************

results in:

Name = 'Arved'
Item number = '5'
Item Type = 'boxes'
Mass = '15.550000'
Unit = 'kg'

It's not identical, of course...you don't have nice variable names to
refer to, rather array elements. But you've got a format string you're
familiar with.

Don't take the above as a robust piece of code - it's more illustrative
than anything. Some extra error checking or informative messages would be
good.

AHS
 
A

Arved Sandstrom

Yes Arved, it is possible to implement a method that is similar to
sscanf() (as you have pointed out). My point was that it is a pity that
Sun did not implement a method that is similar to sscanf() in the
"standard" libraries. Maybe one day they will... :)
[ SNIP ]

I'm guessing that if they haven't done it by now they never will...I'm
willing to bet that Scanner and regular expressions support is about as
far as you'll see the libraries go.

AHS
 
M

Martin Gregorie

Thanks

Solved the problem using regular expressions. It is *really* a pity
that Java does not have a method that is similar to the sscanf() of the
C language, since in my opinion, sscanf() is easier to parse strings
than regular exp.

Fair point, since they did more or less this for sprintf() in 1.5 with
the Formatter class.
 
R

Roedy Green

([\\sa-z]

I think \s has magic meaning only outside [].
--
Roedy Green Canadian Mind Products
http://mindprod.com
PM Steven Harper is fixated on the costs of implementing Kyoto, estimated as high as 1% of GDP.
However, he refuses to consider the costs of not implementing Kyoto which the
famous economist Nicholas Stern estimated at 5 to 20% of GDP
 
L

Lew

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top