String split question.

D

Dan

Hello all,

I have a string with this format:
en_20080722_al_jazirah_018124500030, how can I split it into:

en,
20080722,
al_jazirah,
018124500030,

I can not use the "_" as the delimiter, since in the third part, there
is a "_" inside.
For the first part, always two letters.
For the second part, always 8 digits.
Length of third and fourth part are variable.

Should I use regular expression?
In perl, it can be done with string match (\w{2})_(\d{8})_(\w+)_(\d+),
can I do this in Java also? Thanks a lot.
 
M

Mark Space

Dan said:
Hello all,

I have a string with this format:
en_20080722_al_jazirah_018124500030, how can I split it into:
Should I use regular expression?
In perl, it can be done with string match (\w{2})_(\d{8})_(\w+)_(\d+),
can I do this in Java also? Thanks a lot.

I think so, although you might have to play around a bit with \w+ to set
the greediness. See here:

<http://java.sun.com/docs/books/tutorial/essential/regex/pre_char_classes.html>

and here:

<http://java.sun.com/docs/books/tutorial/essential/regex/>
 
R

Roedy Green

I have a string with this format:
en_20080722_al_jazirah_018124500030, how can I split it into:

en,
20080722,
al_jazirah,
018124500030,

I can not use the "_" as the delimiter, since in the third part, there
is a "_" inside.
For the first part, always two letters.
For the second part, always 8 digits.
Length of third and fourth part are variable.

Should I use regular expression?
In perl, it can be done with string match (\w{2})_(\d{8})_(\w+)_(\d+),
can I do this in Java also? Thanks a lot.

the easiest way is just use '_' as the delimiter and split. Then glue
the accidental split outside the regex.

Another way is to use a regex to capture the pieces separately. You
can then have different rules for different divisions.

you can use X{n,m} to limit the size of chunk.

With fixed length fields, just use substring. It will be an order of
magnitude faster.
 
A

Arne Vajhøj

Dan said:
I have a string with this format:
en_20080722_al_jazirah_018124500030, how can I split it into:

en,
20080722,
al_jazirah,
018124500030,

I can not use the "_" as the delimiter, since in the third part, there
is a "_" inside.
For the first part, always two letters.
For the second part, always 8 digits.
Length of third and fourth part are variable.

Should I use regular expression?
In perl, it can be done with string match (\w{2})_(\d{8})_(\w+)_(\d+),
can I do this in Java also? Thanks a lot.

For inspiration:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Splitting {
public static void main(String[] args) {
String s = "en_20080722_al_jazirah_018124500030";
Pattern p = Pattern.compile("(\\w{2})_(\\d{8})_(\\w+)_(\\d+)");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
System.out.println(m.group(4));
}
String[] parts =
s.split("((?<=^\\w{2})_)|((?<=^\\w{2}_\\d{8})_)|(_(?=\\d+$))");
for(int i = 0; i < parts.length; i++) {
System.out.println(parts);
}
}
}

Arne
 
J

Jean-Baptiste Nizet

Dan a écrit :
Hello all,

I have a string with this format:
en_20080722_al_jazirah_018124500030, how can I split it into:

en,
20080722,
al_jazirah,
018124500030,

I can not use the "_" as the delimiter, since in the third part, there
is a "_" inside.
For the first part, always two letters.
For the second part, always 8 digits.
Length of third and fourth part are variable.

Should I use regular expression?
In perl, it can be done with string match (\w{2})_(\d{8})_(\w+)_(\d+),
can I do this in Java also? Thanks a lot.

You could. But I personnally avoid regexes for such simple cases (and
also because it takes much more time to find the appropriate regex than
it takes me to write the 5 lines of Java code):

String toParse = "en_20080722_al_jazirah_018124500030";
String language = toParse.substring(0, 2);
String date = toParse.substring(3, 11);
int lastUnderscoreIndex = toParse.lastIndexOf('_');
String channel = toParse.substring(12, lastUnderscoreIndex);
String suffix = toParse.substring(lastUnderscoreIndex + 1);

I'm pretty sure this code is significantly more efficient than a regex,
as well.

JB.
 
A

Arne Vajhøj

Jean-Baptiste Nizet said:
Dan a écrit :

You could. But I personnally avoid regexes for such simple cases (and
also because it takes much more time to find the appropriate regex than
it takes me to write the 5 lines of Java code):

String toParse = "en_20080722_al_jazirah_018124500030";
String language = toParse.substring(0, 2);
String date = toParse.substring(3, 11);
int lastUnderscoreIndex = toParse.lastIndexOf('_');
String channel = toParse.substring(12, lastUnderscoreIndex);
String suffix = toParse.substring(lastUnderscoreIndex + 1);

I'm pretty sure this code is significantly more efficient than a regex,
as well.

Me too.

But the code is not very maintenance friendly. Small changes to the
format will require multiple changes to the code. And one error and
you will get a NPE.

Arne
 
A

Arne Vajhøj

Mark said:
One could say the same about regex, too. ;-)

It is possible.

But regex can be written so that it iterates over
matched actually found.

Arne
 
D

Daniel Pitts

Dan said:
Hello all,

I have a string with this format:
en_20080722_al_jazirah_018124500030, how can I split it into:

en,
20080722,
al_jazirah,
018124500030,

I can not use the "_" as the delimiter, since in the third part, there
is a "_" inside.
For the first part, always two letters.
For the second part, always 8 digits.
Length of third and fourth part are variable.

Should I use regular expression?
In perl, it can be done with string match (\w{2})_(\d{8})_(\w+)_(\d+),
can I do this in Java also? Thanks a lot.

It sounds like your string format is poorly designed :)

In any case, if you can ensure that the last field doesn't contain "_",
then you can use lastIndexOf("_") to find the last field, and then
indexOf a couple of times to separate the rest of the fields.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,262
Messages
2,571,045
Members
48,769
Latest member
Clifft

Latest Threads

Top