C
Christophe Vanfleteren
Hello,
I'm having trouble getting finding the right regex for the following
problem:
Assume you have files in a directory in the following form:
*/GROUP1 - GROUP2/GROUP3 - GROUP4.extension
Group 1 can consist of any alphanumeric character, plus some other chars
(space, underscore, ...). Group 3 consists only of digits.
Group 2 and 4 can contain anything, except a File.separator (since that is
used to split on).
This is the regex (using java.util.regex) I use to split all this (all on
one line):
..*/([\w\s'&_,\
\-]+)\s+-\s+([\p{Graph}\s&&[^/]]+)/(\d+)\s+-\s+([\p{Graph}\s&&[^/]]+).*
I am able to retrieve these 4 separate groups, but I get into problems once
the first group also contains a - (minus) char.
When group 1 looks like X XX-ZZZ, it should still consider all this as the
first group. But at the moment, the regex doesn't match, since I don't
allow "-" in the first group. but if I do allow them, I can no longer split
on " - " (since I also allow spaces in the first group).
So I should be able to construct a regex that allows spaces and the "-" in
the first group, but still starts the second group once it finds a " - ".
I tried messing around with non-capturing groups, like ([\w\s'&_,\.\-]+),
but I guess that's not the way it should be done.
Do any regex experts have tips on how I should construct this regex?
I'm having trouble getting finding the right regex for the following
problem:
Assume you have files in a directory in the following form:
*/GROUP1 - GROUP2/GROUP3 - GROUP4.extension
Group 1 can consist of any alphanumeric character, plus some other chars
(space, underscore, ...). Group 3 consists only of digits.
Group 2 and 4 can contain anything, except a File.separator (since that is
used to split on).
This is the regex (using java.util.regex) I use to split all this (all on
one line):
..*/([\w\s'&_,\
\-]+)\s+-\s+([\p{Graph}\s&&[^/]]+)/(\d+)\s+-\s+([\p{Graph}\s&&[^/]]+).*
I am able to retrieve these 4 separate groups, but I get into problems once
the first group also contains a - (minus) char.
When group 1 looks like X XX-ZZZ, it should still consider all this as the
first group. But at the moment, the regex doesn't match, since I don't
allow "-" in the first group. but if I do allow them, I can no longer split
on " - " (since I also allow spaces in the first group).
So I should be able to construct a regex that allows spaces and the "-" in
the first group, but still starts the second group once it finds a " - ".
I tried messing around with non-capturing groups, like ([\w\s'&_,\.\-]+),
but I guess that's not the way it should be done.
Do any regex experts have tips on how I should construct this regex?