M
M
Hi,
I need to parse text files to extract data records. The files will
consist of a header,
zero or more data records, and a trailer. I can discard the header and
trailer but I must split the data records up and return them to an
application.
The complexity here is that I won't know the exact format of the files
until run time. The files may or may not contain headers and trailers
and the format is not known yet. The records may have clearly defined
start and end markers but they may not. There may be a fixed separator
between the records or there may not. (Separators will be used if
there are no record start and end markers).
The current idea is to use UNIX regular expressions to define the
format of the parts of the file and match them up at run time. However
it is not clear whether it would be possible to develop single
expressions for the whole file or whether I would have to use separate
regular expressions for each part of the file (header, trailer,
separator, begin/end record etc.). If a single expression is used I
would imagine the expression would match all the data records rather
than being able to recognise individual records.
This code is to extend an application already written in C running on
UNIX (&OpenVMS) platforms.
I would be grateful for some thoughts on how this could be achieved.
I need to parse text files to extract data records. The files will
consist of a header,
zero or more data records, and a trailer. I can discard the header and
trailer but I must split the data records up and return them to an
application.
The complexity here is that I won't know the exact format of the files
until run time. The files may or may not contain headers and trailers
and the format is not known yet. The records may have clearly defined
start and end markers but they may not. There may be a fixed separator
between the records or there may not. (Separators will be used if
there are no record start and end markers).
The current idea is to use UNIX regular expressions to define the
format of the parts of the file and match them up at run time. However
it is not clear whether it would be possible to develop single
expressions for the whole file or whether I would have to use separate
regular expressions for each part of the file (header, trailer,
separator, begin/end record etc.). If a single expression is used I
would imagine the expression would match all the data records rather
than being able to recognise individual records.
This code is to extend an application already written in C running on
UNIX (&OpenVMS) platforms.
I would be grateful for some thoughts on how this could be achieved.