Find Paths in log text - How to?

C

citizenkahn

I am trying to parse a build log for errors. I figure I can do this
one of three ways:
- find the absolute platonic form of an error and search for that item
- create definitions of what patterns describe errors for each tool
which is used (ant, MSDEV, etc).
- rework the build such that all the return codes for all 3rd party are
captured and logged using my own error description

The return code method has a high level of effort attached and spreads
the responsibility for the task quite widely unless I can write a
little command pattern like wrapper script (which is a possibility).
Still it would mean all of the 100s of calls to tools would have to
wrapped and if any place a developer changed this, the build would leak
errors.

The relativism/definition based approach means that I must be sure to
capture all error cases which may prove difficult and false negatives
are a really dangerous problem.

In the Platonic/absolutist camp, I could define an error as an instance
of a word or phrase from the "bad list" that is not in a filename or
path.

Bad List: [error, fatal, killed, not found].

Were I to go this way, I'd be faced with a major problem: in a world
where symbols and whitespace can be included in a path how can I
extract a path from a line of text?

Ugly Valid Paths:
C:\Program Files\A File Named Error .txt
/usr/#a file named error #.txt


This means that determining the boundaries of a path is non trivial.

FileNames:
Many build tools list filenames without their full path. All of my
product's
files are <text>.<ext>, so that is a pattern that I might be able to
locate
.+\..+ perhaps

Paths:
on windows all of my paths will start with [A-Z]:\ or \\
on unix the will tend to start with ./ or /.
Finding the starting point is not too difficult, but its that ending
that's hard


I could generate a substring for each of the starting types and then
look at what came before.
for sep in [letterStart, uncStart, unixrootedStart, unixpwdStart]:
# create sub string
prePath = line.split(elem)[0]
checkForBadWords(prePath)

I could then split the postPath segment on the os.sep and the check for
unlikely cases in the list elements
- double spaces within a path element
- symbol characters within an element (although this is a little dicey)

Since I am parsing the log on the system on which it was generated, for
each path I could do an os.path.exists on the potential path.

If someone happens to know of a good method of extracting weird paths
out of logs, I'd be interested in hearing about it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,075
Latest member
MakersCBDBloodSupport

Latest Threads

Top