Find Paths in log text - How to?

citizenkahn · Mar 23, 2006

I am trying to parse a build log for errors. I figure I can do this
one of three ways:
- find the absolute platonic form of an error and search for that item
- create definitions of what patterns describe errors for each tool
which is used (ant, MSDEV, etc).
- rework the build such that all the return codes for all 3rd party are
captured and logged using my own error description

The return code method has a high level of effort attached and spreads
the responsibility for the task quite widely unless I can write a
little command pattern like wrapper script (which is a possibility).
Still it would mean all of the 100s of calls to tools would have to
wrapped and if any place a developer changed this, the build would leak
errors.

The relativism/definition based approach means that I must be sure to
capture all error cases which may prove difficult and false negatives
are a really dangerous problem.

In the Platonic/absolutist camp, I could define an error as an instance
of a word or phrase from the "bad list" that is not in a filename or
path.

Bad List: [error, fatal, killed, not found].

Were I to go this way, I'd be faced with a major problem: in a world
where symbols and whitespace can be included in a path how can I
extract a path from a line of text?

Ugly Valid Paths:
C:\Program Files\A File Named Error .txt
/usr/#a file named error #.txt

This means that determining the boundaries of a path is non trivial.

FileNames:
Many build tools list filenames without their full path. All of my
product's
files are <text>.<ext>, so that is a pattern that I might be able to
locate
.+\..+ perhaps

Paths:
on windows all of my paths will start with [A-Z]:\ or \\
on unix the will tend to start with ./ or /.
Finding the starting point is not too difficult, but its that ending
that's hard

I could generate a substring for each of the starting types and then
look at what came before.
for sep in [letterStart, uncStart, unixrootedStart, unixpwdStart]:
# create sub string
prePath = line.split(elem)[0]
checkForBadWords(prePath)

I could then split the postPath segment on the os.sep and the check for
unlikely cases in the list elements
- double spaces within a path element
- symbol characters within an element (although this is a little dicey)

Since I am parsing the log on the system on which it was generated, for
each path I could do an os.path.exists on the potential path.

If someone happens to know of a good method of extracting weird paths
out of logs, I'd be interested in hearing about it.

Spring Boot Request Mapping: How to Handle Multiple Request Paths in a Controller	1	Oct 12, 2023
How to reliably determine paths of active apache .conf files from within php	2	Jul 27, 2022
Function noseen in C++ , how to find solutions?	0	Oct 4, 2023
How do i edit the log file format for the "Geogebra Classic 6 Exam Mode"?	0	Apr 27, 2023
I cannot find a suitable guide on how to use jzy3d, can anyone help me?	1	Jan 5, 2024
Pyautogui, cv2 and cannot find image	0	Feb 7, 2023
Select Eof extension files based on text list of filenames with if condition	0	May 4, 2022
Python default search paths	2	Oct 19, 2010

Find Paths in log text - How to?

citizenkahn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads