String Template

T

t.giuseppe

Hello

I'm rewriting a program previously written in C #, and trying to keep the same configuration file, I have a problem with untapped strings.

The previous configuration files provide an input template string of this type:

<input> <![CDATA [{ip} - [{date}] "HTTP/1.1 GET {url}" {?} {?} "{referer}" "{useragent}"]]></ input>


This string is parsed and the values are replaced with the actual values written to a log file (apache), then he is given the variable name.

Taking for example a classic line of apache log:

0.0.0.0 - [27/Dec/2013: 00:56:51 +0100] "GET / webdav / HTTP/1.1" 404 524 "-" "Mozilla/5.0 (Windows, U, Windows NT 5.1, en-US , rv: 1.9.2.12) Gecko/20101026 Firefox/3.6.12 "

Is there any way to pull out the values so arranged as follows:

ip = 0.0.0.0
date = 27/Dec/2013: 00:56:51 +0100
url = / webdav /

Tnx
 
C

Chris Angelico

I'm rewriting a program previously written in C #, and trying to keep the same configuration file, I have a problem with untapped strings.

Not sure what you mean by "untapped" here?
Taking for example a classic line of apache log:

0.0.0.0 - [27/Dec/2013: 00:56:51 +0100] "GET / webdav / HTTP/1.1" 404 524 "-" "Mozilla/5.0 (Windows, U, Windows NT 5.1, en-US , rv: 1.9.2.12) Gecko/20101026 Firefox/3.6.12 "

Is there any way to pull out the values so arranged as follows:

ip = 0.0.0.0
date = 27/Dec/2013: 00:56:51 +0100
url = / webdav /

(Aside: Do you really have spaces in your URLs? That seems odd.)

One common way to implement this sort of thing is with a regular
expression. You can either derive a regex from your config file, or
have users directly manage the regex.

For the specific case of parsing the Apache common log format, there's
plenty of material around. This page [1] has a tidy regex that'll do
the job, and this module [2] purports to create a parser by reading
the configuration line that creates it. I don't know anything about
either, save that they came up in a Google search for 'python apache
common log', along with a whole lot of other decent-looking results.

But for a more general solution - supposing you have piles and piles
of those parser strings - I'd be inclined to write a preparser that
reads your config file and derives regex patterns. It needs to figure
out what's a placeholder and what's literal text, then escape the
literal text (if there are regex metacharacters in it) and come up
with some sort of capturing sequence for the placeholder. I don't know
what you'd want there; possibly (.*?) will be the best (that means
"capture any number of characters, as few as possible"). But you know
your data far better than I do.

ChrisA

[1] http://www.seehuhn.de/blog/52
[2] https://pypi.python.org/pypi/apachelog/1.0
 
G

Giuseppe Tripoli

The problem is that I have a huge amount of log apache, log Akami, log cotendo, log iis ... messily all together.

And the program does is that, according to the file name, use the configuration file to read.
Certainly the quickest and easiest method is to use the regex, but I did not really intend to change the configuration file.

I think I will change technique.

Thank you very much for your answer
 
C

Chris Angelico

Certainly the quickest and easiest method is to use the regex, but I did not really intend to change the configuration file.

Then all you need is a way to convert your config file string into a
regex, which shouldn't be too difficult. Try the translation I
described in the earlier post; it might be all you need.

ChrisA
 
C

Cristiano Araujo

Hello



I'm rewriting a program previously written in C #, and trying to keep the same configuration file, I have a problem with untapped strings.



The previous configuration files provide an input template string of this type:



<input> <![CDATA [{ip} - [{date}] "HTTP/1.1 GET {url}" {?} {?} "{referer}" "{useragent}"]]></ input>





This string is parsed and the values are replaced with the actual values written to a log file (apache), then he is given the variable name.



Taking for example a classic line of apache log:



0.0.0.0 - [27/Dec/2013: 00:56:51 +0100] "GET / webdav / HTTP/1.1" 404 524 "-" "Mozilla/5.0 (Windows, U, Windows NT 5.1, en-US , rv: 1.9.2.12) Gecko/20101026 Firefox/3.6.12 "



Is there any way to pull out the values so arranged as follows:



ip = 0.0.0.0

date = 27/Dec/2013: 00:56:51 +0100

url = / webdav /



Tnx

maybe it can be helpful: http://stackoverflow.com/questions/12544510/parsing-apache-log-files
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,521
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top