Re for Apache log file format

Sam Giraffe · Oct 8, 2013

Hi,

I am trying to split up the re pattern for Apache log file format and seem
to be having some trouble in getting Python to understand multi-line
pattern:

#!/usr/bin/python

import re

#this is a single line
string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0"
302 276 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"'

#trying to break up the pattern match for easy to read code
pattern = re.compile(r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+'
r'(?P<ident>\-)\s+'
r'(?P<username>\-)\s+'
r'(?P<TZ>\[(.*?)\])\s+'
r'(?P<url>\"(.*?)\")\s+'
r'(?P<httpcode>\d{3})\s+'
r'(?P<size>\d+)\s+'
r'(?P<referrer>\"\")\s+'
r'(?P<agent>$(.*?)$)')

match = re.search(pattern, string)

if match:
print match.group('ip')
else:
print 'not found'

The python interpreter is skipping to the 'math = re.search' and then the
'if' statement right after it looks at the <ip>, instead of moving onto
<ident> and so on.

mybox:~ user$ python -m pdb /Users/user/Documents/Python/apache.py

/Users/user/Documents/Python/apache.py(3)<module>()

-> import re
(Pdb) n

/Users/user/Documents/Python/apache.py(5)<module>()

-> string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET /
HTTP/1.0" 302 276 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"'
(Pdb) n

/Users/user/Documents/Python/apache.py(7)<module>()

-> pattern = re.compile(r'(?P said:
/Users/user/Documents/Python/apache.py(17)<module>()

-> match = re.search(pattern, string)
(Pdb)

Thank you.

Neil Cerutti · Oct 8, 2013

Hi,

I am trying to split up the re pattern for Apache log file format and seem
to be having some trouble in getting Python to understand multi-line
pattern:

#!/usr/bin/python

import re

#this is a single line
string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0"
302 276 "-" "check_http/v1.4.16 (nagios-plugins 1.4.16)"'

#trying to break up the pattern match for easy to read code
pattern = re.compile(r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+'
r'(?P<ident>\-)\s+'
r'(?P<username>\-)\s+'
r'(?P<TZ>\[(.*?)\])\s+'
r'(?P<url>\"(.*?)\")\s+'
r'(?P<httpcode>\d{3})\s+'
r'(?P<size>\d+)\s+'
r'(?P<referrer>\"\")\s+'
r'(?P<agent>$(.*?)$)')

I recommend using the re.VERBOSE flag when explicating an re.
It'll make your life incrementally easier.

pattern = re.compile(
r"""(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+
(?P<ident>\-)\s+
(?P<username>\-)\s+
(?P<TZ>\[(.*?)\])\s+ # You can even insert comments.
(?P<url>\"(.*?)\")\s+
(?P<httpcode>\d{3})\s+
(?P<size>\d+)\s+
(?P<referrer>\"\")\s+
(?P<agent>$(.*?)$)""", re.VERBOSE)

Denis McMahon · Oct 8, 2013

I am trying to split up the re pattern for Apache log file format and
seem to be having some trouble in getting Python to understand
multi-line pattern:

Aiui apache log format uses space as delimiter, encapsulates strings in
'"' characters, and uses '-' as an empty field.

So I think every element should match: (\S+|"[^"]+"|-) and there should
be \s+ between elements.

Skip Montanaro · Oct 8, 2013

Aiui apache log format uses space as delimiter, encapsulates strings in

'"' characters, and uses '-' as an empty field.

Specifying the field delimiter as a space, you might be able to use
the csv module to read these. I haven't done any Apache log file work
since long before the csv module was available, but it just might
work.

Skip

Piet van Oostrum · Oct 9, 2013

Sam Giraffe said:
Hi,

I am trying to split up the re pattern for Apache log file format and seem to be having some
trouble in getting Python to understand multi-line pattern:

#!/usr/bin/python

import re

#this is a single line
string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0" 302 276 "-" "check_http/
v1.4.16 (nagios-plugins 1.4.16)"'

#trying to break up the pattern match for easy to read code
pattern = re.compile(r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+'
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â r'(?P<ident>\-)\s+'
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â r'(?P<username>\-)\s+'
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â r'(?P<TZ>\[(.*?)\])\s+'
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â r'(?P<url>\"(.*?)\")\s+'
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â r'(?P<httpcode>\d{3})\s+'
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â r'(?P<size>\d+)\s+'
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â r'(?P<referrer>\"\")\s+'
Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â r'(?P<agent>$(.*?)$)')

match = re.search(pattern, string)

if match:
Â Â Â print match.group('ip')
else:
Â Â Â print 'not found'

The python interpreter is skipping to the 'math = re.search' and then the 'if' statement right
after it looks at the <ip>, instead of moving onto <ident> and so on.

Although you have written the regexp as a sequence of lines, in reality it is a single string, and therefore pdb will do only a single step, and not go into its "parts", which really are not parts.

-> pattern = re.compile(r'(?P said:
mybox:~ user$ python -m pdb /Users/user/Documents/Python/apache.py

/Users/user/Documents/Python/apache.py(3)<module>()

Click to expand...

-> import re
(Pdb) n

/Users/user/Documents/Python/apache.py(5)<module>()

Click to expand...

-> string = '192.168.122.3 - - [29/Sep/2013:03:52:33 -0700] "GET / HTTP/1.0" 302 276 "-"
"check_http/v1.4.16 (nagios-plugins 1.4.16)"'
(Pdb) n

/Users/user/Documents/Python/apache.py(7)<module>()

Click to expand...

-> pattern = re.compile(r'(?P said:

/Users/user/Documents/Python/apache.py(17)<module>()

Click to expand...

-> match = re.search(pattern, string)
(Pdb)

Also as Andreas has noted the r'(?P<referrer>\"\")\s+' part is wrong. It should probably be
r'(?P<referrer>\".*?\")\s+'

And the r'(?P<agent>$(.*?)$)') will also not match as there is text outside the (). Should probably also be
r'(?P<agent>\".*?\")') or something like it.

Rich Text Format (RTF) Document Builder in C++: Code and Features	0	Sep 28, 2025
groveling over a file for Q:: and A:: stmts	3	Jul 24, 2012
How do you print a string after it's been searched for an RE?	4	Jun 23, 2011
Parsing log in SQL DB to change IPs to hostnames	13	Apr 10, 2007
Need help with this script	4	Mar 12, 2023
Question on regex	1	Dec 23, 2006
performance problem with time.strptime()	1	Jul 2, 2009
Trouble with quotes	1	Mar 8, 2010

Re for Apache log file format

Sam Giraffe

Neil Cerutti

Denis McMahon

Skip Montanaro

Piet van Oostrum

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads