regex to get OS from combined log

A

Anno Siegel

[...]
Can I rephrase the question please! Could someone show me a better way
to get the OS out of the User-Agent please?:

Mozilla/4.0 (compatible; MSIE 5.5; Windows 95)
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Microsoft-WebDAV-MiniRedir/5.1.2600
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Mozilla/4.0 (compatible; MSIE 5.5; Windows 95)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Mozilla/4.0 (compatible; MSIE 5.01; Windows 95)
Mozilla/4.08 [en] (WinNT; U ;Nav)
Mozilla/4.0 (compatible; MSIE 5.01; Windows 95)

Before one can think of a method, there would have to be a way to
identify the OS name. There doesn't seem to be much rhyme or reason
in the samples above (even ignoring the line beginning "Microsoft-Web...".
Neither is it always the third semicolon-separated item in (), nor is
it always the last one.

The only way I see is to have a collection of patterns that match
possible OS names (like /^Windows/, /^WinNT/, ...). Split the part
in parens on /;\s*/ and see if one matches. Resort to guessing if
none of the expected OS names is found. Mark the guesses as such, so
you can update the pattern collection if something new appears.

Anno
 
J

JS

Hi,

I have the following regex to pull the IP address and Operating System
from a log in combined logformat:

if (/^(\d*.\d*.\d*.\d*).*\((.*;.*;.*)\)"/){
$ip=$1;
($j1,$j2,$os,$j3)=split /;/,$2;
$os=~s/^\s//;


For the most part this works, but every now and again I get an OS with
characters on the end of it e.g Windows NT 5.1)" "CTG=1065689351

Can anyone help me fix the regex above please? Here's a an example lines
from the log just to show how difficult this is:

12.110.129.108 - u768912 [07/Oct/2003:00:02:07 +0000] "GET
http://www.anpe.fr/offres/unitary/menu_cand_offre.jsp;jse
ssionid=1B0v5E29JcsNTVJlJPuDyvAP3E4O8zs1aeCZAVqvtCp8CCxYcZCk!1647636474!-1062731341!10000!7002
HTTP/1.1" 200 1733 "h
ttp://www.anpe.fr/offres/index.html?option=1&zone=64D&submit=1"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
..NET CLR 1.0.3705)"

166.2.123.85 - u617856 [07/Oct/2003:00:04:30 +0000] "GET
http://www.toptable.co.uk/images/restaurant/small/1389.jpg
HTTP/1.1" 200 1926
"http://www.toptable.co.uk/details.cfm?bf=172,1296,1361,394,1472,1284,40,1389,1322,1229,829,941,1
163,1387,785,1295,701,1092&amp&bfh=133" "Mozilla/4.0 (compatible; MSIE
6.0; Windows NT 5.1; .NET CLR 1.0.3705)" "CFI
D=1218747;CFTOKEN=ca5a76%2Dbfd444bd%2Db2d4%2D4b80%2Da588%2Df24fdd8e1484"

13.59.16.245 - - [07/Oct/2003:00:08:57 +0000] "OPTIONS
http://161.2.67.76/ HTTP/1.0" 504 1548 "-" "Microsoft-WebDAV-
MiniRedir/5.1.2600"

12.32.34.43 - - [07/Oct/2003:00:10:41 +0000] "GET
http://www.nyc.com/web/website.nsf/Images/$file/header.gif HTTP/1.1" 200
1023 "http://www.nyc.com/web/website.nsf" "Mozilla/4.0 (compatible; MSIE
5.01; Windows 95)"

Thanks for any help.

JS.
 
J

JS

JS said:
Hi,

I have the following regex to pull the IP address and Operating System
from a log in combined logformat:

if (/^(\d*.\d*.\d*.\d*).*\((.*;.*;.*)\)"/){
$ip=$1;
($j1,$j2,$os,$j3)=split /;/,$2;
$os=~s/^\s//;


For the most part this works, but every now and again I get an OS with
characters on the end of it e.g Windows NT 5.1)" "CTG=1065689351

Can anyone help me fix the regex above please? Here's a an example lines
from the log just to show how difficult this is:

12.110.129.108 - u768912 [07/Oct/2003:00:02:07 +0000] "GET
http://www.anpe.fr/offres/unitary/menu_cand_offre.jsp;jse
ssionid=1B0v5E29JcsNTVJlJPuDyvAP3E4O8zs1aeCZAVqvtCp8CCxYcZCk!1647636474!-1062731341!10000!7002
HTTP/1.1" 200 1733 "h
ttp://www.anpe.fr/offres/index.html?option=1&zone=64D&submit=1"
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;
.NET CLR 1.0.3705)"

166.2.123.85 - u617856 [07/Oct/2003:00:04:30 +0000] "GET
http://www.toptable.co.uk/images/restaurant/small/1389.jpg
HTTP/1.1" 200 1926
"http://www.toptable.co.uk/details.cfm?bf=172,1296,1361,394,1472,1284,40,1389,1322,1229,829,941,1

163,1387,785,1295,701,1092&amp&bfh=133" "Mozilla/4.0 (compatible; MSIE
6.0; Windows NT 5.1; .NET CLR 1.0.3705)" "CFI
D=1218747;CFTOKEN=ca5a76%2Dbfd444bd%2Db2d4%2D4b80%2Da588%2Df24fdd8e1484"

13.59.16.245 - - [07/Oct/2003:00:08:57 +0000] "OPTIONS
http://161.2.67.76/ HTTP/1.0" 504 1548 "-" "Microsoft-WebDAV-
MiniRedir/5.1.2600"

12.32.34.43 - - [07/Oct/2003:00:10:41 +0000] "GET
http://www.nyc.com/web/website.nsf/Images/$file/header.gif HTTP/1.1" 200
1023 "http://www.nyc.com/web/website.nsf" "Mozilla/4.0 (compatible; MSIE
5.01; Windows 95)"

Thanks for any help.

JS.

Can I rephrase the question please! Could someone show me a better way
to get the OS out of the User-Agent please?:

Mozilla/4.0 (compatible; MSIE 5.5; Windows 95)
Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Microsoft-WebDAV-MiniRedir/5.1.2600
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Mozilla/4.0 (compatible; MSIE 5.5; Windows 95)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.0.3705)
Mozilla/4.0 (compatible; MSIE 5.01; Windows 95)
Mozilla/4.08 [en] (WinNT; U ;Nav)
Mozilla/4.0 (compatible; MSIE 5.01; Windows 95)


Thanks,

JS.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,266
Messages
2,571,075
Members
48,772
Latest member
Backspace Studios

Latest Threads

Top