Help with pyparsing and dealing with null values

A

avidfan

Help with pyparsing and dealing with null values

I am trying to parse a log file (web.out) similar to this:

-----------------------------------------------------------

MBeanName: "mtg-model:Name=mtg-model_managed2,Type=Server"
AcceptBacklog: 50
AdministrationPort: 0
AutoKillIfFailed: false
AutoRestart: true
COM: mtg-model_managed2
COMEnabled: false
CachingDisabled: true
ClasspathServletDisabled: false
ClientCertProxyEnabled: false
Cluster: mtg-model-cluster
ClusterRuntime: mtg-model-cluster
ClusterWeight: 100
CompleteCOMMessageTimeout: -1
CompleteHTTPMessageTimeout: -1
CompleteIIOPMessageTimeout: -1
CompleteMessageTimeout: 60
CompleteT3MessageTimeout: -1
CustomIdentityKeyStoreFileName:
CustomIdentityKeyStorePassPhrase:
CustomIdentityKeyStorePassPhraseEncrypted:
CustomIdentityKeyStoreType:
CustomTrustKeyStoreFileName:
CustomTrustKeyStorePassPhrase:
CustomTrustKeyStorePassPhraseEncrypted:
CustomTrustKeyStoreType:
DefaultIIOPPassword:
DefaultIIOPPasswordEncrypted:
DefaultIIOPUser:
DefaultInternalServletsDisabled: false
DefaultProtocol: t3
DefaultSecureProtocol: t3s
DefaultTGIOPPassword:
DefaultTGIOPPasswordEncrypted: ******
DefaultTGIOPUser: guest
DomainLogFilter:
EnabledForDomainLog: true
ExecuteQueues: weblogic.kernel.Default,foglight
ExpectedToRun: false
ExternalDNSName:
ExtraEjbcOptions:
ExtraRmicOptions:
GracefulShutdownTimeout: 0

-----------------------------------------------------------

and I need the indented values (eventually) in a dictionary. As you
can see, some of the fields have a value, and some do not. It appears
that the code I have so far is not dealing with the null values and
colons as I had planned. Here is the code:

-----------------------------------------------------------

from pyparsing import *

input = open("web.out", 'r')
data = input.read()

end = Literal("\n").suppress()
all = SkipTo(end)
colon = Literal(":").suppress()
MBeanName = Literal("MBeanName:")
ServerName = dblQuotedString
identity = Word(alphas, alphanums+"._*/,-")
pairs = Group(identity + colon + Optional(identity) +all)

logEntry = MBeanName + ServerName.setResultsName("servername") +
OneOrMore(pairs)

for tokens in logEntry.searchString(data):
print
print "ServerName =\t "+ tokens.servername
for t in tokens:
print t
print
print 50*"-"

-------------------------------------------------------------

which is giving me this:

-------------------------------------------------------------

ServerName = "mtg-model:Name=mtg-modelserver_map501,Type=Server"
MBeanName:
"mtg-model:Name=mtg-modelserver_map501,Type=Server"
['AcceptBacklog', '50']
['AdministrationPort', '0']
['AutoKillIfFailed', 'false', 'AutoRestart: true']
['COM', 'mtg-modelserver_map501', 'COMEnabled: false']
['CachingDisabled', 'true', 'ClasspathServletDisabled: false']
['ClientCertProxyEnabled', 'false', 'Cluster:']
['ClusterRuntime', 'ClusterWeight', ': 100']
['CompleteCOMMessageTimeout', '-1']
['CompleteHTTPMessageTimeout', '-1']
['CompleteIIOPMessageTimeout', '-1']
['CompleteMessageTimeout', '60']
['CompleteT3MessageTimeout', '-1']
['CustomIdentityKeyStoreFileName', 'CustomIdentityKeyStorePassPhrase',
':']
['CustomIdentityKeyStorePassPhraseEncrypted',
'CustomIdentityKeyStoreType', ':']
['CustomTrustKeyStoreFileName', 'CustomTrustKeyStorePassPhrase', ':']
['CustomTrustKeyStorePassPhraseEncrypted', 'CustomTrustKeyStoreType',
':']
['DefaultIIOPPassword', 'DefaultIIOPPasswordEncrypted', ':']
['DefaultIIOPUser', 'DefaultInternalServletsDisabled', ': false']
['DefaultProtocol', 't3', 'DefaultSecureProtocol: t3s']
['DefaultTGIOPPassword', 'DefaultTGIOPPasswordEncrypted', ': ******']
['DefaultTGIOPUser', 'guest', 'DomainLogFilter:']
['EnabledForDomainLog', 'true', 'ExecuteQueues:
weblogic.kernel.Default,foglight']
['ExpectedToRun', 'false', 'ExternalDNSName:']
['ExtraEjbcOptions', 'ExtraRmicOptions', ':']
['GracefulShutdownTimeout', '0']

----------------------------------------------------------------

instead of this (one to one):

----------------------------------------------------------------

ServerName = "mtg-model:Name=mtg-modelserver_map501,Type=Server"
MBeanName:
"mtg-model:Name=mtg-modelserver_map501,Type=Server"
['AcceptBacklog', '50']
['AdministrationPort', '0']
['AutoKillIfFailed', 'false']
['AutoRestart', 'true']
['COM', 'mtg-modelserver_map501']
['COMEnabled', 'false']
['CachingDisabled', 'true']
['ClasspathServletDisabled', false']
['ClientCertProxyEnabled', 'false']
['Cluster', 'mtg-model-cluster']
['ClusterRuntime', 'mtg-model-cluster']
['ClusterWeight', '100']
['CompleteCOMMessageTimeout', '-1']
['CompleteHTTPMessageTimeout', '-1']
['CompleteIIOPMessageTimeout', '-1']
['CompleteMessageTimeout', '60']
['CompleteT3MessageTimeout', '-1']
['CustomIdentityKeyStoreFileName', '']
['CustomIdentityKeyStorePassPhrase', '']
['CustomIdentityKeyStorePassPhraseEncrypted', '']
['CustomIdentityKeyStoreType', '']
['CustomTrustKeyStoreFileName', '']
['CustomTrustKeyStorePassPhrase', '']
['CustomTrustKeyStorePassPhraseEncrypted', '']
['CustomTrustKeyStoreType', '']
['DefaultIIOPPassword', '']
['DefaultIIOPPasswordEncrypted', '']
['DefaultIIOPUser', '']
['DefaultInternalServletsDisabled', 'false']
['DefaultProtocol', 't3']
['DefaultSecureProtocol: t3s']
['DefaultTGIOPPassword', '']
['DefaultTGIOPPasswordEncrypted', '******']
['DefaultTGIOPUser', 'guest']
['DomainLogFilter', '']
['EnabledForDomainLog', 'true']
['ExecuteQueues', 'weblogic.kernel.Default,foglight']
['ExpectedToRun', 'false']
['ExternalDNSName', '']
['ExtraEjbcOptions', '']
['ExtraRmicOptions', '']
['GracefulShutdownTimeout', '0']

------------------------------------------------------------------

Can anyone offer any advice on this? I would certainly appreciate any
input.

Thanks!
 
P

Paul McGuire

Help with pyparsing and dealing with null values

I am trying to parse a log file (web.out) similar to this:
ExpectedToRun: false
ExternalDNSName:
ExtraEjbcOptions:
ExtraRmicOptions:
GracefulShutdownTimeout: 0

-----------------------------------------------------------

and I need the indented values (eventually) in a dictionary. As you
can see, some of the fields have a value, and some do not. It appears
that the code I have so far is not dealing with the null values and
colons as I had planned.

This is a very good first cut at the problem. Here are some tips to
get you going again:

1. Literal("\n") wont work, use LineEnd() instead. Literals are for
non-whitespace literal strings.


2. "all = SkipTo(end)" can be removed, use restOfLine instead of all.
("all" as a variable name masks Python 2.5's "all" builtin function.)


3. In addition to identity, you might consider defining some other
known value types:

boolean = oneOf("true false")
boolean.setParseAction(lambda toks: toks[0]=="true")

integer = Combine(Optional("-") + Word(nums))
integer.setParseAction(lambda toks: int(toks[0]))

These will do data conversion for you at parse time, so that the
values are already in int or bool form when you access them later.


4. The significant change is to this line (I've replaced all with
restOfLine):

pairs = Group(identity + colon + Optional(identity) + restOfLine)

What gives us a problem is that pyparsing's whitespace-skipping will
read an identity, even if it's not on the same line. So for keys that
have no value given, you end up reading past the end-of-line and read
the next key name as the value for the previous key. To work around
this, define the value as something which must be on the same line,
using the NotAny lookahead, which you can abbreviate using the ~
operator.

pairs = Group(identity + colon + Optional(~end + (identity |
restOfLine) ) + end )

If we add in the other known value types, this gets a bit unwieldy, so
I recommend you define value separately:

value = boolean | integer | identity | restOfLine
pairs = Group(identity + colon + Optional(~end + value) + end )

At this point, I think you have a working parser for your log data.


5. (Extra Credit) Lastly, to create a dictionary, you are all set to
just add pyparsing's Dict class. Change:

logEntry = MBeanName + ServerName("servername") + OneOrMore(pairs)

to:

logEntry = MBeanName + ServerName("servername") +
Dict(OneOrMore(pairs))

(I've also removed ".setResultsName", using the new shortened form for
setting results names.)

Dict will return the parsed tokens as-is, but it will also define
results names using the tokens[0] element of each list of tokens
returned by pairs - the values will be the tokens[1:], so that if a
value expression contains multiple tokens, they all will be associated
with the results name key.

Now you can replace the results listing code with:

for t in tokens:
print t

with

print tokens.dump()

And you can access the tokens as if they are a dict, using:

print tokens.keys()
print tokens.values()
print tokens["ClasspathServletDisabled"]

If you prefer, for keys that are valid Python identifiers (all of
yours appear to be), you can just use object.attribute notation:

print tokens.ClasspathServletDisabled

Here is some sample output, using dump(), keys(), and attribute
lookup:

tokens.dump() -> ['MBeanName:', '"mtg-model:Name=mtg-
model_managed2,Type=Server"', ['AcceptBacklog', 50],
['AdministrationPort', 0], ['AutoKillIfFailed', False],
['AutoRestart', True], ['COM', 'mtg-model_managed2'], ['COMEnabled',
False], ['CachingDisabled', True], ['ClasspathServletDisabled',
False], ['ClientCertProxyEnabled', False], ['Cluster', 'mtg-model-
cluster'], ['ClusterRuntime', 'mtg-model-cluster'], ['ClusterWeight',
100], ['CompleteCOMMessageTimeout', -1],
['CompleteHTTPMessageTimeout', -1], ['CompleteIIOPMessageTimeout',
-1], ['CompleteMessageTimeout', 60], ['CompleteT3MessageTimeout', -1],
['CustomIdentityKeyStoreFileName'],
['CustomIdentityKeyStorePassPhrase'],
['CustomIdentityKeyStorePassPhraseEncrypted'],
['CustomIdentityKeyStoreType'], ['CustomTrustKeyStoreFileName'],
['CustomTrustKeyStorePassPhrase'],
['CustomTrustKeyStorePassPhraseEncrypted'],
['CustomTrustKeyStoreType'], ['DefaultIIOPPassword'],
['DefaultIIOPPasswordEncrypted'], ['DefaultIIOPUser'],
['DefaultInternalServletsDisabled', False], ['DefaultProtocol', 't3'],
['DefaultSecureProtocol', 't3s'], ['DefaultTGIOPPassword'],
['DefaultTGIOPPasswordEncrypted', ' ****** '], ['DefaultTGIOPUser',
'guest'], ['DomainLogFilter'], ['EnabledForDomainLog', True],
['ExecuteQueues', 'weblogic.kernel.Default,foglight'],
['ExpectedToRun', False], ['ExternalDNSName'], ['ExtraEjbcOptions'],
['ExtraRmicOptions'], ['GracefulShutdownTimeout', 0]]
- AcceptBacklog: 50
- AdministrationPort: 0
- AutoKillIfFailed: False
- AutoRestart: True
- COM: mtg-model_managed2
- COMEnabled: False
- CachingDisabled: True
- ClasspathServletDisabled: False
- ClientCertProxyEnabled: False
- Cluster: mtg-model-cluster
- ClusterRuntime: mtg-model-cluster
- ClusterWeight: 100
- CompleteCOMMessageTimeout: -1
- CompleteHTTPMessageTimeout: -1
- CompleteIIOPMessageTimeout: -1
- CompleteMessageTimeout: 60
- CompleteT3MessageTimeout: -1
- CustomIdentityKeyStoreFileName:
- CustomIdentityKeyStorePassPhrase:
- CustomIdentityKeyStorePassPhraseEncrypted:
- CustomIdentityKeyStoreType:
- CustomTrustKeyStoreFileName:
- CustomTrustKeyStorePassPhrase:
- CustomTrustKeyStorePassPhraseEncrypted:
- CustomTrustKeyStoreType:
- DefaultIIOPPassword:
- DefaultIIOPPasswordEncrypted:
- DefaultIIOPUser:
- DefaultInternalServletsDisabled: False
- DefaultProtocol: t3
- DefaultSecureProtocol: t3s
- DefaultTGIOPPassword:
- DefaultTGIOPPasswordEncrypted: ******
- DefaultTGIOPUser: guest
- DomainLogFilter:
- EnabledForDomainLog: True
- ExecuteQueues: weblogic.kernel.Default,foglight
- ExpectedToRun: False
- ExternalDNSName:
- ExtraEjbcOptions:
- ExtraRmicOptions:
- GracefulShutdownTimeout: 0
- servername: "mtg-model:Name=mtg-model_managed2,Type=Server"

tokens.keys() -> ['ClasspathServletDisabled', 'servername',
'ExternalDNSName', 'CustomTrustKeyStoreFileName', 'DefaultIIOPUser',
'ExpectedToRun', 'CachingDisabled', 'CompleteHTTPMessageTimeout',
'CompleteIIOPMessageTimeout', 'AutoKillIfFailed',
'ClientCertProxyEnabled', 'ExtraEjbcOptions',
'CustomTrustKeyStorePassPhraseEncrypted', 'COM',
'CompleteMessageTimeout', 'CustomIdentityKeyStoreType',
'CustomTrustKeyStoreType', 'EnabledForDomainLog', 'AutoRestart',
'DefaultTGIOPPasswordEncrypted', 'CompleteCOMMessageTimeout',
'DefaultInternalServletsDisabled', 'DefaultProtocol', 'ClusterWeight',
'ExecuteQueues', 'ExtraRmicOptions', 'CompleteT3MessageTimeout',
'DefaultTGIOPUser', 'AcceptBacklog', 'DefaultIIOPPassword',
'DefaultSecureProtocol', 'COMEnabled',
'CustomIdentityKeyStoreFileName', 'DefaultTGIOPPassword',
'CustomIdentityKeyStorePassPhraseEncrypted',
'GracefulShutdownTimeout', 'DefaultIIOPPasswordEncrypted',
'CustomIdentityKeyStorePassPhrase', 'ClusterRuntime', 'Cluster',
'DomainLogFilter', 'CustomTrustKeyStorePassPhrase',
'AdministrationPort']

tokens.ClasspathServletDisabled -> False


Cheers,
-- Paul
 
A

avidfan

Help with pyparsing and dealing with null values

I am trying to parse a log file (web.out) similar to this:
ExpectedToRun: false
ExternalDNSName:
ExtraEjbcOptions:
ExtraRmicOptions:
GracefulShutdownTimeout: 0

-----------------------------------------------------------

and I need the indented values (eventually) in a dictionary. As you
can see, some of the fields have a value, and some do not. It appears
that the code I have so far is not dealing with the null values and
colons as I had planned.

This is a very good first cut at the problem. Here are some tips to
get you going again:

1. Literal("\n") wont work, use LineEnd() instead. Literals are for
non-whitespace literal strings.


2. "all = SkipTo(end)" can be removed, use restOfLine instead of all.
("all" as a variable name masks Python 2.5's "all" builtin function.)


3. In addition to identity, you might consider defining some other
known value types:

boolean = oneOf("true false")
boolean.setParseAction(lambda toks: toks[0]=="true")

integer = Combine(Optional("-") + Word(nums))
integer.setParseAction(lambda toks: int(toks[0]))

These will do data conversion for you at parse time, so that the
values are already in int or bool form when you access them later.


4. The significant change is to this line (I've replaced all with
restOfLine):

pairs = Group(identity + colon + Optional(identity) + restOfLine)

What gives us a problem is that pyparsing's whitespace-skipping will
read an identity, even if it's not on the same line. So for keys that
have no value given, you end up reading past the end-of-line and read
the next key name as the value for the previous key. To work around
this, define the value as something which must be on the same line,
using the NotAny lookahead, which you can abbreviate using the ~
operator.

pairs = Group(identity + colon + Optional(~end + (identity |
restOfLine) ) + end )

If we add in the other known value types, this gets a bit unwieldy, so
I recommend you define value separately:

value = boolean | integer | identity | restOfLine
pairs = Group(identity + colon + Optional(~end + value) + end )

At this point, I think you have a working parser for your log data.


5. (Extra Credit) Lastly, to create a dictionary, you are all set to
just add pyparsing's Dict class. Change:

logEntry = MBeanName + ServerName("servername") + OneOrMore(pairs)

to:

logEntry = MBeanName + ServerName("servername") +
Dict(OneOrMore(pairs))

(I've also removed ".setResultsName", using the new shortened form for
setting results names.)

Dict will return the parsed tokens as-is, but it will also define
results names using the tokens[0] element of each list of tokens
returned by pairs - the values will be the tokens[1:], so that if a
value expression contains multiple tokens, they all will be associated
with the results name key.

Now you can replace the results listing code with:

for t in tokens:
print t

with

print tokens.dump()

And you can access the tokens as if they are a dict, using:

print tokens.keys()
print tokens.values()
print tokens["ClasspathServletDisabled"]

If you prefer, for keys that are valid Python identifiers (all of
yours appear to be), you can just use object.attribute notation:

print tokens.ClasspathServletDisabled

Here is some sample output, using dump(), keys(), and attribute
lookup:

tokens.dump() -> ['MBeanName:', '"mtg-model:Name=mtg-
model_managed2,Type=Server"', ['AcceptBacklog', 50],
['AdministrationPort', 0], ['AutoKillIfFailed', False],
['AutoRestart', True], ['COM', 'mtg-model_managed2'], ['COMEnabled',
False], ['CachingDisabled', True], ['ClasspathServletDisabled',
False], ['ClientCertProxyEnabled', False], ['Cluster', 'mtg-model-
cluster'], ['ClusterRuntime', 'mtg-model-cluster'], ['ClusterWeight',
100], ['CompleteCOMMessageTimeout', -1],
['CompleteHTTPMessageTimeout', -1], ['CompleteIIOPMessageTimeout',
-1], ['CompleteMessageTimeout', 60], ['CompleteT3MessageTimeout', -1],
['CustomIdentityKeyStoreFileName'],
['CustomIdentityKeyStorePassPhrase'],
['CustomIdentityKeyStorePassPhraseEncrypted'],
['CustomIdentityKeyStoreType'], ['CustomTrustKeyStoreFileName'],
['CustomTrustKeyStorePassPhrase'],
['CustomTrustKeyStorePassPhraseEncrypted'],
['CustomTrustKeyStoreType'], ['DefaultIIOPPassword'],
['DefaultIIOPPasswordEncrypted'], ['DefaultIIOPUser'],
['DefaultInternalServletsDisabled', False], ['DefaultProtocol', 't3'],
['DefaultSecureProtocol', 't3s'], ['DefaultTGIOPPassword'],
['DefaultTGIOPPasswordEncrypted', ' ****** '], ['DefaultTGIOPUser',
'guest'], ['DomainLogFilter'], ['EnabledForDomainLog', True],
['ExecuteQueues', 'weblogic.kernel.Default,foglight'],
['ExpectedToRun', False], ['ExternalDNSName'], ['ExtraEjbcOptions'],
['ExtraRmicOptions'], ['GracefulShutdownTimeout', 0]]
- AcceptBacklog: 50
- AdministrationPort: 0
- AutoKillIfFailed: False
- AutoRestart: True
- COM: mtg-model_managed2
- COMEnabled: False
- CachingDisabled: True
- ClasspathServletDisabled: False
- ClientCertProxyEnabled: False
- Cluster: mtg-model-cluster
- ClusterRuntime: mtg-model-cluster
- ClusterWeight: 100
- CompleteCOMMessageTimeout: -1
- CompleteHTTPMessageTimeout: -1
- CompleteIIOPMessageTimeout: -1
- CompleteMessageTimeout: 60
- CompleteT3MessageTimeout: -1
- CustomIdentityKeyStoreFileName:
- CustomIdentityKeyStorePassPhrase:
- CustomIdentityKeyStorePassPhraseEncrypted:
- CustomIdentityKeyStoreType:
- CustomTrustKeyStoreFileName:
- CustomTrustKeyStorePassPhrase:
- CustomTrustKeyStorePassPhraseEncrypted:
- CustomTrustKeyStoreType:
- DefaultIIOPPassword:
- DefaultIIOPPasswordEncrypted:
- DefaultIIOPUser:
- DefaultInternalServletsDisabled: False
- DefaultProtocol: t3
- DefaultSecureProtocol: t3s
- DefaultTGIOPPassword:
- DefaultTGIOPPasswordEncrypted: ******
- DefaultTGIOPUser: guest
- DomainLogFilter:
- EnabledForDomainLog: True
- ExecuteQueues: weblogic.kernel.Default,foglight
- ExpectedToRun: False
- ExternalDNSName:
- ExtraEjbcOptions:
- ExtraRmicOptions:
- GracefulShutdownTimeout: 0
- servername: "mtg-model:Name=mtg-model_managed2,Type=Server"

tokens.keys() -> ['ClasspathServletDisabled', 'servername',
'ExternalDNSName', 'CustomTrustKeyStoreFileName', 'DefaultIIOPUser',
'ExpectedToRun', 'CachingDisabled', 'CompleteHTTPMessageTimeout',
'CompleteIIOPMessageTimeout', 'AutoKillIfFailed',
'ClientCertProxyEnabled', 'ExtraEjbcOptions',
'CustomTrustKeyStorePassPhraseEncrypted', 'COM',
'CompleteMessageTimeout', 'CustomIdentityKeyStoreType',
'CustomTrustKeyStoreType', 'EnabledForDomainLog', 'AutoRestart',
'DefaultTGIOPPasswordEncrypted', 'CompleteCOMMessageTimeout',
'DefaultInternalServletsDisabled', 'DefaultProtocol', 'ClusterWeight',
'ExecuteQueues', 'ExtraRmicOptions', 'CompleteT3MessageTimeout',
'DefaultTGIOPUser', 'AcceptBacklog', 'DefaultIIOPPassword',
'DefaultSecureProtocol', 'COMEnabled',
'CustomIdentityKeyStoreFileName', 'DefaultTGIOPPassword',
'CustomIdentityKeyStorePassPhraseEncrypted',
'GracefulShutdownTimeout', 'DefaultIIOPPasswordEncrypted',
'CustomIdentityKeyStorePassPhrase', 'ClusterRuntime', 'Cluster',
'DomainLogFilter', 'CustomTrustKeyStorePassPhrase',
'AdministrationPort']

tokens.ClasspathServletDisabled -> False


Cheers,
-- Paul

Thanks, Paul! That's exactly what I needed!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top