Design choices and patterns for passing contextual runtime information.

D

Daniel Pitts

Hello fellow Engineers,

So, I'm working on a (hand written) English imperative statement
parser in Java, and I was thinking that eventually, I might need to
pass context to different nodes in the "parse tree". Having to parse
context information is an even more general problem than this parser,
so I was thinking about the best way(s) to approach this sub-problem.

I personally dislike the needless use of singleton, since often its
not truly required, and is often confused with the "locater" pattern.

The straight forward solution is to pass a context object to all
methods on all objects that need it. Some methods might only need it
in order to forward it to the other objects. This seems like a bit of
a waste to me, although it does keep it clear and explicit where the
context comes from.

The other alternative is to pass the context to the constructor of all
the classes involved, so that they have a reference to it at all
times. This is a slightly more useful approach, but many of my
parsing methods are "static", they return an object of the type that
matches the parsed text. Maybe that's the wrong approach (comments
welcome on *that* problem as well.

The third alternative is to use a ThreadLocal variable. This thread
local variable would be the context object. This is a little close to
a global variable, but it is thread-safe, and it seems like it would
be (if properly encapsulated) a cleaner approach.

I'm sure I'm going to get a lot of strong opinions on the right way to
do this, so I look forward to reading the reasoning behind those
opinions.

Thanks,
Daniel.

P.S.
x-posted to comp.lang.java.programmer, comp.software.patterns, and
comp.object.
follow-up to comp.lang.java.programmer.
 
H

H. S. Lahman

Responding to Pitts...
So, I'm working on a (hand written) English imperative statement
parser in Java, and I was thinking that eventually, I might need to
pass context to different nodes in the "parse tree". Having to parse
context information is an even more general problem than this parser,
so I was thinking about the best way(s) to approach this sub-problem.

I personally dislike the needless use of singleton, since often its
not truly required, and is often confused with the "locater" pattern.

I am not sure why Singleton would be needed here at all. Since English
isn't conveniently LALR(1) you will, indeed, need to capture context as
a statement is processed. But I would expect the life cycle of such an
object to be quite limited by the processing context (e.g., it is born
at the start of processing of a new statement and it dies at the end of
processing that statement). IOW, the natural flow of control of the
parsing enforces the scope of the object to one instance at a time.

[Because Java is a GC language, you might have to be careful to remove
all references to the object at the end of scope to avoid referential
integrity problems. But the point where that is necessary would be
defined by the processing.]
The straight forward solution is to pass a context object to all
methods on all objects that need it. Some methods might only need it
in order to forward it to the other objects. This seems like a bit of
a waste to me, although it does keep it clear and explicit where the
context comes from.

This is a very different thing. As a general rule passing object
references to methods is a poor OOA/D practice, especially when there
may be chains of such calls. It raises data integrity issues concerning
the timeliness of the data that make it more difficult to provide things
like thread safety. It is also the worst form of coupling.

But the big reason is that the caller needs to know too much about what
the receiver does. Effectively one has:

1 R1 1 1 R2 1
[A] ------------ -------------- [C]
+ doIt()

When A passes a C to B::doIt, A is creating a temporary relationship
between B and C. That means A needs to know (a) that B needs to talk to
a C and (b) exactly which C it is that B needs to talk to. The
application will be more robust to change if the collaboration between B
and C is a personal matter between them.

The reason is that the rules and policies that govern instantiation of
objects and relationships in the problem space are quite often distinct
from the rules and policies that govern collaboration. IOW, relationship
instantiation is about Who participates in collaborations while the
collaborations themselves are about When something should be done.

Thus A necessarily needs to know when to collaborate with B (more
precisely, the developer knows when A has done something that requires
an announcement message to be sent to B who responds to that condition).
But A should not know who responds or what they do, particularly their
carnal relations with third parties.

So we usually encapsulate instantiation in dedicated responsibilities
that live in objects that understand other context information in the
problem space. That allows someone else entirely to instantiate the R2
relationship. Then if those rules and policies change, A does not need
to be touched.

The corollary is that methods navigate relationship paths to get to the
data that they need on an as-needed basis. As it happens, this tends to
make it much, much easier to implement concurrency (e.g. threads) and
true asynchronous processing correctly. That's because the scope of
access is limited to the method.

So the OOA/D paradigm is that R1 and R2 are instantiated prior to the
collaboration. Then A sends a message to B that is addressed via R1. B,
in turn, sends a message to C by navigating R2. That removes all
knowledge of the B/C collaboration from A's implementation.

Now let's assume we have a [Context] object that A, B, and C all need to
access. Superficially that suggests:

1 R1 1 1 R2 1
[A] ------------ -------------- [C]
| 1 + doIt() | 1
| | 1 |
| | |
| R3 | R4 | R5
| | |
| 1 | 1 1 |
+---------- [Context] --------------+

However, we don't necessarily need to do that. Since everyone accesses
the same C, all we need to ensure is that there is a viable relationship
path for each object to navigate to get to the Context:

1 R1 1 1 R2 1
[A] ------------ -------------- [C]
| 1 + doIt()
|
| R3
|
| 1
[Context]

Now C can navigate R2 -> R1 -> R3 to get to the context when it needs to
do so. IOW, OOA/D relationships are usually two-way.

For example, full code generators for UML treat relationships very much
like aspect and they generate quite generic code for their
implementation, instantiation, and navigation. That is only possible
because in the OO paradigm relationships are orthogonal with particular
collaborations and the rules and policies are encapsulated. The paradigm
also allows one to manage complexity by solving separate, smaller,
independent problems like Who, and When.

So if, say, R2 is across a distributed boundary, it becomes relatively
trivial to provide the appropriate handshaking code to ensure the access
of Context by C /appears/ to have been synchronous from C's perspective
and the Context doesn't go away while C is accessing it. And if C and
Context are in different concurrent threads, then data integrity is
managed by simply looking at what data C needs (e.g., blocking Context
while C executes) and referential integrity is managed orthgonally by
making sure the right relationships are instantiated (e.g., blocking C
until they are).

On a more general note, I would bet that you will need different flavors
of context objects when processing a grammar as bad as English. For that
you might want to think about the GoF State pattern. The idea being that
you encapsulate both the context data and the specific operations on
that data (e.g., BNF productions) in a State and then dynamically assign
the right one based on previously determined parsing context.


*************
There is nothing wrong with me that could
not be cured by a capful of Drano.

H. S. Lahman
(e-mail address removed)
Pathfinder Solutions
http://www.pathfindermda.com
blog: http://pathfinderpeople.blogs.com/hslahman
"Model-Based Translation: The Next Step in Agile Development". Email
(e-mail address removed) for your copy.
Pathfinder is hiring:
http://www.pathfindermda.com/about_us/careers_pos3.php.
(888)OOA-PATH
 
D

Daniel Pitts

Responding to Pitts...



I am not sure why Singleton would be needed here at all. Since English
isn't conveniently LALR(1) you will, indeed, need to capture context as
a statement is processed. But I would expect the life cycle of such an
object to be quite limited by the processing context (e.g., it is born
at the start of processing of a new statement and it dies at the end of
processing that statement). IOW, the natural flow of control of the
parsing enforces the scope of the object to one instance at a time.
Yes, thats where the ThreadLocal comes into play. The top-level
parser (I'm building this as a top-down recursive descent parser) will
set the Context for the current thread, and then clear the context
once its completed.
[Because Java is a GC language, you might have to be careful to remove
all references to the object at the end of scope to avoid referential
integrity problems. But the point where that is necessary would be
defined by the processing.]
Yes, this would have to be handled by the top level. Although, I don't
think one extra Context object per thread would actually cause a
memory problem, since I'm parsing only one sentence at a time.
This is a very different thing. As a general rule passing object
references to methods is a poor OOA/D practice, especially when there
may be chains of such calls. It raises data integrity issues concerning
the timeliness of the data that make it more difficult to provide things
like thread safety. It is also the worst form of coupling.
I hadn't thought of it that way. Basically, I have an
ImperativeSentence class has a Verb object, and a list of
AdpositionalPhrase objects. I didn't have any link from the
AdpositionalPhrase objects to the owning ImperativeSentence, but such
a link might make sense. (Actually, the AdpositionalPhrase has a
Positional object and a NounClause object, so they would need to have
a parent reference too). Unless I'm misunderstanding you.
But the big reason is that the caller needs to know too much about what
the receiver does. Effectively one has:

1 R1 1 1 R2 1
[A] ------------ -------------- [C]
+ doIt()

When A passes a C to B::doIt, A is creating a temporary relationship
between B and C. That means A needs to know (a) that B needs to talk to
a C and (b) exactly which C it is that B needs to talk to. The
application will be more robust to change if the collaboration between B
and C is a personal matter between them.

The reason is that the rules and policies that govern instantiation of
objects and relationships in the problem space are quite often distinct
from the rules and policies that govern collaboration. IOW, relationship
instantiation is about Who participates in collaborations while the
collaborations themselves are about When something should be done.

Thus A necessarily needs to know when to collaborate with B (more
precisely, the developer knows when A has done something that requires
an announcement message to be sent to B who responds to that condition).
But A should not know who responds or what they do, particularly their
carnal relations with third parties.

That's a good point, and I guess my original question was more about
the best design to avoid this coupling of A to the knowledge of the B-
C relationship.

So we usually encapsulate instantiation in dedicated responsibilities
that live in objects that understand other context information in the
problem space. That allows someone else entirely to instantiate the R2
relationship. Then if those rules and policies change, A does not need
to be touched.
So, in my case, the Context class might have a static
getCurrentContext() method, as well as a "newContext()" and
"disposeContext()" methods which are called solely from my top-level
parsing method.
The corollary is that methods navigate relationship paths to get to the
data that they need on an as-needed basis. As it happens, this tends to
make it much, much easier to implement concurrency (e.g. threads) and
true asynchronous processing correctly. That's because the scope of
access is limited to the method.
I'm not so concerned with concurrency in this project, but it is good
to know that anyway.
So the OOA/D paradigm is that R1 and R2 are instantiated prior to the
collaboration. Then A sends a message to B that is addressed via R1. B,
in turn, sends a message to C by navigating R2. That removes all
knowledge of the B/C collaboration from A's implementation.

Now let's assume we have a [Context] object that A, B, and C all need to
access. Superficially that suggests:

1 R1 1 1 R2 1
[A] ------------ -------------- [C]
| 1 + doIt() | 1
| | 1 |
| | |
| R3 | R4 | R5
| | |
| 1 | 1 1 |
+---------- [Context] --------------+

However, we don't necessarily need to do that. Since everyone accesses
the same C, all we need to ensure is that there is a viable relationship
path for each object to navigate to get to the Context:

1 R1 1 1 R2 1
[A] ------------ -------------- [C]
| 1 + doIt()
|
| R3
|
| 1
[Context]

Now C can navigate R2 -> R1 -> R3 to get to the context when it needs to
do so. IOW, OOA/D relationships are usually two-way.

For example, full code generators for UML treat relationships very much
like aspect and they generate quite generic code for their
implementation, instantiation, and navigation. That is only possible
because in the OO paradigm relationships are orthogonal with particular
collaborations and the rules and policies are encapsulated. The paradigm
also allows one to manage complexity by solving separate, smaller,
independent problems like Who, and When.

So if, say, R2 is across a distributed boundary, it becomes relatively
trivial to provide the appropriate handshaking code to ensure the access
of Context by C /appears/ to have been synchronous from C's perspective
and the Context doesn't go away while C is accessing it. And if C and
Context are in different concurrent threads, then data integrity is
managed by simply looking at what data C needs (e.g., blocking Context
while C executes) and referential integrity is managed orthgonally by
making sure the right relationships are instantiated (e.g., blocking C
until they are).

On a more general note, I would bet that you will need different flavors
of context objects when processing a grammar as bad as English. For that
you might want to think about the GoF State pattern. The idea being that
you encapsulate both the context data and the specific operations on
that data (e.g., BNF productions) in a State and then dynamically assign
the right one based on previously determined parsing context.

Like I mentioned earlier, I'm designing this initially as a recursive
descent parser. I don't necessarily see that the State pattern
matches this design. Although, I'm very practiced refactoring to
design patterns. Often times I'll see a switch/case tree, refactor to
enum flyweights (Thanks Java for making enums full classes!), and use
that as a state or strategy pattern.


Thanks for your feedback, it does give me some things to think about.
I hope that limiting the scope to imperative sentences will make the
parser feasible to implement. It does give me all the capability I
need for my larger project :)

Thanks,
Daniel.
 
R

Russell Wallace

Daniel said:
The other alternative is to pass the context to the constructor of all
the classes involved, so that they have a reference to it at all
times.

That's the way I normally do it in those circumstances, if I understand
the setup correctly.
This is a slightly more useful approach, but many of my
parsing methods are "static", they return an object of the type that
matches the parsed text. Maybe that's the wrong approach (comments
welcome on *that* problem as well.

Well in that case those methods are standing in for constructors, so you
could just pass the context to those?
The third alternative is to use a ThreadLocal variable. This thread
local variable would be the context object. This is a little close to
a global variable, but it is thread-safe, and it seems like it would
be (if properly encapsulated) a cleaner approach.

Seems a reasonable choice also; I don't have a strong opinion between
the two options.
 
R

Roland Pibinger

So, I'm working on a (hand written) English imperative statement
parser in Java, and I was thinking that eventually, I might need to
pass context to different nodes in the "parse tree". Having to parse
context information is an even more general problem than this parser,
so I was thinking about the best way(s) to approach this sub-problem.

I personally dislike the needless use of singleton, since often its
not truly required, and is often confused with the "locater" pattern.

The straight forward solution is to pass a context object to all
methods on all objects that need it. [...]
The other alternative is to pass the context to the constructor of all
the classes involved, [...]
The third alternative is to use a ThreadLocal variable.

try:
http://www.two-sdg.demon.co.uk/curbralan/papers/europlop/ContextEncapsulation.pdf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top