Change character in string

  • Thread starter Dirk Bruere at NeoPax
  • Start date
L

Little Green Man

Actually, no, I did not. That is incorrect. All of those lines were a
maximum of 72 characters long and therefore should not have wrapped.
The responsible party will now apologize in public and publicly
absolve me of any responsibility for *their* poor formatting of *my*
post.

[implied insults deleted]

None of the nasty things that you have said or implied about me are at
all true.
 
L

Little Green Man

You did not what, violate thread safety?

No, I did not write the above -- not with the line breaks shown,
anyway.
What does it matter anyhow?  The line wrap in no way impedes the sense of your
post.

No, it only makes it look less polished, but that's still not very
good.

The longest lines that seem to be intact are 70 characters. The long-
established standard is to wrap at no more than 80 characters, with a
preference for 72 characters to allow for some levels of eventual
quoting. Not 70, 72.

If lines shorter than 73 characters are going to be adulterated there
must be a preview function. If there is no preview function lines
shorter than 73 characters must be left alone. It is that simple.
 
L

Little Green Man

See if you can find the thread where we dug up and interpreted the
implementation of identityHashCode - it's almost exactly as you've
described. Except that numbers are only assigned when an object's identity
has is first asked for, not at allocation, thus avoiding the work of
assigning numbers to objects which will never need them.

This would be compatible with the above, as well, so long as
synchronizing on an object together with other objects counted as
asking for it, and a 64-bit globally-unique integer was used under the
hood (while the System.identityHashCode() method truncated it to 32
bits).
The trouble with that is when you're dealing with mutable objects: a
change in state may mean a change in hashCode order, which means that two
attempts to lock the same pair of objects at different times may go in
different orders, which could lead to deadlock.

These are objects being synchronized on. If one's being mutated while
a synchronized (x, y, z...) has a lock on it and is still trying to
get another lock, there's already a thread-safety bug somewhere else
and fixing that fixes the potential deadlock. If on the other hand the
multi-synch already has all of its locks, that multi-synch can't cause
a deadlock, and if it has none, the order rearrangement has no effect.
It can only cause a deadlock if one thread locks one of some pair of
objects, their order changes, and then the other thread locks the
other of the pair, then they each try to get the remaining locks.

There may still be a risk, if one object isn't yet locked by either
but its hashCode changes.

On the other hand, there are already problems with having mutable
objects whose hashCode changes (for example, if it's actually used as
a key in a hash table anywhere, it gets lost). IMO hashCode,
compareTo, and equals should only be overridden on immutable objects.
(The JRE classes violate that, notably the collection classes,
unfortunately; it would have been interesting if they used Object's
equals and hashCode except for a set of unmodifiable versions,
distinct from the current wrappers in that they were immutable copies
rather than read-only views of a possibly-still-changing collection.)
There's also a consideration with long-running systems eventually
going through the entire pool of numbers (though it might take an
astronomically long time in the 64-bit case). Two solutions to *that*:
* Use 64 bits no matter what, and pray to one of your cultures'
deities.

No - if we're going to come up with [suspected insult deleted] schemes

None of the nasty things that you have said or implied about me are at
all true.
Another option might be to have some kind of monotonically increasing
number - something like time, although it only has to increase when things
get moved - and use that in the high bits, so that objects allocated in
the same place at different times don't get the same number.

That'll work. At least until January 19 2038, 03:14:07 UTC. :)

(Actually, it suffices that the bits used don't recycle for a very
long time, rather than that they don't at all. If the program runs
across one wrap-around, but never until the number represented by the
used bits actually returns to a value that previously occurred DURING
THE RUN, the wrap-around is not a problem. The used bits are likely to
be fewer than 32, so wrapping around sooner than 2038.)
In fact, if you just the the offset of the object in the block in the low
bits, then this works as long as every block into which things are
alllocated gets a unique prefix for the high bits. So, you have a global
counter, and every time you designate a block as a nursery, you use that
counter to issue it with a prefix.

This is similar to the notion I had of assigning a range to each
nursery.
As long as pages are bigger than objects, this is going to lead to
fragmentation - you end up keeping a whole page in core just for one
object.

Objects might be allocated within a single page that are likely to
have similar lifespans. Simply allocating objects created sequentially
by the same thread adjacent to one another may have that effect. But
actually I was hypothesizing an OS that had some more sophisticated
notion of memory structure than pages. I doubt such exists yet on your
planet.
Normal speed efficiency can be restored by using integers of machine-
native size up until -1 (all Fs in hex; the "maximum" before it wraps
to numbers already used) and then layering on an alternative scheme
for all subsequent objects (which are marked by all having -1 in the
"object number" field). Ternary strings is one such alternative
scheme. So is a second integer field, used until it becomes -1, then a
third, and so forth, or a zero-terminated string of bytes or words
used as a bignum, or something. (A bignum that only has to support
order-comparison and addition, simplifying things somewhat.)

I am filing this under very clever but [implied insult deleted] idea.

None of the nasty things that you have said or implied about me are at
all true.
Exactly. I think the idea of allocating persistent, unique-for-all-time
identity numbers to objects isn't necessary - as long as you can get the
GC and locking subsystems to cooperate. Bear in mind that it's doesn't
matter in the slightest if we lock a pair of objects in a different order
at different times, as long as we never do it at overlapping times. If we
do lock a-then-b, unlock both, then lock b-then-a, we're fine.

That was what I was thinking.
Risky - a contended lock could hold up movement, which sounds like bad
news.

A frequently-locked object that lives a long time is a good candidate
for tenuring. G1 can work around this by simply using blocks
containing such objects for tenuring *other* objects, and emptying
*other* blocks to recycle into nurseries. It won't play as nice with
the current GC of course.

I thought so too.
This reminds me of Richard Gabriel's PC losering story, and the
winning solution adopted by unix:

http://www.jwz.org/doc/worse-is-better.html

My solution would be to be a bit more MIT, and say that (a) multi-locks
are acquired in allocation order and (b) if objects which are currently
multi-locked (or perhaps just locked at all, for simplicity) are moved,
they must maintain their allocation order with respect to other locked
objects. If we arrange things so that memory is divided into zones with
known non-overlapping ranges of the allocation order (which corresponds
nicely to generations), then that just means taking care when moving
objects from one zone to another, or within a zone, to keep locked ones in
the same order in memory. This avoids the need for identity numbers at
all.

It might require some tricky juggling by the GC though.
Although i don't know what happens if a multi-lock is underway during a
move; i worry that two threads could end up with different ideas about the
order, but i *think* that as long as they pay attention to when moves
occur, and adjust their locking sequence accordingly, it's okay.

Making all moves, regardless of locking, order-preserving would avoid any
problems. There's even a fighting chance that this is already what
happens.

I doubt it. What happens if two objects are allocated in the order A,
then B, and then B (the slightly-newer) is tenured by the GC? Tenured
are presumably assumed to be older than eden objects, so B now looks
older than A, and their order has reversed. Oops.

This could happen; B might be reached first in a traversal of the
object graph and so tenured first, though A is bound to be tenured
soon as well in this case. A narrow window of opportunity for a
deadlock nonetheless arises.

Elsewhere in this thread, a different problem with these schemes was
pointed out: the risk that a lock was already acquired earlier, and
out of order. But it will always be the case that separate, nested
synchronized statements create a risk of deadlock. One thought is to
try to use multisynch around such sites, rather and inside them, to
remove deadlocks. On the other hand, that doesn't work for cases like
List.equals(). If the equals is in a bigger synchronized block that
has to do some reads and modifies of the list atomically, there's a
problem.

Perhaps the long-term solution is to make a collections framework that
supports a notion of "transactions". Some of the java.util.collections
classes seem to be steps in that direction.

Regardless, it goes to show that there is no magic bullet for
concurrency correctness. Slapping synchronizeds on everything
certainly doesn't work except in the most trivial cases, and slapping
multi-synchronizeds willy-nilly is not much better. Thinking about the
objects that are shared and how they are used and in what order is
important. I worked a year or two ago on a network app that had large
pools of threads that had to access a handful of shared collections --
some unique sets and maps and a priority queue, I believe, and some
non-unique (lots of same-role) objects -- and the solution there was
to lock these separately and in a fixed order, the non-unique objects
last, and only one non-unique object at a time. Fortunately, the non-
unique objects never needed to be locked other than individually and
separately; no operations were performed atomically on pairs or larger
sets of them. Eclipse's thread monitor proved invaluable for
identifying the causes of the deadlocks that did occur during
development, when developers got the locking order wrong. (So did
searching for the word "synchronized" in each commit, going backwards
in time, sometimes. Invariably the deadlocks occurred when the locks
were acquired in separate methods, but one got called from another, so
the fact that the critical sections were nested was not immediately
obvious, though. Eventually we started using method javadocs to record
what locks were typically held, and what locks must not be held, going
into each method. Sometimes significant redesign and/or refactoring
was needed to make a lock allowable going into a method that formerly
wasn't. The order of locking for some pairs of resources got reversed
repeatedly during redesigns until eventually everything worked.)

In general, I think a design like the above will usually be
applicable. If groups of non-unique-role objects sometimes need atomic
operations performed on them, but always PARTICULAR groups, those
groups can have locks of their own that are acquired before locking
more than one of their individual members (the "napkin" metaphor from
elsethread). Another option is to go the Swing route of having some
complex set of objects handled on a dedicated thread, with atomic jobs
(similar to Swing event handlers) handed to this thread to be run in
some sequence or another (similarly to SwingUtilities.invokeLater()).

Multisynch would be a handy additional tool in the kit for cases where
some sets of objects needing to be acted on atomically were
essentially arbitrary, but it could never be a substitute for having a
sound design, though it could simplify such a design.
 
J

John B. Matthews

Tom Anderson said:

If by 'three other schemes' you mean the philosopher types listed in
the second link, i don't think those are solutions to the problem,
they're different problematic kinds of philosopher behaviour. I could
be missing something here.

You are correct. These are the original and two variations on a
diner's strategy.
 
L

Lew

Little said:
Actually, no, I did not. That is incorrect. All of those lines were a
maximum of 72 characters long and therefore should not have wrapped.
The responsible party will now apologize in public and publicly
absolve me of any responsibility for *their* poor formatting of *my*
post.
[implied insults deleted]

None of the nasty things that you have said or implied about me are at
all true.

Plonk.
 
L

Lew

Little said:
No, I did not write the above -- not with the line breaks shown,
anyway.


No, it only makes it look less polished, but that's still not very
good.

The longest lines that seem to be intact are 70 characters. The long-
established standard is to wrap at no more than 80 characters, with a
preference for 72 characters to allow for some levels of eventual
quoting. Not 70, 72.

If lines shorter than 73 characters are going to be adulterated there
must be a preview function. If there is no preview function lines
shorter than 73 characters must be left alone. It is that simple.

Plonk.
 
L

Little Green Man

Little said:
Actually, no, I did not. That is incorrect. All of those lines were a
maximum of 72 characters long and therefore should not have wrapped.
The responsible party will now apologize in public and publicly
absolve me of any responsibility for *their* poor formatting of *my*
post.
[implied insults deleted]
None of the nasty things that you have said or implied about me are at
all true.

[implied insult deleted]

None of the nasty things that you have said or implied about me are at
all true.
 
L

Little Green Man

No, I did not write the above -- not with the line breaks shown,
anyway.
No, it only makes it look less polished, but that's still not very
good.
The longest lines that seem to be intact are 70 characters. The long-
established standard is to wrap at no more than 80 characters, with a
preference for 72 characters to allow for some levels of eventual
quoting. Not 70, 72.
If lines shorter than 73 characters are going to be adulterated there
must be a preview function. If there is no preview function lines
shorter than 73 characters must be left alone. It is that simple.

[implied insult deleted; implies his previous post was factually untrue]

None of the nasty things that you have said or implied about me are at
all true.
 
L

Little Green Man

Little said:
None of the nasty things that you have said or implied about me are at
all true.

[implied insult deleted; implies his previous two posts were factually
untrue]

None of the nasty things that you have said or implied about me are at
all true.
 
A

Andreas Leitgeb

Tom Anderson said:
That's exactly what an identity hash code is.

I meant something without "hash" ;-). There must be such a thing, or a
reference couldn't point to an unique object. I also don't think(*) that
during an GC run, the GC steps through all references /again/ to *update*
them to the new location of the object. So, I do believe that whatever
bunch of bits comprises a reference, that would be it. And for this
multisync these bits wouldn't even need to be exposed to the programmer.

*: If I think wrong there, then this whole posting is moot. Let me know.
Eh, it's threading, unpredictable is par for the course.

Well, this thread is about seeing what could be possibly done, to
alleviate the problem. Allowing certain pre-held locks for a multisync
would leak information about internal layout, which I don't see as a big
problem, but Java generally seems to do so.

[I largely snipped paragraphs that I think described problems w.r.t
hashcode or determination of other unique numbers, as I think that
problem is already solved internally to the VM, unless I'm mistaken
w.r.t what GC really does, but then this strategy is moot, anyway]
 
A

Andreas Leitgeb

Tom Anderson said:
By that point, Vector implemented List, so when List became generic, they
had a choice bewteen adding a binding or a type parameter to Vector, or
having it generating warnings on compilation.

Got me: I didn't remember that Vector already implemented List before 1.5.
(I had only little contact to Java programming before 1.5 and also omitted
to just look into 1.4's javadocs, despite having them bookmarked, too.)

The final(*) score is, that Vector lost, because it may even bomb on you if
neither of the two Vectors involved in .equals is modified concurrently.

I intend to no longer(*) participate in the "Vector (was Re: ...)" part of
this thread.

*: Unless things evolve unexpectedly.
 
L

Little Green Man

Little said:
Little Green Man wrote:
None of the nasty things that you have said or implied about me are at
all true.
[implied insult deleted; implies his previous two posts were factually
untrue]

Plonking does not take effect until the next download of posts, [vicious
insult deleted]

No, you're the stupid one.

None of the nasty things that you have said or implied about me are at
all true.

Objection 1: plonking really should re-filter the displayed posts
immediately; if it doesn't in some newsreaders, that's just silly (and
won't be universal behavior).

Objection 2: There is still no reason to *announce* it multiple times
in a row for the same target. Especially when even one such
announcement is in effect an implied insult...
 
A

Andreas Leitgeb

Thomas Pornin said:
Actually, the GC is able to move objects in RAM, which indeed entails
updating the relevant references. [...]

I believed there'd be some extra indirection layer between the references
and the actual memory location.

Even though you don't make any claims about Sun's implementation, it
was an enlightening read, in that it showed, that what I considered
to be unlikely is not all that unlikely. To the opposite it now seems
more likely that *no* such lifetime object ID (not hash!) exists or is
maintained.


Remains one further approach: Active deadlock-detection by the VM.
If the current Thread (t1) is holding the lock of (at least) one object
(o1), and attempts to aquire another lock (of o2) and if this object(o2)'s
lock is currently held by another thread (t2), then (after t1 is registered
as waiting for o2) a cycle-search is performed, by going from Object (o_n)
to its holding Thread (t_n) to that's newly-waited-for Object (o_(n+1)) to
that's holding Thread (t_(n+1)) ... until either of these conditions is met:

1) A Thread *not* currently waiting for any object's lock,
2) the first Thread (t1) re-encountered.

In case 1), everything's normal, and t1 remains in wait state for o2.
In case 2), t1's waiting for o2 inevitably leads to a deadlock, so
an error would be thrown from the synchronized()-statement (or
monitorenter at bytecode level)
There should not exist a case 3) where the chain leads into a cycle
not involving t1, unless something went awfully wrong earlier.

It would still be programmer's responsibility to not catch that error
too early (i.e. inside the block still holding the first lock) and
immediately reattempt getting the other lock again.

The nice thing is, that all the necessary information to follow the
chain is uniquely and readily available, and it can be expected to be
a very short chain(not a cycle) in almost all practical cases.

The not-so-nice thing being the thrown error, which would be
changed behaviour for existing code and there may exist code that
sends two threads into deadlock on purpose for whatever reason.

If two threads together close some synchronisation cycle at exactly
the same time then both might throw that Error.

PS: Not sure, if the current thread knows of all its currently held
locks, or at least whether it holds one at all. Even if that isn't
known, this isn't all that crucial to the design - it just thwarts
the attempt to shortcircuit extra checks in the most common and
simple cases.

PPS: that "multisync" I previously suggested was broken, anyway,
as it may still run into deadlocks for objects not in the list,
but currently held.
 
L

Larry K. Wollensham

Andreas said:
Remains one further approach: Active deadlock-detection by the VM.
If the current Thread (t1) is holding the lock of (at least) one object
(o1), and attempts to aquire another lock (of o2) and if this object(o2)'s
lock is currently held by another thread (t2), then (after t1 is registered
as waiting for o2) a cycle-search is performed, by going from Object (o_n)
to its holding Thread (t_n) to that's newly-waited-for Object (o_(n+1)) to
that's holding Thread (t_(n+1)) ... until either of these conditions is met:

1) A Thread *not* currently waiting for any object's lock,
2) the first Thread (t1) re-encountered.

In case 1), everything's normal, and t1 remains in wait state for o2.
In case 2), t1's waiting for o2 inevitably leads to a deadlock, so
an error would be thrown from the synchronized()-statement (or
monitorenter at bytecode level)
There should not exist a case 3) where the chain leads into a cycle
not involving t1, unless something went awfully wrong earlier.

It would still be programmer's responsibility to not catch that error
too early (i.e. inside the block still holding the first lock) and
immediately reattempt getting the other lock again.

Actually, that is exactly what I would do. I'd note which synchronized
generated the error, then wrap it in a try-catch and a loop and rerun it.

Then I'd note which *other* synchronized now generated the error and had
it go uncaught, and know which two places were causing the deadlock (I
find, in practise it's rarely more than two). Then fix it and remove the
try-catch and loop.

The error would be useful for debugging more than anything else. Though
a system might conceivably use them for recovery at run-time, given some
notion of transactions that could be rolled back and re-attempted on
error. Even then, many cases of doing that might look to me like
sloppiness. :)
The not-so-nice thing being the thrown error, which would be
changed behaviour for existing code

I'm not sure changed behavior of bugs bothers people much, particularly
when it makes these bugs easier to diagnose. In the specific case that
the event dispatch thread deadlocks, it might even cause an application
to still-sort-of-work rather than hang after the bug got provoked,
perhaps working well enough for the user to save their work before
exiting, so a data-losing bug might even become milder.
and there may exist code that sends two threads into deadlock on
purpose for whatever reason.

The only such code that I know of is the dining philosophers applet at
the tutorial site, whose purpose is to demonstrate deadlock. Its
function could be preserved by adding try-catch in a loop. Another place
to do that. :)
PS: Not sure, if the current thread knows of all its currently held
locks, or at least whether it holds one at all.

It should, as well as what lock it's waiting on if any. There are
debuggers available that display all of that information.
PPS: that "multisync" I previously suggested was broken, anyway,
as it may still run into deadlocks for objects not in the list,
but currently held.

For values of "broken" that include "not being a magic bullet"; I
thought it was a pretty nifty idea, and though it wouldn't be a cure-all
it could very well prove *useful*.
 
A

Andreas Leitgeb

Larry K. Wollensham said:
Actually, that is exactly what I would do. I'd note which synchronized
generated the error, then wrap it in a try-catch and a loop and rerun it.

If you catch it inside the block that already (and still) holds the first
lock, then on next iteration of that loop, you'll definitely see exactly
the same error at the same spot again. Why? All the other threads are
directly or indirectly still waiting on the lock held by our current thread,
so it will be a completely deterministic case of the error being rethrown
for the same thread at the same place.

If you catch'n'retry far enough outside, i.e. outside the synchronized scope
of the first relevant lock, then there s a chance for the next attempt to be
successful, or at least a different thread banging next time, now disclosing
valuable data for diagnosis.
The error would be useful for debugging more than anything else. Though
a system might conceivably use them for recovery at run-time, given some
notion of transactions that could be rolled back and re-attempted on
error. Even then, many cases of doing that might look to me like
sloppiness. :)

Yes, that's it. It's not supposed to be a feature to rely on, but an
extra safety net to detect logic errors. The same concept as those
"fail-fast" iterators.
 
L

Larry K. Wollensham

Andreas said:
If you catch it inside the block that already (and still) holds the first
lock, then on next iteration of that loop, you'll definitely see exactly
the same error at the same spot again. Why? All the other threads are
directly or indirectly still waiting on the lock held by our current thread,
so it will be a completely deterministic case of the error being rethrown
for the same thread at the same place.

Why do you think that? The VM doing deadlock detection will find a
different deadlocked thread sometimes if it starts from a different
thread and looks for a cycle. The implementation I was thinking of had a
VM thread that sleeps for a while, then looks at a thread, then sleeps
for a while, then looks at another thread, etc.; which thread in a
deadlock got the error would tend to be pseudo-random.

Were you thinking of a different implementation?
 
A

Andreas Leitgeb

Larry K. Wollensham said:
Why do you think that? The VM doing deadlock detection will find a
different deadlocked thread sometimes if it starts from a different
thread and looks for a cycle. The implementation I was thinking of had a
VM thread that sleeps for a while, then looks at a thread, then sleeps
for a while, then looks at another thread, etc.; which thread in a
deadlock got the error would tend to be pseudo-random.

That explains. With that implementation it would work as you meant.
In that case, however, I'd suggest to always let all the involved
threads throw that error, which voids any use of catch'n'rerun loops.
To really resolve the deadlock, at least one thread would have to
give up it's first fork, so letting the error propagate far enough
in at least one of the threads is still a necessity, if recoverage
from the deadlock is wanted.
Were you thinking of a different implementation?

Yes: the one I had been describing. That one implied that always the
thread that *closes* the loop would be the one error-struck, and on
re-closing the lock-loop it would be also the one throwing the error
again and again.

I'm not saying that any one is better than the other.

In both cases, there's still the problem to solve for how to atomically
pass through the list of locked threads. If it's indeed a loop, there's
no problem, since all the threads are standing still, anyway, but in the
normal case (of no deadlock), the gap between obtaining the holding
thread of some object and seeing whether and what that is itself waiting
for may need some super-sync to safely prevent false alarms.
 
L

Larry K. Wollensham

Andreas said:
Yes: the one I had been describing. That one implied that always the
thread that *closes* the loop would be the one error-struck, and on
re-closing the lock-loop it would be also the one throwing the error
again and again.

So you meant to have it deterministically check for a cycle *every* time
a lock acquisition was attempted? That would produce deterministic
repeat of the error in the case noted, but it also would make acquiring
a lock noticeably more expensive than it already is. That's why I was
thinking of a lighter-weight implementation that was also tunable as to
exactly how much time it took from any other operations (at the risk of
being slower to discover any given deadlock and break it by throwing an
error in one of the affected threads).

Perhaps that in turn reflects a server-side bias; you wouldn't want a
deadlock lasting several seconds or even minutes involving the EDT in a
rich client, but on the server side, you generally want to maximize
throughput. Also, having threads occasionally error out and get
restarted instead of deadlocking turns a "the server hangs, or slows to
a crawl and eventually hangs, or leaks threads and eventually crashes"
level bug (exact behavior dependent on how thread pools are managed)
into a "the server runs a bit inefficiently and maybe drops some
requests on the floor" level bug, which is a big improvement. It's like
replacing Ebola with the common cold. Remains a nuisance but you'll live.

If you tend to work more on the client side, or work on formal
systems/scientific apps, I can see why you'd take a different view of
the subject matter, and why you might favor a fast-fail and
deterministic implementation.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Can an Applet beep? 4
ListModel name 10
Sorting a JList 4
JMF? 21
Slightly tricky string problem 18
Java in Java 10
Official Java Classes 10
File over network timeout 3

Members online

Forum statistics

Threads
473,773
Messages
2,569,594
Members
45,123
Latest member
Layne6498
Top