Mars Rover Controlled By Java

Dmitry A. Kazakov · Jan 28, 2004

David said:
David said:

[snip]

Click to expand...

"[Mission manager Jennifer] Trosper said the problem appeared to be that
the rover's flash memory couldn't handle the number of files it was
storing. ... She pointed out that the scientists had thoroughly tested
the rover's systems on Earth, but that the longest trial for the file
system was nine days, half of the 18 days Spirit operated before running
into the problem."

http://www.cnn.com/2004/TECH/space/01/26/mars.rovers/

"Thoroughly tested"? If you're going to send any object, and especially
an object with a computer and software, to a distant planet where it is
supposed to survive for about 90 days, wouldn't it seem prudent to run
at least a 90 day test of the object on earth before liftoff?

Click to expand...

HERETIC!

Any child of BillGates' Windoze knows that

1) Memory is unlimited, and

"64K should be enough for everybody."
Bill Gates (1981)

I do not think that the above contradicts in any way!

-))

Stefan Monnier · Jan 28, 2004

Chaffee -- by dedicating the hills surrounding the Mars Exploration

Rover Spirit's landing site to the astronauts. The crew of Apollo 1
perished in flash fire during a launch pad test of their Apollo

^^^^^

-- Stefan

Dimitri Maziuk · Jan 28, 2004

Dmitry A Kazakov sez:

David said:
David said:

[snip]

Click to expand...

"[Mission manager Jennifer] Trosper said the problem appeared to be that
the rover's flash memory couldn't handle the number of files it was
storing. ... She pointed out that the scientists had thoroughly tested
the rover's systems on Earth, but that the longest trial for the file
system was nine days, half of the 18 days Spirit operated before running
into the problem."

http://www.cnn.com/2004/TECH/space/01/26/mars.rovers/

"Thoroughly tested"? If you're going to send any object, and especially
an object with a computer and software, to a distant planet where it is
supposed to survive for about 90 days, wouldn't it seem prudent to run
at least a 90 day test of the object on earth before liftoff?

Click to expand...

HERETIC!

Any child of BillGates' Windoze knows that

1) Memory is unlimited, and

Click to expand...

"64K should be enough for everybody."
Bill Gates (1981)

Now google for exact source of this quote.

Dima

Yoyoma_2 · Jan 28, 2004

Dimitri said:
Dmitry A Kazakov sez:

David C DiNucci wrote:

[snip]

"[Mission manager Jennifer] Trosper said the problem appeared to be that
the rover's flash memory couldn't handle the number of files it was
storing. ... She pointed out that the scientists had thoroughly tested
the rover's systems on Earth, but that the longest trial for the file
system was nine days, half of the 18 days Spirit operated before running
into the problem."

http://www.cnn.com/2004/TECH/space/01/26/mars.rovers/

"Thoroughly tested"? If you're going to send any object, and especially
an object with a computer and software, to a distant planet where it is
supposed to survive for about 90 days, wouldn't it seem prudent to run
at least a 90 day test of the object on earth before liftoff?

HERETIC!

Any child of BillGates' Windoze knows that

1) Memory is unlimited, and

Click to expand...

"64K should be enough for everybody."
Bill Gates (1981)

Click to expand...

Now google for exact source of this quote.

Dima

You'r right, we have an expression at school that "Memory is cheap".
Even our larger apps don't take much more than 100mb of memory (that's
for huge simulations).

But its a special case. When you have to lift up the memory where 1 lb
of equipment is 5lb of propellant, then its another issue. They have to
skim on weight, and that includes having small amounts of memory.. As a
tradeoff i bet they have abit of a faster processor and transmitter.

The mars rover is an example of Software Engineering vs Computer Science
vs Computer Engineering vs Electrical Engineering.

Bruce Bowen · Jan 29, 2004

Uncle Al said:
There are no hard drives on Mars! With ambient air pressure being
only 7 torr, there isn't nearly enough air pressure to levitate the
read/write heads when the hard drive spins up. If you seal the hard
drive, it overheats.

This is a trivial engineering problem to address. Encase the HD in a
larger sealed and presurize container, with enough surface area and/or
internal air circulation to keep it cool. A low power 2.5" HD
shouldn't take that much larger of a container. What about the flash
sized microdrives?

-Bruce

Yoyoma_2 · Jan 29, 2004

Bruce said:
This is a trivial engineering problem to address. Encase the HD in a
larger sealed and presurize container, with enough surface area and/or
internal air circulation to keep it cool. A low power 2.5" HD
shouldn't take that much larger of a container. What about the flash
sized microdrives?

From what i was tought all hard drives must be kept sealed. This is
because dust and air pollutants could accumulate on the read head, and
if disaster strikes, a particle of dust could get between the head and
the disk (which are almost touching, but not quite).

Tony Hill · Jan 30, 2004

This is a trivial engineering problem to address. Encase the HD in a
larger sealed and presurize container, with enough surface area and/or
internal air circulation to keep it cool. A low power 2.5" HD
shouldn't take that much larger of a container. What about the flash
sized microdrives?

Yeah, and the 4+ Gs that the drive would experience during take-off
would do wonders for that drive! Not to mention the high levels of
radiation in space would probably fry any drive (to the best of my
knowledge, no one makes rad-hardened hard drives).

Just stick everything on a disk-on-chip, much easier and cheaper than
trying to jerry-rig some sort of hard drive contraption.

Jon Leech · Jan 30, 2004

Yeah, and the 4+ Gs that the drive would experience during take-off
would do wonders for that drive!

Don't worry about the launch so much as the transients on landing
(which is to say, hitting and bouncing a couple of dozen times

Jon
__@/

The Ghost In The Machine · Jan 30, 2004

In sci.physics, Yoyoma_2
<Yoyoma_2@[>
wrote

From what i was tought all hard drives must be kept sealed. This is
because dust and air pollutants could accumulate on the read head, and
if disaster strikes, a particle of dust could get between the head and
the disk (which are almost touching, but not quite).

At the scale of a disk drive's heads a human hair would
be the equivalent of hitting a mountain -- but with a
slightly different result; instead of simply destroying
the item hitting the platter, it would leave a very nasty
gouge in the platter -- a head crash. Therefore all drives
of this type have to be sealed.

I was under the impression that drives had to survive
10-G impulse tests (e.g., dropping a laptop on the floor
from the height of a desk). So 4G wouldn't be much of a
problem to handle.

I don't know about radiation.

Uncle Al has a point; here on Earth there's a good
(relatively speaking) connection between ambient air and
the drive's internals. At 1 kPa the thermal flow is much
more tenuous; of course, one could deploy the drive with
a large cooling panel which doubles as its power supply
(the sun powers the drive; the water circulates among
the panels, heating them and cooling the drive). A large
forced-air fan might complete the ensemble, making for a
Frankenstein's monster that looks like a cross between a
jet engine and a tail-dragging Godzilla...

though I
suspect the primary cooling method is radiative.

Modern computers are starting to use water-cooling, which
could get interesting if people get the bright idea of
hooking the heat exchanger into the cold-water supply as
opposed to letting it exhaust noisily into the ambient air.
Of course all that does is transfer the heat elsewhere
(and possibly waste water), probably into the sea if the
water is allowed to drain, or to one's neighbors if the
water goes back into the line (not recommended for various
reasons; in fact, backflow check valves are required for
certain equipment).

I would think, though, that, absent radiation concerns,
flash memory is a nice solution -- and the radiation
presumably can be mitigated by proper shielding.

Yoyoma_2 · Jan 30, 2004

Tony said:
Yeah, and the 4+ Gs that the drive would experience during take-off
would do wonders for that drive! Not to mention the high levels of
radiation in space would probably fry any drive (to the best of my
knowledge, no one makes rad-hardened hard drives).

I actually know of some rad-proofing drives. Actually i hear there are
alot of them, they are mostly used in the military (mil-spec drives).
For example, hard drives on an aircraft carrier have to be able to take
a direct nuclear assult and still function. Little piece of cold war
trivia for you

Ran proofing isn't a very big deal.

According to this page i just pulled up at
random(http://www.westerndigital.com/en/products/products.asp?DriveID=41),
normal desktop "run of the mill" drives can take impulses of up to 250G
when non-operating, and 60G while operating at a delta-t of 2 seconds.
That's pretty good.

So if that's for ordinary hard drives, immagine a mil-spec drive. I
doubt carriers kept all of their data on flash memory in the 1970s.
Even if they use mag tape reel, it implies that they have developed some
sort of rad-proofing for it.

BarryNL · Jan 30, 2004

Yoyoma_2 said:
I actually know of some rad-proofing drives. Actually i hear there are
alot of them, they are mostly used in the military (mil-spec drives).
For example, hard drives on an aircraft carrier have to be able to take
a direct nuclear assult and still function. Little piece of cold war
trivia for you

Ran proofing isn't a very big deal.

According to this page i just pulled up at
random(http://www.westerndigital.com/en/products/products.asp?DriveID=41),
normal desktop "run of the mill" drives can take impulses of up to 250G
when non-operating, and 60G while operating at a delta-t of 2 seconds.
That's pretty good.

So if that's for ordinary hard drives, immagine a mil-spec drive. I
doubt carriers kept all of their data on flash memory in the 1970s. Even
if they use mag tape reel, it implies that they have developed some sort
of rad-proofing for it.

And the 4Gs thing is a non-issue. Most normal ATA drives can take around
300Gs when not operating or 30Gs when running without damage.

Stanley Krute · Jan 30, 2004

Howdy Edward

I am almost flabbergasted into textlessness. The fact that a system
... any system, not just a computer ... may work correctly in some one
or two delta range yet fail in some 10 or 20 delta range ... some
newbie tyro university graduate wet behind the ears neophyte kid might
make this mistake in a small project, and the old seasoned pro salt
seen it all manager would take this as a teaching opportunity. But in
an entire organization, a huge project putting a robot on a distant
planet, and not once did this occur to anybody!?

Yep, you nailed it.

My 5-word software testing book: Run 'er Hard & Long

-- stan

Double-A · Jan 31, 2004

Re: Mars Rover Not Responding!

Maybe it's your cologne, you Martian perv!

Nick Maclaren · Jan 31, 2004

Howdy Edward

Yep, you nailed it.

My 5-word software testing book: Run 'er Hard & Long

Sigh. That is very likely the CAUSE of the problem :-(

Any particular test schedule (artificial or natural) will create a
distribution of circumstances over the space of all that are handled
differently by the program. And, remember, we are talking about a
space of cardinality 10^(10^4) to 10^(10^8). Any particular, broken
logic (usually a combination of sections of code and data) may be
invoked only once ever millennium, or perhaps never.

Now, change the test schedule in an apparently trivial way, or use
the program for real, and that broken logic may be invoked once a
day. Ouch. Incidentally, another way of looking at this is the
probability of distinguishing two finite element automata by feeding
in test strings and comparing the results. It was studied some
decades back, and the conclusions are not pretty.

The modern, unsystematic approach to testing is hopeless as an
egnineering technique, though fine as a political or marketing one.
For high-reliability codes, we need to go back to the approaches
used in the days when most computer people were also mathematicians,
engineers or both.

Regards,
Nick Maclaren.

Stanley Krute · Jan 31, 2004

Howdy Nick

The modern, unsystematic approach to testing is hopeless as an
egnineering technique, though fine as a political or marketing one.
For high-reliability codes, we need to go back to the approaches
used in the days when most computer people were also mathematicians,
engineers or both.

Deep agreement as to importance of math-smart testing.

-- stan

Edward Green · Jan 31, 2004

Sigh. That is very likely the CAUSE of the problem :-(

Any particular test schedule (artificial or natural) will create a
distribution of circumstances over the space of all that are handled
differently by the program. And, remember, we are talking about a
space of cardinality 10^(10^4) to 10^(10^8). Any particular, broken
logic (usually a combination of sections of code and data) may be
invoked only once ever millennium, or perhaps never.

Now, change the test schedule in an apparently trivial way, or use
the program for real, and that broken logic may be invoked once a
day. Ouch. Incidentally, another way of looking at this is the
probability of distinguishing two finite element automata by feeding
in test strings and comparing the results. It was studied some
decades back, and the conclusions are not pretty.

The modern, unsystematic approach to testing is hopeless as an
egnineering technique, though fine as a political or marketing one.
For high-reliability codes, we need to go back to the approaches
used in the days when most computer people were also mathematicians,
engineers or both.

Going back to the case at hand: it still sounds to me like the stated
cause of the bug -- more files were written to flash memory than were
anticipated in design -- is at least restrospectively obviously a
possibility which should have been addressed at the design stage, and
maybe should have been prospectively obvious also.

I'm not quite sure how to jive this with your theoretical insight that
we are searching a space of a cardinality of 10 followed by many, many
zeroes, and similar observations which make the problem sound
hopeless: maybe I'm naively wrong, or not (well that covers all the
possibilities ;-).

Is asking that when the program expends some resource it handles the
problem in some minimally damaging way really an impossibly hard
problem in the space of all the impossibly hard problems with which
computer science abounds, or is it merely a challenging but tractable
engineering problem?

For example, suppose we had a machine running around some play pen,
the the space of possible joint states of the machine and the play pen
were of cardinality 10 followed by some humungous number of zeroes.
And suppose that when the machine leaves the play pen, that is a
"crash". Now, we might ask why the machine crashed, and the designer
might respond with language about the cardinality of the joint state
space, and the impossibility of complete testing. But now we might
ask why he did not put a _fence_ around the play pen, and this answer
is no longer sufficient, and the answer "well, we let it run around
for a while, and it didn't seem likely to cross the boundaries, so we
didn't bother with a fence", is marginal.

Is the problem of building a number of internal fences in complex
systems sufficient to provided timely alert to unanticipated operating
conditions itself an intractably hard problem, or merely hard?

Nick Maclaren · Feb 1, 2004

Going back to the case at hand: it still sounds to me like the stated
cause of the bug -- more files were written to flash memory than were
anticipated in design -- is at least restrospectively obviously a
possibility which should have been addressed at the design stage, and
maybe should have been prospectively obvious also.

Perhaps. Without investigating the problem carefully, it is impossible
to tell.

I'm not quite sure how to jive this with your theoretical insight that
we are searching a space of a cardinality of 10 followed by many, many
zeroes, and similar observations which make the problem sound
hopeless: maybe I'm naively wrong, or not (well that covers all the
possibilities ;-).

Is asking that when the program expends some resource it handles the
problem in some minimally damaging way really an impossibly hard
problem in the space of all the impossibly hard problems with which
computer science abounds, or is it merely a challenging but tractable
engineering problem?

There are several aspects here. Minimising damage IS an impossibly
hard problem (not just exponentially expensive, but insoluble). But
that is often used as excuse to avoid even attempting to constrain
the consequent damages. Think of it this way.

Identifying and logging resource exhaustion takes a fixed time, so
there is no excuse not to do it. Yes, that can exhaust space in the
logging files, so there is a recursive issue, but there are known
partial solutions to that.

Identifying the cause is simple if there is one factor, harder if
there are two, and so on. To a great extent, that is also true of
predicting the resources needed, but that can be insoluble even with
one factor. This is confused by the fact that, the fewer the factors,
the more likely a bug is to be removed in initial testing.

Most of my bug-tracking time is spent on ones with 3-10 factors, on
a system with (say) 100-1,000 relevant factors. It isn't surprising
that the vendors' testing has failed to detect them. There are only
two useful approaches to such issues:

1) To design the system using a precise mathematical model, so
that you can eliminate, minimise or constrain interactions. This
also needs PRECISE interface specifications, of course, not the sloppy
rubbish that is almost universal.

2) To provide good detection and diagnostic facilities, to help
locating the causes and effects of such problems. This is even more
neglected nowadays, and is of limited help for systems like Mars
missions.

Regards,
Nick Maclaren.

A. G. McDowell · Feb 1, 2004

Nick Maclaren said:
Sigh. That is very likely the CAUSE of the problem :-(

Any particular test schedule (artificial or natural) will create a
distribution of circumstances over the space of all that are handled
differently by the program. And, remember, we are talking about a
space of cardinality 10^(10^4) to 10^(10^8). Any particular, broken
logic (usually a combination of sections of code and data) may be
invoked only once ever millennium, or perhaps never.

Now, change the test schedule in an apparently trivial way, or use
the program for real, and that broken logic may be invoked once a
day. Ouch. Incidentally, another way of looking at this is the
probability of distinguishing two finite element automata by feeding
in test strings and comparing the results. It was studied some
decades back, and the conclusions are not pretty.

The modern, unsystematic approach to testing is hopeless as an
egnineering technique, though fine as a political or marketing one.
For high-reliability codes, we need to go back to the approaches
used in the days when most computer people were also mathematicians,
engineers or both.

Regards,
Nick Maclaren.

I agree that more could be done before thorough testing, but I would not
attempt to replace even random and soak testing. Formal methods alone
will never be enough because they prove only the correctness of a
specification implemented by a model, not that the specification or the
model are accurate enough representations of the real world. The speed
with which NASA have claimed to replicate the problem suggests something
widely enough spread through the state space to be replicable by soak
testing. Furthermore, a planetary probe is a pretty good match to even
random testing, because (given the relative costs of putting something
on Mars and of conducting automatic testing in simulation or in a
warehouse) it may be possible to run for longer in test than in real
life, reducing the chance of bugs that show up only in real life.
Example: the priority inversion bug that hit a previous probe had
apparently shown up in testing but been ignored, because it wasn't what
they were looking for at the time. My impression of current good
practice is that black box testing, white box testing, and code review
are good at finding different sorts of bugs, and so should be used
together. I would lump pre-testing approaches into code review.

Nick Maclaren · Feb 1, 2004

I agree that more could be done before thorough testing, but I would not
attempt to replace even random and soak testing. Formal methods alone
will never be enough because they prove only the correctness of a
specification implemented by a model, not that the specification or the
model are accurate enough representations of the real world. The speed
with which NASA have claimed to replicate the problem suggests something
widely enough spread through the state space to be replicable by soak
testing. Furthermore, a planetary probe is a pretty good match to even
random testing, because (given the relative costs of putting something
on Mars and of conducting automatic testing in simulation or in a
warehouse) it may be possible to run for longer in test than in real
life, reducing the chance of bugs that show up only in real life.
Example: the priority inversion bug that hit a previous probe had
apparently shown up in testing but been ignored, because it wasn't what
they were looking for at the time. My impression of current good
practice is that black box testing, white box testing, and code review
are good at finding different sorts of bugs, and so should be used
together. I would lump pre-testing approaches into code review.

That is a classic example of what I say is mistaken methodology!
Yes, pretty well everything that you say is true, but you have missed
the fact that interactions WILL change the failure syndromes in ways
that mean untargetted testing will miss even the most glaringly
obvious errors. There are just TOO MANY possible combinations of
conditions to rely on random testing.

For a fairly simple or pervasive problem, unintelligent 'soak' testing
will work. For a more complex one, it won't. Unless you target the
testing fairly accurately, certain syndromes will either not occur or
be so rare as not to happen in a month of Sundays. I see this problem
on a daily basis :-(

An aspect of a mathematical design that I did not say explicitly (but
only hinted at) is that you can identify areas to check that the code
matches the mode and other areas where the analysis descends from
mathematics to hand-waving. You can then design precise tests for the
former, and targetted soak tests for the latter. It isn't uncommon
for such an approach to increase the effectiveness of testing beyond
all recognition.

Regards,
Nick Maclaren.

A. G. McDowell · Feb 1, 2004

That is a classic example of what I say is mistaken methodology!
Yes, pretty well everything that you say is true, but you have missed
the fact that interactions WILL change the failure syndromes in ways
that mean untargetted testing will miss even the most glaringly
obvious errors. There are just TOO MANY possible combinations of
conditions to rely on random testing.
(trimmed)

An aspect of a mathematical design that I did not say explicitly (but
only hinted at) is that you can identify areas to check that the code
matches the mode and other areas where the analysis descends from
mathematics to hand-waving. You can then design precise tests for the
former, and targetted soak tests for the latter. It isn't uncommon
for such an approach to increase the effectiveness of testing beyond
all recognition.

Regards,
Nick Maclaren.

I would be very interested to hear more about increasing the
effectiveness of testing beyond all recognition. I am a professional
programmer in an area where we routinely estimate the testing effort as
about equal to the programming effort (in terms of staff time, but not
necessarily staff cost). Do you have references? As a token of sincerity
I will provide references for what we seem to agree is commercial
practice (whether it should be or not):

The main reference establishing commercial practice that I can find
online seems to be "A controlled Experiment in Program Testing and Code
Walkthroughs/Inspections" by Myers, CACM Volume 21, Number 9. The date
shows that some technology transfer might indeed be overdue - Volume 21
translates to 1978! However, I think automated testing, especially
regression testing, has become a lot easier, or at least more popular,
than it was (JUnit, Rational Robot, etc.). The notion of test coverage
tools seems to be of similar vintage and actually became less accessible
for a while as no version of tcov appeared for setups other than K&R C
on Unix. I spent part of last week trying out Hansel, an open source
test coverage tool for Java, available on www.sourceforge.net.

References from within books to hand:
Black box and white box complementary: "Testing Computer Software", by
Kamer, Falk, and Nguyen, Chapter 12 P271.
Code Review invaluable (but few details on how to do one): "Software
Assessments, Benchmarks, and Best Practices", by Capers Jones, e.g.
Chapter on Best Practices for Systems Software, P367
Mars Pathfinder bug ignored during pre-launch tests: "The Practice of
Programming", by Kernighan and Pike, Section 5.2 P121. (The next chapter
is a good short overview of commercial-practice test design circa 1999).

Mars Rover Controlled By Java

Dmitry A. Kazakov

Stefan Monnier

Dimitri Maziuk

Yoyoma_2

Bruce Bowen

Yoyoma_2

Tony Hill

Jon Leech

The Ghost In The Machine

Yoyoma_2

BarryNL

Stanley Krute

Double-A

Nick Maclaren

Stanley Krute

Edward Green

Nick Maclaren

A. G. McDowell

Nick Maclaren

A. G. McDowell

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads