Proper Validation

I

Immortal Nephi

I have seen many websites. They always state. Most programmers want
high performance. They are tired of writing OOP with C++ class. They
complain about CPU’s overheads because indirection has to be done to
modify data members. They want to use direct memory instead. They
decide to stick with old habit of writing C source code to use global
variables and global functions.
Set and Get member functions are very important because you want to
validate data members to avoid incorrect value. Sometimes, they use
if keyword. Sometimes, they find a way to modify data members without
needing to use if keyword.
If Set and Get member functions are not used, then you have to
examine each hundreds of member functions and try to find why data
member contains incorrect value. To reduce debugging is better
software engineering.
Let me give you my example class code with validation. You may
notice incomplete code such as NULL on data member—ram and
p_opcode_array. You do not need to say something about incomplete
code, but you know what I mean.
Tell me what you think if class design is to be good software
engineering. Compare opcode_1() and opcode_1_bad(). The execution on
this program can be overhead on the CPU. How can you find a way to
use direct memory instead of indirection? Inline keyword and static
keyword are the answer.
Tell me your opinion. Why do programmers want to avoid C++ class if
they want to write emulator with high performance?

class CPU
{
public:
CPU() : reg_a(0), reg_x(0), reg_y(0), reg_pc(0), reg_opcode(0), ram
(0) {}
~CPU() {}

void run()
{
(*p_opcode_array[reg_opcode])(); // run opcode_1(), ...,, opcode_256
()
}

private:
void opcode_1() // Good Software Engineering Practice
{
set_reg_a( ram[reg_pc] );
increment_reg_pc();
}

void opcode_1_bad() // Modifying data member directly is bad practice
{
reg_a = ram[reg_pc];
reg_a &= 0xFF;

reg_pc++;
reg_pc &= 0xFFFF;
}

void set_reg_a( unsigned reg)
{
reg_a = reg & 0xFF; // Validate to 8 bits only
}

unsigned int get_reg_a()
{
return reg_a;
}

void increment_reg_pc()
{
reg_pc++;
reg_pc &= 0xFFFF;
}

unsigned int reg_a;
unsigned int reg_x;
unsigned int reg_y;
unsigned int reg_pc;
unsigned int reg_opcode;
unsigned char* ram;
void (*p_opcode_array[0x100])();
};
 
J

James Kanze

* Immortal Nephi:

[...]
You have hit on an important software engineering principle:
* In programming, as opposed to some other engineering disciplines,
redundancy (repetition of the same or similar code) is generally Bad..

Actually, I don't think that there's any real difference between
software engineering and other disciplines here. Redundency is
good, duplication of effort is bad. They're both really the
same thing, however: the difference is why and how they are
used. Thus, having a low level function validate its arguments
is good, even if it is redundant with the input validation
(hopefully) done in higher level code. And putting two
identical beams in a building, where one would have done the
job, is bad.

(Redundant is actually a strange word in English. In general
use, it has decidedly negative overtones: "Exceeding what is
necessary or natural; superfluous. 2. Needlessly wordy or
repetitive in expression: a student paper filled with redundant
phrases", according to the online version of The American
Heritage Dictionary. And no one wants to be made redundant.
But in most technical fields, it implies some sort of back-up,
resulting in additional safety or reliability---a CRC in a
transmitted message, for example, or a redundant system backing
up the main one.)
 
A

Alf P. Steinbach

* James Kanze:
* Immortal Nephi:
[...]
You have hit on an important software engineering principle:
* In programming, as opposed to some other engineering disciplines,
redundancy (repetition of the same or similar code) is generally Bad.

Actually, I don't think that there's any real difference between
software engineering and other disciplines here. Redundency is
good, duplication of effort is bad. They're both really the
same thing, however: the difference is why and how they are
used. Thus, having a low level function validate its arguments
is good, even if it is redundant with the input validation
(hopefully) done in higher level code. And putting two
identical beams in a building, where one would have done the
job, is bad.

Don't know if redundancy and effort duplication is the same thing, but perhaps.

Anyways, I used the weasel word "generally" mainly because as you note, for
detecting errors redundancy can be good. Disregarding the Non Redundancy
Principle of the Design By Contract paradigm, which holds that counter to one's
intuition a routine body should under no circumstance check the routine's
precondition. I think that principl should be disregarded because I think that
assumes language support where such checks can be turned on/off and are not
regarded as part of the routine's "body".

However, for the purpose of normal case code's computation of some result,
redundancy is generally bad in programming, because all such computations of the
same result should yield the *exact* same result (or state, whatever). And so
the redundancy can generally not achieve a more correct or robust computation.
It can improve efficiency in some cases by letting each place do the computation
in some way most efficient in the local context, but it can also, in general,
introduce more failure modes and maintainance problems, and it generally
requires more effort also at the outset, and usually these factors/problems
outweight any efficiency gain. All IMHO. But I think you'd agree, yes?


Cheers,

- Alf
 
G

gw7rib

Set and Get member functions are very important because you want to
validate data members to avoid incorrect value.

While they are a good start, you also have to make sure that the
functions (especially set) correspond to concepts that exist in the
outside world (ie not just inside your class) and that they are not
simply tied to the data members you happen to be using.

For example, consider the following code. What could possibly go
wrong?

Date date;

date.setyear(2009);
date.setmon(7);
date.setday(31);
date.print(); // prints "31 July, 2009"
date.setmon(4);
date.setday(26);
date.print(); // prints "26 April, 2009"

Looks simple enough, doesn't it? But suppose we make a small change -
deleting one line - and call the first version code A and this version
code B:

date.setyear(2009);
date.setmon(7);
date.setday(31);
date.print(); // prints "31 July, 2009"
date.setmon(4);
date.print(); // what does this print?

Now, one possibility is that code B will print "31 April, 2009". But
this is not a real date. The point of doing the validation is to avoid
this.

Another possibility is that the program objects at the date.setmon(4);
line. But that would stop code A from working.

Another possibility is that changing the month to 4 silently changes
the day to 30. But this silent change may come back to bite you. You
will get unexpected results with:

date.setyear(2009);
date.setmon(7);
date.setday(31);
date.setmon(4);
date.setmon(7);
date.print();

unless you do some very careful record keeping.

The real solution is not to allow the user to change one of the
variables alone. You might have a function to set them all at once, eg
date.setdate(31, 7, 2009); and you might also have functions to set it
to today's date, or to move forward a week, etc.

Hope this is useful.
Paul.
 
J

Jerry Coffin

[ ... ]
Set and Get member functions are very important because you want to
validate data members to avoid incorrect value. Sometimes, they use
if keyword. Sometimes, they find a way to modify data members without
needing to use if keyword.

I've said for years, and remain convinced, that Get and Set member
functions are usually a mistake. They usually result from starting with
a variable of the wrong type (i.e. a type that doesn't have the desired
semantics) and then attempting to enforce those semantics externally.

I think it's generally better to define a type that as the semantics you
really want, and then use it directly.
class CPU
{

[ ... ]
private:
void opcode_1() // Good Software Engineering Practice
{
set_reg_a( ram[reg_pc] );
increment_reg_pc();
}

I don't think this is really good practice of software engineering or
much of anything else. You're doing exactly what I've described above:
starting with an unsigned variable, and then attempting to enforce
different semantics (in this case, an unsigned of a specified number of
bits) externally. C and C++ provide that particular possibility
directly:

unsigned reg_a:8;
unsigned reg_x:8;
unsigned reg_y:8;
unsigned reg_pc:16;

However, in many cases you want semantics that aren't already directly
suppoorted, and in such a case you normally have to use a rather
different route to enforce the desired constraints. In this case, you
could define your code something like this:

// constraint: 'bits' must be no larger than the number of bits in an
// unsigned in whatever implementation you're using. Could probably be
// enforced, but code to do so would hide most of what we care about.
//
template <unsigned bits>
class reg {
unsigned data;
static const unsigned max = (1 << bits) - 1;
public:
reg const &operator=(unsigned value) {
data = value & max;
return *this;
}

// technically unnecessary micro-optimization.
// see commmetary below for more.
reg const &operator=(reg value) {
data = value.data;
return *this;
}

operator unsigned() {
return data;
}
void operator++() {
data = (data + 1) & max;
}
void operator++(int) {
data = (data + 1) & max;
}

reg() : data(0) {}
};
void opcode_1_bad() // Modifying data member directly is bad practice
{
reg_a = ram[reg_pc];
reg_a &= 0xFF;

reg_pc++;
reg_pc &= 0xFFFF;
}

While I agree that this is far from ideal, IMO, your code for opcode_1
wasn't much better. In particular, it still required ALL the code in the
CPU class to be aware of the chosen names for assigning, incrementing,
etc., the registers, rather than using the names that are native to C++.

OTOH, if we define our registers using the template above:

reg<8> reg_a, reg_x, reg_y;
reg<16> pc;

then the opcode implementation can look like this:

void opcode_1_good() {
reg_a = ram[reg_pc];
reg_pc++;
}

The assignment looks like an assignment and the increment looks like an
increment. It's much easier for client code to follow the desired
semantics than to bypass them. We've reduced coupling and increased
cohesion. This isn't purely theoretical either -- it's easy (at least
for me) to imagine a quick hack in the original code bypassing the
desired constraints, but in this version you'd have to do quite a bit of
extra work to do so.

As a bonus, this version will generally improve run-time performance as
well. The second assignment operator in the code isn't entirely
necessary, but speeds up (many) assignments by eliminating range
enforcement when the source and destination operands match, assuring us
that the source will fit into the destination as-is. There's often
another improvement as well: giving nearly absolute assurance that the
proper checks happen in one place can nearly eliminate the desire to add
them elsewhere "just in case".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top