Proper Validation

Discussion in 'C++' started by Immortal Nephi, Apr 23, 2009.

  1. I have seen many websites. They always state. Most programmers want
    high performance. They are tired of writing OOP with C++ class. They
    complain about CPU’s overheads because indirection has to be done to
    modify data members. They want to use direct memory instead. They
    decide to stick with old habit of writing C source code to use global
    variables and global functions.
    Set and Get member functions are very important because you want to
    validate data members to avoid incorrect value. Sometimes, they use
    if keyword. Sometimes, they find a way to modify data members without
    needing to use if keyword.
    If Set and Get member functions are not used, then you have to
    examine each hundreds of member functions and try to find why data
    member contains incorrect value. To reduce debugging is better
    software engineering.
    Let me give you my example class code with validation. You may
    notice incomplete code such as NULL on data member—ram and
    p_opcode_array. You do not need to say something about incomplete
    code, but you know what I mean.
    Tell me what you think if class design is to be good software
    engineering. Compare opcode_1() and opcode_1_bad(). The execution on
    this program can be overhead on the CPU. How can you find a way to
    use direct memory instead of indirection? Inline keyword and static
    keyword are the answer.
    Tell me your opinion. Why do programmers want to avoid C++ class if
    they want to write emulator with high performance?

    class CPU
    {
    public:
    CPU() : reg_a(0), reg_x(0), reg_y(0), reg_pc(0), reg_opcode(0), ram
    (0) {}
    ~CPU() {}

    void run()
    {
    (*p_opcode_array[reg_opcode])(); // run opcode_1(), ...,, opcode_256
    ()
    }

    private:
    void opcode_1() // Good Software Engineering Practice
    {
    set_reg_a( ram[reg_pc] );
    increment_reg_pc();
    }

    void opcode_1_bad() // Modifying data member directly is bad practice
    {
    reg_a = ram[reg_pc];
    reg_a &= 0xFF;

    reg_pc++;
    reg_pc &= 0xFFFF;
    }

    void set_reg_a( unsigned reg)
    {
    reg_a = reg & 0xFF; // Validate to 8 bits only
    }

    unsigned int get_reg_a()
    {
    return reg_a;
    }

    void increment_reg_pc()
    {
    reg_pc++;
    reg_pc &= 0xFFFF;
    }

    unsigned int reg_a;
    unsigned int reg_x;
    unsigned int reg_y;
    unsigned int reg_pc;
    unsigned int reg_opcode;
    unsigned char* ram;
    void (*p_opcode_array[0x100])();
    };
     
    Immortal Nephi, Apr 23, 2009
    #1
    1. Advertising

  2. Immortal Nephi

    James Kanze Guest

    On Apr 23, 5:20 am, "Alf P. Steinbach" <> wrote:
    > * Immortal Nephi:


    [...]
    > You have hit on an important software engineering principle:


    > * In programming, as opposed to some other engineering disciplines,
    > redundancy (repetition of the same or similar code) is generally Bad..


    Actually, I don't think that there's any real difference between
    software engineering and other disciplines here. Redundency is
    good, duplication of effort is bad. They're both really the
    same thing, however: the difference is why and how they are
    used. Thus, having a low level function validate its arguments
    is good, even if it is redundant with the input validation
    (hopefully) done in higher level code. And putting two
    identical beams in a building, where one would have done the
    job, is bad.

    (Redundant is actually a strange word in English. In general
    use, it has decidedly negative overtones: "Exceeding what is
    necessary or natural; superfluous. 2. Needlessly wordy or
    repetitive in expression: a student paper filled with redundant
    phrases", according to the online version of The American
    Heritage Dictionary. And no one wants to be made redundant.
    But in most technical fields, it implies some sort of back-up,
    resulting in additional safety or reliability---a CRC in a
    transmitted message, for example, or a redundant system backing
    up the main one.)

    --
    James Kanze (GABI Software) email:
    Conseils en informatique orientée objet/
    Beratung in objektorientierter Datenverarbeitung
    9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34
     
    James Kanze, Apr 23, 2009
    #2
    1. Advertising

  3. * James Kanze:
    > On Apr 23, 5:20 am, "Alf P. Steinbach" <> wrote:
    >> * Immortal Nephi:

    >
    > [...]
    >> You have hit on an important software engineering principle:

    >
    >> * In programming, as opposed to some other engineering disciplines,
    >> redundancy (repetition of the same or similar code) is generally Bad.

    >
    > Actually, I don't think that there's any real difference between
    > software engineering and other disciplines here. Redundency is
    > good, duplication of effort is bad. They're both really the
    > same thing, however: the difference is why and how they are
    > used. Thus, having a low level function validate its arguments
    > is good, even if it is redundant with the input validation
    > (hopefully) done in higher level code. And putting two
    > identical beams in a building, where one would have done the
    > job, is bad.


    Don't know if redundancy and effort duplication is the same thing, but perhaps.

    Anyways, I used the weasel word "generally" mainly because as you note, for
    detecting errors redundancy can be good. Disregarding the Non Redundancy
    Principle of the Design By Contract paradigm, which holds that counter to one's
    intuition a routine body should under no circumstance check the routine's
    precondition. I think that principl should be disregarded because I think that
    assumes language support where such checks can be turned on/off and are not
    regarded as part of the routine's "body".

    However, for the purpose of normal case code's computation of some result,
    redundancy is generally bad in programming, because all such computations of the
    same result should yield the *exact* same result (or state, whatever). And so
    the redundancy can generally not achieve a more correct or robust computation.
    It can improve efficiency in some cases by letting each place do the computation
    in some way most efficient in the local context, but it can also, in general,
    introduce more failure modes and maintainance problems, and it generally
    requires more effort also at the outset, and usually these factors/problems
    outweight any efficiency gain. All IMHO. But I think you'd agree, yes?


    Cheers,

    - Alf

    --
    Due to hosting requirements I need visits to <url: http://alfps.izfree.com/>.
    No ads, and there is some C++ stuff! :) Just going there is good. Linking
    to it is even better! Thanks in advance!
     
    Alf P. Steinbach, Apr 23, 2009
    #3
  4. Immortal Nephi

    Guest

    On 23 Apr, 03:58, Immortal Nephi <> wrote:
    > Set and Get member functions are very important because you want to
    > validate data members to avoid incorrect value.


    While they are a good start, you also have to make sure that the
    functions (especially set) correspond to concepts that exist in the
    outside world (ie not just inside your class) and that they are not
    simply tied to the data members you happen to be using.

    For example, consider the following code. What could possibly go
    wrong?

    Date date;

    date.setyear(2009);
    date.setmon(7);
    date.setday(31);
    date.print(); // prints "31 July, 2009"
    date.setmon(4);
    date.setday(26);
    date.print(); // prints "26 April, 2009"

    Looks simple enough, doesn't it? But suppose we make a small change -
    deleting one line - and call the first version code A and this version
    code B:

    date.setyear(2009);
    date.setmon(7);
    date.setday(31);
    date.print(); // prints "31 July, 2009"
    date.setmon(4);
    date.print(); // what does this print?

    Now, one possibility is that code B will print "31 April, 2009". But
    this is not a real date. The point of doing the validation is to avoid
    this.

    Another possibility is that the program objects at the date.setmon(4);
    line. But that would stop code A from working.

    Another possibility is that changing the month to 4 silently changes
    the day to 30. But this silent change may come back to bite you. You
    will get unexpected results with:

    date.setyear(2009);
    date.setmon(7);
    date.setday(31);
    date.setmon(4);
    date.setmon(7);
    date.print();

    unless you do some very careful record keeping.

    The real solution is not to allow the user to change one of the
    variables alone. You might have a function to set them all at once, eg
    date.setdate(31, 7, 2009); and you might also have functions to set it
    to today's date, or to move forward a week, etc.

    Hope this is useful.
    Paul.
     
    , Apr 23, 2009
    #4
  5. Immortal Nephi

    Jerry Coffin Guest

    In article <482abea5-e8c1-4905-892a-3844645e73a1
    @o20g2000vbh.googlegroups.com>, says...

    [ ... ]

    > Set and Get member functions are very important because you want to
    > validate data members to avoid incorrect value. Sometimes, they use
    > if keyword. Sometimes, they find a way to modify data members without
    > needing to use if keyword.


    I've said for years, and remain convinced, that Get and Set member
    functions are usually a mistake. They usually result from starting with
    a variable of the wrong type (i.e. a type that doesn't have the desired
    semantics) and then attempting to enforce those semantics externally.

    I think it's generally better to define a type that as the semantics you
    really want, and then use it directly.

    > class CPU
    > {


    [ ... ]

    > private:
    > void opcode_1() // Good Software Engineering Practice
    > {
    > set_reg_a( ram[reg_pc] );
    > increment_reg_pc();
    > }


    I don't think this is really good practice of software engineering or
    much of anything else. You're doing exactly what I've described above:
    starting with an unsigned variable, and then attempting to enforce
    different semantics (in this case, an unsigned of a specified number of
    bits) externally. C and C++ provide that particular possibility
    directly:

    unsigned reg_a:8;
    unsigned reg_x:8;
    unsigned reg_y:8;
    unsigned reg_pc:16;

    However, in many cases you want semantics that aren't already directly
    suppoorted, and in such a case you normally have to use a rather
    different route to enforce the desired constraints. In this case, you
    could define your code something like this:

    // constraint: 'bits' must be no larger than the number of bits in an
    // unsigned in whatever implementation you're using. Could probably be
    // enforced, but code to do so would hide most of what we care about.
    //
    template <unsigned bits>
    class reg {
    unsigned data;
    static const unsigned max = (1 << bits) - 1;
    public:
    reg const &operator=(unsigned value) {
    data = value & max;
    return *this;
    }

    // technically unnecessary micro-optimization.
    // see commmetary below for more.
    reg const &operator=(reg value) {
    data = value.data;
    return *this;
    }

    operator unsigned() {
    return data;
    }
    void operator++() {
    data = (data + 1) & max;
    }
    void operator++(int) {
    data = (data + 1) & max;
    }

    reg() : data(0) {}
    };

    > void opcode_1_bad() // Modifying data member directly is bad practice
    > {
    > reg_a = ram[reg_pc];
    > reg_a &= 0xFF;
    >
    > reg_pc++;
    > reg_pc &= 0xFFFF;
    > }


    While I agree that this is far from ideal, IMO, your code for opcode_1
    wasn't much better. In particular, it still required ALL the code in the
    CPU class to be aware of the chosen names for assigning, incrementing,
    etc., the registers, rather than using the names that are native to C++.

    OTOH, if we define our registers using the template above:

    reg<8> reg_a, reg_x, reg_y;
    reg<16> pc;

    then the opcode implementation can look like this:

    void opcode_1_good() {
    reg_a = ram[reg_pc];
    reg_pc++;
    }

    The assignment looks like an assignment and the increment looks like an
    increment. It's much easier for client code to follow the desired
    semantics than to bypass them. We've reduced coupling and increased
    cohesion. This isn't purely theoretical either -- it's easy (at least
    for me) to imagine a quick hack in the original code bypassing the
    desired constraints, but in this version you'd have to do quite a bit of
    extra work to do so.

    As a bonus, this version will generally improve run-time performance as
    well. The second assignment operator in the code isn't entirely
    necessary, but speeds up (many) assignments by eliminating range
    enforcement when the source and destination operands match, assuring us
    that the source will fit into the destination as-is. There's often
    another improvement as well: giving nearly absolute assurance that the
    proper checks happen in one place can nearly eliminate the desire to add
    them elsewhere "just in case".

    --
    Later,
    Jerry.

    The universe is a figment of its own imagination.
     
    Jerry Coffin, Apr 24, 2009
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Colin Mackay
    Replies:
    0
    Views:
    2,728
    Colin Mackay
    Jun 25, 2003
  2. Libs
    Replies:
    0
    Views:
    1,565
  3. Colin Basterfield

    Web form validation vs object validation

    Colin Basterfield, Nov 28, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    450
    Tommy
    Nov 29, 2003
  4. Matt
    Replies:
    14
    Views:
    4,219
    Chad Z. Hower aka Kudzu
    Jan 30, 2004
  5. Lucas Tam
    Replies:
    2
    Views:
    1,147
    Lucas Tam
    Feb 26, 2004
Loading...

Share This Page