Simple and clear ways of creating distinct types

J

Jorgen Grahn

I have recently realized that I want more distinct types in the code I
work with, to get better help from the compiler.

The code I work with now is like much real-life code: dozens of
logically different kinds of identities are mapped to simple integer
types, e.g. u_int16_t. These can be various kinds of IDs, like UDP
port numbers. Some of them are my-side/remote-side pairs, where it's
easy to confuse them and e.g. send an UDP datagram to my port rather
than the remote one.

I can do something like:

template<char TAG> struct Id { unsigned value; ... };
typedef Id<'x'> MyId;
typedef Id<'y'> RemoteId;

But that means I have to manually create the distinct tags, and the
compiler cannot detect my mistakes.

- How do people generally accomplish the thing I want?

- How do people do it in the more general case, when the real values
are of class type? For example, if my application uses
std::strings, but want to classify them in three different, distinct
types?

- Am I too fixated on static type checking? :)

/Jorgen
 
M

Michael DOUBEZ

Jorgen said:
I have recently realized that I want more distinct types in the code I
work with, to get better help from the compiler.

The code I work with now is like much real-life code: dozens of
logically different kinds of identities are mapped to simple integer
types, e.g. u_int16_t. These can be various kinds of IDs, like UDP
port numbers. Some of them are my-side/remote-side pairs, where it's
easy to confuse them and e.g. send an UDP datagram to my port rather
than the remote one.

I can do something like:

template<char TAG> struct Id { unsigned value; ... };
typedef Id<'x'> MyId;
typedef Id<'y'> RemoteId;

But that means I have to manually create the distinct tags, and the
compiler cannot detect my mistakes.

- How do people generally accomplish the thing I want?

Using named parameters:

//for the local_data/remote_data distinction
template<typename T>
struct Local
{
explicit Local(const T& v):value(v){}
const T& value;
};

template<typename T>
struct Remote
{
explicit Remove(const T& v):value(v){}
const T& value;
};

typedef Local<u_int16_t> LocalPort;
typedef Remote<u_int16_t> RemotePort;

udp_send(const LocalPort& lp,const RemotePort& rp);

udp_send(1689,23); //error
udp_send(LocalPort(1689),RemotePort(23));//ok
- How do people do it in the more general case, when the real values
are of class type? For example, if my application uses
std::strings, but want to classify them in three different, distinct
types?

idem.

Alternatively, you can drop the 'explicit' restriction or use typedefs.
- Am I too fixated on static type checking? :)

Just like with salt; don't overdo it.
 
J

Jorgen Grahn

Using named parameters:

//for the local_data/remote_data distinction
template<typename T>
struct Local
{
explicit Local(const T& v):value(v){}
const T& value;
};

template<typename T>
struct Remote
{
explicit Remove(const T& v):value(v){}
const T& value;
};

typedef Local<u_int16_t> LocalPort;
typedef Remote<u_int16_t> RemotePort;

udp_send(const LocalPort& lp,const RemotePort& rp);

udp_send(1689,23); //error
udp_send(LocalPort(1689),RemotePort(23));//ok

Ok, thanks. I was going to complain first that it makes LocalUdpPort
and LocalTcpPort the same type -- but I can solve that by inventing
TcpPort and UdpPort first, and then

udp_send(Local<UdpPort>(1689),Remote<UdpPort>(23));

That strategy could work in my application since I only deal with a
dozen or so integer-like identifiers ... but if anyone has other
suggestions, I'm still interested!.

/Jorgen
 
J

Jorgen Grahn

Ok, thanks. I was going to complain first that it makes LocalUdpPort
and LocalTcpPort the same type -- but I can solve that by inventing
TcpPort and UdpPort first, and then

udp_send(Local<UdpPort>(1689),Remote<UdpPort>(23));

Although come to think of it, I would have to do something clever
to avoid having to say "Local<UdpPort>(UdpPort(1689))" and
"myport.value.value" at the edges of my code.

Maybe that's not too important; the "identifier"-style objects I'm
thinking of are mostly stored, passed around, used as keys in std::map
lookups and so on.
That strategy could work in my application since I only deal with a
dozen or so integer-like identifiers ... but if anyone has other
suggestions, I'm still interested!.

/Jorgen
 
M

Michael DOUBEZ

This example is not a good one because TCP port and UDP port represent
the same concept and making the distinction between local port and
remote port is not that important (local port are seldom specified
unless they are negotiated).
Although come to think of it, I would have to do something clever
to avoid having to say "Local<UdpPort>(UdpPort(1689))" and
"myport.value.value" at the edges of my code.

If you really want to enforce this kind of composition,you can also use
a free function:
template<class T>
Local<T> local(const T&t){return Local<T>(t);}

udp_send(local(UdpPort(1689))
,remote(UdpPort(23))
);

Which is already ugly enough.

Or with an identifier based system which seems to be what you want. You
can use a combination of trait and named parameters to achieve the same:

//ids of resource
enum ress_id
{
UDP_PORT
,TCP_PORT
,PIPE_PORT
};

//traits of resources
template<ress_id rid> struct ress_trait;

template<> struct ress_trait<UDP_PORT>
{ typedef u_int16_t value_type; }
template<> struct ress_trait<PIPE_PORT>
{ typedef std::string value_type; }


//property of resource
enum ress_prop
{
UNSPECIFIED
,LOCAL
,REMOTE
};

//named ressource template
template<ress_id rid,ress_prop rprop=UNSPECIFIED>
struct Named
{
typedef typename ress_trait<rid>::value_type value_type;

explicit Named(const value_type& v):value(v){}

//explicit conversion from same type with different property
template<ress_prop rprop2>
explicit Named(const Named<rid,rprop2>& v):value(v.value){}

const value_type value;
};

//and then your typedefs
typedef Named<UDP_PORT> UdpPort;
typedef Named<TCP_PORT> TcpPort;
typedef Named<PIPE_PORT> PipePort;

typedef Named<UDP_PORT,LOCAL> LocalUdpPort;
typedef Named<UDP_PORT,REMOTE> RemoteUdpPort;


That way, you can write:
foo(const UdpPort& p);

udp_send(const LocalUdpPort& lp,const RemoteUdpPort& rp)
{
foo(UdpPort(lp));//convert local->plain
foo(UdpPort(rp));//convert remote->plain
}

You have to find the right amount or you will spend more time writing
wrappers and clever struct rather than coding your application.
Maybe that's not too important; the "identifier"-style objects I'm
thinking of are mostly stored, passed around, used as keys in std::map
lookups and so on.

Named parameters are mainly used for enforcing the caller has the
parameter in the right order. Storing them isn't useful.
 
J

Jorgen Grahn

This example is not a good one because TCP port and UDP port represent
the same concept

Do they really, or are we just used to the weak type system of the BSD
sockets API? In almost all application code, TCP-or-UDP is a static
property: you can pick a variable or parameter named 'port' at random
and tell if it's a TCP or UDP port. Same with the local/remote
distinction. That's why I'm turning to the type system.
and making the distinction between local port and
remote port is not that important (local port are seldom specified
unless they are negotiated).

But yes, the example became a bit forced when I added TCP/UDP to the
mix. You're not even likely to handle both in the same application.

My problem domain is in reality an IP-over-UDP tunneling protocol
(GTP) where you have to wrangle four tunnel identifiers and half a
dozen IP addresses for each tunnel. I guess this is no big issue if
your design is sound otherwise. But if it isn't, I believe introducing
distinct types is a simple, low-risk way of improving it.

[snip example code which I will have to read more closely later]
You have to find the right amount or you will spend more time writing
wrappers and clever struct rather than coding your application.

I do not mind the time, which I think would pay off quickly. I am
mostly worried that lots of code duplicaton and extra indirection
would scare readers, and/or make me look odd.

regards,
/Jorgen
 
M

Michael DOUBEZ

Jorgen said:
Do they really, or are we just used to the weak type system of the BSD
sockets API?

It is true tht TCP and UDP are defined in different RFC and represent
different namespace but they are twin. If you look into well known port
number (see iana: http://www.iana.org/assignments/port-numbers), you
will notice that the port assignment for an application is often
duplicated in the UDP space although the UDP flavor is often less used.
An example is tftp which is FTP over UDP.
In almost all application code, TCP-or-UDP is a static
property: you can pick a variable or parameter named 'port' at random
and tell if it's a TCP or UDP port. Same with the local/remote
distinction.

No but I can say which application it represent which is the whole point.
That's why I'm turning to the type system.

In this instance, you are breaking the layering. If one day, you choose
to change the transport layer (from TCP to UDP), you may have some
problem with your system.
But yes, the example became a bit forced when I added TCP/UDP to the
mix. You're not even likely to handle both in the same application.

I have worked on an application that used both and the decision for a
message to use one channel or the other was based on real-time/QoS specs.
My problem domain is in reality an IP-over-UDP tunneling protocol
(GTP) where you have to wrangle four tunnel identifiers and half a
dozen IP addresses for each tunnel. I guess this is no big issue if
your design is sound otherwise. But if it isn't, I believe introducing
distinct types is a simple, low-risk way of improving it.

Then may be, the issue it not to represent the local/remote port in a
procedural way but find a descriptive way of handling this definition.
When working with layered tunnels it might help to have a notion of
endpoints and a way to compose them with template. You can then pass the
structure in parameter (to a template function) and fill in the
individual parameters in the function(s) taking single parameters.

The template function is likely to be optimized out if it just delegate
to the low-level function.
[snip example code which I will have to read more closely later]
You have to find the right amount or you will spend more time writing
wrappers and clever struct rather than coding your application.

I do not mind the time, which I think would pay off quickly. I am
mostly worried that lots of code duplicaton and extra indirection
would scare readers, and/or make me look odd.

If you have code duplication AND extra indirection then I agree you have
a problem :).
What I meant is that I don't think you have to secure all interfaces,
only those that are likely to bring confusion but you may have a lot of
those.
 
C

coal

It is true tht TCP and UDP are defined in different RFC and represent
different namespace but they are twin. If you look into well known port
number (see iana:http://www.iana.org/assignments/port-numbers), you
will notice that the port assignment for an application is often
duplicated in the UDP space although the UDP flavor is often less used.
An example is tftp which is FTP over UDP.


No but I can say which application it represent which is the whole point.


In this instance, you are breaking the layering. If one day, you choose
to change the transport layer (from TCP to UDP), you may have some
problem with your system.



I have worked on an application that used both and the decision for a
message to use one channel or the other was based on real-time/QoS specs.

I've worked on an application that used both also. TCP was
used to receive commands and UDP was used to send data.

Brian Wood
Ebenezer Enterprises
www.webEbenezer.net
 
J

Jorgen Grahn

Jorgen said:
Jorgen Grahn wrote: [...]
Ok, thanks. I was going to complain first that it makes LocalUdpPort
and LocalTcpPort the same type -- but I can solve that by inventing
TcpPort and UdpPort first, and then
This example is not a good one because TCP port and UDP port represent
the same concept

Do they really, or are we just used to the weak type system of the BSD
sockets API?

It is true tht TCP and UDP are defined in different RFC and represent
different namespace but they are twin. If you look into well known port
number (see iana: http://www.iana.org/assignments/port-numbers), you
will notice that the port assignment for an application is often
duplicated in the UDP space although the UDP flavor is often less used.
An example is tftp which is FTP over UDP.

This is off-topic, but tftp isn't really FTP -- it's file transfer
alright, but very unlike FTP in many ways -- and not likely to be used
for the same things.

If I put it this way: I think BSD sockets are polymorphic not because
many applications use them that way, but for other reasons: weak
typing in old C compilers, expensive to add Unix kernel calls, hard to
predict which new protocols would be added.

(It's good that you can select(2) on any descriptor, of course.)
No but I can say which application it represent which is the whole point.


In this instance, you are breaking the layering. If one day, you choose
to change the transport layer (from TCP to UDP), you may have some
problem with your system.

While maintaining code, I have become fed up with code which tries to
be future-proof. It's a rich source of bugs and vagueness.

Maybe I'm going to the other extreme, but I more or less aim for as
little flexibility as possible. I don't mind if I have to change a lot
of code when I unexpectedly have to move from TCP to UDP -- provided I
have good help by the compiler errors.
I have worked on an application that used both and the decision for a
message to use one channel or the other was based on real-time/QoS specs.

Ok. Well, I never claimed that they didn't *exist*, just that they are
too rare to be a good example.
My problem domain is in reality an IP-over-UDP tunneling protocol
(GTP) where you have to wrangle four tunnel identifiers and half a
dozen IP addresses for each tunnel. I guess this is no big issue if
your design is sound otherwise. But if it isn't, I believe introducing
distinct types is a simple, low-risk way of improving it.

Then may be, the issue it not to represent the local/remote port in a
procedural way but find a descriptive way of handling this definition.
When working with layered tunnels it might help to have a notion of
endpoints and a way to compose them with template [...]

Yes, that's a design which makes more sense -- dealing with objects
which are larger than single identifiers. But I often have to maintain
code which is in practice procedural, and I cannot always afford heavy
refactoring. Tightening up the existing types is a safe change
compared to that -- and stronger types will make further refactoring
later more safe.

/Jorgen
 
J

Jorgen Grahn

A minimalist way would be to use Hungarian notation. I.e. all variables
and struct fields would have prefixes, e.g.

int lp_port1; // a local port
int rp_port2; // a remote port

Using this style consistently all over the place ought to make spotting
mistakes much easier. I'm not saying this is better than static
typechecks, but it's an option.

Yes, but that's what we usually start out with, isn't it? Maybe not a
prefix, but *some* kind of half-sane naming convention.
My question was more about the cases when this isn't enough.

/Jorgen
 
M

Michael DOUBEZ

Jorgen said:
Jorgen said:
Jorgen Grahn wrote: [...]
Ok, thanks. I was going to complain first that it makes LocalUdpPort
and LocalTcpPort the same type -- but I can solve that by inventing
TcpPort and UdpPort first, and then
This example is not a good one because TCP port and UDP port represent
the same concept
Do they really, or are we just used to the weak type system of the BSD
sockets API?
It is true tht TCP and UDP are defined in different RFC and represent
different namespace but they are twin. If you look into well known port
number (see iana: http://www.iana.org/assignments/port-numbers), you
will notice that the port assignment for an application is often
duplicated in the UDP space although the UDP flavor is often less used.
An example is tftp which is FTP over UDP.

This is off-topic, but tftp isn't really FTP -- it's file transfer
alright, but very unlike FTP in many ways -- and not likely to be used
for the same things.

No but tftp uses the same port and represent the same application
although there are adaptation at the application layer to account with
the UDP's guarantees (or lack of them).
If I put it this way: I think BSD sockets are polymorphic not because
many applications use them that way, but for other reasons: weak
typing in old C compilers, expensive to add Unix kernel calls, hard to
predict which new protocols would be added.

(It's good that you can select(2) on any descriptor, of course.)

I select on port and pipes which have completely different adressing
system. Socket is another level of abstraction IMO.

While maintaining code, I have become fed up with code which tries to
be future-proof. It's a rich source of bugs and vagueness.

Ok agree. Especially with GTP which AFAIK isn't defined with TCP. I
wasn't thinking in terms of reuse but rather in term of concept.
Separating UDP and TCP is a bit a surprise to me but perhaps its is only
for me.
Maybe I'm going to the other extreme, but I more or less aim for as
little flexibility as possible. I don't mind if I have to change a lot
of code when I unexpectedly have to move from TCP to UDP -- provided I
have good help by the compiler errors.
I have worked on an application that used both and the decision for a
message to use one channel or the other was based on real-time/QoS specs.

Ok. Well, I never claimed that they didn't *exist*, just that they are
too rare to be a good example.
My problem domain is in reality an IP-over-UDP tunneling protocol
(GTP) where you have to wrangle four tunnel identifiers and half a
dozen IP addresses for each tunnel. I guess this is no big issue if
your design is sound otherwise. But if it isn't, I believe introducing
distinct types is a simple, low-risk way of improving it.
Then may be, the issue it not to represent the local/remote port in a
procedural way but find a descriptive way of handling this definition.
When working with layered tunnels it might help to have a notion of
endpoints and a way to compose them with template [...]

Yes, that's a design which makes more sense -- dealing with objects
which are larger than single identifiers. But I often have to maintain
code which is in practice procedural, and I cannot always afford heavy
refactoring. Tightening up the existing types is a safe change
compared to that -- and stronger types will make further refactoring
later more safe.

Yes. That's why I was suggesting a facade: using C++ expression system
to express the composition and use the procedural calls in back-end
which is validated by unit tests. Well, this is more a design issue.
 
J

Jorgen Grahn

I have recently realized that I want more distinct types in the code I
work with, to get better help from the compiler.

The code I work with now is like much real-life code: dozens of
logically different kinds of identities are mapped to simple integer
types, e.g. u_int16_t. These can be various kinds of IDs, like UDP
port numbers. Some of them are my-side/remote-side pairs, where it's
easy to confuse them and e.g. send an UDP datagram to my port rather
than the remote one.

I can do something like:

template<char TAG> struct Id { unsigned value; ... };
typedef Id<'x'> MyId;
typedef Id<'y'> RemoteId;

But that means I have to manually create the distinct tags, and the
compiler cannot detect my mistakes.

- How do people generally accomplish the thing I want?

- How do people do it in the more general case, when the real values
are of class type? For example, if my application uses
std::strings, but want to classify them in three different, distinct
types?

- Am I too fixated on static type checking? :)

I thought it would be polite to mention what I ended up doing.

It turned out (perhaps not surprisingly) that almost all those
distinct integer- or ID-like types I envisioned actually *did* have
things which made them distinct C++ types.

For example, in the struct Id example above, I did not have to make up
fake tags like 'x' and 'y' -- "My" and "Remote" IDs had to be encoded
with different tags anyway, in the binary protocol I implement.

And some of my other "types" turned out not to be very important. I
let them remain typedefed integers for now.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,221
Latest member
TrinidadKa

Latest Threads

Top