Discussion:
Is type-punning with unions legal?
(too old to reply)
Andrzej Krzemieński
2015-07-14 09:19:59 UTC
Permalink
Hi,
I am unable to figure out from the Standard what it has to say about
type-punning with unions.

I have the following union:
union String
{
char as_array[sizeof(int)];
int as_int;
};

I intend to initialize member as_array, but later access member as_int. The
goal is to perform a sort of reinterpret cast by accessing memory through a
different member. The question is: what does the Standard has to say about
it?

Is this an undefined behavior? But if so, can you point me to the relevant
sections?

Or is this part of the standard underspecified? In that case, does someone
know what the intention is?

Regards,
&rzej
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.
Fabio Fracassi
2015-07-14 09:42:03 UTC
Permalink
Post by Andrzej Krzemieński
Hi,
I am unable to figure out from the Standard what it has to say about
type-punning with unions.
|
unionString
{
charas_array[sizeof(int)];
intas_int;
};
|
I intend to initialize member as_array, but later access member
as_int. The goal is to perform a sort of reinterpret cast by accessing
memory through a different member. The question is: what does the
Standard has to say about it?
Is this an undefined behavior? But if so, can you point me to the
relevant sections?
Or is this part of the standard underspecified? In that case, does
someone know what the intention is?
If I remember the last discussion (on the undefined behavior list) about
this correctly, it is intentionally undefined behavior.
I also interpret §[class.union]/1 as forbidding it, also it is not very
explicit about it. The first sentence reads:
"In a union, at most one of the non-static data members can be active at
any time, that is, the value of at most one of the non-static data
members can be stored in a union at any time."
Which I interpret as forbidding accessing a unions member that is not
the currently active one. The following note supports this reading as it
explicitly defines an exception to this rule:
" [ Note: One special guarantee is made in order to simplify the use of
unions: If a standard-layout union contains several standard-layout
structs that share a common initial sequence (9.2), and if an object of
this standard-layout union type contains one of the standard-layout
structs, it is permitted to inspect the common initial sequence of any
of standard-layout struct members; see 9.2. — end note ]"

so it is legal to do
struct A { int u; int v;};
struct B { int x; char y;};
union legal {
A a;
B b;
}

legal.A = A{1,1};
read(legal.B.x);

both read(legal.B.y) and your example would be illegal.

IIRC the only way to do legal type pruning is using memcpy.

Best
Fabio
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.
'Johannes Schaub' via ISO C++ Standard - Discussion
2015-07-14 09:46:11 UTC
Permalink
Post by Fabio Fracassi
Post by Andrzej Krzemieński
Hi,
I am unable to figure out from the Standard what it has to say about
type-punning with unions.
Post by Fabio Fracassi
Post by Andrzej Krzemieński
union String
{
char as_array[sizeof(int)];
int as_int;
};
I intend to initialize member as_array, but later access member as_int.
The goal is to perform a sort of reinterpret cast by accessing memory
through a different member. The question is: what does the Standard has to
say about it?
Post by Fabio Fracassi
Post by Andrzej Krzemieński
Is this an undefined behavior? But if so, can you point me to the
relevant sections?
Post by Fabio Fracassi
Post by Andrzej Krzemieński
Or is this part of the standard underspecified? In that case, does
someone know what the intention is?
Post by Fabio Fracassi
If I remember the last discussion (on the undefined behavior list) about
this correctly, it is intentionally undefined behavior.
Post by Fabio Fracassi
I also interpret §[class.union]/1 as forbidding it, also it is not very
"In a union, at most one of the non-static data members can be active at
any time, that is, the value of at most one of the non-static data members
can be stored in a union at any time."
Post by Fabio Fracassi
Which I interpret as forbidding accessing a unions member that is not the
currently active one.

If so, this would not need to be banned in constexpressions directly, but
merely by its undefined behavior (and reading an initial common sequence is
intended to be allowed there aswell).

Richard?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.
David Krauss
2015-07-14 09:47:08 UTC
Permalink
Post by Fabio Fracassi
IIRC the only way to do legal type pruning is using memcpy.
You can pun to character types with reinterpret_cast. Other types require something like memcpy, which works because it treats each of the two objects as a character sequence. Anything that behaves that way is sufficient, though, including std::copy<char const*, char*>.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.
'Johannes Schaub' via ISO C++ Standard - Discussion
2015-07-14 09:50:44 UTC
Permalink
Post by David Krauss
Post by Fabio Fracassi
IIRC the only way to do legal type pruning is using memcpy.
You can pun to character types with reinterpret_cast. Other types require
something like memcpy, which works because it treats each of the two
objects as a character sequence. Anything that behaves that way is
sufficient, though, including std::copy<char const*, char*>.
For this to work, aliasing rules need to be transitive or am I missing
something?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.
David Krauss
2015-07-14 09:57:06 UTC
Permalink
Post by David Krauss
You can pun to character types with reinterpret_cast. Other types require something like memcpy, which works because it treats each of the two objects as a character sequence. Anything that behaves that way is sufficient, though, including std::copy<char const*, char*>.
For this to work, aliasing rules need to be transitive or am I missing something?
Nothing is transitive. The bytes of object A are copied into storage buffer B. Now B can be treated as another object of some other POD type, provided that the object representation is valid. Upon the first access, though, that permanently becomes the type of B.

All I’m saying is, std::memcpy isn’t specially blessed. The standard only specifies what you can do with object representations. It doesn’t prescribe specific library functions.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.
'Johannes Schaub' via ISO C++ Standard - Discussion
2015-07-14 10:05:49 UTC
Permalink
Post by David Krauss
On 2015–07–14, at 5:50 PM, 'Johannes Schaub' via ISO C++ Standard -
Post by David Krauss
You can pun to character types with reinterpret_cast. Other types
require something like memcpy, which works because it treats each of the
two objects as a character sequence. Anything that behaves that way is
sufficient, though, including std::copy<char const*, char*>.
Post by David Krauss
For this to work, aliasing rules need to be transitive or am I missing something?
Nothing is transitive. The bytes of object A are copied into storage
buffer B. Now B can be treated as another object of some other POD type,
provided that the object representation is valid. Upon the first access,
though, that permanently becomes the type of B.
I guess this only works because any simple object has sizeof(T) char
objects in its obj representation.

Otherwise the copy relies entirely on the aliasing rule. And since you ask
for copying char objects, the copy function can decide to read the object T
using a type compatible with char.

To have that not be UB you would need.transitive aliasing blessing from
compatible-type to T by link over char.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.
David Krauss
2015-07-14 10:09:05 UTC
Permalink
Post by David Krauss
Nothing is transitive. The bytes of object A are copied into storage buffer B.
Just to be clear, the union member of type char[N] is not suitable as such a storage buffer. A sanitizer would be free to complain about the non-active member being used. I don’t suppose a real-world C++ compiler would break with common C programming practice, though, regardless of undefined behavior.

Really, what value is added by the union as opposed to a reinterpret_cast? At the moment I can’t recall the C rules for union punning, but as for C compatibility, you could define an inline function to perform the cast
Post by David Krauss
To have that not be UB you would need.transitive aliasing blessing from compatible-type to T by link over char.
Yes, the character types char and unsigned char are specified by the aliasing rule to be valid ways of accessing any object representation. See §3.10/10.8.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.
Andrzej Krzemieński
2015-07-14 12:00:00 UTC
Permalink
Post by Fabio Fracassi
IIRC the only way to do legal type pruning is using memcpy.
You can pun to character types with reinterpret_cast. Other types require
something like memcpy, which works because it treats each of the two
objects as a character sequence. Anything that behaves that way is
sufficient, though, including std::copy<char const*, char*>.
Does this provision work only one way (from type T to char[])? Or is it
also possible to reinterpret_cast from char[]?

It looks like a "safe" use of reinterpret_cast is to temporarily cast value
v (of type T) to some other type U, and later back to T. But is there any
legal way in the Standard to observe an (properly aligned) array of
characters as int? (because computing equality of integers appears to be
faster than performing memcmp on the same region).

For instance, is it legal to use the same union in combination with
reinterpret cast:

union String
{
char as_array[sizeof(int)];
int as_int;
};

String s;
memcpy(&s.as_array, "12345678", sizeof(int));
int & i = reinterpret_cast<int&>(s.as_array);
read(i);

?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.
David Krauss
2015-07-15 02:59:11 UTC
Permalink
Post by David Krauss
You can pun to character types with reinterpret_cast. Other types require something like memcpy, which works because it treats each of the two objects as a character sequence. Anything that behaves that way is sufficient, though, including std::copy<char const*, char*>.
Does this provision work only one way (from type T to char[])? Or is it also possible to reinterpret_cast from char[]?
Yes, sort-of. This is an ill-specified corner of the standard and a focus of the undefined behavior study group. For example, std::aligned_storage::type only works if its object representation is not committed to any type. To be sure, given the choice, it’s clearer to call operator new(N) instead of e.g. new char[N]. And all bets are off once the machine evaluates an lvalue-to-rvalue conversion or assignment expression with non-char type, or a class member access expression.
Post by David Krauss
It looks like a "safe" use of reinterpret_cast is to temporarily cast value v (of type T) to some other type U, and later back to T. But is there any legal way in the Standard to observe an (properly aligned) array of characters as int? (because computing equality of integers appears to be faster than performing memcmp on the same region).
union String
{
char as_array[sizeof(int)];
int as_int;
};
String s;
memcpy(&s.as_array, "12345678", sizeof(int));
int & i = reinterpret_cast<int&>(s.as_array);
read(i);
This looks OK. Although as_int is never accessed, it guarantees that the union is correctly aligned for int. It would be easier to use std::aligned_storage, though.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.
Continue reading on narkive:
Loading...