Discussion:
Pointer arithmetic and aliasing within a struct
(too old to reply)
Edward Rosten
2016-05-18 11:26:38 UTC
Permalink
I've been trying to wrangle the standardese on a bit of code with no
success. If you want to know why I'm trying to do this, skip to the end.
I also started a discussion here:

https://www.reddit.com/r/cpp/comments/42cbd2/
i_think_pointer_arithmetic_on_struct_members_is/

but I don't feel that a definitive conclusion has been reached.

For this I'm going off N4582. I wrote this a while back after studying
an older draft, so I may have missed something that came up in the mean
time.

Take the following code:

////////////////////////////////////////////////
struct RGB
{
float r, g, b;

float& operator[](int i)
{
return (&r + i); //Is this legal?
}
};
static_assert(sizeof(RGB) == sizeof(float*3));
///////////////////////////////////////////////

So, [class.mem] (9.2)/14 says that r, g, b must be in increasing address
order. It also says that padding might result from alignment
requirements or due to virtual functions and virtual bases. There aren't
any of the latter two. If padding for alignment was required, then I can
see how a simple float array would be able to work, so I don't believe
padding is possible. Nonetheless the static assert guarantees that there
is none if the code compiles.

[basic.lval](3.10)/10 doesn't forbid it in terms of alisasing since
we're accessing a float via a float*.

[expr.add] (5.7) might forbid it but only partly. 5.7/4 is about array
objects, but footnote 85 says that for the purposes of the rules, r is
equivalent to an array of length 1. Interestingly 5.7/4 also says that
the pointers &r and &r+1 are valid and shall not overflow, but &r+2 is
undefined. Combining the rules appears to allow the indexing to work,
but only for the first two elements. I admit this is a somewhat perverse
reading, but I can't find an error.

Alternatively:

RGB rgb;
int i;
//...
*(reinterpret_cast<char*>(&rgb.r)+i);

appears to be valid for i < sizeof(rgb), especially considering how
memcpy is implemented:

RGB a, b;
for(int i=0; i < sizeof(a); i++)
reinterpret_cast<char*>(&a)[i] = reinterpret_cast<char*>(&b)[i]

I'm not sure I could quote the exact section allowing this.
Given that, I cannot find any standardese that forbids the following:

*reinterpret_cast<float*>(reinterpret_cast<char*>(&rgb.r) +
sizeof(float)*i)



To (de) muddy the waters further, an alternative was suggested:

struct RGB
{
union
{
struct
{
float r, g, b;
};
float my_data[3];
};

float& operator[](int i)
{
return my_data[i];
}
};

Naturally alternating between r and my_data will involve accessing a
non-active member of the union: [class.union] (9.5)/1 says at most one
member of a union can be active, so if r has been written to, clearly
my_data is inactive. I'm not entirely sure how [basic.lval] (3.10.10.6),
applies, especially given the comment in footnote 85.

However, as far as I can tell the C++ standard simply has nothing to say
about that (it is neither explicitly allowed nor explicitly forbidden),
though the C standard does specify (ISO 9899:2011/6.2.6.1.7, 6.5.7 I
think though I'm rather less familiar with the C standard).

Is there anything in the standard that specifies what one should assume
when the standard is silent on a matter?

Can anyone shed any light on this? I've dug around but have hit a bit of
a wall determining if it definitely allowed or definitely disallowed by
the standard. Have I missed any relavent sections?


If you want to know why, it's for a numerics library
(https://github.com/edrosten/TooN), in order to allow (for example)
correctly defined 3-vectors to be able to be accessed with either
pointer arithmetic internally (the implementation of operator[]) or via
named elements such as r, g, b. With a little bit of fun with variadic
macros and the ## operator, it allows you to declare named element
vector of any length with any legal names which behave exactly as
vectors of the same size declared through the more usual means.


Regards

-Ed
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2016-05-18 13:56:39 UTC
Permalink
Post by Edward Rosten
I've been trying to wrangle the standardese on a bit of code with no
success. If you want to know why I'm trying to do this, skip to the end.
https://www.reddit.com/r/cpp/comments/42cbd2/
i_think_pointer_arithmetic_on_struct_members_is/
but I don't feel that a definitive conclusion has been reached.
For this I'm going off N4582. I wrote this a while back after studying
an older draft, so I may have missed something that came up in the mean
time.
////////////////////////////////////////////////
struct RGB
{
float r, g, b;
float& operator[](int i)
{
return (&r + i); //Is this legal?
}
};
static_assert(sizeof(RGB) == sizeof(float*3));
///////////////////////////////////////////////
So, [class.mem] (9.2)/14 says that r, g, b must be in increasing address
order. It also says that padding might result from alignment
requirements or due to virtual functions and virtual bases. There aren't
any of the latter two. If padding for alignment was required, then I can
see how a simple float array would be able to work, so I don't believe
padding is possible.
You seem to forget how a specification works. If it says that something
happens, then it must be so. If it says that something cannot happen, then
it must not be so.

The specification does not say that address allocation will be contiguous
unless otherwise noted. The specification does not say that the *only*
reasons why it would *not* be contiguous are those that it states.
Therefore, you cannot assume anything about the contiguity of adjacent
members.

Nonetheless the static assert guarantees that there
Post by Edward Rosten
is none if the code compiles.
Irrelevant. Even if you know for a fact that two objects are allocated
adjacent to one another, you cannot use pointer arithmetic to get from one
to the other without invoking undefined behavior. Not unless those two
objects are sub-objects of an [i]array[/i].
Post by Edward Rosten
[basic.lval](3.10)/10 doesn't forbid it in terms of alisasing since
we're accessing a float via a float*.
[expr.add] (5.7) might forbid it but only partly. 5.7/4 is about array
objects, but footnote 85 says that for the purposes of the rules, r is
equivalent to an array of length 1. Interestingly 5.7/4 also says that
the pointers &r and &r+1 are valid and shall not overflow, but &r+2 is
undefined. Combining the rules appears to allow the indexing to work,
but only for the first two elements.
No, it allows you to get a pointer to &r+1. It does not allow you to
*dereference* that pointer and still get defined behavior. It is saying
that you can treat any object as a one-element array, and therefore you can
get a pointer to the end of that one-element array.

But just like any end pointer/iterator, you aren't allowed to dereference
it, since it points to past the end of the array.

I admit this is a somewhat perverse
Post by Edward Rosten
reading, but I can't find an error.
RGB rgb;
int i;
//...
*(reinterpret_cast<char*>(&rgb.r)+i);
appears to be valid for i < sizeof(rgb), especially considering how
RGB a, b;
for(int i=0; i < sizeof(a); i++)
reinterpret_cast<char*>(&a)[i] = reinterpret_cast<char*>(&b)[i]
I'm not sure I could quote the exact section allowing this.
This is allowed per strict-aliasing rules. Or rather, per the rules
permitting aliasing with `char*` and `unsigned char*`. (3.10, p10.8).

The copy is permitted per the rules regarding copying bytes with trivially
copyable types (3.9, p2-4).
Post by Edward Rosten
*reinterpret_cast<float*>(reinterpret_cast<char*>(&rgb.r) +
sizeof(float)*i)
Converting a pointer to the top of a structure to a `char*`, then
offsetting it by an offset, then casting it back to a pointer to the type
of a member, that will will work, so long as the offset is the exact
`offsetof` a member of that type.

However, your code is dubious because there is no guarantee that
`sizeof(float)*i` will be the exact `offsetof` a member.

What you could do is use macro gymnastics to generate a table that can
convert from `i` to the `offsetof` a member, thus allowing you to access a
member by index rather than by name.
Post by Edward Rosten
struct RGB
{
union
{
struct
{
float r, g, b;
};
float my_data[3];
};
float& operator[](int i)
{
return my_data[i];
}
};
Naturally alternating between r and my_data will involve accessing a
non-active member of the union: [class.union] (9.5)/1 says at most one
member of a union can be active, so if r has been written to, clearly
my_data is inactive. I'm not entirely sure how [basic.lval] (3.10.10.6),
applies, especially given the comment in footnote 85.
However, as far as I can tell the C++ standard simply has nothing to say
about that (it is neither explicitly allowed nor explicitly forbidden),
though the C standard does specify (ISO 9899:2011/6.2.6.1.7, 6.5.7 I
think though I'm rather less familiar with the C standard).
It *is* explicitly forbidden. You just said how: you are accessing a value
through a non-active union member. The common-initial-sequence rules don't
apply to you, because a struct is never layout compatible with an array.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Edward Rosten
2016-05-19 12:33:45 UTC
Permalink
Post by Nicol Bolas
You seem to forget how a specification works. If it says that something
happens, then it must be so. If it says that something cannot happen, then
it must not be so.
The standard states:

Implementation alignment requirements might cause two adjacent members not
to be allocated immediately after each other; so might requirements for
space for managing virtual functions (10.3) and virtual base classes (10.1).

To me that is open to interpretation as to whether that's an exhaustive
list or not. If it's not. then it's probably worth removing that wording
from the clause, or amending it to state that it's a non exhaustive list.

Nonetheless that's also irrelevant to the rest of the post, because of the
static assert. If there's padding then the code won't compile, so there's
no need to debate how it might behave (clearly it would be UB).
Post by Nicol Bolas
Irrelevant. Even if you know for a fact that two objects are allocated
adjacent to one another, you cannot use pointer arithmetic to get from one
to the other without invoking undefined behavior. Not unless those two
objects are sub-objects of an [i]array[/i].
Where is that actually stated in the standard? My argument is there seems
to be enough of a gap here that what I have written is not explicitly
forbidden.
Post by Nicol Bolas
Post by Edward Rosten
*reinterpret_cast<float*>(reinterpret_cast<char*>(&rgb.r) +
sizeof(float)*i)
Converting a pointer to the top of a structure to a `char*`, then
offsetting it by an offset, then casting it back to a pointer to the type
of a member, that will will work, so long as the offset is the exact
`offsetof` a member of that type.
That's what I figured, or at least I couldn't find anything forbidding it.
Post by Nicol Bolas
However, your code is dubious because there is no guarantee that
`sizeof(float)*i` will be the exact `offsetof` a member.
Recall the static assert. It can be arranged such that the code will only
compile if there is no padding.
Post by Nicol Bolas
What you could do is use macro gymnastics to generate a table that can
convert from `i` to the `offsetof` a member, thus allowing you to access a
member by index rather than by name.
Indeed. However as far as I can see, it's only the numeric value of
offsetof that matters. If we have some oracle that gives us that number,
then we don't need to use the offsetof macro.
Post by Nicol Bolas
It *is* explicitly forbidden. You just said how: you are accessing a
value through a non-active union member. The common-initial-sequence rules
don't apply to you, because a struct is never layout compatible with an
array.
*Where* is it explicitly forbidden? As far as I can see there is no wording
actually forbidding it: at no point does it say that accessing an inactive
member of a union is undefined behaviour.


-Ed
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Edward Catmur
2016-05-19 13:39:15 UTC
Permalink
Post by Edward Rosten
Post by Nicol Bolas
Irrelevant. Even if you know for a fact that two objects are allocated
adjacent to one another, you cannot use pointer arithmetic to get from one
to the other without invoking undefined behavior. Not unless those two
objects are sub-objects of an [i]array[/i].
Where is that actually stated in the standard? My argument is there seems
to be enough of a gap here that what I have written is not explicitly
forbidden.
[expr.add]/4.
Post by Edward Rosten
Post by Nicol Bolas
Post by Edward Rosten
*reinterpret_cast<float*>(reinterpret_cast<char*>(&rgb.r) +
sizeof(float)*i)
Converting a pointer to the top of a structure to a `char*`, then
offsetting it by an offset, then casting it back to a pointer to the type
of a member, that will will work, so long as the offset is the exact
`offsetof` a member of that type.
That's what I figured, or at least I couldn't find anything forbidding it.
Although you have to be extremely careful in how you manufacture a pointer
to the correct location, once you have it you can use it per
[basic.compound]/3.
Post by Edward Rosten
What you could do is use macro gymnastics to generate a table that can
Post by Nicol Bolas
convert from `i` to the `offsetof` a member, thus allowing you to access a
member by index rather than by name.
Indeed. However as far as I can see, it's only the numeric value of
offsetof that matters. If we have some oracle that gives us that number,
then we don't need to use the offsetof macro.
You could also check the offsets with another static_assert.
Post by Edward Rosten
It *is* explicitly forbidden. You just said how: you are accessing a
Post by Nicol Bolas
value through a non-active union member. The common-initial-sequence rules
don't apply to you, because a struct is never layout compatible with an
array.
*Where* is it explicitly forbidden? As far as I can see there is no
wording actually forbidding it: at no point does it say that accessing an
inactive member of a union is undefined behaviour.
Access is not a problem for primitives; that's how we switch active member.
Undefined behavior (again, for primitives) occurs when you perform
lvalue-to-rvalue conversion; this is [basic.life]/7.1.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Edward Rosten
2016-05-19 14:30:19 UTC
Permalink
On Thursday, 19 May 2016 14:39:15 UTC+1, Edward Catmur wrote:

Thanks for your reply.
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
Irrelevant. Even if you know for a fact that two objects are allocated
adjacent to one another, you cannot use pointer arithmetic to get from one
to the other without invoking undefined behavior. Not unless those two
objects are sub-objects of an [i]array[/i].
Where is that actually stated in the standard? My argument is there seems
to be enough of a gap here that what I have written is not explicitly
forbidden.
[expr.add]/4.
I don't see how: that clause forbids making a pointer to >= 2 beyond the
end of an array, but doesn't say what happens when you try to dereference
one beyond the end of an array (such a point is explicitly allowed). The
clause doesn't mention dereferencing.
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
Post by Edward Rosten
*reinterpret_cast<float*>(reinterpret_cast<char*>(&rgb.r) +
sizeof(float)*i)
Converting a pointer to the top of a structure to a `char*`, then
offsetting it by an offset, then casting it back to a pointer to the type
of a member, that will will work, so long as the offset is the exact
`offsetof` a member of that type.
That's what I figured, or at least I couldn't find anything forbidding it.
Although you have to be extremely careful in how you manufacture a pointer
to the correct location, once you have it you can use it per
[basic.compound]/3.
OK that's interesting! Thanks for the link, I wasn't familiar with that
section.

So my interpretation is now this:

1. *if* you can legally get a pointer to X, then the pointer is valid.

2. Getting a pointer to 1 beyond the end of an array is a legal way of
getting a pointer value (going 2 beyond might overflow, so it's always
counted as UB).

3. The pointer is a valid value since it represents an address in memory
and it points to an object of the correct type (there's no padding), so
it's dereferencable(?). We did not invoke UB by doing anything illegal to
pointer values.

4. If you apply 1-3 recursively, you can say &rgb.r+1 is a valid pointer to
g. If (&rgb.r+1) is a valid pointer to g then (&rgb.r+1) + 1 is a valid
pointer to b. Because in that case we're simply going 1 beyond the end of
the single element "g" array, which is explicitly allowed (if my reasoning
is correct). One possible interpretation is that &rgb.r+2 is UB, but
&rgb+1+1 is well defined.
Post by Edward Catmur
What you could do is use macro gymnastics to generate a table that can
Post by Edward Rosten
Post by Nicol Bolas
convert from `i` to the `offsetof` a member, thus allowing you to access a
member by index rather than by name.
Indeed. However as far as I can see, it's only the numeric value of
offsetof that matters. If we have some oracle that gives us that number,
then we don't need to use the offsetof macro.
You could also check the offsets with another static_assert.
Is that necessary? Given all the constraints imposed with the
static_assert, I can't see a way for it to be possible that in the struct
above offsetof g could be anything other than sizeof(float), and likewise
2*sizeof(float) for offsetof b.
Post by Edward Catmur
Post by Edward Rosten
It *is* explicitly forbidden. You just said how: you are accessing a
Post by Nicol Bolas
value through a non-active union member. The common-initial-sequence rules
don't apply to you, because a struct is never layout compatible with an
array.
*Where* is it explicitly forbidden? As far as I can see there is no
wording actually forbidding it: at no point does it say that accessing an
inactive member of a union is undefined behaviour.
Access is not a problem for primitives; that's how we switch active
member. Undefined behavior (again, for primitives) occurs when you perform
lvalue-to-rvalue conversion; this is [basic.life]/7.1.
Ah interesting, though if I read correctly the standard never explicitly
ties "active member" to the lifetime of an object. These are all floats, so
[basic.life]/1.3 doesn't apply and as per 1.4 the storage hasn't been
released. As for "reuse" in 1.4, the standard seems a little vague on what
the precise meaning of that is.


-Ed
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2016-05-19 16:43:58 UTC
Permalink
Post by Edward Catmur
[expr.add]/4.
I don't see how: that clause forbids making a pointer to >= 2 beyond the
end of an array, but doesn't say what happens when you try to dereference
one beyond the end of an array (such a point is explicitly allowed). The
clause doesn't mention dereferencing.
Sorry, my mistake.
Post by Edward Catmur
Although you have to be extremely careful in how you manufacture a pointer
Post by Edward Catmur
to the correct location, once you have it you can use it per
[basic.compound]/3.
OK that's interesting! Thanks for the link, I wasn't familiar with that
section.
1. *if* you can legally get a pointer to X, then the pointer is valid.
2. Getting a pointer to 1 beyond the end of an array is a legal way of
getting a pointer value (going 2 beyond might overflow, so it's always
counted as UB).
3. The pointer is a valid value since it represents an address in memory
and it points to an object of the correct type (there's no padding), so
it's dereferencable(?). We did not invoke UB by doing anything illegal to
pointer values.
4. If you apply 1-3 recursively, you can say &rgb.r+1 is a valid pointer
to g. If (&rgb.r+1) is a valid pointer to g then (&rgb.r+1) + 1 is a valid
pointer to b. Because in that case we're simply going 1 beyond the end of
the single element "g" array, which is explicitly allowed (if my reasoning
is correct). One possible interpretation is that &rgb.r+2 is UB, but
&rgb+1+1 is well defined.
I would have to agree, though I would urge caution, especially with (4) -
you don't want to get in a fight with an over-optimizing compiler, even if
you feel your interpretation is valid.
Post by Edward Catmur
You could also check the offsets with another static_assert.
Is that necessary? Given all the constraints imposed with the
static_assert, I can't see a way for it to be possible that in the struct
above offsetof g could be anything other than sizeof(float), and likewise
2*sizeof(float) for offsetof b.
Probably not. It wouldn't hurt, though.
Post by Edward Catmur
It *is* explicitly forbidden. You just said how: you are accessing a
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
value through a non-active union member. The common-initial-sequence rules
don't apply to you, because a struct is never layout compatible with an
array.
*Where* is it explicitly forbidden? As far as I can see there is no
wording actually forbidding it: at no point does it say that accessing an
inactive member of a union is undefined behaviour.
Access is not a problem for primitives; that's how we switch active
member. Undefined behavior (again, for primitives) occurs when you perform
lvalue-to-rvalue conversion; this is [basic.life]/7.1.
Ah interesting, though if I read correctly the standard never explicitly
ties "active member" to the lifetime of an object. These are all floats, so
[basic.life]/1.3 doesn't apply and as per 1.4 the storage hasn't been
released. As for "reuse" in 1.4, the standard seems a little vague on what
the precise meaning of that is.
"Reuse" is the construction of another object in that place - either by
assignment for primitives, placement new for class types, or memcpy for
trivially copyable types. I think [basic.life]/7.2 might be relevant here;
you're aliasing a class type so accessing its NSDMs is verboten,
irrespective that the class type is trivial. An interesting if tangential
question would be whether it's allowed to alias an int[2][3] with an
int[3][2] in a union.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2016-05-19 21:18:53 UTC
Permalink
Post by Edward Rosten
Thanks for your reply.
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
Irrelevant. Even if you know for a fact that two objects are allocated
adjacent to one another, you cannot use pointer arithmetic to get from one
to the other without invoking undefined behavior. Not unless those two
objects are sub-objects of an [i]array[/i].
Where is that actually stated in the standard? My argument is there
seems to be enough of a gap here that what I have written is not explicitly
forbidden.
[expr.add]/4.
I don't see how: that clause forbids making a pointer to >= 2 beyond the
end of an array, but doesn't say what happens when you try to dereference
one beyond the end of an array (such a point is explicitly allowed). The
clause doesn't mention dereferencing.
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
Post by Edward Rosten
*reinterpret_cast<float*>(reinterpret_cast<char*>(&rgb.r) +
sizeof(float)*i)
Converting a pointer to the top of a structure to a `char*`, then
offsetting it by an offset, then casting it back to a pointer to the type
of a member, that will will work, so long as the offset is the exact
`offsetof` a member of that type.
That's what I figured, or at least I couldn't find anything forbidding it.
Although you have to be extremely careful in how you manufacture a
pointer to the correct location, once you have it you can use it per
[basic.compound]/3.
OK that's interesting! Thanks for the link, I wasn't familiar with that
section.
Beware, that wording is defective, and we're in the process of fixing it.
Here's the draft wording that we're currently working on in CWG:
https://htmlpreview.github.io/?https://raw.github.com/zygoloid/wg21papers/blob/master/wip/d0137r1.html
Post by Edward Rosten
1. *if* you can legally get a pointer to X, then the pointer is valid.
That will no longer be true once we fix the wording.
Post by Edward Rosten
2. Getting a pointer to 1 beyond the end of an array is a legal way of
getting a pointer value (going 2 beyond might overflow, so it's always
counted as UB).
3. The pointer is a valid value since it represents an address in memory
and it points to an object of the correct type (there's no padding), so
it's dereferencable(?). We did not invoke UB by doing anything illegal to
pointer values.
4. If you apply 1-3 recursively, you can say &rgb.r+1 is a valid pointer
to g. If (&rgb.r+1) is a valid pointer to g then (&rgb.r+1) + 1 is a valid
pointer to b. Because in that case we're simply going 1 beyond the end of
the single element "g" array, which is explicitly allowed (if my reasoning
is correct). One possible interpretation is that &rgb.r+2 is UB, but
&rgb+1+1 is well defined.
Post by Edward Catmur
What you could do is use macro gymnastics to generate a table that can
Post by Edward Rosten
Post by Nicol Bolas
convert from `i` to the `offsetof` a member, thus allowing you to access a
member by index rather than by name.
Indeed. However as far as I can see, it's only the numeric value of
offsetof that matters. If we have some oracle that gives us that number,
then we don't need to use the offsetof macro.
You could also check the offsets with another static_assert.
Is that necessary? Given all the constraints imposed with the
static_assert, I can't see a way for it to be possible that in the struct
above offsetof g could be anything other than sizeof(float), and likewise
2*sizeof(float) for offsetof b.
Post by Edward Catmur
Post by Edward Rosten
It *is* explicitly forbidden. You just said how: you are accessing a
Post by Nicol Bolas
value through a non-active union member. The common-initial-sequence rules
don't apply to you, because a struct is never layout compatible with an
array.
*Where* is it explicitly forbidden? As far as I can see there is no
wording actually forbidding it: at no point does it say that accessing an
inactive member of a union is undefined behaviour.
Access is not a problem for primitives; that's how we switch active
member. Undefined behavior (again, for primitives) occurs when you perform
lvalue-to-rvalue conversion; this is [basic.life]/7.1.
Ah interesting, though if I read correctly the standard never explicitly
ties "active member" to the lifetime of an object. These are all floats, so
[basic.life]/1.3 doesn't apply and as per 1.4 the storage hasn't been
released. As for "reuse" in 1.4, the standard seems a little vague on what
the precise meaning of that is.
-Ed
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Edward Rosten
2016-05-20 07:54:31 UTC
Permalink
Although you have to be extremely careful in how you manufacture a pointer to the correct location, once you have it you can use it per [basic.compound]/3.
OK that's interesting! Thanks for the link, I wasn't familiar with that section.
Beware, that wording is defective, and we're in the process of fixing it. Here's the draft wording that we're currently working on in CWG: https://htmlpreview.github.io/?https://raw.github.com/zygoloid/wg21papers/blob/master/wip/d0137r1.html
I see: you've effectively inverted the meaning by adding a "not" in
there. I can see the rationale because the current wording is poor and
leads to some entertaining conclusions about the validity if 1+1
versus 2.

That said, I do wonder if this is the correct fix at least in
isolation. I have seen a fair amount of code in the wild which assumes
that arrays of structs with one member type T (and no padding) can be
accessed via a T*. I wouldn't call that non-conformant since I'd say
the standard isn't exactly crystal clear on the matter currently.

It still allows of course *(float*)(sizeof(float)*index +
(char*)&some_instance_of_rgb), provided you don't escape the chunk of
storage that some_instance happens to sit in.

I'm now also less sure about the invalidity of
((float*)(char*)&instance_of_rgb)[index].

I also noticed that you've explicitly disallowed aliasing via a union
in [basic.life]/1, though I'd probably switch to "active" from
"initialized" in that paragraph for consistency with the wording on
union. If you don't switch, for example if you initialize one member
of a union (that member is active) and then initialize another member,
if I'm really pedantic (I am), then I could argue that you've
initialized both.

However, before you change the wording, could you explain the
rationale for the change? Obviously the wording is unclear and you
want to fix that however with this fix, you're explicitly breaking
compatibility with C code, so I could see this potentially breaking
code which makes use of C based headers. I think the change also goes
against common practice since many compilers take the C language
interpretation in the absence of a definitive word from the C++
standard and also because the C and C++ compilers tend to share a lot
of code.

In addition, it still leaves open a hole if you memcpy into the union
via a pointer to the union not a pointer to one of the member. If the
wording does not allow for aliasing or multiple active/initialized
members, then it's ambiguous which member you could legally use as an
rvalue and which you couldn't after a memcpy.

Personally, I'd advocate going the other way and making aliasing
allowed, partly because I can't think of a sensible way of plugging
the memcpy hole. For types with vacuous initialization (I love that
term!) it could be made explicitly defined behaviour if two objects of
the same type exist at the same offset. For everything else, e.g. a
float aliasing a uint32, the behaviour could be implementation
defined.


-Ed
1. *if* you can legally get a pointer to X, then the pointer is valid.
That will no longer be true once we fix the wording.
2. Getting a pointer to 1 beyond the end of an array is a legal way of getting a pointer value (going 2 beyond might overflow, so it's always counted as UB).
3. The pointer is a valid value since it represents an address in memory and it points to an object of the correct type (there's no padding), so it's dereferencable(?). We did not invoke UB by doing anything illegal to pointer values.
4. If you apply 1-3 recursively, you can say &rgb.r+1 is a valid pointer to g. If (&rgb.r+1) is a valid pointer to g then (&rgb.r+1) + 1 is a valid pointer to b. Because in that case we're simply going 1 beyond the end of the single element "g" array, which is explicitly allowed (if my reasoning is correct). One possible interpretation is that &rgb.r+2 is UB, but &rgb+1+1 is well defined.
What you could do is use macro gymnastics to generate a table that can convert from `i` to the `offsetof` a member, thus allowing you to access a member by index rather than by name.
Indeed. However as far as I can see, it's only the numeric value of offsetof that matters. If we have some oracle that gives us that number, then we don't need to use the offsetof macro.
You could also check the offsets with another static_assert.
Is that necessary? Given all the constraints imposed with the static_assert, I can't see a way for it to be possible that in the struct above offsetof g could be anything other than sizeof(float), and likewise 2*sizeof(float) for offsetof b.
It is explicitly forbidden. You just said how: you are accessing a value through a non-active union member. The common-initial-sequence rules don't apply to you, because a struct is never layout compatible with an array.
*Where* is it explicitly forbidden? As far as I can see there is no wording actually forbidding it: at no point does it say that accessing an inactive member of a union is undefined behaviour.
Access is not a problem for primitives; that's how we switch active member. Undefined behavior (again, for primitives) occurs when you perform lvalue-to-rvalue conversion; this is [basic.life]/7.1.
Ah interesting, though if I read correctly the standard never explicitly ties "active member" to the lifetime of an object. These are all floats, so [basic.life]/1.3 doesn't apply and as per 1.4 the storage hasn't been released. As for "reuse" in 1.4, the standard seems a little vague on what the precise meaning of that is.
-Ed
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2016-05-20 09:55:13 UTC
Permalink
Post by Edward Catmur
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Although you have to be extremely careful in how you manufacture a
pointer to the correct location, once you have it you can use it per
[basic.compound]/3.
Post by Richard Smith
Post by Edward Rosten
OK that's interesting! Thanks for the link, I wasn't familiar with that
section.
Post by Richard Smith
Beware, that wording is defective, and we're in the process of fixing
https://htmlpreview.github.io/?https://raw.github.com/zygoloid/wg21papers/blob/master/wip/d0137r1.html
I see: you've effectively inverted the meaning by adding a "not" in
there. I can see the rationale because the current wording is poor and
leads to some entertaining conclusions about the validity if 1+1
versus 2.
That said, I do wonder if this is the correct fix at least in
isolation. I have seen a fair amount of code in the wild which assumes
that arrays of structs with one member type T (and no padding) can be
accessed via a T*. I wouldn't call that non-conformant since I'd say
the standard isn't exactly crystal clear on the matter currently.
It still allows of course *(float*)(sizeof(float)*index +
(char*)&some_instance_of_rgb), provided you don't escape the chunk of
storage that some_instance happens to sit in.
I'm now also less sure about the invalidity of
((float*)(char*)&instance_of_rgb)[index].
My understanding is that you'd use std::launder (see "Library wording").
Post by Edward Catmur
In addition, it still leaves open a hole if you memcpy into the union
via a pointer to the union not a pointer to one of the member. If the
wording does not allow for aliasing or multiple active/initialized
members, then it's ambiguous which member you could legally use as an
rvalue and which you couldn't after a memcpy.
It's the active member of the union you performed the memcpy from, I would
hope. If on the other hand you're performing a memcpy from an object of
type one of the data members, I don't see that that would change the active
member.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Edward Rosten
2016-05-20 10:16:35 UTC
Permalink
Post by Edward Rosten
I'm now also less sure about the invalidity of
Post by Edward Rosten
((float*)(char*)&instance_of_rgb)[index].
My understanding is that you'd use std::launder (see "Library wording").
That's good: it appears to provide a standard mechanism for what people
sometimes want to do. I also didn't know about that until today, so thanks!
However, I'm still not sure either way if the expression I wrote is
undefined or not, regardless of the good sense of using it.
Post by Edward Rosten
It's the active member of the union you performed the memcpy from, I would
hope. If on the other hand you're performing a memcpy from an object of
type one of the data members, I don't see that that would change the active
member.
Take for example:

union {
uint32_t a;
float b;
} u;

memset(&u, 0, sizeof(u));

or a memcpy equivalent to that.


-Ed
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2016-05-20 10:57:52 UTC
Permalink
Post by Edward Rosten
Post by Edward Rosten
I'm now also less sure about the invalidity of
Post by Edward Rosten
((float*)(char*)&instance_of_rgb)[index].
My understanding is that you'd use std::launder (see "Library wording").
That's good: it appears to provide a standard mechanism for what people
sometimes want to do. I also didn't know about that until today, so thanks!
However, I'm still not sure either way if the expression I wrote is
undefined or not, regardless of the good sense of using it.
I think it's undefined, because (float*)(char*)&instance_of_rgb does not
point to an element of an array of type float (nor to an object of type
float). You'd have to write
std::launder(std::launder(std::launder((float*)(char*)&instance_of_rgb) +
1) [...] + 1), or more sensibly
std::launder((float*)(((char*)&instance_of_rgb) + (index * sizeof(float))).
Actually, you might need to launder the char* preparatory to performing
pointer arithmetic on it; I'm not sure.
Post by Edward Rosten
It's the active member of the union you performed the memcpy from, I would
Post by Edward Rosten
hope. If on the other hand you're performing a memcpy from an object of
type one of the data members, I don't see that that would change the active
member.
union {
uint32_t a;
float b;
} u;
memset(&u, 0, sizeof(u));
or a memcpy equivalent to that.
u.a is active before the memset, and continues to be active afterward, so
you can read u.a but not u.b.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Edward Rosten
2016-05-20 11:26:00 UTC
Permalink
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Edward Rosten
Post by Edward Rosten
I'm now also less sure about the invalidity of
Post by Edward Rosten
((float*)(char*)&instance_of_rgb)[index].
My understanding is that you'd use std::launder (see "Library wording").
That's good: it appears to provide a standard mechanism for what people
sometimes want to do. I also didn't know about that until today, so thanks!
However, I'm still not sure either way if the expression I wrote is
undefined or not, regardless of the good sense of using it.
I think it's undefined, because (float*)(char*)&instance_of_rgb does not
point to an element of an array of type float (nor to an object of type
float).
I thought it was allowed to cast from pointer-to-struct to
pointer-to-first-member provided it's standard-layout (?). I was sure that
was a rule, but I've not checked in detail so I'll have to go read the
relevant section.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
You'd have to write
std::launder(std::launder(std::launder((float*)(char*)&instance_of_rgb) +
1) [...] + 1), or more sensibly
std::launder((float*)(((char*)&instance_of_rgb) + (index * sizeof(float))).
Actually, you might need to launder the char* preparatory to performing
pointer arithmetic on it; I'm not sure.
Could you do:

float* f = (float*)std::launder((char*)&instance_of_rgb); //Not sure if
the laundering is correct here
f[1]; //Does this work if the previous line is correct?
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Edward Rosten
It's the active member of the union you performed the memcpy from, I
Post by Edward Rosten
would hope. If on the other hand you're performing a memcpy from an object
of type one of the data members, I don't see that that would change the
active member.
union {
uint32_t a;
float b;
} u;
memset(&u, 0, sizeof(u));
or a memcpy equivalent to that.
u.a is active before the memset, and continues to be active afterward, so
you can read u.a but not u.b.
I'm not sure I agree: I can't find any wording that indicates that the
first member's lifetime has begun or is active. But on the subjects of
unions...

A thought occurred with regards to the common initial sequence. Consider:

struct A
{
float f;
};

struct B
{
float f[1];
};

union U
{
A a;
B b;
} u;

I think that u.a.f and u.b.f[0] can alias as they share a common initial
sequence since it could be intrepreted that scalars are equivalent to
arrays of length 1 (footnote 85 in [expr.add]).


Given that the new wording on padding possibly precludes padding in the
following case:

class D
{
float f, g;
};

since it only lists virtual stuff and alignment as reasons for padding,
then it might be worth tweaking the definition of "layout compatible" to
cover aliasing D with float[2] in a union (and tightening up the wording
about padding).

-Ed
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2016-05-20 11:51:50 UTC
Permalink
Post by Edward Rosten
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Edward Rosten
Post by Edward Rosten
I'm now also less sure about the invalidity of
Post by Edward Rosten
((float*)(char*)&instance_of_rgb)[index].
My understanding is that you'd use std::launder (see "Library wording").
That's good: it appears to provide a standard mechanism for what people
sometimes want to do. I also didn't know about that until today, so thanks!
However, I'm still not sure either way if the expression I wrote is
undefined or not, regardless of the good sense of using it.
I think it's undefined, because (float*)(char*)&instance_of_rgb does not
point to an element of an array of type float (nor to an object of type
float).
I thought it was allowed to cast from pointer-to-struct to
pointer-to-first-member provided it's standard-layout (?). I was sure that
was a rule, but I've not checked in detail so I'll have to go read the
relevant section.
Oops, you're absolutely right. (This is "pointer interconvertibility" in
the new language.) But that still only allows you to add 1 to get the
past-the-end pointer for the first float NSDM.
Post by Edward Rosten
You'd have to write
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
std::launder(std::launder(std::launder((float*)(char*)&instance_of_rgb) +
1) [...] + 1), or more sensibly
std::launder((float*)(((char*)&instance_of_rgb) + (index * sizeof(float))).
Actually, you might need to launder the char* preparatory to performing
pointer arithmetic on it; I'm not sure.
float* f = (float*)std::launder((char*)&instance_of_rgb); //Not sure if
the laundering is correct here
f[1]; //Does this work if the previous line is correct?
Nope, because f + 1 does not point to an object of type float. You need
*std::launder(f + 1).

It's the active member of the union you performed the memcpy from, I would
Post by Edward Rosten
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Edward Rosten
Post by Edward Rosten
hope. If on the other hand you're performing a memcpy from an object of
type one of the data members, I don't see that that would change the active
member.
union {
uint32_t a;
float b;
} u;
memset(&u, 0, sizeof(u));
or a memcpy equivalent to that.
u.a is active before the memset, and continues to be active afterward,
so you can read u.a but not u.b.
I'm not sure I agree: I can't find any wording that indicates that the
first member's lifetime has begun or is active. But on the subjects of
unions...
I assumed u was being zero-initialized ([dcl.init]/6); that would depend on
its scope. If it is default-initialized there is no active member, so no
member can be read.
Post by Edward Rosten
struct A
{
float f;
};
struct B
{
float f[1];
};
union U
{
A a;
B b;
} u;
I think that u.a.f and u.b.f[0] can alias as they share a common initial
sequence since it could be intrepreted that scalars are equivalent to
arrays of length 1 (footnote 85 in [expr.add]).
The single-element array rule (introduced in [expr.unary.op]/3) holds only
for pointer arithmetic and comparison.
Post by Edward Rosten
Given that the new wording on padding possibly precludes padding in the
class D
{
float f, g;
};
since it only lists virtual stuff and alignment as reasons for padding,
then it might be worth tweaking the definition of "layout compatible" to
cover aliasing D with float[2] in a union (and tightening up the wording
about padding).
I think it's probably on purpose that the layout compatibility rules do not
mention arrays. Precluding aliasing between arrays of different sizes and
between arrays and scalars is worth a lot for performance.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Edward Rosten
2016-05-20 15:48:01 UTC
Permalink
On 20 May 2016 at 12:51, 'Edward Catmur' via ISO C++ Standard - Discussion
u.a is active before the memset, and continues to be active afterward, so you can read u.a but not u.b.
I'm not sure I agree: I can't find any wording that indicates that the first member's lifetime has begun or is active. But on the subjects of unions...
I assumed u was being zero-initialized ([dcl.init]/6); that would depend on its scope. If it is default-initialized there is no active member, so no member can be read.
Yes, I was thinking of default initializing it, or specifically
something like the following code:

union U
{
uint32_t a;
float f;
} u1, u2;

//u1, u2 are default initialized here, so AFAICT neither member is active.

u1.a = 0xdeadbeef;
memcpy(&u2, &u1, sizeof U);

At that point it appears that it ought to be legal to read u2.a, but
with the new wording about lifetime, it isn't because the life of u2.a
has not begun. In fact it gets stranger, according to the wording, it
would appear that:

U u3, u4; //as above
u3.a = 0xdeadbeef;
u4.f = 1.0;
memcpy(&u4, &u3, sizeof U);

At this point, u4.f is still alive/active, and u4.a isn't, so it's UB
to read u4.a, and something (implementation defined?) to read u4.f.

To me, that doesn't make much sense, behaviour wise, so I think it's
either an unintended consequence of the current proposed wording (in
d0137) or I've misread it!

I suspect that the change in wording will break any code that dumps
trivially copyable things containing unions to a file and then reads
them back. I believe that's legal behaviour (or neither forbidden nor
allowed in the current standard). With the new code, the union member
would have to be assigned to or placement new'd before the read in
order to activate it/begin its lifetime for the code to not be UB.


I'm going to go out on a limb and claim that there are too many edge
cases with unions to outright ban aliasing via them without
introducing some very strange unintended consequences.
I think that u.a.f and u.b.f[0] can alias as they share a common initial sequence since it could be intrepreted that scalars are equivalent to arrays of length 1 (footnote 85 in [expr.add]).
The single-element array rule (introduced in [expr.unary.op]/3) holds only for pointer arithmetic and comparison.
I was thinking of expr.add/4 footnote 84, but yes, I think your
interpretation is correct and mine was not.
class D
{
float f, g;
};
since it only lists virtual stuff and alignment as reasons for padding, then it might be worth tweaking the definition of "layout compatible" to cover aliasing D with float[2] in a union (and tightening up the wording about padding).
I think it's probably on purpose that the layout compatibility rules do not mention arrays. Precluding aliasing between arrays of different sizes and between arrays and scalars is worth a lot for performance.
I think my brain flamed out for a minute there. Yes, that's absolutely
right. I meant to say "common initial sequence", not "layout
compatible".

-Ed
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2016-05-20 16:12:26 UTC
Permalink
Post by Edward Rosten
On 20 May 2016 at 12:51, 'Edward Catmur' via ISO C++ Standard - Discussion
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Edward Rosten
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
u.a is active before the memset, and continues to be active
afterward, so you can read u.a but not u.b.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Edward Rosten
I'm not sure I agree: I can't find any wording that indicates that the
first member's lifetime has begun or is active. But on the subjects of
unions...
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
I assumed u was being zero-initialized ([dcl.init]/6); that would depend
on its scope. If it is default-initialized there is no active member, so no
member can be read.
Yes, I was thinking of default initializing it, or specifically
union U
{
uint32_t a;
float f;
} u1, u2;
//u1, u2 are default initialized here, so AFAICT neither member is active.
u1.a = 0xdeadbeef;
memcpy(&u2, &u1, sizeof U);
At that point it appears that it ought to be legal to read u2.a, but
with the new wording about lifetime, it isn't because the life of u2.a
has not begun. In fact it gets stranger, according to the wording, it
U u3, u4; //as above
u3.a = 0xdeadbeef;
u4.f = 1.0;
memcpy(&u4, &u3, sizeof U);
At this point, u4.f is still alive/active, and u4.a isn't, so it's UB
to read u4.a, and something (implementation defined?) to read u4.f.
To me, that doesn't make much sense, behaviour wise, so I think it's
either an unintended consequence of the current proposed wording (in
d0137) or I've misread it!
I suspect that the change in wording will break any code that dumps
trivially copyable things containing unions to a file and then reads
them back. I believe that's legal behaviour (or neither forbidden nor
allowed in the current standard). With the new code, the union member
would have to be assigned to or placement new'd before the read in
order to activate it/begin its lifetime for the code to not be UB.
Yes, I agree with your analysis of d0137r1. Fortunately it's still a draft
at this point.

Richard, would you agree that this is an unintended consequence of the
current proposed wording? I think it could be patched by appending to
[basic.types]/2 something like "and, for each subobject of union type, the
active member of that union shall subsequently be the original active
member, if any", and similarly for [basic.types]/3 "and, for each subobject
of obj2 of union type, the active member of that union shall subsequently
be the active member of the corresponding subobject of obj1, if any". Then
[basic.life]/1 could refer forward to [basic.types].
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2016-05-20 21:54:16 UTC
Permalink
Post by Edward Catmur
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Although you have to be extremely careful in how you manufacture a
pointer to the correct location, once you have it you can use it per
[basic.compound]/3.
Post by Richard Smith
Post by Edward Rosten
OK that's interesting! Thanks for the link, I wasn't familiar with that
section.
Post by Richard Smith
Beware, that wording is defective, and we're in the process of fixing
https://htmlpreview.github.io/?https://raw.github.com/zygoloid/wg21papers/blob/master/wip/d0137r1.html
I see: you've effectively inverted the meaning by adding a "not" in
there. I can see the rationale because the current wording is poor and
leads to some entertaining conclusions about the validity if 1+1
versus 2.
That said, I do wonder if this is the correct fix at least in
isolation. I have seen a fair amount of code in the wild which assumes
that arrays of structs with one member type T (and no padding) can be
accessed via a T*. I wouldn't call that non-conformant since I'd say
the standard isn't exactly crystal clear on the matter currently.
That is still allowed for a standard-layout struct (for non-standard-layout
structs, you don't have any guarantee that the struct and its first member
share the same address) -- reinterpret_cast between a pointer to the type
and a pointer to its first member is required to do the right thing (as is
static_cast via void*).

It still allows of course *(float*)(sizeof(float)*index +
Post by Edward Catmur
(char*)&some_instance_of_rgb), provided you don't escape the chunk of
storage that some_instance happens to sit in.
I'm now also less sure about the invalidity of
((float*)(char*)&instance_of_rgb)[index].
I also noticed that you've explicitly disallowed aliasing via a union
in [basic.life]/1, though I'd probably switch to "active" from
"initialized" in that paragraph for consistency with the wording on
union.
That would break the wording. "active" is defined in terms of lifetime; we
cannot define the start of the lifetime in terms of a member being active.
Post by Edward Catmur
If you don't switch, for example if you initialize one member
of a union (that member is active) and then initialize another member,
if I'm really pedantic (I am), then I could argue that you've
initialized both.
I'll join you in some pedantry. There's no way to initialize another
member. The only way to initialize a union member is as part of
initializing the union object. What you *can* do is to start the lifetime
of a new object that overlays the storage of some other union member. When
you do so, the name of the union member can (usually) then be used to
denote the new object, but the original union member and the new object are
distinct objects.

However, before you change the wording, could you explain the
Post by Edward Catmur
rationale for the change? Obviously the wording is unclear and you
want to fix that however with this fix, you're explicitly breaking
compatibility with C code, so I could see this potentially breaking
code which makes use of C based headers. I think the change also goes
against common practice since many compilers take the C language
interpretation in the absence of a definitive word from the C++
standard and also because the C and C++ compilers tend to share a lot
of code.
The C language does not have a clear and unambiguous rule for union member
lifetime. In fact, there are at least three different and incompatible
rules, and different parts of different editions of the C standard contain
fragments of them. And in any case, the C++ language has, and always had,
significantly stronger type safety rules than C.

Also, you should read through the current wording of [basic.life]. C++
already has a notion that pointers point to objects rather than merely
storage (other than some bad wording that was introduced in wg21.link/cwg73
as a way of specifying pointer equality, which had some rather problematic
side-effects on the aliasing rules and type safety).

And finally, I'd note that several compilers actually use the C++ aliasing
rules even in C. (C's effective type rules don't allow most of the more
sophisticated subobject-based alias analysis that modern compilers like to
do, except in cases where the compiler can track the lvalue back to the
original object declaration that created it).

In addition, it still leaves open a hole if you memcpy into the union
Post by Edward Catmur
via a pointer to the union not a pointer to one of the member. If the
wording does not allow for aliasing or multiple active/initialized
members, then it's ambiguous which member you could legally use as an
rvalue and which you couldn't after a memcpy.
This should be considered in the context of wg21.link/n3751. I agree it
ought to work, but due to the lack of any sound definition of the active
member of a union in the current standard, I'm not convinced this is a
regression. It seems like we really want memcpy and malloc to implicitly
create (trivially-constructible) objects of the type of the programmer's
choosing. (Put another way, the rule I'm suggesting is: if you can come up
with a list of objects that could have been created by the malloc / memcpy,
and creating that list of objects would have given the program defined
behavior, then the program has defined behavior. You could imagine this as
if the implementations of these functions look into the future, see what
objects would be needed, and then execute the relevant placement new
expressions to create them. This results in something very much like C's
effective type rules.)

Personally, I'd advocate going the other way and making aliasing
Post by Edward Catmur
allowed, partly because I can't think of a sensible way of plugging
the memcpy hole. For types with vacuous initialization (I love that
term!) it could be made explicitly defined behaviour if two objects of
the same type exist at the same offset. For everything else, e.g. a
float aliasing a uint32, the behaviour could be implementation
defined.
I think it's unlikely that you will succeed in persuading the people who
want the something like current rules for type safety and higher-level
language semantics, and the people who want something like the current
rules for type-based alias analysis and related optimization techniques,
that this is a good idea.

-Ed
Post by Edward Catmur
Post by Richard Smith
Post by Edward Rosten
1. *if* you can legally get a pointer to X, then the pointer is valid.
That will no longer be true once we fix the wording.
Post by Edward Rosten
2. Getting a pointer to 1 beyond the end of an array is a legal way of
getting a pointer value (going 2 beyond might overflow, so it's always
counted as UB).
Post by Richard Smith
Post by Edward Rosten
3. The pointer is a valid value since it represents an address in
memory and it points to an object of the correct type (there's no padding),
so it's dereferencable(?). We did not invoke UB by doing anything illegal
to pointer values.
Post by Richard Smith
Post by Edward Rosten
4. If you apply 1-3 recursively, you can say &rgb.r+1 is a valid
pointer to g. If (&rgb.r+1) is a valid pointer to g then (&rgb.r+1) + 1 is
a valid pointer to b. Because in that case we're simply going 1 beyond the
end of the single element "g" array, which is explicitly allowed (if my
reasoning is correct). One possible interpretation is that &rgb.r+2 is UB,
but &rgb+1+1 is well defined.
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
What you could do is use macro gymnastics to generate a table that
can convert from `i` to the `offsetof` a member, thus allowing you to
access a member by index rather than by name.
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Post by Edward Rosten
Indeed. However as far as I can see, it's only the numeric value of
offsetof that matters. If we have some oracle that gives us that number,
then we don't need to use the offsetof macro.
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
You could also check the offsets with another static_assert.
Is that necessary? Given all the constraints imposed with the
static_assert, I can't see a way for it to be possible that in the struct
above offsetof g could be anything other than sizeof(float), and likewise
2*sizeof(float) for offsetof b.
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
It is explicitly forbidden. You just said how: you are accessing a
value through a non-active union member. The common-initial-sequence rules
don't apply to you, because a struct is never layout compatible with an
array.
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Post by Edward Rosten
*Where* is it explicitly forbidden? As far as I can see there is no
wording actually forbidding it: at no point does it say that accessing an
inactive member of a union is undefined behaviour.
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Access is not a problem for primitives; that's how we switch active
member. Undefined behavior (again, for primitives) occurs when you perform
lvalue-to-rvalue conversion; this is [basic.life]/7.1.
Post by Richard Smith
Post by Edward Rosten
Ah interesting, though if I read correctly the standard never
explicitly ties "active member" to the lifetime of an object. These are all
floats, so [basic.life]/1.3 doesn't apply and as per 1.4 the storage hasn't
been released. As for "reuse" in 1.4, the standard seems a little vague on
what the precise meaning of that is.
Post by Richard Smith
Post by Edward Rosten
-Ed
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
Post by Richard Smith
Post by Edward Rosten
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
Post by Richard Smith
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
Post by Richard Smith
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Edward Rosten
2016-05-23 10:50:54 UTC
Permalink
Post by Richard Smith
Post by Edward Rosten
That said, I do wonder if this is the correct fix at least in
isolation. I have seen a fair amount of code in the wild which assumes
that arrays of structs with one member type T (and no padding) can be
accessed via a T*. I wouldn't call that non-conformant since I'd say
the standard isn't exactly crystal clear on the matter currently.
That is still allowed for a standard-layout struct (for non-standard-layout
structs, you don't have any guarantee that the struct and its first member
share the same address) -- reinterpret_cast between a pointer to the type
and a pointer to its first member is required to do the right thing (as is
static_cast via void*).
OK: I'm only talking about standard layout structs here. I'm also
referring to accessing members other than the first one. If you're
also referring to that can you say why you think it's allowed, because
I'm not sure from the standard.
Post by Richard Smith
Post by Edward Rosten
I also noticed that you've explicitly disallowed aliasing via a union
in [basic.life]/1, though I'd probably switch to "active" from
"initialized" in that paragraph for consistency with the wording on
union.
That would break the wording. "active" is defined in terms of lifetime; we
cannot define the start of the lifetime in terms of a member being active.
No it isn't. There's no wording which links "active" to lifetime directly.
Post by Richard Smith
Post by Edward Rosten
If you don't switch, for example if you initialize one member
of a union (that member is active) and then initialize another member,
if I'm really pedantic (I am), then I could argue that you've
initialized both.
I'll join you in some pedantry.
That's good: the standard is essentially an exercise in pedantry.
Post by Richard Smith
There's no way to initialize another member.
The only way to initialize a union member is as part of initializing the
union object. What you *can* do is to start the lifetime of a new object
that overlays the storage of some other union member. When you do so, the
name of the union member can (usually) then be used to denote the new
object, but the original union member and the new object are distinct
objects.
Only problem is that [class.union] doesn't define active in terms
lifetime. And I think nor should it, because I think that would lead
to some very strange consequences.
Post by Richard Smith
Post by Edward Rosten
However, before you change the wording, could you explain the
rationale for the change? Obviously the wording is unclear and you
want to fix that however with this fix, you're explicitly breaking
compatibility with C code, so I could see this potentially breaking
code which makes use of C based headers. I think the change also goes
against common practice since many compilers take the C language
interpretation in the absence of a definitive word from the C++
standard and also because the C and C++ compilers tend to share a lot
of code.
The C language does not have a clear and unambiguous rule for union member
lifetime. In fact, there are at least three different and incompatible
rules, and different parts of different editions of the C standard contain
fragments of them.
Sure, but it's clearer than the C++ standard. In N1570:6.2.6.2.7, for
example can be read to be reasonably clear. But yes, it's still not
perfect, by any means.
Post by Richard Smith
And in any case, the C++ language has, and always had,
significantly stronger type safety rules than C.
Quite so, but it's not like the C++ standard is any better when it
comes to unions. That's something I'd very much like to fix.
Post by Richard Smith
Also, you should read through the current wording of [basic.life].
I already have. Please note that me disagreeing with your
interpretation or even me simply being mistaken in my interpretation
is not the same as me not reading it, and making such claims and
assumptions is not conducive to a productive discussion on the topic.
Post by Richard Smith
C++
already has a notion that pointers point to objects rather than merely
storage (other than some bad wording that was introduced in wg21.link/cwg73
as a way of specifying pointer equality, which had some rather problematic
side-effects on the aliasing rules and type safety).
Yes, but a union isn't a pointer, it's an object.
Post by Richard Smith
And finally, I'd note that several compilers actually use the C++ aliasing
rules even in C. (C's effective type rules don't allow most of the more
sophisticated subobject-based alias analysis that modern compilers like to
do, except in cases where the compiler can track the lvalue back to the
original object declaration that created it).
Are you referring to unions here?
Post by Richard Smith
Post by Edward Rosten
In addition, it still leaves open a hole if you memcpy into the union
via a pointer to the union not a pointer to one of the member. If the
wording does not allow for aliasing or multiple active/initialized
members, then it's ambiguous which member you could legally use as an
rvalue and which you couldn't after a memcpy.
This should be considered in the context of wg21.link/n3751. I agree it
ought to work, but due to the lack of any sound definition of the active
member of a union in the current standard, I'm not convinced this is a
regression.
Interesting document. I think it could serve as a basis for something
more, though it's still got some holes.

Also note that N3751 refers to copying bytes from one type to another
unrelated type. I'm pointing out that copying bytes from one union to
an identical one gives UB if you've used the "wrong" member in the
destination before the copy, given the current draft and proposed
rules, which is certainly unintended.

As for it being a regression, that all depends. Personally, I think
code which depended explicitly on UB is one thing. Breaking code which
depends on an ambiguously worded part of the standard is quite
another.
Post by Richard Smith
It seems like we really want memcpy and malloc to implicitly
create (trivially-constructible) objects of the type of the programmer's
choosing.
That sounds like plugging the hole with something too small. The
following code is legal:

void eds_memcpy(char* dest, char* src, size_t n)
{
for(size_t i=0; i < n; i++)
dest[i] = src[i];
}

And I could use that function or any variant thereof (with appropriate
cases) instead of memcpy. Special casing memcpy and malloc would seem
to leave a gap nearly as large as if there was no special cased.
Post by Richard Smith
(Put another way, the rule I'm suggesting is: if you can come up
with a list of objects that could have been created by the malloc / memcpy,
and creating that list of objects would have given the program defined
behavior, then the program has defined behavior. You could imagine this as
if the implementations of these functions look into the future, see what
objects would be needed, and then execute the relevant placement new
expressions to create them. This results in something very much like C's
effective type rules.)
Post by Edward Rosten
Personally, I'd advocate going the other way and making aliasing
allowed, partly because I can't think of a sensible way of plugging
the memcpy hole. For types with vacuous initialization (I love that
term!) it could be made explicitly defined behaviour if two objects of
the same type exist at the same offset. For everything else, e.g. a
float aliasing a uint32, the behaviour could be implementation
defined.
I think it's unlikely that you will succeed in persuading the people who
want the something like current rules for type safety
The current rules (as in the current standard, not drafts or
proposals) arguably already allow what I'm suggesting, and they
certainly don't appear to forbid it. I'd also hope they wouldn't want
to introduce some very peculiar semantics in an attempt to plug the
holes.
Post by Richard Smith
and higher-level
language semantics, and the people who want something like the current rules
for type-based alias analysis and related optimization techniques, that this
is a good idea.
I'm not suggesting weakening the current rules, and I'm not suggesting
messing around with pointer based alias analysis. I'm referring to a
particular problem with unions here. Unions are also a pretty low
level feature, and you're never going to get much by the way of type
safety with them.

-Ed
Post by Richard Smith
Post by Edward Rosten
-Ed
Post by Richard Smith
Post by Edward Rosten
1. *if* you can legally get a pointer to X, then the pointer is valid.
That will no longer be true once we fix the wording.
Post by Edward Rosten
2. Getting a pointer to 1 beyond the end of an array is a legal way of
getting a pointer value (going 2 beyond might overflow, so it's always
counted as UB).
3. The pointer is a valid value since it represents an address in
memory and it points to an object of the correct type (there's no padding),
so it's dereferencable(?). We did not invoke UB by doing anything illegal to
pointer values.
4. If you apply 1-3 recursively, you can say &rgb.r+1 is a valid
pointer to g. If (&rgb.r+1) is a valid pointer to g then (&rgb.r+1) + 1 is a
valid pointer to b. Because in that case we're simply going 1 beyond the end
of the single element "g" array, which is explicitly allowed (if my
reasoning is correct). One possible interpretation is that &rgb.r+2 is UB,
but &rgb+1+1 is well defined.
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
What you could do is use macro gymnastics to generate a table that
can convert from `i` to the `offsetof` a member, thus allowing you to access
a member by index rather than by name.
Indeed. However as far as I can see, it's only the numeric value of
offsetof that matters. If we have some oracle that gives us that number,
then we don't need to use the offsetof macro.
You could also check the offsets with another static_assert.
Is that necessary? Given all the constraints imposed with the
static_assert, I can't see a way for it to be possible that in the struct
above offsetof g could be anything other than sizeof(float), and likewise
2*sizeof(float) for offsetof b.
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
It is explicitly forbidden. You just said how: you are accessing a
value through a non-active union member. The common-initial-sequence rules
don't apply to you, because a struct is never layout compatible with an
array.
*Where* is it explicitly forbidden? As far as I can see there is no
wording actually forbidding it: at no point does it say that accessing an
inactive member of a union is undefined behaviour.
Access is not a problem for primitives; that's how we switch active
member. Undefined behavior (again, for primitives) occurs when you perform
lvalue-to-rvalue conversion; this is [basic.life]/7.1.
Ah interesting, though if I read correctly the standard never
explicitly ties "active member" to the lifetime of an object. These are all
floats, so [basic.life]/1.3 doesn't apply and as per 1.4 the storage hasn't
been released. As for "reuse" in 1.4, the standard seems a little vague on
what the precise meaning of that is.
-Ed
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2016-05-23 19:23:16 UTC
Permalink
Post by Richard Smith
Post by Richard Smith
Post by Edward Rosten
That said, I do wonder if this is the correct fix at least in
isolation. I have seen a fair amount of code in the wild which assumes
that arrays of structs with one member type T (and no padding) can be
accessed via a T*. I wouldn't call that non-conformant since I'd say
the standard isn't exactly crystal clear on the matter currently.
That is still allowed for a standard-layout struct (for
non-standard-layout
Post by Richard Smith
structs, you don't have any guarantee that the struct and its first
member
Post by Richard Smith
share the same address) -- reinterpret_cast between a pointer to the type
and a pointer to its first member is required to do the right thing (as
is
Post by Richard Smith
static_cast via void*).
OK: I'm only talking about standard layout structs here. I'm also
referring to accessing members other than the first one. If you're
also referring to that can you say why you think it's allowed, because
I'm not sure from the standard.
I'm pretty sure we're talking past each other. I'm talking about the
wording in d0137r1 (in direct response to your comment about the same).
Post by Richard Smith
Post by Richard Smith
I also noticed that you've explicitly disallowed aliasing via a union
Post by Edward Rosten
in [basic.life]/1, though I'd probably switch to "active" from
"initialized" in that paragraph for consistency with the wording on
union.
That would break the wording. "active" is defined in terms of lifetime;
we
Post by Richard Smith
cannot define the start of the lifetime in terms of a member being
active.
No it isn't. There's no wording which links "active" to lifetime directly.
In d0137r1, this is in 9.5/1. Prior to d0137r1, "active" is not defined at
all.
Post by Richard Smith
Post by Richard Smith
If you don't switch, for example if you initialize one member
Post by Edward Rosten
of a union (that member is active) and then initialize another member,
if I'm really pedantic (I am), then I could argue that you've
initialized both.
I'll join you in some pedantry.
That's good: the standard is essentially an exercise in pedantry.
Post by Richard Smith
There's no way to initialize another member.
The only way to initialize a union member is as part of initializing the
union object. What you *can* do is to start the lifetime of a new object
that overlays the storage of some other union member. When you do so, the
name of the union member can (usually) then be used to denote the new
object, but the original union member and the new object are distinct
objects.
Only problem is that [class.union] doesn't define active in terms
lifetime. And I think nor should it, because I think that would lead
to some very strange consequences.
Post by Richard Smith
Post by Edward Rosten
However, before you change the wording, could you explain the
rationale for the change? Obviously the wording is unclear and you
want to fix that however with this fix, you're explicitly breaking
compatibility with C code, so I could see this potentially breaking
code which makes use of C based headers. I think the change also goes
against common practice since many compilers take the C language
interpretation in the absence of a definitive word from the C++
standard and also because the C and C++ compilers tend to share a lot
of code.
The C language does not have a clear and unambiguous rule for union
member
Post by Richard Smith
lifetime. In fact, there are at least three different and incompatible
rules, and different parts of different editions of the C standard
contain
Post by Richard Smith
fragments of them.
Sure, but it's clearer than the C++ standard. In N1570:6.2.6.2.7, for
example can be read to be reasonably clear.
I'll need to take a moment to enjoy the phrase "can be read to be
reasonably clear". =)

Also, there is no 6.2.6.2.7 nor 6.2.6.2/7 in N1570, so I'm not really sure
what you're referring to.

Let me quote the self-contradictory parts for you:

6.5.2.3/3: "A postfix expression followed by the . operator and an
identifier designates a member of a structure or union object. The value is
that of the named member, [Footnote: If the member used to read the
contents of a union object is not the same as the member last used to store
a value in the object, the appropriate part of the object representation of
the value is reinterpreted as an object representation in the new type as
described in 6.2.6 (a process sometimes called ''type punning''). This
might be a trap representation.]"

6.5/7: "An object shall have its stored value accessed only by an lvalue
expression that has one of the following types [Footnote: The intent of
this list is to specify those circumstances in which an object may or may
not be aliased.]: [restricted list of types]"

So for an example like

union U { int n; float f; } u = {0};
float f = u.f;

... 6.5.2.3/3 gives semantics and 6.5/7 says it's UB because we're
accessing an object of type int through an lvalue expression of type float.

(And that's without getting into the "visible union" rule described in
6.5.2.3/6, that applies only to the common initial sequence rule but is
misinterpreted by some as applying to union type punning in general. C++ is
more generous here, and doesn't require the union to be visible for the
common initial sequence to be accessible.)

But yes, it's still not
Post by Richard Smith
perfect, by any means.
That's a significant understatement.
Post by Richard Smith
And in any case, the C++ language has, and always had,
Post by Richard Smith
significantly stronger type safety rules than C.
Quite so, but it's not like the C++ standard is any better when it
comes to unions. That's something I'd very much like to fix.
Post by Richard Smith
Also, you should read through the current wording of [basic.life].
I already have. Please note that me disagreeing with your
interpretation or even me simply being mistaken in my interpretation
is not the same as me not reading it, and making such claims and
assumptions is not conducive to a productive discussion on the topic.
Post by Richard Smith
C++
already has a notion that pointers point to objects rather than merely
storage (other than some bad wording that was introduced in
wg21.link/cwg73
Post by Richard Smith
as a way of specifying pointer equality, which had some rather
problematic
Post by Richard Smith
side-effects on the aliasing rules and type safety).
Yes, but a union isn't a pointer, it's an object.
Right, but pointers to union members are pointers.
Post by Richard Smith
And finally, I'd note that several compilers actually use the C++ aliasing
Post by Richard Smith
rules even in C. (C's effective type rules don't allow most of the more
sophisticated subobject-based alias analysis that modern compilers like
to
Post by Richard Smith
do, except in cases where the compiler can track the lvalue back to the
original object declaration that created it).
Are you referring to unions here?
I'm talking about the general case, which includes unions.
Post by Richard Smith
Post by Richard Smith
In addition, it still leaves open a hole if you memcpy into the union
Post by Edward Rosten
via a pointer to the union not a pointer to one of the member. If the
wording does not allow for aliasing or multiple active/initialized
members, then it's ambiguous which member you could legally use as an
rvalue and which you couldn't after a memcpy.
This should be considered in the context of wg21.link/n3751. I agree it
ought to work, but due to the lack of any sound definition of the active
member of a union in the current standard, I'm not convinced this is a
regression.
Interesting document. I think it could serve as a basis for something
more, though it's still got some holes.
Also note that N3751 refers to copying bytes from one type to another
unrelated type. I'm pointing out that copying bytes from one union to
an identical one gives UB if you've used the "wrong" member in the
destination before the copy, given the current draft and proposed
rules, which is certainly unintended.
As for it being a regression, that all depends. Personally, I think
code which depended explicitly on UB is one thing. Breaking code which
depends on an ambiguously worded part of the standard is quite
another.
You and I have different ideas of what it means to break code. If the
standard didn't previously explicitly say how something worked, and a
change results in it still not saying, then it was and still is up to
implementations how they choose to handle that case.
Post by Richard Smith
It seems like we really want memcpy and malloc to implicitly
Post by Richard Smith
create (trivially-constructible) objects of the type of the programmer's
choosing.
That sounds like plugging the hole with something too small. The
void eds_memcpy(char* dest, char* src, size_t n)
{
for(size_t i=0; i < n; i++)
dest[i] = src[i];
}
And I could use that function or any variant thereof (with appropriate
cases) instead of memcpy. Special casing memcpy and malloc would seem
to leave a gap nearly as large as if there was no special cased.
For standard-layout or trivially-copyable types, I agree that it's
desirable for your roll-your-own memcpy to work. My preference would be for
the rule to apply in all cases (for those types), not just to be implicitly
triggered by malloc/memcpy.
Post by Richard Smith
(Put another way, the rule I'm suggesting is: if you can come up
Post by Richard Smith
with a list of objects that could have been created by the malloc /
memcpy,
Post by Richard Smith
and creating that list of objects would have given the program defined
behavior, then the program has defined behavior. You could imagine this
as
Post by Richard Smith
if the implementations of these functions look into the future, see what
objects would be needed, and then execute the relevant placement new
expressions to create them. This results in something very much like C's
effective type rules.)
Post by Edward Rosten
Personally, I'd advocate going the other way and making aliasing
allowed, partly because I can't think of a sensible way of plugging
the memcpy hole. For types with vacuous initialization (I love that
term!) it could be made explicitly defined behaviour if two objects of
the same type exist at the same offset. For everything else, e.g. a
float aliasing a uint32, the behaviour could be implementation
defined.
I think it's unlikely that you will succeed in persuading the people who
want the something like current rules for type safety
The current rules (as in the current standard, not drafts or
proposals) arguably already allow what I'm suggesting, and they
certainly don't appear to forbid it.
3.10/10 says that aliasing is not allowed in the general case. 3.8/8 says
under which circumstances a pointer/reference/glvalue denoting one object
can be used to denote a different object that later gets created in the
same storage.

I'd also hope they wouldn't want
Post by Richard Smith
to introduce some very peculiar semantics in an attempt to plug the
holes.
Post by Richard Smith
and higher-level
language semantics, and the people who want something like the current
rules
Post by Richard Smith
for type-based alias analysis and related optimization techniques, that
this
Post by Richard Smith
is a good idea.
I'm not suggesting weakening the current rules, and I'm not suggesting
messing around with pointer based alias analysis. I'm referring to a
particular problem with unions here. Unions are also a pretty low
level feature, and you're never going to get much by the way of type
safety with them.
The problem is that it's very difficult to weaken the rules for unions
without significantly damaging the rules for the general case. Consider
this classic example:

// TU 1
int f(int *p, float *q) {
int k = *p;
*q = 1.0;
return k;
}

// TU 2
union U { int n; float f; };
int f(int *p, float *q);
int main() {
union U u = { 1 };
return f(&u.n, &u.f);
}

C has traditionally had a very hard time specifying the semantics for this
code. If it has defined behavior, that has surprising aliasing effects on
TU 1 (which contains no union types). And if not, we must somehow
distinguish this from a similar case where the code is inline in TU 2:

int main() {
union U u = { 1 };
int k = u.n;
u.f = 1.0;
return k;
}

... which it seems uncontroversial to claim has defined behavior.

-Ed
Post by Richard Smith
Post by Richard Smith
Post by Edward Rosten
-Ed
Post by Richard Smith
Post by Edward Rosten
1. *if* you can legally get a pointer to X, then the pointer is
valid.
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
That will no longer be true once we fix the wording.
Post by Edward Rosten
2. Getting a pointer to 1 beyond the end of an array is a legal way
of
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
getting a pointer value (going 2 beyond might overflow, so it's
always
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
counted as UB).
3. The pointer is a valid value since it represents an address in
memory and it points to an object of the correct type (there's no
padding),
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
so it's dereferencable(?). We did not invoke UB by doing anything
illegal to
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
pointer values.
4. If you apply 1-3 recursively, you can say &rgb.r+1 is a valid
pointer to g. If (&rgb.r+1) is a valid pointer to g then (&rgb.r+1)
+ 1 is a
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
valid pointer to b. Because in that case we're simply going 1 beyond
the end
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
of the single element "g" array, which is explicitly allowed (if my
reasoning is correct). One possible interpretation is that &rgb.r+2
is UB,
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
but &rgb+1+1 is well defined.
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
What you could do is use macro gymnastics to generate a table that
can convert from `i` to the `offsetof` a member, thus allowing
you to access
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
a member by index rather than by name.
Indeed. However as far as I can see, it's only the numeric value of
offsetof that matters. If we have some oracle that gives us that
number,
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Post by Edward Rosten
then we don't need to use the offsetof macro.
You could also check the offsets with another static_assert.
Is that necessary? Given all the constraints imposed with the
static_assert, I can't see a way for it to be possible that in the
struct
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
above offsetof g could be anything other than sizeof(float), and
likewise
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
2*sizeof(float) for offsetof b.
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
It is explicitly forbidden. You just said how: you are accessing a
value through a non-active union member. The
common-initial-sequence rules
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
don't apply to you, because a struct is never layout compatible
with an
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Post by Edward Rosten
Post by Nicol Bolas
array.
*Where* is it explicitly forbidden? As far as I can see there is no
wording actually forbidding it: at no point does it say that
accessing an
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Post by Edward Rosten
inactive member of a union is undefined behaviour.
Access is not a problem for primitives; that's how we switch active
member. Undefined behavior (again, for primitives) occurs when you
perform
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
lvalue-to-rvalue conversion; this is [basic.life]/7.1.
Ah interesting, though if I read correctly the standard never
explicitly ties "active member" to the lifetime of an object. These
are all
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
floats, so [basic.life]/1.3 doesn't apply and as per 1.4 the storage
hasn't
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
been released. As for "reuse" in 1.4, the standard seems a little
vague on
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
what the precise meaning of that is.
-Ed
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it,
send
Post by Richard Smith
Post by Edward Rosten
Post by Richard Smith
Post by Edward Rosten
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google
Groups
Post by Richard Smith
Post by Edward Rosten
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
an
Post by Richard Smith
Post by Edward Rosten
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2016-05-26 01:00:50 UTC
Permalink
Post by Richard Smith
For standard-layout or trivially-copyable types, I agree that it's
desirable for your roll-your-own memcpy to work. My preference would be for
the rule to apply in all cases (for those types), not just to be implicitly
triggered by malloc/memcpy.
Why doesn't the proposed fix have equivalent rules for standard-layout and
trivially-copyable types? As I've proven before, the pointer and offset
tricks *must* work for trivially-copyable types just as they do for
standard-layout types, or you get a contradiction, due to std::memcpy.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2016-05-26 03:04:59 UTC
Permalink
Post by Myriachan
Post by Richard Smith
For standard-layout or trivially-copyable types, I agree that it's
desirable for your roll-your-own memcpy to work. My preference would be for
the rule to apply in all cases (for those types), not just to be implicitly
triggered by malloc/memcpy.
Why doesn't the proposed fix have equivalent rules for standard-layout and
trivially-copyable types?
Because standard-layout types are not necessarily trivially-copyable.
Standard layout is about permitting compatibility between the layout of the
NSDMs between two separate types. Trivial copyability is about a type's
lifetime and the ability to copy that object's value representation via a
memcpy.

You can write types which are standard layout but not trivially copyable;
just declare a copy constructor. And you can write types which are
trivially copyable and not standard layout; just give it two base classes
with NSDMs. The standards wording under discussion has nothing to do with
the classification of such objects.
Post by Myriachan
As I've proven before, the pointer and offset tricks *must* work for
trivially-copyable types just as they do for standard-layout types, or you
get a contradiction, due to std::memcpy.
If you are talking about this post
<https://groups.google.com/a/isocpp.org/d/msg/std-proposals/85dm5nwmYmk/O9y9H4NPEdsJ>,
then as stated in that thread, all you proved was that NSDMs in classes
have static offsets. That doesn't actually prove anything that is under
discussion here.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2016-05-20 10:44:45 UTC
Permalink
Post by Richard Smith
Post by Edward Rosten
Post by Edward Catmur
Although you have to be extremely careful in how you manufacture a
pointer to the correct location, once you have it you can use it per
[basic.compound]/3.
OK that's interesting! Thanks for the link, I wasn't familiar with that
section.
Beware, that wording is defective, and we're in the process of fixing it.
https://htmlpreview.github.io/?https://raw.github.com/zygoloid/wg21papers/blob/master/wip/d0137r1.html
Good to see that's coming along; thanks for working on it!

If you're accepting comment, I have a few queries: the
[expr.static.cast]/13 wording would appear to imply that for struct S { int
i; int j; } s a converted past-the-end pointer (int*)(&s + 1) would equal
&s.i + 1, which would be a pretty significant change. I assume past-the-end
conversions are only intended to work where the pointer-interconvertible
objects have the same size.

Could you explain why pointer interconvertibility doesn't work with base
class subobjects of a standard-layout class with NSDMs? It seems odd that
adding an NSDM to a previously empty standard-layout class would cause it
to cease being pointer-interconvertible with its first base class
subobjects, while they continue to be located at the same memory address;
e.g. in struct B {}; struct C : B {}; struct D : B { int i; } d; we have &c
~ (B*) &c but &d !~ (B*) &d, even though (void*) &d == (void*) (B*) &d. I'm
also unsure why a derived class couldn't be pointer-interconvertible with
its later base classes; e.g. in struct S {}; struct T {}; struct U : S, T
{} u; &u ~ (S*) &u !~ (T*) &u even though (void*) &u == (void*) (S*) &u ==
(void*) (T*) &u.

Finally, it's not obvious that the single-element array and x[n] rules as
referred to in [expr.add]/4 continue to apply to [expr.add]/5.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Continue reading on narkive:
Loading...