Discussion:
Core issue 2182 and pointer arithmetic
(too old to reply)
Myriachan
2017-09-08 19:47:40 UTC
Permalink
Core issue 2182:

2182. Pointer arithmetic in array-like containers *Section: *5.7
*2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776> (see
paper P0137) calls into question the validity of doing pointer arithmetic
to address separately-allocated but contiguous objects in a container like
std::vector. A related question is whether there should be some allowance
made for allowing pointer arithmetic using a pointer to a base class if the
derived class is a standard-layout class with no non-static data members.
It is possible that std::launder could play a part in the resolution of
this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>. The
major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of std::vector
may be required, perhaps using std::launder as part of iterator
processing.
It seems incredible that the direction of the Standard would be toward
making pointer arithmetic undefined for objects inside an std::vector just
because they have a const member or reference member.


class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();



This would seriously be undefined by what was stated above? This is
completely ridiculous to me.


First of all, it seems that P0137R1 didn't solve the arithmetic issue,
making &v[0] + 2 illegal pointer arithmetic in the first place, because
std::vector (most likely) constructed the objects separately. Second,
making this undefined just because of a const or reference nonstatic member
would break an unbelievable amount of existing C++ code if this arithmetic
were to suddenly require a call to std::launder.


For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
This does have implications, though, because it would allow "bad" code such
as the following to be well-defined:


struct S {
int a, b, c;
};
...
static_assert(offsetof(S, b) == offsetof(S, a) + sizeof(S::a));
static_assert(offsetof(S, c) == offsetof(S, b) + sizeof(S::b));
S s;
(&s.a)[2] = 2;
assert(s.c == 2);

If we solve the pointer arithmetic with std::vector, it seems as if this
code must either be correct (or be ill-formed because the static_asserts
fire). I don't think that compiler optimizers would appreciate this very
much, though. This seems like the "right" solution to the whole problem to
me, but I can see why there would be objections. (I would propose
additionally that char * / unsigned char * / std::byte * be allowed to
cross among objects even of different types, so long as the arithmetic
remains within the bounds of that allocated storage block...but I'm a
radical around here.)


For the second issue, this would be a strongly breaking change - at least,
it would be a breaking change in the sense that too much existing code
relied upon this even if it were undefined behavior for technical reasons.
I think a lot of C++ programmers would be unhappy that this is suddenly
undefined behavior and compilers start emitting code that generates nasal
demons.


An option that would technically work would be to require that std::vector
allocate its whole collection of objects as an array, meaning that any
push_back would necessarily require moving the entire array. This
obviously won't fly for performance reasons.


Thanks, and sorry for the long text,

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2017-09-08 20:19:43 UTC
Permalink
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
*2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776> (see
paper P0137) calls into question the validity of doing pointer arithmetic
to address separately-allocated but contiguous objects in a container like
std::vector. A related question is whether there should be some
allowance made for allowing pointer arithmetic using a pointer to a base
class if the derived class is a standard-layout class with no non-static
data members. It is possible that std::launder could play a part in the
resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>. The
major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of std::vector
may be required, perhaps using std::launder as part of iterator
processing.
It seems incredible that the direction of the Standard would be toward
making pointer arithmetic undefined for objects inside an std::vector
just because they have a const member or reference member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to std:launder
would be in the implementation of vector itself. However, this case would
be undefined:

std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
Post by Myriachan
This is completely ridiculous to me.
First of all, it seems that P0137R1 didn't solve the arithmetic issue,
making &v[0] + 2 illegal pointer arithmetic in the first place, because
std::vector (most likely) constructed the objects separately. Second,
making this undefined just because of a const or reference nonstatic member
would break an unbelievable amount of existing C++ code if this arithmetic
were to suddenly require a call to std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent objects
whose storage is provided by the same array.
Post by Myriachan
This does have implications, though, because it would allow "bad" code
struct S {
int a, b, c;
};
...
static_assert(offsetof(S, b) == offsetof(S, a) + sizeof(S::a));
static_assert(offsetof(S, c) == offsetof(S, b) + sizeof(S::b));
S s;
(&s.a)[2] = 2;
assert(s.c == 2);
... which does not happen in this case, so this would continue to be UB
under that approach.
Post by Myriachan
If we solve the pointer arithmetic with std::vector, it seems as if this
code must either be correct (or be ill-formed because the static_asserts
fire). I don't think that compiler optimizers would appreciate this very
much, though. This seems like the "right" solution to the whole problem to
me, but I can see why there would be objections. (I would propose
additionally that char * / unsigned char * / std::byte * be allowed to
cross among objects even of different types, so long as the arithmetic
remains within the bounds of that allocated storage block...but I'm a
radical around here.)
For the second issue, this would be a strongly breaking change - at least,
it would be a breaking change in the sense that too much existing code
relied upon this even if it were undefined behavior for technical reasons.
I think a lot of C++ programmers would be unhappy that this is suddenly
undefined behavior and compilers start emitting code that generates nasal
demons.
An option that would technically work would be to require that std::vector
allocate its whole collection of objects as an array, meaning that any
push_back would necessarily require moving the entire array. This
obviously won't fly for performance reasons.
Thanks, and sorry for the long text,
Melissa
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-08 20:35:57 UTC
Permalink
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
*2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there should
be some allowance made for allowing pointer arithmetic using a pointer to a
base class if the derived class is a standard-layout class with no
non-static data members. It is possible that std::launder could play a
part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of std::vector
may be required, perhaps using std::launder as part of iterator
processing.
It seems incredible that the direction of the Standard would be toward
making pointer arithmetic undefined for objects inside an std::vector
just because they have a const member or reference member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to std:launder
would be in the implementation of vector itself. However, this case would
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain types
of `C`. Namely those mentioned above: types containing references or
`const` objects.

This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic issue,
making &v[0] + 2 illegal pointer arithmetic in the first place, because
std::vector (most likely) constructed the objects separately. Second,
making this undefined just because of a const or reference nonstatic member
would break an unbelievable amount of existing C++ code if this arithmetic
were to suddenly require a call to std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent objects
whose storage is provided by the same array.
What does "provided by the same array" mean, exactly? Right now, we already
have that.

The problem is that we don't allow pointer arithmetic to work across
adjacent objects of the same type whose storage is provided by the same
*allocation*.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2017-09-08 21:08:38 UTC
Permalink
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
*2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there should
be some allowance made for allowing pointer arithmetic using a pointer to a
base class if the derived class is a standard-layout class with no
non-static data members. It is possible that std::launder could play a
part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of std::vector
may be required, perhaps using std::launder as part of iterator
processing.
It seems incredible that the direction of the Standard would be toward
making pointer arithmetic undefined for objects inside an std::vector
just because they have a const member or reference member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to std:launder
would be in the implementation of vector itself. However, this case would
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain
types of `C`. Namely those mentioned above: types containing references or
`const` objects.
This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic issue,
making &v[0] + 2 illegal pointer arithmetic in the first place, because
std::vector (most likely) constructed the objects separately. Second,
making this undefined just because of a const or reference nonstatic member
would break an unbelievable amount of existing C++ code if this arithmetic
were to suddenly require a call to std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent objects
whose storage is provided by the same array.
What does "provided by the same array" mean, exactly?
See the definition of "provides storage" here:
http://eel.is/c++draft/intro.object#3
Post by Nicol Bolas
Right now, we already have that.
The problem is that we don't allow pointer arithmetic to work across
adjacent objects of the same type whose storage is provided by the same
*allocation*.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-08 21:36:45 UTC
Permalink
Post by Richard Smith
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
[expr.add] *Status: *drafting *Submitter: *Jonathan Wakely
*Date: *2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there
should be some allowance made for allowing pointer arithmetic using a
pointer to a base class if the derived class is a standard-layout class
with no non-static data members. It is possible that std::launder
could play a part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of std::vector
may be required, perhaps using std::launder as part of iterator
processing.
It seems incredible that the direction of the Standard would be toward
making pointer arithmetic undefined for objects inside an std::vector
just because they have a const member or reference member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to
std:launder would be in the implementation of vector itself. However, this
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain
types of `C`. Namely those mentioned above: types containing references or
`const` objects.
This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic issue,
making &v[0] + 2 illegal pointer arithmetic in the first place,
because std::vector (most likely) constructed the objects separately.
Second, making this undefined just because of a const or reference
nonstatic member would break an unbelievable amount of existing C++ code if
this arithmetic were to suddenly require a call to std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent objects
whose storage is provided by the same array.
What does "provided by the same array" mean, exactly?
http://eel.is/c++draft/intro.object#3
Right, but the allocator functions don't create objects "of type “array of N unsigned
char” or of type “array of N std​::​byte”". Nor does
`std::aligned_storage/union_t`. So neither can "provide storage" for an
object under those rules. You can certainly create objects in that storage.
But that won't be the same as "provide storage".

Unless you're saying that `vector` has to allocate memory, then do `new()
char[]` on the allocation, and only then perform construction on any types
in the memory. Or unless you're saying that every allocation of memory, *every
object*, is also an array of bytes in addition to being whatever it
currently is.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-09 00:25:48 UTC
Permalink
Post by Nicol Bolas
Post by Richard Smith
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
[expr.add] *Status: *drafting *Submitter: *Jonathan Wakely
*Date: *2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there
should be some allowance made for allowing pointer arithmetic using a
pointer to a base class if the derived class is a standard-layout class
with no non-static data members. It is possible that std::launder
could play a part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of
std::vector may be required, perhaps using std::launder as part of
iterator processing.
It seems incredible that the direction of the Standard would be toward
making pointer arithmetic undefined for objects inside an std::vector
just because they have a const member or reference member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to
std:launder would be in the implementation of vector itself. However, this
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain
types of `C`. Namely those mentioned above: types containing references or
`const` objects.
This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic issue,
making &v[0] + 2 illegal pointer arithmetic in the first place,
because std::vector (most likely) constructed the objects
separately. Second, making this undefined just because of a const or
reference nonstatic member would break an unbelievable amount of existing
C++ code if this arithmetic were to suddenly require a call to
std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent objects
whose storage is provided by the same array.
What does "provided by the same array" mean, exactly?
http://eel.is/c++draft/intro.object#3
Right, but the allocator functions don't create objects "of type “array of
N unsigned char” or of type “array of N std​::​byte”". Nor does
`std::aligned_storage/union_t`. So neither can "provide storage" for an
object under those rules. You can certainly create objects in that storage.
But that won't be the same as "provide storage".
Unless you're saying that `vector` has to allocate memory, then do `new()
char[]` on the allocation, and only then perform construction on any types
in the memory. Or unless you're saying that every allocation of memory, *every
object*, is also an array of bytes in addition to being whatever it
currently is.
I know you're replying to Richard, but I personally would say that every
object of type T ought to be considered an array of characters of size
sizeof(T). However, that definition implies that a fix to the std::vector
would also allow the shenanigans I mentioned above.

I think that the permissive route is better overall, but it does impede
some optimizations.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2017-09-09 00:32:17 UTC
Permalink
Post by Myriachan
Post by Nicol Bolas
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
[expr.add] *Status: *drafting *Submitter: *Jonathan Wakely
*Date: *2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there
should be some allowance made for allowing pointer arithmetic using a
pointer to a base class if the derived class is a standard-layout class
with no non-static data members. It is possible that std::launder
could play a part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of
std::vector may be required, perhaps using std::launder as part of
iterator processing.
It seems incredible that the direction of the Standard would be
toward making pointer arithmetic undefined for objects inside an
std::vector just because they have a const member or reference
member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to
std:launder would be in the implementation of vector itself. However, this
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain
types of `C`. Namely those mentioned above: types containing references or
`const` objects.
This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic
issue, making &v[0] + 2 illegal pointer arithmetic in the first
place, because std::vector (most likely) constructed the objects
separately. Second, making this undefined just because of a const or
reference nonstatic member would break an unbelievable amount of existing
C++ code if this arithmetic were to suddenly require a call to
std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent
objects whose storage is provided by the same array.
What does "provided by the same array" mean, exactly?
See the definition of "provides storage" here: http://eel.is/c++draft/i
ntro.object#3
Right, but the allocator functions don't create objects "of type “array
of N unsigned char” or of type “array of N std​::​byte”". Nor does
`std::aligned_storage/union_t`. So neither can "provide storage" for an
object under those rules. You can certainly create objects in that storage.
But that won't be the same as "provide storage".
Unless you're saying that `vector` has to allocate memory, then do `new()
char[]` on the allocation, and only then perform construction on any types
in the memory. Or unless you're saying that every allocation of memory, *every
object*, is also an array of bytes in addition to being whatever it
currently is.
I know you're replying to Richard, but I personally would say that every
object of type T ought to be considered an array of characters of size
sizeof(T). However, that definition implies that a fix to the std::vector
would also allow the shenanigans I mentioned above.
There's a delicate balance here. On the one hand, we would like C++ to
support the low-level memory operations necessary to implement something
like vector, and on the other hand, we would like C++ to support high-level
semantics in which abstract reasoning about the behavior of a UB-free
program can be performed.

Finding a middle ground is not simple, but in this case one does seem like
it might be available (allowing vector but not your adjacent fields case),
and that's the direction that I'm currently pursuing with encouragement
from SG12. If that doesn't work out, maybe a blunter instrument will be
warranted.
Post by Myriachan
I think that the permissive route is better overall, but it does impede
some optimizations.
Melissa
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-09 01:38:15 UTC
Permalink
Post by Richard Smith
Post by Myriachan
Post by Nicol Bolas
Post by Richard Smith
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
[expr.add] *Status: *drafting *Submitter: *Jonathan Wakely
*Date: *2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there
should be some allowance made for allowing pointer arithmetic using a
pointer to a base class if the derived class is a standard-layout class
with no non-static data members. It is possible that std::launder
could play a part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of
std::vector may be required, perhaps using std::launder as part of
iterator processing.
It seems incredible that the direction of the Standard would be
toward making pointer arithmetic undefined for objects inside an
std::vector just because they have a const member or reference
member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to
std:launder would be in the implementation of vector itself. However, this
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain
types of `C`. Namely those mentioned above: types containing references or
`const` objects.
This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic
issue, making &v[0] + 2 illegal pointer arithmetic in the first
place, because std::vector (most likely) constructed the objects
separately. Second, making this undefined just because of a const or
reference nonstatic member would break an unbelievable amount of existing
C++ code if this arithmetic were to suddenly require a call to
std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent
objects whose storage is provided by the same array.
What does "provided by the same array" mean, exactly?
http://eel.is/c++draft/intro.object#3
Right, but the allocator functions don't create objects "of type “array
of N unsigned char” or of type “array of N std​::​byte”". Nor does
`std::aligned_storage/union_t`. So neither can "provide storage" for an
object under those rules. You can certainly create objects in that storage.
But that won't be the same as "provide storage".
Unless you're saying that `vector` has to allocate memory, then do
`new() char[]` on the allocation, and only then perform construction on any
types in the memory. Or unless you're saying that every allocation of
memory, *every object*, is also an array of bytes in addition to being
whatever it currently is.
I know you're replying to Richard, but I personally would say that every
object of type T ought to be considered an array of characters of size
sizeof(T). However, that definition implies that a fix to the std::vector
would also allow the shenanigans I mentioned above.
There's a delicate balance here. On the one hand, we would like C++ to
support the low-level memory operations necessary to implement something
like vector, and on the other hand, we would like C++ to support high-level
semantics in which abstract reasoning about the behavior of a UB-free
program can be performed.
Finding a middle ground is not simple, but in this case one does seem like
it might be available (allowing vector but not your adjacent fields case),
and that's the direction that I'm currently pursuing with encouragement
from SG12. If that doesn't work out, maybe a blunter instrument will be
warranted.
Here's the thing though. If all allocations (dynamic, automatic, static,
whatever) are *not* byte arrays, then that means you cannot perform
byte-pointer arithmetic on them to move from pointer to pointer. Which
means `offsetof` is useless. This makes many existing forms of automatic
serialization (that is, iterating through the subobjects of a type)
unworkable.

But if all objects everywhere really are byte arrays, then the definition
of pointer arithmetic you want (based on "provides storage" wording) can
just as easily be applied to any contiguous sequence of objects of the same
type, so long as they're in the same storage allocation.

Where is the middle ground here? It seems like you either make automatic
serialization impossible (ignoring what reflection might bring), or you
make jumping between contiguous members well-defined.
Post by Richard Smith
Post by Myriachan
I think that the permissive route is better overall, but it does impede
some optimizations.
Melissa
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2017-09-09 01:40:26 UTC
Permalink
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
Post by Nicol Bolas
Post by Richard Smith
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
[expr.add] *Status: *drafting *Submitter: *Jonathan
Wakely *Date: *2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there
should be some allowance made for allowing pointer arithmetic using a
pointer to a base class if the derived class is a standard-layout class
with no non-static data members. It is possible that std::launder
could play a part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of
std::vector may be required, perhaps using std::launder as part
of iterator processing.
It seems incredible that the direction of the Standard would be
toward making pointer arithmetic undefined for objects inside an
std::vector just because they have a const member or reference
member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to
std:launder would be in the implementation of vector itself. However, this
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain
types of `C`. Namely those mentioned above: types containing references or
`const` objects.
This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic
issue, making &v[0] + 2 illegal pointer arithmetic in the first
place, because std::vector (most likely) constructed the objects
separately. Second, making this undefined just because of a const or
reference nonstatic member would break an unbelievable amount of existing
C++ code if this arithmetic were to suddenly require a call to
std::launder.
For the first issue, it seems like we should formally define
pointer arithmetic as working across adjacent array objects, with
individual objects being treated as an array of size 1 for this purpose as
usual.
SG12 is already exploring this direction, but only for adjacent
objects whose storage is provided by the same array.
What does "provided by the same array" mean, exactly?
http://eel.is/c++draft/intro.object#3
Right, but the allocator functions don't create objects "of type “array
of N unsigned char” or of type “array of N std​::​byte”". Nor does
`std::aligned_storage/union_t`. So neither can "provide storage" for
an object under those rules. You can certainly create objects in that
storage. But that won't be the same as "provide storage".
Unless you're saying that `vector` has to allocate memory, then do
`new() char[]` on the allocation, and only then perform construction on any
types in the memory. Or unless you're saying that every allocation of
memory, *every object*, is also an array of bytes in addition to being
whatever it currently is.
I know you're replying to Richard, but I personally would say that every
object of type T ought to be considered an array of characters of size
sizeof(T). However, that definition implies that a fix to the std::vector
would also allow the shenanigans I mentioned above.
There's a delicate balance here. On the one hand, we would like C++ to
support the low-level memory operations necessary to implement something
like vector, and on the other hand, we would like C++ to support high-level
semantics in which abstract reasoning about the behavior of a UB-free
program can be performed.
Finding a middle ground is not simple, but in this case one does seem
like it might be available (allowing vector but not your adjacent fields
case), and that's the direction that I'm currently pursuing with
encouragement from SG12. If that doesn't work out, maybe a blunter
instrument will be warranted.
Here's the thing though. If all allocations (dynamic, automatic, static,
whatever) are *not* byte arrays, then that means you cannot perform
byte-pointer arithmetic on them to move from pointer to pointer. Which
means `offsetof` is useless. This makes many existing forms of automatic
serialization (that is, iterating through the subobjects of a type)
unworkable.
But if all objects everywhere really are byte arrays, then the definition
of pointer arithmetic you want (based on "provides storage" wording) can
just as easily be applied to any contiguous sequence of objects of the same
type, so long as they're in the same storage allocation.
Where is the middle ground here? It seems like you either make automatic
serialization impossible (ignoring what reflection might bring), or you
make jumping between contiguous members well-defined.
The middle ground is that this only applies to a contiguous sequence of
*complete* objects.
Post by Nicol Bolas
I think that the permissive route is better overall, but it does impede
Post by Richard Smith
Post by Myriachan
some optimizations.
Melissa
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-09 04:34:20 UTC
Permalink
Post by Richard Smith
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
I know you're replying to Richard, but I personally would say that
every object of type T ought to be considered an array of characters of
size sizeof(T). However, that definition implies that a fix to the
std::vector would also allow the shenanigans I mentioned above.
There's a delicate balance here. On the one hand, we would like C++ to
support the low-level memory operations necessary to implement something
like vector, and on the other hand, we would like C++ to support high-level
semantics in which abstract reasoning about the behavior of a UB-free
program can be performed.
Finding a middle ground is not simple, but in this case one does seem
like it might be available (allowing vector but not your adjacent fields
case), and that's the direction that I'm currently pursuing with
encouragement from SG12. If that doesn't work out, maybe a blunter
instrument will be warranted.
Here's the thing though. If all allocations (dynamic, automatic, static,
whatever) are *not* byte arrays, then that means you cannot perform
byte-pointer arithmetic on them to move from pointer to pointer. Which
means `offsetof` is useless. This makes many existing forms of automatic
serialization (that is, iterating through the subobjects of a type)
unworkable.
But if all objects everywhere really are byte arrays, then the definition
of pointer arithmetic you want (based on "provides storage" wording) can
just as easily be applied to any contiguous sequence of objects of the same
type, so long as they're in the same storage allocation.
Where is the middle ground here? It seems like you either make automatic
serialization impossible (ignoring what reflection might bring), or you
make jumping between contiguous members well-defined.
The middle ground is that this only applies to a contiguous sequence of
*complete* objects.
... now, I'm very confused as to what you're saying.

My initial understanding of the idea is that there are some allocations of
memory that are naturally byte arrays and some that may or may not be. If
you declare an `int` on the stack, then there is no byte array providing
storage for that `int`. But if you heap-allocate it, then there will be a
byte array providing storage for it, through the use of special wording for
`::operator new`.

But now, with this "complete object" wording, I'm not sure why it is that
you cannot say that all allocations of memory are byte arrays. After all,
if only complete objects can form a sequence (outside of actual arrays, of
course), then you've directly forbidden the ability to jump from subobject
to subobject via pointer arithmetic.

Given that, why do you need to explicitly declare some kinds of allocations
to make byte arrays but not other kinds? Why can't all memory be byte
arrays? If you keep this dichotomy between byte-array storage and
non-byte-array storage, then you create this situation:

struct S
{
int x;
float y;
};

void do_something(S &s);

//...

auto ptr_s = newS;
S obj_s;

do_something(*ptr_s);
do_something(obj_s);

//...

void do_something(S &s)
{
auto ptr = &s;
auto member_ptr = reinterpret_cast<float*>(reinterpret_cast<std::byte*>(
ptr) + offsetof(y, S));
//Do something with member_ptr.
}

You're basically saying that `do_something(*ptr_s)` works, but
`do_something(obj_s)` does not, simply because one object was created
dynamically and one was not. So if I'm using `offsetof`, it only works for
dynamically allocated memory or cases where the user explicitly creates
them, since those are the only ones where I can cast to a byte array and do
pointer arithmetic.

For *any* object of type `S`, I think that `do_something` should either be
well-formed or ill-formed. It shouldn't matter how the object was created
or what kind of storage it is in. And the only way to make that always be
well-formed is to make all allocations be byte arrays.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-11 19:48:48 UTC
Permalink
Post by Nicol Bolas
Post by Richard Smith
The middle ground is that this only applies to a contiguous sequence of
*complete* objects.
... now, I'm very confused as to what you're saying.
My initial understanding of the idea is that there are some allocations of
memory that are naturally byte arrays and some that may or may not be. If
you declare an `int` on the stack, then there is no byte array providing
storage for that `int`. But if you heap-allocate it, then there will be a
byte array providing storage for it, through the use of special wording for
`::operator new`.
But now, with this "complete object" wording, I'm not sure why it is that
you cannot say that all allocations of memory are byte arrays. After all,
if only complete objects can form a sequence (outside of actual arrays, of
course), then you've directly forbidden the ability to jump from subobject
to subobject via pointer arithmetic.
Given that, why do you need to explicitly declare some kinds of
allocations to make byte arrays but not other kinds? Why can't all memory
be byte arrays? If you keep this dichotomy between byte-array storage and
struct S
{
int x;
float y;
};
void do_something(S &s);
//...
auto ptr_s = newS;
S obj_s;
do_something(*ptr_s);
do_something(obj_s);
//...
void do_something(S &s)
{
auto ptr = &s;
auto member_ptr = reinterpret_cast<float*>(reinterpret_cast<std::byte*>(
ptr) + offsetof(y, S));
//Do something with member_ptr.
}
You're basically saying that `do_something(*ptr_s)` works, but
`do_something(obj_s)` does not, simply because one object was created
dynamically and one was not. So if I'm using `offsetof`, it only works for
dynamically allocated memory or cases where the user explicitly creates
them, since those are the only ones where I can cast to a byte array and do
pointer arithmetic.
For *any* object of type `S`, I think that `do_something` should either
be well-formed or ill-formed. It shouldn't matter how the object was
created or what kind of storage it is in. And the only way to make that
always be well-formed is to make all allocations be byte arrays.
It is already the case that at least trivially copyable types have to be
considered byte arrays, since basic.types/2 depends on it. The Standard is
self-inconsistent here: it says that copying the underlying bytes copies
the value if it's a trivially copyable type, but it's undefined behavior to
do the pointer arithmetic for this.

struct S {
int a, b;
};

S d;
S s = { 1, 2 };
std::byte *dest = reinterpret_cast<std::byte *>(&d);
std::byte *src = reinterpret_cast<std::byte *>(&s);

for (std::size_t x = 0; x < sizeof(S); ++x) {
dest[x] = src[x];
}

This is undefined behavior, because dest[x] and src[x] resolve to *(dest +
x) and *(src + x) respectively, and those pointer additions are undefined
behavior, because dest and src do not point to std::byte arrays.

It seems that at the least, we should define all storage to be byte arrays
in addition to whatever type is constructed within. There are side effects
to that decision, however.

One side effect is that if you have a pointer to a struct object containing
a char array, the compiler cannot assume that you never overflow that char
array within the object, because the object is within some storage, and
that storage is a char array covering at least the whole object.

If we wanted to preserve certain optimizations, we might have to design
pointers such that it matters how the pointer is derived, rather than just
its value. This is already the case in certain respects, such as the rule
that you can't access a second array using a pointer to just past the end
of a first array, even if the compiler happened to make the two arrays
adjacent, and thus the pointers are equal in the middle.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-09-11 20:14:21 UTC
Permalink
If we wanted to preserve certain optimizations...
Language design driven by optimization is a fundamental error which has
poisoned many aspects of C++.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-11 22:00:24 UTC
Permalink
Post by Myriachan
Post by Nicol Bolas
Post by Richard Smith
The middle ground is that this only applies to a contiguous sequence of
*complete* objects.
... now, I'm very confused as to what you're saying.
My initial understanding of the idea is that there are some allocations
of memory that are naturally byte arrays and some that may or may not be.
If you declare an `int` on the stack, then there is no byte array providing
storage for that `int`. But if you heap-allocate it, then there will be a
byte array providing storage for it, through the use of special wording for
`::operator new`.
But now, with this "complete object" wording, I'm not sure why it is that
you cannot say that all allocations of memory are byte arrays. After all,
if only complete objects can form a sequence (outside of actual arrays, of
course), then you've directly forbidden the ability to jump from subobject
to subobject via pointer arithmetic.
Given that, why do you need to explicitly declare some kinds of
allocations to make byte arrays but not other kinds? Why can't all memory
be byte arrays? If you keep this dichotomy between byte-array storage and
struct S
{
int x;
float y;
};
void do_something(S &s);
//...
auto ptr_s = newS;
S obj_s;
do_something(*ptr_s);
do_something(obj_s);
//...
void do_something(S &s)
{
auto ptr = &s;
auto member_ptr = reinterpret_cast<float*>(reinterpret_cast<std::byte
*>(ptr) + offsetof(y, S));
//Do something with member_ptr.
}
You're basically saying that `do_something(*ptr_s)` works, but
`do_something(obj_s)` does not, simply because one object was created
dynamically and one was not. So if I'm using `offsetof`, it only works for
dynamically allocated memory or cases where the user explicitly creates
them, since those are the only ones where I can cast to a byte array and do
pointer arithmetic.
For *any* object of type `S`, I think that `do_something` should either
be well-formed or ill-formed. It shouldn't matter how the object was
created or what kind of storage it is in. And the only way to make that
always be well-formed is to make all allocations be byte arrays.
It is already the case that at least trivially copyable types have to be
considered byte arrays, since basic.types/2 depends on it. The Standard is
self-inconsistent here: it says that copying the underlying bytes copies
the value if it's a trivially copyable type, but it's undefined behavior to
do the pointer arithmetic for this.
It's illegal for you to do it manually. But it's *not* illegal to call
`memcpy` or `memmove` to cause it to happen. The standard is just saying
that those are the only ways to copy the underlying bytes.
Post by Myriachan
This is undefined behavior, because dest[x] and src[x] resolve to *(dest
+ x) and *(src + x) respectively, and those pointer additions are
undefined behavior, because dest and src do not point to std::byte arrays.
It seems that at the least, we should define all storage to be byte arrays
in addition to whatever type is constructed within. There are side effects
to that decision, however.
One side effect is that if you have a pointer to a struct object
containing a char array, the compiler cannot assume that you never overflow
that char array within the object, because the object is within some
storage, and that storage is a char array covering at least the whole
object.
Note that this would have to be an `unsigned char` or `std::byte` array. A
plain `char` array does not qualify.

If we wanted to preserve certain optimizations, we might have to design
Post by Myriachan
pointers such that it matters how the pointer is derived, rather than just
its value. This is already the case in certain respects, such as the rule
that you can't access a second array using a pointer to just past the end
of a first array, even if the compiler happened to make the two arrays
adjacent, and thus the pointers are equal in the middle.
Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-09-11 22:22:06 UTC
Permalink
Post by Nicol Bolas
It's illegal for you to do it manually. But it's *not* illegal to call
`memcpy` or `memmove` to cause it to happen. The standard is just saying
that those are the only ways to copy the underlying bytes.
That's false. N4687 [basic.types] talks about copying the bytes.
It does not require that only certain functions may do that copying.
The footnotes in that section, describing how bytes are copied, say
that *for example* using memcpy or memmove.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-11 22:57:20 UTC
Permalink
Post by Hyman Rosen
Post by Nicol Bolas
It's illegal for you to do it manually. But it's *not* illegal to call
`memcpy` or `memmove` to cause it to happen. The standard is just saying
that those are the only ways to copy the underlying bytes.
That's false. N4687 [basic.types] talks about copying the bytes.
It does not require that only certain functions may do that copying.
The footnotes in that section, describing how bytes are copied, say
that *for example* using memcpy or memmove.
My point is that this part of the standard makes it legal to copy the bytes
of certain types. But that doesn't make it wrong for *other* parts of the
standard to forbid copying bytes in certain ways. So long as there is still *some
way* to do it, the standard is fine.

That's not to say that I disagree with the idea of all allocations being
byte arrays. But it doesn't make the standard inconsistent to have it allow
byte copying in one place while forbidding a certain type of byte copying
in another.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-12 01:13:31 UTC
Permalink
Post by Nicol Bolas
Post by Hyman Rosen
Post by Nicol Bolas
It's illegal for you to do it manually. But it's *not* illegal to call
`memcpy` or `memmove` to cause it to happen. The standard is just saying
that those are the only ways to copy the underlying bytes.
That's false. N4687 [basic.types] talks about copying the bytes.
It does not require that only certain functions may do that copying.
The footnotes in that section, describing how bytes are copied, say
that *for example* using memcpy or memmove.
My point is that this part of the standard makes it legal to copy the
bytes of certain types. But that doesn't make it wrong for *other* parts
of the standard to forbid copying bytes in certain ways. So long as there
is still *some way* to do it, the standard is fine.
That's not to say that I disagree with the idea of all allocations being
byte arrays. But it doesn't make the standard inconsistent to have it allow
byte copying in one place while forbidding a certain type of byte copying
in another.
I suppose that if you consider memcpy and memmove to be magic functions,
then you're right.

I feel as though the community/committee needs to decide whether to go down
the route of having abstract objects or having a concrete memory model. If
an extreme abstract model is there, and it's not possible (or undefined
behavior) to drop down to raw memory when the programmer wants to, we might
as well be coding C# or Java. The ability to drop down to lower-level code
while still having high-level code is what attracts modern developers to
C++, because the other languages are safer and easier.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-12 14:48:13 UTC
Permalink
Post by Myriachan
Post by Nicol Bolas
Post by Hyman Rosen
Post by Nicol Bolas
It's illegal for you to do it manually. But it's *not* illegal to call
`memcpy` or `memmove` to cause it to happen. The standard is just saying
that those are the only ways to copy the underlying bytes.
That's false. N4687 [basic.types] talks about copying the bytes.
It does not require that only certain functions may do that copying.
The footnotes in that section, describing how bytes are copied, say
that *for example* using memcpy or memmove.
My point is that this part of the standard makes it legal to copy the
bytes of certain types. But that doesn't make it wrong for *other* parts
of the standard to forbid copying bytes in certain ways. So long as there
is still *some way* to do it, the standard is fine.
That's not to say that I disagree with the idea of all allocations being
byte arrays. But it doesn't make the standard inconsistent to have it allow
byte copying in one place while forbidding a certain type of byte copying
in another.
I suppose that if you consider memcpy and memmove to be magic functions,
then you're right.
I feel as though the community/committee needs to decide whether to go
down the route of having abstract objects or having a concrete memory model.
I don't believe that such a choice needs to be made. I see no reason why
you can't have a concrete memory model and have a concrete object model.

The main problem we have is that the concept of indexing memory through
bytes has to happen via the object model, requiring an explicit "byte
array" object. What we want is for it to be able to happen outside of the
object model.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-12 19:00:05 UTC
Permalink
Post by Myriachan
I suppose that if you consider memcpy and memmove to be magic functions,
Post by Myriachan
then you're right.
I feel as though the community/committee needs to decide whether to go
down the route of having abstract objects or having a concrete memory model.
I don't believe that such a choice needs to be made. I see no reason why
you can't have a concrete memory model and have a concrete object model.
The main problem we have is that the concept of indexing memory through
bytes has to happen via the object model, requiring an explicit "byte
array" object. What we want is for it to be able to happen outside of the
object model.
It's sometimes difficult to reconcile the two worlds, particularly when
considering compiler optimizations and exotic implementations.

How do we allow bytewise access and allow std::vector::data() to be used
with pointer arithmetic simultaneously with disallowing garbage like
accessing the wrong element of a class?

// Assume no padding in this implementation.

struct S { int a, b, c; };
S s{ 0, 0, 0 };

(&s.a)[2] = 4; // writes s.c???

If s is to be treated as a byte array, how could this indirection be
disallowed? A reasonable implementation of std::vector<int> would be to
allocate a suitably-aligned std::byte array, then construct ints
individually within the byte array. How would the above differ from this?:

std::vector<int> v;
v.reserve(3);
v.push_back(0);
v.push_back(0);
v.push_back(0);
v.data()[2] = 4;

The current model is rather screwy in that the + 2 within v.data()[2] = 4;
is technically undefined pointer arithmetic, even though it's intended to
be allowed. The alternative would be screwy in that you could do stuff
you're really not supposed to do, and compilers may have to be pessimistic.

A technically correct interpretation of the current standard would be to
say that std::vector magically constructs objects sequentially while
simultaneously allowing pointer arithmetic through data(). An inability to
make custom containers would irritate many C++ programmers, so I don't
think that that is a viable solution.

Which way is better?

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-09-12 20:06:53 UTC
Permalink
Post by Myriachan
How do we allow bytewise access and allow std::vector::data() to be used
with pointer arithmetic simultaneously with disallowing garbage like
accessing the wrong element of a class?
You don't. Stop trying to disallow things, and stop letting optimization
drive language design.

It is a reality that objects sit at locations in memory, and that adding an
offset to a pointer to
one object can make it point to a different object. Instead of trying to
find brilliant ways to
avoid that reality, acknowledge it, and stop contorting the language so
that optimizers can
assume that writing through a pointer leaves some object untouched. If the
compiler can
prove that, fine, but that shouldn't be a part of the language.

Optimization by the assumption that undefined behavior does not happen has
been a curse
on the language for decades. Failing to specify left-to-right order of
evaluation has been a
curse on the language for decades. Treating uninitialized variables as
untouchable has been
a curse on the language for decades.

When you think you need std::launder, your language design has gone off the
deep end.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-13 00:36:43 UTC
Permalink
Post by Hyman Rosen
Post by Myriachan
How do we allow bytewise access and allow std::vector::data() to be used
with pointer arithmetic simultaneously with disallowing garbage like
accessing the wrong element of a class?
You don't. Stop trying to disallow things, and stop letting optimization
drive language design.
It is a reality that objects sit at locations in memory, and that adding
an offset to a pointer to
one object can make it point to a different object. Instead of trying to
find brilliant ways to
avoid that reality, acknowledge it, and stop contorting the language so
that optimizers can
assume that writing through a pointer leaves some object untouched. If
the compiler can
prove that, fine, but that shouldn't be a part of the language.
Optimization by the assumption that undefined behavior does not happen has
been a curse
on the language for decades. Failing to specify left-to-right order of
evaluation has been a
curse on the language for decades. Treating uninitialized variables as
untouchable has been
a curse on the language for decades.
When you think you need std::launder, your language design has gone off
the deep end.
I agree with most of what you're saying, but not everyone in the community
and Committee agrees, so I was trying to find a consensus resolution most
are happy with.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-09-13 16:20:42 UTC
Permalink
Post by Myriachan
I agree with most of what you're saying, but not everyone in the community
and Committee agrees, so I was trying to find a consensus resolution most
are happy with.
It's a noble cause, but doing that has gotten us

a() << b(); *// a() is called before b()*
a() <= b(); *// a() and b() are called in unspecified order*
a() += b();
*// b() is called before a()* a().operator+=(b()); *// a() is called
before b()*

Language design needs vision, not compromise. That's how Jean Ichbiah
designed Ada, for example. In that language you don't see a hodgepodge
mess of cases and exceptions.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-09-14 09:28:24 UTC
Permalink
Post by Hyman Rosen
Language design needs vision, not compromise.
hear hear.

It saddens me that I see so many discussions on this age-old unsolvable
problem of "is it bytes or is it an object".

There is only one correct answer.

It's an object unless you say it's bytes.

proposal: modify the class/struct syntax to allow the *volatile* notation.

struct volatile X { ... };

now a compiler can understand that all members can be potentially aliased
by a pointer and it must ensure that the memory model of the X is
consistent at the beginning and end of a statement involving an X. A 'soft'
thread-local memory fence, if you will.

Otherwise the compiler is safe to assume no aliasing and can make sweeping
optimisations.

At the moment of course we can write
volatile X vx;

But that eliminates *all* optimisation, including within one statement.
That's overkill.

result:

every member of X with the same level of access shall be treated as both an
array of bytes and as discrete objects by the compiler. Aliasing into
members shall be legal and consistent.

pros:

no new keywords
no new attributes
no existing code breakage
obvious intent

cons:

I'll leave that to you. I can't see any.
Post by Hyman Rosen
Post by Myriachan
I agree with most of what you're saying, but not everyone in the
community and Committee agrees, so I was trying to find a consensus
resolution most are happy with.
It's a noble cause, but doing that has gotten us
a() << b(); *// a() is called before b()*
a() <= b(); *// a() and b() are called in unspecified order*
a() += b();
*// b() is called before a()* a().operator+=(b()); *// a() is called
before b()*
Language design needs vision, not compromise. That's how Jean Ichbiah
designed Ada, for example. In that language you don't see a hodgepodge
mess of cases and exceptions.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Ville Voutilainen
2017-09-14 09:41:23 UTC
Permalink
proposal: modify the class/struct syntax to allow the volatile notation.
struct volatile X { ... };
now a compiler can understand that all members can be potentially aliased by
a pointer and it must ensure that the memory model of the X is consistent at
the beginning and end of a statement involving an X. A 'soft' thread-local
memory fence, if you will.
I don't know why you started talking about threads and memory fences
when you seemed
to want to promise that the type should not allow type-based aliasing
optimizations, but
as soon as you start talking about threads, volatile automatically
becomes unacceptable
as a keyword choice.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-09-14 09:43:45 UTC
Permalink
Post by Ville Voutilainen
I don't know why you started talking about threads and memory fences
I didn't.

I'm talking about a 'soft' fence. That ensures consistency of the memory
model between statements. This has nothing to do with concurrency. I chose
the term 'soft' fence because in my mind, it's a similar *kind of
concept. *What
term would be less confusing*?*



On 14 September 2017 at 11:41, Ville Voutilainen <
Post by Ville Voutilainen
proposal: modify the class/struct syntax to allow the volatile notation.
struct volatile X { ... };
now a compiler can understand that all members can be potentially
aliased by
a pointer and it must ensure that the memory model of the X is
consistent at
the beginning and end of a statement involving an X. A 'soft'
thread-local
memory fence, if you will.
I don't know why you started talking about threads and memory fences
when you seemed
to want to promise that the type should not allow type-based aliasing
optimizations, but
as soon as you start talking about threads, volatile automatically
becomes unacceptable
as a keyword choice.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-14 10:13:31 UTC
Permalink
圚 2017幎9月13日星期䞉 UTC+8䞊午4:07:17Hyman Rosen写道
Post by Hyman Rosen
Post by Myriachan
How do we allow bytewise access and allow std::vector::data() to be used
with pointer arithmetic simultaneously with disallowing garbage like
accessing the wrong element of a class?
You don't. Stop trying to disallow things, and stop letting optimization
drive language design.
It is a reality that objects sit at locations in memory, and that adding
an offset to a pointer to
one object can make it point to a different object.
Why "disallowing"? You're requesting new rules to be put into the language
which was not provided previously, aren't you?

If not, show your reality in the language. Which rules have granted these
properties to you? The C++ object model do not expose such properties. The
C++ memory model only allows the location to be addressed as some bytes,
rather than any arbitrary object or subobject.

Instead of trying to find brilliant ways to
Post by Hyman Rosen
avoid that reality, acknowledge it, and stop contorting the language so
that optimizers can
assume that writing through a pointer leaves some object untouched. If
the compiler can
prove that, fine, but that shouldn't be a part of the language.
False. It is still required to be in the language rules to ensure the
leeway a conforming implementation has. Currently they are the as-if rules.

Optimization by the assumption that undefined behavior does not happen has
Post by Hyman Rosen
been a curse
on the language for decades. Failing to specify left-to-right order of
evaluation has been a
curse on the language for decades.
False. You have semicolon and built-in operator comma for decades.
Post by Hyman Rosen
Treating uninitialized variables as untouchable has been
a curse on the language for decades.
Let alone newbies being lacking of ability to correctly use fundamental
abstraction like "volatile" has been a curse on the language for decades.

When you think you need std::launder, your language design has gone off the
Post by Hyman Rosen
deep end.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Edward Catmur
2017-09-14 09:04:28 UTC
Permalink
Post by Myriachan
Post by Myriachan
I suppose that if you consider memcpy and memmove to be magic functions,
Post by Myriachan
then you're right.
I feel as though the community/committee needs to decide whether to go
down the route of having abstract objects or having a concrete memory model.
I don't believe that such a choice needs to be made. I see no reason why
you can't have a concrete memory model and have a concrete object model.
The main problem we have is that the concept of indexing memory through
bytes has to happen via the object model, requiring an explicit "byte
array" object. What we want is for it to be able to happen outside of the
object model.
It's sometimes difficult to reconcile the two worlds, particularly when
considering compiler optimizations and exotic implementations.
How do we allow bytewise access and allow std::vector::data() to be used
with pointer arithmetic simultaneously with disallowing garbage like
accessing the wrong element of a class?
// Assume no padding in this implementation.
struct S { int a, b, c; };
S s{ 0, 0, 0 };
(&s.a)[2] = 4; // writes s.c???
If s is to be treated as a byte array, how could this indirection be
disallowed? A reasonable implementation of std::vector<int> would be to
allocate a suitably-aligned std::byte array, then construct ints
std::vector<int> v;
v.reserve(3);
v.push_back(0);
v.push_back(0);
v.push_back(0);
v.data()[2] = 4;
The current model is rather screwy in that the + 2 within v.data()[2] = 4;
is technically undefined pointer arithmetic, even though it's intended to
be allowed. The alternative would be screwy in that you could do stuff
you're really not supposed to do, and compilers may have to be pessimistic.
A technically correct interpretation of the current standard would be to
say that std::vector magically constructs objects sequentially while
simultaneously allowing pointer arithmetic through data(). An inability
to make custom containers would irritate many C++ programmers, so I don't
think that that is a viable solution.
std::vector doesn't need to be itself magic; it can call a language support
facility available to users as well as to implementors. This would be
called whenever updating data() or size() to mark the range [data(), data()
+ size()) as amenable to pointer arithmetic of the pointer type.
Post by Myriachan
Which way is better?
Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-14 15:38:29 UTC
Permalink
Post by Edward Catmur
Post by Myriachan
Post by Myriachan
I suppose that if you consider memcpy and memmove to be magic functions,
Post by Myriachan
then you're right.
I feel as though the community/committee needs to decide whether to go
down the route of having abstract objects or having a concrete memory model.
I don't believe that such a choice needs to be made. I see no reason why
you can't have a concrete memory model and have a concrete object model.
The main problem we have is that the concept of indexing memory through
bytes has to happen via the object model, requiring an explicit "byte
array" object. What we want is for it to be able to happen outside of the
object model.
It's sometimes difficult to reconcile the two worlds, particularly when
considering compiler optimizations and exotic implementations.
How do we allow bytewise access and allow std::vector::data() to be used
with pointer arithmetic simultaneously with disallowing garbage like
accessing the wrong element of a class?
// Assume no padding in this implementation.
struct S { int a, b, c; };
S s{ 0, 0, 0 };
(&s.a)[2] = 4; // writes s.c???
If s is to be treated as a byte array, how could this indirection be
disallowed? A reasonable implementation of std::vector<int> would be to
allocate a suitably-aligned std::byte array, then construct ints
std::vector<int> v;
v.reserve(3);
v.push_back(0);
v.push_back(0);
v.push_back(0);
v.data()[2] = 4;
The current model is rather screwy in that the + 2 within v.data()[2] =
4; is technically undefined pointer arithmetic, even though it's
intended to be allowed. The alternative would be screwy in that you could
do stuff you're really not supposed to do, and compilers may have to be
pessimistic.
A technically correct interpretation of the current standard would be to
say that std::vector magically constructs objects sequentially while
simultaneously allowing pointer arithmetic through data(). An inability
to make custom containers would irritate many C++ programmers, so I don't
think that that is a viable solution.
std::vector doesn't need to be itself magic; it can call a language
support facility available to users as well as to implementors. This would
be called whenever updating data() or size() to mark the range [data(),
data() + size()) as amenable to pointer arithmetic of the pointer type.
This is a really bad way of thinking. Implementing `vector` or
`vector`-like constructs should not require such an expert-level of
understanding of the object model and the use of exceedingly esoteric
functions.

Users have good reason to expect that, if you explicitly construct two
objects of the same type beside each other in memory, then you can use
pointer arithmetic to jump from one to another. Code that does this exists
and is *extremely prevalent*.

You're effectively proposing to tell all of these people that they have to
call some function (which, FYI, doesn't actually do anything) in order to
make code work. Even though it already works. People will simply not do it,
and therefore compiler writers will refuse to optimize for it since it
would break the world.

So, what exactly have you gained over just making the code work?

I'm against breaking the object model just to allow certain C-isms to work.
But I don't see how it's breaking the object model to say that two
non-subobjects of the same dynamic type, constructed adjacently in the same
storage, can have pointer arithmetic used on them as though they were in an
array. That seems like a perfectly coherent object model to me.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-09-14 18:03:52 UTC
Permalink
Post by Nicol Bolas
Post by Edward Catmur
Post by Myriachan
Post by Myriachan
I suppose that if you consider memcpy and memmove to be magic
functions, then you're right.
I feel as though the community/committee needs to decide whether to go
down the route of having abstract objects or having a concrete memory model.
I don't believe that such a choice needs to be made. I see no reason
why you can't have a concrete memory model and have a concrete object model.
The main problem we have is that the concept of indexing memory through
bytes has to happen via the object model, requiring an explicit "byte
array" object. What we want is for it to be able to happen outside of the
object model.
It's sometimes difficult to reconcile the two worlds, particularly when
considering compiler optimizations and exotic implementations.
How do we allow bytewise access and allow std::vector::data() to be
used with pointer arithmetic simultaneously with disallowing garbage like
accessing the wrong element of a class?
// Assume no padding in this implementation.
struct S { int a, b, c; };
S s{ 0, 0, 0 };
(&s.a)[2] = 4; // writes s.c???
If s is to be treated as a byte array, how could this indirection be
disallowed? A reasonable implementation of std::vector<int> would be
to allocate a suitably-aligned std::byte array, then construct ints
std::vector<int> v;
v.reserve(3);
v.push_back(0);
v.push_back(0);
v.push_back(0);
v.data()[2] = 4;
The current model is rather screwy in that the + 2 within v.data()[2] =
4; is technically undefined pointer arithmetic, even though it's
intended to be allowed. The alternative would be screwy in that you could
do stuff you're really not supposed to do, and compilers may have to be
pessimistic.
A technically correct interpretation of the current standard would be to
say that std::vector magically constructs objects sequentially while
simultaneously allowing pointer arithmetic through data(). An
inability to make custom containers would irritate many C++ programmers, so
I don't think that that is a viable solution.
std::vector doesn't need to be itself magic; it can call a language
support facility available to users as well as to implementors. This would
be called whenever updating data() or size() to mark the range [data(),
data() + size()) as amenable to pointer arithmetic of the pointer type.
This is a really bad way of thinking. Implementing `vector` or
`vector`-like constructs should not require such an expert-level of
understanding of the object model and the use of exceedingly esoteric
functions.
Users have good reason to expect that, if you explicitly construct two
objects of the same type beside each other in memory, then you can use
pointer arithmetic to jump from one to another. Code that does this exists
and is *extremely prevalent*.
You're effectively proposing to tell all of these people that they have to
call some function (which, FYI, doesn't actually do anything) in order to
make code work. Even though it already works. People will simply not do it,
and therefore compiler writers will refuse to optimize for it since it
would break the world.
OK, I take your point. It's certainly worth exploring what the minimal
change to the Standard would be to make existing containers work without
changing code.

So, what exactly have you gained over just making the code work?
Post by Nicol Bolas
I'm against breaking the object model just to allow certain C-isms to
work. But I don't see how it's breaking the object model to say that two
non-subobjects of the same dynamic type, constructed adjacently in the same
storage, can have pointer arithmetic used on them as though they were in an
array. That seems like a perfectly coherent object model to me.
Yes, and this should hold whether the storage is a byte array or provided
by an allocation function.

So, if [expr.add]/4 is additionally allowed to work on the ith element of a
sequence of n complete objects of the same type as *P, where either the
same array provides storage for all the elements of the sequence or the
elements of the sequence are constructed within the same block of storage
returned by an allocation function, would that be enough? It would appear
to allow any sensible implementation of vector and other contiguous
containers, as well as SSO vectors and so forth. And it would not appear to
leave any space for jumping between class data members as shown above.
There's no need to require everything to be a byte array.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-09-14 20:02:09 UTC
Permalink
On Thu, Sep 14, 2017 at 2:03 PM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
So, if [expr.add]/4 is additionally allowed to work on the ith element of
a sequence of n complete objects of the same type
What about code that constructs a sequence of variously typed objects in a
buffer (say, for serialization)? For that matter, the underlying <stdarg>
code used to just treat the stack as a such a buffer, and would increment a
pointer through the parameters having cast the pointer to the current
parameter type.

Pointer addition of *T *p* and *ptrdiff_t offset* should just be
defined as *(T*)((intptr_t)p
+ offset * sizeof(T))* with wraparound semantics for the arithmetic. Then
if the result is equal to some valid pointer, the result *is* that
pointer. Yes, that may let you jump between different objects via
arithmetic. So what? The language has *offsetof* already, so it's not
like this is an undesirable result.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-09-14 21:13:21 UTC
Permalink
Post by Hyman Rosen
On Thu, Sep 14, 2017 at 2:03 PM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
So, if [expr.add]/4 is additionally allowed to work on the ith element of
a sequence of n complete objects of the same type
What about code that constructs a sequence of variously typed objects in a
buffer (say, for serialization)?
If you want to do that safely, you access the buffer as a byte array and
construct the sequence of objects using memcpy or equivalent. You need a
pointer to the buffer anyway, so you may as well just use that as the
destination for each memcpy and update as you go.
Post by Hyman Rosen
For that matter, the underlying <stdarg> code used to just treat the
stack as a such a buffer, and would increment a pointer through the
parameters having cast the pointer to the current parameter type.
va_arg yields a prvalue, so even that doesn't need to perform unsafe
pointer arithmetic, whatever other magic it's doing.
Post by Hyman Rosen
Pointer addition of *T *p* and *ptrdiff_t offset* should just be defined
as *(T*)((intptr_t)p + offset * sizeof(T))* with wraparound semantics for
the arithmetic. Then if the result is equal to some valid pointer, the
result *is* that pointer. Yes, that may let you jump between different
objects via arithmetic. So what? The language has *offsetof* already,
so it's not like this is an undesirable result.
offsetof is still useful when following the rules; it allows you to memcpy
a data member into or out of a standard-layout object in a type-erased
manner, knowing only its offset and size.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-14 21:57:46 UTC
Permalink
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Hyman Rosen
On Thu, Sep 14, 2017 at 2:03 PM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
So, if [expr.add]/4 is additionally allowed to work on the ith element
of a sequence of n complete objects of the same type
What about code that constructs a sequence of variously typed objects in
a buffer (say, for serialization)?
If you want to do that safely, you access the buffer as a byte array and
construct the sequence of objects using memcpy or equivalent. You need a
pointer to the buffer anyway, so you may as well just use that as the
destination for each memcpy and update as you go.
Post by Hyman Rosen
For that matter, the underlying <stdarg> code used to just treat the
stack as a such a buffer, and would increment a pointer through the
parameters having cast the pointer to the current parameter type.
va_arg yields a prvalue, so even that doesn't need to perform unsafe
pointer arithmetic, whatever other magic it's doing.
Post by Hyman Rosen
Pointer addition of *T *p* and *ptrdiff_t offset* should just be defined
as *(T*)((intptr_t)p + offset * sizeof(T))* with wraparound semantics
for the arithmetic. Then if the result is equal to some valid pointer, the
result *is* that pointer. Yes, that may let you jump between different
objects via arithmetic. So what? The language has *offsetof* already,
so it's not like this is an undesirable result.
offsetof is still useful when following the rules; it allows you to memcpy
a data member into or out of a standard-layout object in a type-erased
manner, knowing only its offset and size.
Currently, that is usually not true. The below is undefined behavior:

struct S { int a; int b; } s;
int c = 4;
std::memcpy(reinterpret_cast<std::byte *>(&s) + offsetof(S, b), &c, sizeof(
int));

The reason that it's undefined behavior is that the reinterpret_cast
pointer does not point to an array of std::byte, so pointer arithmetic on
it is undefined behavior. This is undefined behavior even before memcpy
gets involved.

That's primarily why I created this thread: the status quo is broken.
Since programmers do this often, and even more expect pointers to work
across adjacent separately constructed objects, it seems that the Standard
is broken here, not the programmers.

Additionally, there is an active proposal to extend offsetof to all classes
that don't have virtual functions or virtual bases. It passed LEWG in
Toronto. (Classes with virtual functions could be supported in an
implementation if they want to, which probably most will, since virtual
functions don't break offsetof in the ABIs I can think of. Virtual base
classes, not so much.) I hope that this proposal passes Core.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-09-15 11:39:03 UTC
Permalink
Post by Myriachan
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Hyman Rosen
On Thu, Sep 14, 2017 at 2:03 PM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
So, if [expr.add]/4 is additionally allowed to work on the ith element
of a sequence of n complete objects of the same type
What about code that constructs a sequence of variously typed objects in
a buffer (say, for serialization)?
If you want to do that safely, you access the buffer as a byte array and
construct the sequence of objects using memcpy or equivalent. You need a
pointer to the buffer anyway, so you may as well just use that as the
destination for each memcpy and update as you go.
Post by Hyman Rosen
For that matter, the underlying <stdarg> code used to just treat the
stack as a such a buffer, and would increment a pointer through the
parameters having cast the pointer to the current parameter type.
va_arg yields a prvalue, so even that doesn't need to perform unsafe
pointer arithmetic, whatever other magic it's doing.
Post by Hyman Rosen
Pointer addition of *T *p* and *ptrdiff_t offset* should just be
defined as *(T*)((intptr_t)p + offset * sizeof(T))* with wraparound
semantics for the arithmetic. Then if the result is equal to some valid
pointer, the result *is* that pointer. Yes, that may let you jump
between different objects via arithmetic. So what? The language has
*offsetof* already, so it's not like this is an undesirable result.
offsetof is still useful when following the rules; it allows you to
memcpy a data member into or out of a standard-layout object in a
type-erased manner, knowing only its offset and size.
struct S { int a; int b; } s;
int c = 4;
std::memcpy(reinterpret_cast<std::byte *>(&s) + offsetof(S, b), &c, sizeof
(int));
The reason that it's undefined behavior is that the reinterpret_cast
pointer does not point to an array of std::byte, so pointer arithmetic on
it is undefined behavior. This is undefined behavior even before memcpy
gets involved.
That's primarily why I created this thread: the status quo is broken.
Since programmers do this often, and even more expect pointers to work
across adjacent separately constructed objects, it seems that the Standard
is broken here, not the programmers.
Right; that's the "sequence of N [bytes]" in [basic.types]/4. I agree that
both of these are issues, but as far as I can tell they could be resolved
by relaxing [expr.add] slightly; in this case by allowing pointer
arithmetic on byte pointers within a complete object or allocation.
Hopefully this need not open any holes elsewhere; e.g. the pointer
interconvertibility rules [basic.compound] would still prevent using
offsetof to construct a pointer to a class data member.

Additionally, there is an active proposal to extend offsetof to all classes
Post by Myriachan
that don't have virtual functions or virtual bases. It passed LEWG in
Toronto. (Classes with virtual functions could be supported in an
implementation if they want to, which probably most will, since virtual
functions don't break offsetof in the ABIs I can think of. Virtual base
classes, not so much.) I hope that this proposal passes Core.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-09-14 22:01:14 UTC
Permalink
On Thu, Sep 14, 2017 at 5:13 PM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
va_arg yields a prvalue, so even that doesn't need to perform unsafe
pointer arithmetic, whatever other magic it's doing.
It's not unsafe, just forbidden. This is all a weird sort of political
correctness. The overwhelming majority of computing takes place on systems
that present a single flat address space to programs, and where pointer
arithmetic is just integer arithmetic. But we are supposed to pretend that
this isn't so, and that there is special fairy dust that gets sprinkled on
some additions but not on others. It's even weirder because actual
programs do that sort of unblessed pointer arithmetic all the time, and
have been doing so forever. Overlaying data representations in order to
treat the same segment of memory as objects of different types has been
around even longer, in Fortran's EQUIVALENCE and probably earlier, but we
can't do that with unions any more, even though we've been doing that with
unions forever.

All of this in misguided service to the optimizationists. The compiler
should translate our code as written, without making assumptions that we're
not engaging in activities that we are are, in fact, engaging in. If it
can prove something to itself, that's fine, but otherwise it needs to leave
our code alone. The language should do away with the undefined behavior
that lets compilers indulge in these shenanigans.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-15 10:13:26 UTC
Permalink
圚 2017幎9月15日星期五 UTC+8䞊午6:01:39Hyman Rosen写道
Post by Hyman Rosen
On Thu, Sep 14, 2017 at 5:13 PM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
va_arg yields a prvalue, so even that doesn't need to perform unsafe
pointer arithmetic, whatever other magic it's doing.
It's not unsafe, just forbidden.
How do you know it is safe when not guaranteed?
Post by Hyman Rosen
This is all a weird sort of political correctness.
Not all. History counts.
Post by Hyman Rosen
The overwhelming majority of computing takes place on systems that present
a single flat address space to programs, and where pointer arithmetic is
just integer arithmetic.
No. Pointers in C++ are basically types. Pointer values are typed. If you
want arithmetic on address, instead propose address type or break the type
system in a whole.

But we are supposed to pretend that this isn't so, and that there is
Post by Hyman Rosen
special fairy dust that gets sprinkled on some additions but not on others.
No. This is the status quo: if nobody add *more *rules, or *limitations*,
you can't rely on that assumption in the sense of the language.

Once you have done that in the standard, some one can then accuse you *forbid
*such former-conforming implementations. Should all of them go to die? This
is also more or less "a weird sort of political correctness".

Of course this is not totally unacceptable, e.g. removal of trigraphs. But
in general, such changes should better make the specification simpler and
be with less limitations to conform, to overcome the net cost of losing
functionality or portability. Assuming a flat address space fails to be the
evolution in that direction. (This can be a non-issue for a new language.
This is never the case in C++.)

It's even weirder because actual programs do that sort of unblessed pointer
Post by Hyman Rosen
arithmetic all the time, and have been doing so forever.
No. High-level languages should usually never rely on such low-level
operations. They are obliged by the view of system design to provide and
utilize proper high-level abstractions to escape away from implementation
details. If they have to leak the abstraction, they flaw by design.

People have face such unblessed features for historical reasons. The case
is just compromise. To bless such things back leads to more mess.

If you're talking about the language implementation itself - no rules
forbid them. If there is some difference ...OK, it's you need to guarantee
it correct (and portable) rather than vendors of the implementations. Is it
unfair to do more detailed work when you want to gain more dirty corners?
Post by Hyman Rosen
Overlaying data representations in order to treat the same segment of
memory as objects of different types has been around even longer, in
Fortran's EQUIVALENCE and probably earlier, but we can't do that with
unions any more, even though we've been doing that with unions forever.
All of this in misguided service to the optimizationists. The compiler
should translate our code as written, without making assumptions that we're
not engaging in activities that we are are, in fact, engaging in. If it
can prove something to itself, that's fine, but otherwise it needs to leave
our code alone. The language should do away with the undefined behavior
that lets compilers indulge in these shenanigans.
Without any optimization, the reasons above still apply. The rules to model
"as written" are far from sufficient to be accurate about behavior, and,
usually useless for a practical programming language even there is no
portability problems.

To discipline most side effects exposed by the abstraction machine "as
written", we already have `volatile`. So what exact change do you want to
get if it does not provide a better chance to provide more flexible
abstraction and ease to conform (also, way to "optimize")? Just to be more
noob-friendly? ... There would be no end. You can get it done only with
more anti-engineering practices.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-09-15 11:11:07 UTC
Permalink
It seems to me that the problem of is-it-bytes-or-is-it-objects is only a
concern because optimisers have to know whether there is any pointer
aliasing going on.

Might be be fair to allow a compiler to assume there is no aliasing unless
told there might be?

Humour me for a moment and imagine a construct called the 'as_if_fence".

The optimiser would be free to optimise and do "as_if" stuff, but not
across such a fence.

Now the programmer has a way to prevent aliasing problems when he's
treating objects as bytes.

Of course there would need to be a way to mark a function as using a fence.

example, imagine a function that takes a reference and a pointer to
potentially the same object:

auto foo(Bar& bar, std::uint8_t* pbar)
{
bar.some_updates();
more_updates(pbar); // this could be reordered in either direction
auto result = bar.result();
return result;
}

If aggressive re-ordering were allowed, optimising this code could cause
breakage because of as-if reordering where *pbar and bar are aliases.

so consider:

auto foo(Bar& bar, Bar* pbar)
{
bar.some_updates();

// compiler may not reorder memory state changes over this fence
// as a result of as-if-rule
std::as_if_fence();

pbar->more_updates(); // so the side effects of this will be consistent

std::as_if_fence(); // and will "happen before" the stuff below

auto result = bar.result();
return result;
}

Doesn't that solve the issue of is it memory or is it objects?
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞊午6:01:39Hyman Rosen写道
Post by Hyman Rosen
On Thu, Sep 14, 2017 at 5:13 PM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
va_arg yields a prvalue, so even that doesn't need to perform unsafe
pointer arithmetic, whatever other magic it's doing.
It's not unsafe, just forbidden.
How do you know it is safe when not guaranteed?
Post by Hyman Rosen
This is all a weird sort of political correctness.
Not all. History counts.
Post by Hyman Rosen
The overwhelming majority of computing takes place on systems that
present a single flat address space to programs, and where pointer
arithmetic is just integer arithmetic.
No. Pointers in C++ are basically types. Pointer values are typed. If you
want arithmetic on address, instead propose address type or break the type
system in a whole.
But we are supposed to pretend that this isn't so, and that there is
Post by Hyman Rosen
special fairy dust that gets sprinkled on some additions but not on others.
No. This is the status quo: if nobody add *more *rules, or *limitations*,
you can't rely on that assumption in the sense of the language.
Once you have done that in the standard, some one can then accuse you *forbid
*such former-conforming implementations. Should all of them go to die?
This is also more or less "a weird sort of political correctness".
Of course this is not totally unacceptable, e.g. removal of trigraphs. But
in general, such changes should better make the specification simpler and
be with less limitations to conform, to overcome the net cost of losing
functionality or portability. Assuming a flat address space fails to be the
evolution in that direction. (This can be a non-issue for a new language.
This is never the case in C++.)
It's even weirder because actual programs do that sort of unblessed
Post by Hyman Rosen
pointer arithmetic all the time, and have been doing so forever.
No. High-level languages should usually never rely on such low-level
operations. They are obliged by the view of system design to provide and
utilize proper high-level abstractions to escape away from implementation
details. If they have to leak the abstraction, they flaw by design.
People have face such unblessed features for historical reasons. The case
is just compromise. To bless such things back leads to more mess.
If you're talking about the language implementation itself - no rules
forbid them. If there is some difference ...OK, it's you need to guarantee
it correct (and portable) rather than vendors of the implementations. Is it
unfair to do more detailed work when you want to gain more dirty corners?
Post by Hyman Rosen
Overlaying data representations in order to treat the same segment of
memory as objects of different types has been around even longer, in
Fortran's EQUIVALENCE and probably earlier, but we can't do that with
unions any more, even though we've been doing that with unions forever.
All of this in misguided service to the optimizationists. The compiler
should translate our code as written, without making assumptions that we're
not engaging in activities that we are are, in fact, engaging in. If it
can prove something to itself, that's fine, but otherwise it needs to leave
our code alone. The language should do away with the undefined behavior
that lets compilers indulge in these shenanigans.
Without any optimization, the reasons above still apply. The rules to
model "as written" are far from sufficient to be accurate about behavior,
and, usually useless for a practical programming language even there is no
portability problems.
To discipline most side effects exposed by the abstraction machine "as
written", we already have `volatile`. So what exact change do you want to
get if it does not provide a better chance to provide more flexible
abstraction and ease to conform (also, way to "optimize")? Just to be more
noob-friendly? ... There would be no end. You can get it done only with
more anti-engineering practices.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-09-15 11:15:28 UTC
Permalink
Sorry, the second function above should be:

auto foo(Bar& bar, std::uint8_t* pbar)
{
bar.some_updates();

// compiler may not reorder memory state changes over this fence
// as a result of as-if-rule
std::as_if_fence();

more_updates(pbar); // so the side effects of this will be consistent

std::as_if_fence(); // and will "happen before" the stuff below

auto result = bar.result();
return result;
}
Post by Richard Hodges
It seems to me that the problem of is-it-bytes-or-is-it-objects is only a
concern because optimisers have to know whether there is any pointer
aliasing going on.
Might be be fair to allow a compiler to assume there is no aliasing unless
told there might be?
Humour me for a moment and imagine a construct called the 'as_if_fence".
The optimiser would be free to optimise and do "as_if" stuff, but not
across such a fence.
Now the programmer has a way to prevent aliasing problems when he's
treating objects as bytes.
Of course there would need to be a way to mark a function as using a fence.
example, imagine a function that takes a reference and a pointer to
auto foo(Bar& bar, std::uint8_t* pbar)
{
bar.some_updates();
more_updates(pbar); // this could be reordered in either direction
auto result = bar.result();
return result;
}
If aggressive re-ordering were allowed, optimising this code could cause
breakage because of as-if reordering where *pbar and bar are aliases.
auto foo(Bar& bar, Bar* pbar)
{
bar.some_updates();
// compiler may not reorder memory state changes over this fence
// as a result of as-if-rule
std::as_if_fence();
pbar->more_updates(); // so the side effects of this will be consistent
std::as_if_fence(); // and will "happen before" the stuff below
auto result = bar.result();
return result;
}
Doesn't that solve the issue of is it memory or is it objects?
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞊午6:01:39Hyman Rosen写道
Post by Hyman Rosen
On Thu, Sep 14, 2017 at 5:13 PM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
va_arg yields a prvalue, so even that doesn't need to perform unsafe
pointer arithmetic, whatever other magic it's doing.
It's not unsafe, just forbidden.
How do you know it is safe when not guaranteed?
Post by Hyman Rosen
This is all a weird sort of political correctness.
Not all. History counts.
Post by Hyman Rosen
The overwhelming majority of computing takes place on systems that
present a single flat address space to programs, and where pointer
arithmetic is just integer arithmetic.
No. Pointers in C++ are basically types. Pointer values are typed. If you
want arithmetic on address, instead propose address type or break the type
system in a whole.
But we are supposed to pretend that this isn't so, and that there is
Post by Hyman Rosen
special fairy dust that gets sprinkled on some additions but not on others.
No. This is the status quo: if nobody add *more *rules, or *limitations*,
you can't rely on that assumption in the sense of the language.
Once you have done that in the standard, some one can then accuse you *forbid
*such former-conforming implementations. Should all of them go to die?
This is also more or less "a weird sort of political correctness".
Of course this is not totally unacceptable, e.g. removal of trigraphs.
But in general, such changes should better make the specification simpler
and be with less limitations to conform, to overcome the net cost of losing
functionality or portability. Assuming a flat address space fails to be the
evolution in that direction. (This can be a non-issue for a new language.
This is never the case in C++.)
It's even weirder because actual programs do that sort of unblessed
Post by Hyman Rosen
pointer arithmetic all the time, and have been doing so forever.
No. High-level languages should usually never rely on such low-level
operations. They are obliged by the view of system design to provide and
utilize proper high-level abstractions to escape away from implementation
details. If they have to leak the abstraction, they flaw by design.
People have face such unblessed features for historical reasons. The case
is just compromise. To bless such things back leads to more mess.
If you're talking about the language implementation itself - no rules
forbid them. If there is some difference ...OK, it's you need to guarantee
it correct (and portable) rather than vendors of the implementations. Is it
unfair to do more detailed work when you want to gain more dirty corners?
Post by Hyman Rosen
Overlaying data representations in order to treat the same segment of
memory as objects of different types has been around even longer, in
Fortran's EQUIVALENCE and probably earlier, but we can't do that with
unions any more, even though we've been doing that with unions forever.
All of this in misguided service to the optimizationists. The compiler
should translate our code as written, without making assumptions that we're
not engaging in activities that we are are, in fact, engaging in. If it
can prove something to itself, that's fine, but otherwise it needs to leave
our code alone. The language should do away with the undefined behavior
that lets compilers indulge in these shenanigans.
Without any optimization, the reasons above still apply. The rules to
model "as written" are far from sufficient to be accurate about behavior,
and, usually useless for a practical programming language even there is no
portability problems.
To discipline most side effects exposed by the abstraction machine "as
written", we already have `volatile`. So what exact change do you want to
get if it does not provide a better chance to provide more flexible
abstraction and ease to conform (also, way to "optimize")? Just to be more
noob-friendly? ... There would be no end. You can get it done only with
more anti-engineering practices.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-15 15:01:26 UTC
Permalink
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞊午6:01:39Hyman Rosen写道
Post by Hyman Rosen
On Thu, Sep 14, 2017 at 5:13 PM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
va_arg yields a prvalue, so even that doesn't need to perform unsafe
pointer arithmetic, whatever other magic it's doing.
It's not unsafe, just forbidden.
How do you know it is safe when not guaranteed?
Post by Hyman Rosen
This is all a weird sort of political correctness.
Not all. History counts.
So why are you ignoring the long history of C++ programmers doing
operations like this and expecting them to work?

The overwhelming majority of computing takes place on systems that present
Post by FrankHB1989
Post by Hyman Rosen
a single flat address space to programs, and where pointer arithmetic is
just integer arithmetic.
No. Pointers in C++ are basically types. Pointer values are typed. If you
want arithmetic on address, instead propose address type or break the type
system in a whole.
You're misrepresenting his statement. His statement is about the systems
that the implementation runs on. Your response is as if he were making a
declaration about what the standard says.

His statement is correct: most systems that C++ code executes on has a flat
address space, with pointer arithmetic being integer arithmetic. That is an
accurate description of most computer systems.

Yes, the standard's memory model does not model this fact. He knows that,
which is why he's arguing that it should be changed.

But we are supposed to pretend that this isn't so, and that there is
Post by FrankHB1989
Post by Hyman Rosen
special fairy dust that gets sprinkled on some additions but not on others.
No. This is the status quo: if nobody add *more *rules, or *limitations*,
you can't rely on that assumption in the sense of the language.
Sure, but since we're talking about *changing the language*, that is kind
of a non-sequitur.

Once you have done that in the standard, some one can then accuse you *forbid
Post by FrankHB1989
*such former-conforming implementations. Should all of them go to die?
This is also more or less "a weird sort of political correctness".
That assumes that such implementations cannot be changed. That there are
systems where it is fundamentally impossible to implement a revised C++
object model on them.

Do you have evidence of the existence of such systems?

Of course this is not totally unacceptable, e.g. removal of trigraphs. But
Post by FrankHB1989
in general, such changes should better make the specification simpler and
be with less limitations to conform, to overcome the net cost of losing
functionality or portability. Assuming a flat address space fails to be the
evolution in that direction. (This can be a non-issue for a new language.
This is never the case in C++.)
It's even weirder because actual programs do that sort of unblessed
Post by Hyman Rosen
pointer arithmetic all the time, and have been doing so forever.
No. High-level languages should usually never rely on such low-level
operations.
It doesn't matter what you think ought to happen. It has happened, is
happening, and will continue to happen because it is a very useful thing to
do.

Also, I disagree with your characterization here. Being able to drop down
to low-level operations when appropriate is one of the defining
characteristics of C++.

They are obliged by the view of system design to provide and utilize proper
Post by FrankHB1989
high-level abstractions to escape away from implementation details. If they
have to leak the abstraction, they flaw by design.
People have face such unblessed features for historical reasons. The case
is just compromise. To bless such things back leads to more mess.
Please explain why allowing the following would make the object model a
"mess":

alignas(T) <some storage of T * 2 bytes> storage;
T *p1 = new(storage) T;
T *p2 = new(p1+1) T; //Getting the 1-past-the-end is always valid ptr
arithmetic.
p1[1].x = 5;
assert(p2->x == 5);

It seems perfectly clear to me what this code is expressing. You want to
create two Ts right next to each other and do pointer arithmetic between
them. The object model seems to be preserved here; objects are initialized
in storage. They have a well-defined lifetime. And so forth.

What is wrong with allowing this?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-15 04:11:48 UTC
Permalink
圚 2017幎9月14日星期四 UTC+8䞋午11:38:30Nicol Bolas写道
Post by Nicol Bolas
Post by Edward Catmur
Post by Myriachan
Post by Myriachan
I suppose that if you consider memcpy and memmove to be magic
functions, then you're right.
I feel as though the community/committee needs to decide whether to go
down the route of having abstract objects or having a concrete memory model.
I don't believe that such a choice needs to be made. I see no reason
why you can't have a concrete memory model and have a concrete object model.
The main problem we have is that the concept of indexing memory through
bytes has to happen via the object model, requiring an explicit "byte
array" object. What we want is for it to be able to happen outside of the
object model.
It's sometimes difficult to reconcile the two worlds, particularly when
considering compiler optimizations and exotic implementations.
How do we allow bytewise access and allow std::vector::data() to be
used with pointer arithmetic simultaneously with disallowing garbage like
accessing the wrong element of a class?
// Assume no padding in this implementation.
struct S { int a, b, c; };
S s{ 0, 0, 0 };
(&s.a)[2] = 4; // writes s.c???
If s is to be treated as a byte array, how could this indirection be
disallowed? A reasonable implementation of std::vector<int> would be
to allocate a suitably-aligned std::byte array, then construct ints
std::vector<int> v;
v.reserve(3);
v.push_back(0);
v.push_back(0);
v.push_back(0);
v.data()[2] = 4;
The current model is rather screwy in that the + 2 within v.data()[2] =
4; is technically undefined pointer arithmetic, even though it's
intended to be allowed. The alternative would be screwy in that you could
do stuff you're really not supposed to do, and compilers may have to be
pessimistic.
A technically correct interpretation of the current standard would be to
say that std::vector magically constructs objects sequentially while
simultaneously allowing pointer arithmetic through data(). An
inability to make custom containers would irritate many C++ programmers, so
I don't think that that is a viable solution.
std::vector doesn't need to be itself magic; it can call a language
support facility available to users as well as to implementors. This would
be called whenever updating data() or size() to mark the range [data(),
data() + size()) as amenable to pointer arithmetic of the pointer type.
This is a really bad way of thinking. Implementing `vector` or
`vector`-like constructs should not require such an expert-level of
understanding of the object model and the use of exceedingly esoteric
functions.
Users have good reason to expect that, if you explicitly construct two
objects of the same type beside each other in memory, then you can use
pointer arithmetic to jump from one to another. Code that does this exists
and is *extremely prevalent*.
You're effectively proposing to tell all of these people that they have to
call some function (which, FYI, doesn't actually do anything) in order to
make code work. Even though it already works. People will simply not do it,
and therefore compiler writers will refuse to optimize for it since it
would break the world.
So, what exactly have you gained over just making the code work?
Did it ever work?
I'm against breaking the object model just to allow certain C-isms to work.
Post by Nicol Bolas
But I don't see how it's breaking the object model to say that two
non-subobjects of the same dynamic type, constructed adjacently in the same
storage, can have pointer arithmetic used on them as though they were in an
array. That seems like a perfectly coherent object model to me.
Did it ever work in ISO C?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-15 04:45:43 UTC
Permalink
Post by FrankHB1989
圚 2017幎9月14日星期四 UTC+8䞋午11:38:30Nicol Bolas写道
Post by Nicol Bolas
Post by Edward Catmur
std::vector doesn't need to be itself magic; it can call a language
support facility available to users as well as to implementors. This would
be called whenever updating data() or size() to mark the range [data(),
data() + size()) as amenable to pointer arithmetic of the pointer type.
This is a really bad way of thinking. Implementing `vector` or
`vector`-like constructs should not require such an expert-level of
understanding of the object model and the use of exceedingly esoteric
functions.
Users have good reason to expect that, if you explicitly construct two
objects of the same type beside each other in memory, then you can use
pointer arithmetic to jump from one to another. Code that does this exists
and is *extremely prevalent*.
You're effectively proposing to tell all of these people that they have
to call some function (which, FYI, doesn't actually do anything) in order
to make code work. Even though it already works. People will simply not do
it, and therefore compiler writers will refuse to optimize for it since it
would break the world.
So, what exactly have you gained over just making the code work?
Did it ever work?
... what are you asking here? Are you asking if non-standard library people
have ever written implementations of `vector`? Yes, they have.

The C++ object model has not been defined to allow it. My point is that
this is a pattern of usage that our object model *ought* to allow. We
shouldn't have to rely on esoteric functions to be able to implement
something like this.

I'm against breaking the object model just to allow certain C-isms to work.
Post by FrankHB1989
Post by Nicol Bolas
But I don't see how it's breaking the object model to say that two
non-subobjects of the same dynamic type, constructed adjacently in the same
storage, can have pointer arithmetic used on them as though they were in an
array. That seems like a perfectly coherent object model to me.
Did it ever work in ISO C?
I don't see why that matters, since the C object model is completely
different from C++'s. But yes, `vector`-like types work in C too.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-15 07:49:00 UTC
Permalink
圚 2017幎9月15日星期五 UTC+8䞋午12:45:44Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月14日星期四 UTC+8䞋午11:38:30Nicol Bolas写道
Post by Nicol Bolas
Post by Edward Catmur
std::vector doesn't need to be itself magic; it can call a language
support facility available to users as well as to implementors. This would
be called whenever updating data() or size() to mark the range [data(),
data() + size()) as amenable to pointer arithmetic of the pointer type.
This is a really bad way of thinking. Implementing `vector` or
`vector`-like constructs should not require such an expert-level of
understanding of the object model and the use of exceedingly esoteric
functions.
Users have good reason to expect that, if you explicitly construct two
objects of the same type beside each other in memory, then you can use
pointer arithmetic to jump from one to another. Code that does this exists
and is *extremely prevalent*.
You're effectively proposing to tell all of these people that they have
to call some function (which, FYI, doesn't actually do anything) in order
to make code work. Even though it already works. People will simply not do
it, and therefore compiler writers will refuse to optimize for it since it
would break the world.
So, what exactly have you gained over just making the code work?
Did it ever work?
... what are you asking here? Are you asking if non-standard library
people have ever written implementations of `vector`? Yes, they have.
The C++ object model has not been defined to allow it. My point is that
this is a pattern of usage that our object model *ought* to allow. We
shouldn't have to rely on esoteric functions to be able to implement
something like this.
Well, to make the problem clearer, let me borrow some terms from ISO C.
Did it work in a *strictly conforming* way? That is, being directly
portable in any conforming C++ implementations without any more assumptions
that ISO C++ does not provide.

If it did work like that in the sense of C++, why you propose the change?
(Otherwise it does not make much sense - things in namespace `std` already
have their rights to be implemented by magic, with or without the
guarantees provided by ISO C++.)

You were arguing you want it to work, with assumptions not provided by the
C++ object model or any other rules in current standard. Please define them
first, e.g. an alternative object model, to replace the necessity of
"exceedingly esoteric functions" you don't want to see. I'm curious whether
it would be more complicated.

(BTW, the naive way you seem to expect does not work in ISO C, either.)

I'm against breaking the object model just to allow certain C-isms to work.
Post by Nicol Bolas
Post by FrankHB1989
Post by Nicol Bolas
But I don't see how it's breaking the object model to say that two
non-subobjects of the same dynamic type, constructed adjacently in the same
storage, can have pointer arithmetic used on them as though they were in an
array. That seems like a perfectly coherent object model to me.
Did it ever work in ISO C?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-15 15:12:17 UTC
Permalink
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞋午12:45:44Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月14日星期四 UTC+8䞋午11:38:30Nicol Bolas写道
Post by Nicol Bolas
Post by Edward Catmur
std::vector doesn't need to be itself magic; it can call a language
support facility available to users as well as to implementors. This would
be called whenever updating data() or size() to mark the range [data(),
data() + size()) as amenable to pointer arithmetic of the pointer type.
This is a really bad way of thinking. Implementing `vector` or
`vector`-like constructs should not require such an expert-level of
understanding of the object model and the use of exceedingly esoteric
functions.
Users have good reason to expect that, if you explicitly construct two
objects of the same type beside each other in memory, then you can use
pointer arithmetic to jump from one to another. Code that does this exists
and is *extremely prevalent*.
You're effectively proposing to tell all of these people that they have
to call some function (which, FYI, doesn't actually do anything) in order
to make code work. Even though it already works. People will simply not do
it, and therefore compiler writers will refuse to optimize for it since it
would break the world.
So, what exactly have you gained over just making the code work?
Did it ever work?
... what are you asking here? Are you asking if non-standard library
people have ever written implementations of `vector`? Yes, they have.
The C++ object model has not been defined to allow it. My point is that
this is a pattern of usage that our object model *ought* to allow. We
shouldn't have to rely on esoteric functions to be able to implement
something like this.
Well, to make the problem clearer, let me borrow some terms from ISO C.
Did it work in a *strictly conforming* way? That is, being directly
portable in any conforming C++ implementations without any more assumptions
that ISO C++ does not provide.
The question is irrelevant, since we're talking about whether we should *change
the standard* to *make* it conforming. That is, we *know* that the standard
doesn't allow it, and we're saying that *it should*.
Post by FrankHB1989
If it did work like that in the sense of C++, why you propose the change?
(Otherwise it does not make much sense - things in namespace `std` already
have their rights to be implemented by magic, with or without the
guarantees provided by ISO C++.)
You were arguing you want it to work, with assumptions not provided by the
C++ object model or any other rules in current standard. Please define them
first, e.g. an alternative object model, to replace the necessity of
"exceedingly esoteric functions" you don't want to see. I'm curious whether
it would be more complicated.
The changes have already been discussed in this thread. It's in the e-mail
Post by FrankHB1989
two non-subobjects of the same dynamic type, constructed adjacently in
the same storage, can have pointer arithmetic used on them as though they
were in an array

It doesn't matter if it's "more complicated" than the current system. What
matters is if the change:

1) Provides genuine benefit to users.

2) Does not make the object model nonsensical.

It clearly provides genuine benefit, since users are *already doing it*. So
clearly that's something people want to do. And I don't see how it damages
the object model. Each object's lifetime is still clear, and we already
have the concept of "nested within" to allow objects to be dynamically
created within other objects' storage. All we're doing is allowing you to
effectively dynamically create arrays of objects.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-15 18:23:36 UTC
Permalink
圚 2017幎9月15日星期五 UTC+8䞋午11:12:17Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞋午12:45:44Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月14日星期四 UTC+8䞋午11:38:30Nicol Bolas写道
Post by Nicol Bolas
Post by Edward Catmur
std::vector doesn't need to be itself magic; it can call a language
support facility available to users as well as to implementors. This would
be called whenever updating data() or size() to mark the range [data(),
data() + size()) as amenable to pointer arithmetic of the pointer type.
This is a really bad way of thinking. Implementing `vector` or
`vector`-like constructs should not require such an expert-level of
understanding of the object model and the use of exceedingly esoteric
functions.
Users have good reason to expect that, if you explicitly construct two
objects of the same type beside each other in memory, then you can use
pointer arithmetic to jump from one to another. Code that does this exists
and is *extremely prevalent*.
You're effectively proposing to tell all of these people that they
have to call some function (which, FYI, doesn't actually do anything) in
order to make code work. Even though it already works. People will simply
not do it, and therefore compiler writers will refuse to optimize for it
since it would break the world.
So, what exactly have you gained over just making the code work?
Did it ever work?
... what are you asking here? Are you asking if non-standard library
people have ever written implementations of `vector`? Yes, they have.
The C++ object model has not been defined to allow it. My point is that
this is a pattern of usage that our object model *ought* to allow. We
shouldn't have to rely on esoteric functions to be able to implement
something like this.
Well, to make the problem clearer, let me borrow some terms from ISO C.
Did it work in a *strictly conforming* way? That is, being directly
portable in any conforming C++ implementations without any more assumptions
that ISO C++ does not provide.
The question is irrelevant, since we're talking about whether we should *change
the standard* to *make* it conforming. That is, we *know* that the
standard doesn't allow it, and we're saying that *it should*.
Sure, but this is not the reason qualifying it should.

If it did work like that in the sense of C++, why you propose the change?
Post by Nicol Bolas
Post by FrankHB1989
(Otherwise it does not make much sense - things in namespace `std` already
have their rights to be implemented by magic, with or without the
guarantees provided by ISO C++.)
You were arguing you want it to work, with assumptions not provided by
the C++ object model or any other rules in current standard. Please define
them first, e.g. an alternative object model, to replace the necessity of
"exceedingly esoteric functions" you don't want to see. I'm curious whether
it would be more complicated.
The changes have already been discussed in this thread. It's in the e-mail
Post by FrankHB1989
two non-subobjects of the same dynamic type, constructed adjacently in
the same storage, can have pointer arithmetic used on them as though they
were in an array
It doesn't matter if it's "more complicated" than the current system. What
Is this enough? For example, is "constructed adjacently" allowed to be
normative solely in the standard without providing a definition of term?
How does "contiguous sequence" apply?

I see the point, but no formal wording. I doubt there can be bad things
slip in easily, until an exhaustive list of actual changes are checked.

1) Provides genuine benefit to users.
Post by Nicol Bolas
2) Does not make the object model nonsensical.
It clearly provides genuine benefit, since users are *already doing it*.
It provides compatibility to old code which was written without mind of
this issue. This may benefit or not. Also note verbosity and limitations on
operations does not effect much, as the case where `observer_ptr` is
considered superior than raw pointers.

So clearly that's something people want to do.
So, no.

And I don't see how it damages the object model. Each object's lifetime is
Post by Nicol Bolas
still clear, and we already have the concept of "nested within" to allow
objects to be dynamically created within other objects' storage. All we're
doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values. Consider
comparing of two iterator values from different vector objects with same
type. You can't assume the result being meaningful because they can be from
unrelated sequences. In this case the sequences assumed are array objects.
The change claims arbitrary two "adjacent" objects are in the same
sequence, which sounds like it requires the language to provide the
definition of infinite number of implicit array lvalues with unknown bound
(and they can be aliased randomly) on-the-fly. This implies a very strange
picture on the object model, especially when [intro.object] is left
unchanged.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-15 20:02:56 UTC
Permalink
Post by Nicol Bolas
Is this enough? For example, is "constructed adjacently" allowed to be
normative solely in the standard without providing a definition of term?
How does "contiguous sequence" apply?
I see the point, but no formal wording. I doubt there can be bad things
slip in easily, until an exhaustive list of actual changes are checked.
The formal wording is a formality; once we decide what we want, then we can
write a proposal with formal wording.
Post by Nicol Bolas
1) Provides genuine benefit to users.
Post by Nicol Bolas
2) Does not make the object model nonsensical.
It clearly provides genuine benefit, since users are *already doing it*.
It provides compatibility to old code which was written without mind of
this issue. This may benefit or not. Also note verbosity and limitations on
operations does not effect much, as the case where `observer_ptr` is
considered superior than raw pointers.
"Old code which was written without mind of this issue" - you mean, every
current implementation of std::vector out there? I don't know that it's
possible to implement std::vector entirely correctly without violating this
rule, due to the presence of the reserve() function. The current state of
things is that std::vector is necessarily a magic class.
Post by Nicol Bolas
And I don't see how it damages the object model. Each object's lifetime is
Post by Nicol Bolas
still clear, and we already have the concept of "nested within" to allow
objects to be dynamically created within other objects' storage. All we're
doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values.
Consider comparing of two iterator values from different vector objects
with same type. You can't assume the result being meaningful because they
can be from unrelated sequences. In this case the sequences assumed are
array objects. The change claims arbitrary two "adjacent" objects are in
the same sequence, which sounds like it requires the language to provide
the definition of infinite number of implicit array lvalues with unknown
bound (and they can be aliased randomly) on-the-fly. This implies a very
strange picture on the object model, especially when [intro.object] is left
unchanged.
This is why one of the rules I propose is that you can only do this for
pointers that are part of the same block of storage. Two std::vectors
would allocate memory separately from their allocator, meaning that even if
the two arrays end up adjacent, the pointer arithmetic would still be
undefined behavior. Similarly for two automatic arrays of the same type,
even if the compiler happens to put them adjacent in stack memory, because
such allocation is considered to be separate storage.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-16 18:13:06 UTC
Permalink
圚 2017幎9月16日星期六 UTC+8䞊午4:02:56Myriachan写道
Post by Myriachan
Post by Nicol Bolas
Is this enough? For example, is "constructed adjacently" allowed to be
normative solely in the standard without providing a definition of term?
How does "contiguous sequence" apply?
I see the point, but no formal wording. I doubt there can be bad things
slip in easily, until an exhaustive list of actual changes are checked.
The formal wording is a formality; once we decide what we want, then we
can write a proposal with formal wording.
Post by Nicol Bolas
1) Provides genuine benefit to users.
Post by Nicol Bolas
2) Does not make the object model nonsensical.
It clearly provides genuine benefit, since users are *already doing it*.
It provides compatibility to old code which was written without mind of
this issue. This may benefit or not. Also note verbosity and limitations on
operations does not effect much, as the case where `observer_ptr` is
considered superior than raw pointers.
"Old code which was written without mind of this issue" - you mean, every
current implementation of std::vector out there? I don't know that it's
possible to implement std::vector entirely correctly without violating this
rule, due to the presence of the reserve() function. The current state of
things is that std::vector is necessarily a magic class.
Probably yes. But the case is different, since it is a part of
implementation, vendors can provide additional guarantees to avoid
portability problems in user code. Ask them if you are afraid of bugs here,
if you can't provide these guarantees by yourself.
I don't think making std::vector relying on such magic is a deliberate
design, but implementations of every standard library have already the
rights to rely on the magic, since no rule tells you can avoid that. If you
don't like it, propose new rules in [library], to require the library
components except [support.general] and implementation-defined ones being
always implementable in portable C++; or directly move them out of the
standard. Anyway, this should not be only applicable on std::vector.
(Though this is more likely another topic.)
Post by Myriachan
Post by Nicol Bolas
And I don't see how it damages the object model. Each object's lifetime
Post by Nicol Bolas
is still clear, and we already have the concept of "nested within" to allow
objects to be dynamically created within other objects' storage. All we're
doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values.
Consider comparing of two iterator values from different vector objects
with same type. You can't assume the result being meaningful because they
can be from unrelated sequences. In this case the sequences assumed are
array objects. The change claims arbitrary two "adjacent" objects are in
the same sequence, which sounds like it requires the language to provide
the definition of infinite number of implicit array lvalues with unknown
bound (and they can be aliased randomly) on-the-fly. This implies a very
strange picture on the object model, especially when [intro.object] is left
unchanged.
This is why one of the rules I propose is that you can only do this for
pointers that are part of the same block of storage. Two std::vectors
would allocate memory separately from their allocator, meaning that even if
the two arrays end up adjacent, the pointer arithmetic would still be
undefined behavior. Similarly for two automatic arrays of the same type,
even if the compiler happens to put them adjacent in stack memory, because
such allocation is considered to be separate storage.
Well, I see, you actually have to make it bypassing the object model
(otherwise there would a chicken and egg problem on definition of "same
block storage" without changing on [intro.object]) for limited cases. This
might be technically doable, but as a user I am not comfortable to rely on
these rules reasoning the program with "object pointers". It seems just
like a hack. I hope there would be a more sane address space model to
resolve the problem, if possible.
Post by Myriachan
Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-15 21:15:30 UTC
Permalink
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞋午11:12:17Nicol Bolas写道
Post by FrankHB1989
And I don't see how it damages the object model. Each object's lifetime
is still clear, and we already have the concept of "nested within" to allow
objects to be dynamically created within other objects' storage. All we're
doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values.
Consider comparing of two iterator values from different vector objects
with same type.
This is already explicitly undefined behavior. Iterators from different
containers cannot be compared to one another. However, that's irrelevant
Post by FrankHB1989
You can't assume the result being meaningful because they can be from
unrelated sequences.
two non-subobjects of the same dynamic type, constructed adjacently *in
the same storage*, can have pointer arithmetic used on them as though they
were in an array

So unless those two containers got their allocations from the same storage,
that can't happen.

In this case the sequences assumed are array objects. The change claims
Post by FrankHB1989
arbitrary two "adjacent" objects are in the same sequence, which sounds
like it requires the language to provide the definition of infinite number
of implicit array lvalues with unknown bound (and they can be aliased
randomly) on-the-fly. This implies a very strange picture on the object
model, especially when [intro.object] is left unchanged.
It implies only the following.

A top-level object (one which is not explicitly a subobject) lives in a
piece of storage. If there is a top-level object of the same type directly
adjacent to it in that same storage, then it is legal to access that
top-level object via pointer arithmetic from this object.

Or to put it another way, you're not assuming anything is an array object.
Pointer arithmetic now works for things that *aren't* arrays. It's now
explicitly for accessing adjacent objects, either sibling top-level objects
or sibling array elements.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-16 18:48:11 UTC
Permalink
圚 2017幎9月16日星期六 UTC+8䞊午5:15:31Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞋午11:12:17Nicol Bolas写道
Post by FrankHB1989
And I don't see how it damages the object model. Each object's lifetime
is still clear, and we already have the concept of "nested within" to allow
objects to be dynamically created within other objects' storage. All we're
doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values.
Consider comparing of two iterator values from different vector objects
with same type.
This is already explicitly undefined behavior. Iterators from different
containers cannot be compared to one another. However, that's irrelevant
Post by FrankHB1989
You can't assume the result being meaningful because they can be from
unrelated sequences.
two non-subobjects of the same dynamic type, constructed adjacently *in
the same storage*, can have pointer arithmetic used on them as though
they were in an array
So unless those two containers got their allocations from the same
storage, that can't happen.
In this case the sequences assumed are array objects. The change claims
Post by FrankHB1989
arbitrary two "adjacent" objects are in the same sequence, which sounds
like it requires the language to provide the definition of infinite number
of implicit array lvalues with unknown bound (and they can be aliased
randomly) on-the-fly. This implies a very strange picture on the object
model, especially when [intro.object] is left unchanged.
It implies only the following.
A top-level object (one which is not explicitly a subobject) lives in a
piece of storage. If there is a top-level object of the same type directly
adjacent to it in that same storage, then it is legal to access that
top-level object via pointer arithmetic from this object.
Or to put it another way, you're not assuming anything is an array object.
Pointer arithmetic now works for things that *aren't* arrays. It's now
explicitly for accessing adjacent objects, either sibling top-level objects
or sibling array elements.
I think I see your point now. That's why I illustrated the limitations on
iterators: just *as *an iterator, conceptually, a valid nonnull object
pointer is always implicitly bound to a sequence, i.e. an array object.
This is the base to build the semantics of pointer arithmetic operations,
or, *random access iteration operations*. The proposed change turns it to
rebuild the semantics on the layout properties *occasionally *provided by
underlying memory model instead of the object model. This undermines the
ability of reasoning on well-behaved pointer arithmetic operations
severely, because it then needs more information on siblings which often
can't be collected from a single context. I don't see it a desired
abstraction even the wording can be patched in limited cases.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-17 00:35:23 UTC
Permalink
Post by FrankHB1989
圚 2017幎9月16日星期六 UTC+8䞊午5:15:31Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞋午11:12:17Nicol Bolas写道
Post by FrankHB1989
And I don't see how it damages the object model. Each object's lifetime
is still clear, and we already have the concept of "nested within" to allow
objects to be dynamically created within other objects' storage. All we're
doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values.
Consider comparing of two iterator values from different vector objects
with same type.
This is already explicitly undefined behavior. Iterators from different
containers cannot be compared to one another. However, that's irrelevant
Post by FrankHB1989
You can't assume the result being meaningful because they can be from
unrelated sequences.
two non-subobjects of the same dynamic type, constructed adjacently *in
the same storage*, can have pointer arithmetic used on them as though
they were in an array
So unless those two containers got their allocations from the same
storage, that can't happen.
In this case the sequences assumed are array objects. The change claims
Post by FrankHB1989
arbitrary two "adjacent" objects are in the same sequence, which sounds
like it requires the language to provide the definition of infinite number
of implicit array lvalues with unknown bound (and they can be aliased
randomly) on-the-fly. This implies a very strange picture on the object
model, especially when [intro.object] is left unchanged.
It implies only the following.
A top-level object (one which is not explicitly a subobject) lives in a
piece of storage. If there is a top-level object of the same type directly
adjacent to it in that same storage, then it is legal to access that
top-level object via pointer arithmetic from this object.
Or to put it another way, you're not assuming anything is an array
object. Pointer arithmetic now works for things that *aren't* arrays.
It's now explicitly for accessing adjacent objects, either sibling
top-level objects or sibling array elements.
I think I see your point now. That's why I illustrated the limitations on
iterators: just *as *an iterator, conceptually, a valid nonnull object
pointer is always implicitly bound to a sequence, i.e. an array object.
This is the base to build the semantics of pointer arithmetic operations,
or, *random access iteration operations*. The proposed change turns it to
rebuild the semantics on the layout properties *occasionally *provided by
underlying memory model instead of the object model. This undermines the
ability of reasoning on well-behaved pointer arithmetic operations
severely, because it then needs more information on siblings which often
can't be collected from a single context.
Any function which takes a pointer as a parameter lacks knowledge that the
pointer is "bound to a sequence, i.e. an array object". So it is already
the case that reasoning is about pointer arithmetic is impaired.

Under the current wording, in order to dynamically know whether pointer
arithmetic is legitimate, you have to be able to look at the pointer and
know that it points into an array object of `T`s. Under the new wording, in
order to dynamically know whether pointer arithmetic is legitimate, you
have to be able to look at the pointer, and then look in the direction of
the arithmetic to see if there are more `T`s, and that they all share the
same storage and are not themselves subobjects.

I submit that if you can reason about the former, then you have all of the
information needed to reason about the latter. In both cases, you have to
be able to walk through memory and determine what is actually in that
region of storage pointed to by a pointer. You have to be able to turn an
address into the nested sequence of objects that are pointed to by that
address.
Post by FrankHB1989
I don't see it a desired abstraction even the wording can be patched in
limited cases.
So, being able to implement `std::vector` in a platform-neutral fashion is
not a desired feature of the language?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-17 06:13:09 UTC
Permalink
圚 2017幎9月17日星期日 UTC+8䞊午8:35:24Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月16日星期六 UTC+8䞊午5:15:31Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞋午11:12:17Nicol Bolas写道
Post by FrankHB1989
And I don't see how it damages the object model. Each object's
lifetime is still clear, and we already have the concept of "nested within"
to allow objects to be dynamically created within other objects' storage.
All we're doing is allowing you to effectively dynamically create arrays of
objects.
This effectively changes the notion of validity of pointer values.
Consider comparing of two iterator values from different vector objects
with same type.
This is already explicitly undefined behavior. Iterators from different
containers cannot be compared to one another. However, that's irrelevant
Post by FrankHB1989
You can't assume the result being meaningful because they can be from
unrelated sequences.
two non-subobjects of the same dynamic type, constructed adjacently *in
the same storage*, can have pointer arithmetic used on them as though
they were in an array
So unless those two containers got their allocations from the same
storage, that can't happen.
In this case the sequences assumed are array objects. The change claims
Post by FrankHB1989
arbitrary two "adjacent" objects are in the same sequence, which sounds
like it requires the language to provide the definition of infinite number
of implicit array lvalues with unknown bound (and they can be aliased
randomly) on-the-fly. This implies a very strange picture on the object
model, especially when [intro.object] is left unchanged.
It implies only the following.
A top-level object (one which is not explicitly a subobject) lives in a
piece of storage. If there is a top-level object of the same type directly
adjacent to it in that same storage, then it is legal to access that
top-level object via pointer arithmetic from this object.
Or to put it another way, you're not assuming anything is an array
object. Pointer arithmetic now works for things that *aren't* arrays.
It's now explicitly for accessing adjacent objects, either sibling
top-level objects or sibling array elements.
I think I see your point now. That's why I illustrated the limitations on
iterators: just *as *an iterator, conceptually, a valid nonnull object
pointer is always implicitly bound to a sequence, i.e. an array object.
This is the base to build the semantics of pointer arithmetic operations,
or, *random access iteration operations*. The proposed change turns it
to rebuild the semantics on the layout properties *occasionally *provided
by underlying memory model instead of the object model. This undermines the
ability of reasoning on well-behaved pointer arithmetic operations
severely, because it then needs more information on siblings which often
can't be collected from a single context.
Any function which takes a pointer as a parameter lacks knowledge that the
pointer is "bound to a sequence, i.e. an array object". So it is already
the case that reasoning is about pointer arithmetic is impaired.
This is caused by the design of type system. Working around semantic
limitations is not a fix of it.

In general, the C++ type system is not power enough to encode such
information in type signatures. Some form of gradual typing is needed.
Though there would more problems if you do fix it, e.g. ABI.

However, that does not implies that the knowledge cannot be reasoned
besides typechecking.

Under the current wording, in order to dynamically know whether pointer
Post by Nicol Bolas
arithmetic is legitimate, you have to be able to look at the pointer and
know that it points into an array object of `T`s. Under the new wording, in
order to dynamically know whether pointer arithmetic is legitimate, you
have to be able to look at the pointer, and then look in the direction of
the arithmetic to see if there are more `T`s, and that they all share the
same storage and are not themselves subobjects.
I submit that if you can reason about the former, then you have all of the
information needed to reason about the latter. In both cases, you have to
be able to walk through memory and determine what is actually in that
region of storage pointed to by a pointer. You have to be able to turn an
address into the nested sequence of objects that are pointed to by that
address.
Really? The former is not depending on the memory model directly. It needs
only to know the length of array and one of the pointer values to *n*th
element of the array, not the complete element type, nor the complete array
type. Size of element is not interested here, nor is the layout of elements
in the array. On the contrary, the latter is not possible if you don't know
the layout of members of the complete type it resides in, and the way to
determine all the information is to calculate the layout depending on the
complete type definitions and the memory model, or, to simulate the actual
allocations.

I don't see it a desired abstraction even the wording can be patched in
Post by Nicol Bolas
Post by FrankHB1989
limited cases.
So, being able to implement `std::vector` in a platform-neutral fashion
is not a desired feature of the language?
No. To be accurate, I agree that allowing implementation of `std::vector`
being portable (in sense of conforming C++ programs) is a *convenience*.

However:

1. *Technically*, it is a convenience, not a must, even lacking of the
convenience easily cause bad things we can see, *politically*.

2. It has pros and cons. Not all cases need such convenience and it can
harm users not relying on it in reality. For instance, insufficient
vendor-based optimization. Users can hardly rescue the case unless they
turn themselves to be vendors.

3. As I have said, it should not be only applicable on `std::vector`. It is
strange to allow `std::vector` as the only special case. If the ability is
important to keep, why not only `std::vector`?

4.To avoid it requiring magic directly to be conforming is the correct
direction. But it does not imply modification of core rules is always the
desired choice just for that. (Personally I prefer
magic-wrapped-in-another-std-library approach.)
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-09-17 10:13:13 UTC
Permalink
Post by FrankHB1989
圚 2017幎9月17日星期日 UTC+8䞊午8:35:24Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月16日星期六 UTC+8䞊午5:15:31Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞋午11:12:17Nicol Bolas写道
Post by FrankHB1989
And I don't see how it damages the object model. Each object's
lifetime is still clear, and we already have the concept of "nested within"
to allow objects to be dynamically created within other objects' storage.
All we're doing is allowing you to effectively dynamically create arrays of
objects.
This effectively changes the notion of validity of pointer values.
Consider comparing of two iterator values from different vector objects
with same type.
This is already explicitly undefined behavior. Iterators from different
containers cannot be compared to one another. However, that's irrelevant
Post by FrankHB1989
You can't assume the result being meaningful because they can be from
unrelated sequences.
two non-subobjects of the same dynamic type, constructed adjacently *in
the same storage*, can have pointer arithmetic used on them as though
they were in an array
So unless those two containers got their allocations from the same
storage, that can't happen.
In this case the sequences assumed are array objects. The change claims
Post by FrankHB1989
arbitrary two "adjacent" objects are in the same sequence, which sounds
like it requires the language to provide the definition of infinite number
of implicit array lvalues with unknown bound (and they can be aliased
randomly) on-the-fly. This implies a very strange picture on the object
model, especially when [intro.object] is left unchanged.
It implies only the following.
A top-level object (one which is not explicitly a subobject) lives in a
piece of storage. If there is a top-level object of the same type directly
adjacent to it in that same storage, then it is legal to access that
top-level object via pointer arithmetic from this object.
Or to put it another way, you're not assuming anything is an array
object. Pointer arithmetic now works for things that *aren't* arrays.
It's now explicitly for accessing adjacent objects, either sibling
top-level objects or sibling array elements.
I think I see your point now. That's why I illustrated the limitations
on iterators: just *as *an iterator, conceptually, a valid nonnull
object pointer is always implicitly bound to a sequence, i.e. an array
object. This is the base to build the semantics of pointer arithmetic
operations, or, *random access iteration operations*. The proposed
change turns it to rebuild the semantics on the layout properties *occasionally
*provided by underlying memory model instead of the object model. This
undermines the ability of reasoning on well-behaved pointer arithmetic
operations severely, because it then needs more information on siblings
which often can't be collected from a single context.
Any function which takes a pointer as a parameter lacks knowledge that
the pointer is "bound to a sequence, i.e. an array object". So it is
already the case that reasoning is about pointer arithmetic is impaired.
This is caused by the design of type system. Working around semantic
limitations is not a fix of it.
In general, the C++ type system is not power enough to encode such
information in type signatures. Some form of gradual typing is needed.
Though there would more problems if you do fix it, e.g. ABI.
However, that does not implies that the knowledge cannot be reasoned
besides typechecking.
Under the current wording, in order to dynamically know whether pointer
Post by Nicol Bolas
arithmetic is legitimate, you have to be able to look at the pointer and
know that it points into an array object of `T`s. Under the new wording, in
order to dynamically know whether pointer arithmetic is legitimate, you
have to be able to look at the pointer, and then look in the direction of
the arithmetic to see if there are more `T`s, and that they all share the
same storage and are not themselves subobjects.
I submit that if you can reason about the former, then you have all of
the information needed to reason about the latter. In both cases, you have
to be able to walk through memory and determine what is actually in that
region of storage pointed to by a pointer. You have to be able to turn an
address into the nested sequence of objects that are pointed to by that
address.
Really? The former is not depending on the memory model directly. It needs
only to know the length of array and one of the pointer values to *n*th
element of the array, not the complete element type, nor the complete
array type. Size of element is not interested here, nor is the layout of
elements in the array. On the contrary, the latter is not possible if you
don't know the layout of members of the complete type it resides in, and
the way to determine all the information is to calculate the layout
depending on the complete type definitions and the memory model, or, to
simulate the actual allocations.
I don't see it a desired abstraction even the wording can be patched in
Post by Nicol Bolas
Post by FrankHB1989
limited cases.
So, being able to implement `std::vector` in a platform-neutral fashion
is not a desired feature of the language?
No. To be accurate, I agree that allowing implementation of `std::vector`
being portable (in sense of conforming C++ programs) is a *convenience*.
1. *Technically*, it is a convenience, not a must, even lacking of the
convenience easily cause bad things we can see, *politically*.
2. It has pros and cons. Not all cases need such convenience and it can
harm users not relying on it in reality. For instance, insufficient
vendor-based optimization. Users can hardly rescue the case unless they
turn themselves to be vendors.
3. As I have said, it should not be only applicable on `std::vector`. It
is strange to allow `std::vector` as the only special case. If the ability
is important to keep, why not only `std::vector`?
4.To avoid it requiring magic directly to be conforming is the correct
direction. But it does not imply modification of core rules is always the
desired choice just for that. (Personally I prefer
magic-wrapped-in-another-std-library approach.)
Note that any implementation of std::vector will always require at least as
much magic as std::optional, since optional is equivalent to a vector with
max_size of 1; an object with const or reference data members can be erased
and re-emplaced into the same storage.

So at present there's not much point tweaking rules on pointer arithmetic
to allow std::vector to operate without magic, since it will still need
magic to deal with destroyed and recreated objects.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-17 13:22:28 UTC
Permalink
Post by FrankHB1989
圚 2017幎9月17日星期日 UTC+8䞊午8:35:24Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月16日星期六 UTC+8䞊午5:15:31Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞋午11:12:17Nicol Bolas写道
Post by FrankHB1989
And I don't see how it damages the object model. Each object's
lifetime is still clear, and we already have the concept of "nested within"
to allow objects to be dynamically created within other objects' storage.
All we're doing is allowing you to effectively dynamically create arrays of
objects.
This effectively changes the notion of validity of pointer values.
Consider comparing of two iterator values from different vector objects
with same type.
This is already explicitly undefined behavior. Iterators from different
containers cannot be compared to one another. However, that's irrelevant
Post by FrankHB1989
You can't assume the result being meaningful because they can be from
unrelated sequences.
two non-subobjects of the same dynamic type, constructed adjacently *in
the same storage*, can have pointer arithmetic used on them as though
they were in an array
So unless those two containers got their allocations from the same
storage, that can't happen.
In this case the sequences assumed are array objects. The change claims
Post by FrankHB1989
arbitrary two "adjacent" objects are in the same sequence, which sounds
like it requires the language to provide the definition of infinite number
of implicit array lvalues with unknown bound (and they can be aliased
randomly) on-the-fly. This implies a very strange picture on the object
model, especially when [intro.object] is left unchanged.
It implies only the following.
A top-level object (one which is not explicitly a subobject) lives in a
piece of storage. If there is a top-level object of the same type directly
adjacent to it in that same storage, then it is legal to access that
top-level object via pointer arithmetic from this object.
Or to put it another way, you're not assuming anything is an array
object. Pointer arithmetic now works for things that *aren't* arrays.
It's now explicitly for accessing adjacent objects, either sibling
top-level objects or sibling array elements.
I think I see your point now. That's why I illustrated the limitations
on iterators: just *as *an iterator, conceptually, a valid nonnull
object pointer is always implicitly bound to a sequence, i.e. an array
object. This is the base to build the semantics of pointer arithmetic
operations, or, *random access iteration operations*. The proposed
change turns it to rebuild the semantics on the layout properties *occasionally
*provided by underlying memory model instead of the object model. This
undermines the ability of reasoning on well-behaved pointer arithmetic
operations severely, because it then needs more information on siblings
which often can't be collected from a single context.
Any function which takes a pointer as a parameter lacks knowledge that
the pointer is "bound to a sequence, i.e. an array object". So it is
already the case that reasoning is about pointer arithmetic is impaired.
This is caused by the design of type system.
The cause is irrelevant; the fact is that it's the way the system works.
You can pass pointers to functions, and those functions can do pointer
arithmetic on them if those pointers happen to point to the right thing.

Working around semantic limitations is not a fix of it.
Post by FrankHB1989
In general, the C++ type system is not power enough to encode such
information in type signatures. Some form of gradual typing is needed.
Though there would more problems if you do fix it, e.g. ABI.
However, that does not implies that the knowledge cannot be reasoned
besides typechecking.
It doesn't "imply" anything; it outright states it. A `T*` is a pointer;
that's a type. There is no type-based way to distinguish a pointer to an
array element from a pointer to something that isn't an array element.
Therefore, typechecking *alone* cannot be used to determine if pointer
arithmetic is valid.

That is the way C++ works today. Given that fact, there is no reason why we
can't extend the set of conditions in which pointer arithmetic works.

Under the current wording, in order to dynamically know whether pointer
Post by FrankHB1989
Post by Nicol Bolas
arithmetic is legitimate, you have to be able to look at the pointer and
know that it points into an array object of `T`s. Under the new wording, in
order to dynamically know whether pointer arithmetic is legitimate, you
have to be able to look at the pointer, and then look in the direction of
the arithmetic to see if there are more `T`s, and that they all share the
same storage and are not themselves subobjects.
I submit that if you can reason about the former, then you have all of
the information needed to reason about the latter. In both cases, you have
to be able to walk through memory and determine what is actually in that
region of storage pointed to by a pointer. You have to be able to turn an
address into the nested sequence of objects that are pointed to by that
address.
Really? The former is not depending on the memory model directly. It needs
only to know the length of array and one of the pointer values to *n*th
element of the array, not the complete element type, nor the complete
array type. Size of element is not interested here, nor is the layout of
elements in the array.
You forget what we're doing here.

We're trying to reason about the object model implications about a
particular piece of code. Specifically, we have a function that takes a
pointer. That function is performing pointer arithmetic on that pointer.
We're trying to see what it would take for the object model to be able to
verify, at runtime, if that pointer arithmetic would result in well-defined
behavior. What must the object model be able to determine from that pointer.

Well, in order to get to "runtime", we must first get past *compile-time*.
And it is ill-formed for code to perform pointer arithmetic on `T*` if `T`
is incomplete. Therefore, we must already know the "size of element".

Furthermore, you do need to know the "layout of elements in the array",
because you need to make sure that the type `T` that points to is the *same
type* as the array element type. After all, the `T*` could be pointing to
the first subobject of the array element type `U`. Since `U*` will have the
same address as the `T*` first subobject, you need to be able to
differentiate between these cases.

Knowing that a `T*` happens to point into an array isn't enough; it must
point into an array of `T`. It must be a direct subobject of an array of
`T`.

So given a `T*`, we have to be able to ask (at least) if it is a subobject
and, if it is, if that containing object is an array of `T`.

And if you have the length of the array, and the array element type, then *by
definition* you know the "complete array type".

The only way to determine the well-behaved status of this is to have data
structures in memory that can take a typed pointer and get the containing
object. And if you can do this once, you can do this repeatedly until there
is no containing object.

The only difference between the "pointer arithmetic on arrays" and "pointer
arithmetic on top-level sequences" is that the former can have "containing
object" be based only on static subobject definitions, while the latter
requires "containing object" to handle dynamic "nested-within" objects. But
really, if you have a memory system that can compute the former, there's no
reason you can't extend it to compute the latter.

On the contrary, the latter is not possible if you don't know the layout of
Post by FrankHB1989
members of the complete type it resides in, and the way to determine all
the information is to calculate the layout depending on the complete type
definitions and the memory model, or, to simulate the actual allocations.
I don't see it a desired abstraction even the wording can be patched in
Post by Nicol Bolas
Post by FrankHB1989
limited cases.
So, being able to implement `std::vector` in a platform-neutral fashion
is not a desired feature of the language?
No. To be accurate, I agree that allowing implementation of `std::vector`
being portable (in sense of conforming C++ programs) is a *convenience*.
1. *Technically*, it is a convenience, not a must, even lacking of the
convenience easily cause bad things we can see, *politically*.
... huh?

2. It has pros and cons. Not all cases need such convenience and it can
Post by FrankHB1989
harm users not relying on it in reality. For instance, insufficient
vendor-based optimization. Users can hardly rescue the case unless they
turn themselves to be vendors.
3. As I have said, it should not be only applicable on `std::vector`. It
is strange to allow `std::vector` as the only special case. If the ability
is important to keep, why not only `std::vector`?
I don't know what you're talking about. The feature we're discussing here
would not be limited to `vector`. It's simply the most obvious use case for
it and the biggest justification for having it.

4.To avoid it requiring magic directly to be conforming is the correct
Post by FrankHB1989
direction. But it does not imply modification of core rules is always the
desired choice just for that. (Personally I prefer
magic-wrapped-in-another-std-library approach.)
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-17 17:25:13 UTC
Permalink
圚 2017幎9月17日星期日 UTC+8䞋午9:22:29Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月17日星期日 UTC+8䞊午8:35:24Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月16日星期六 UTC+8䞊午5:15:31Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月15日星期五 UTC+8䞋午11:12:17Nicol Bolas写道
Post by FrankHB1989
And I don't see how it damages the object model. Each object's
lifetime is still clear, and we already have the concept of "nested within"
to allow objects to be dynamically created within other objects' storage.
All we're doing is allowing you to effectively dynamically create arrays of
objects.
This effectively changes the notion of validity of pointer values.
Consider comparing of two iterator values from different vector objects
with same type.
This is already explicitly undefined behavior. Iterators from
different containers cannot be compared to one another. However, that's
Post by FrankHB1989
You can't assume the result being meaningful because they can be from
unrelated sequences.
two non-subobjects of the same dynamic type, constructed adjacently *in
the same storage*, can have pointer arithmetic used on them as though
they were in an array
So unless those two containers got their allocations from the same
storage, that can't happen.
In this case the sequences assumed are array objects. The change
Post by FrankHB1989
claims arbitrary two "adjacent" objects are in the same sequence, which
sounds like it requires the language to provide the definition of infinite
number of implicit array lvalues with unknown bound (and they can be
aliased randomly) on-the-fly. This implies a very strange picture on the
object model, especially when [intro.object] is left unchanged.
It implies only the following.
A top-level object (one which is not explicitly a subobject) lives in
a piece of storage. If there is a top-level object of the same type
directly adjacent to it in that same storage, then it is legal to access
that top-level object via pointer arithmetic from this object.
Or to put it another way, you're not assuming anything is an array
object. Pointer arithmetic now works for things that *aren't* arrays.
It's now explicitly for accessing adjacent objects, either sibling
top-level objects or sibling array elements.
I think I see your point now. That's why I illustrated the limitations
on iterators: just *as *an iterator, conceptually, a valid nonnull
object pointer is always implicitly bound to a sequence, i.e. an array
object. This is the base to build the semantics of pointer arithmetic
operations, or, *random access iteration operations*. The proposed
change turns it to rebuild the semantics on the layout properties *occasionally
*provided by underlying memory model instead of the object model. This
undermines the ability of reasoning on well-behaved pointer arithmetic
operations severely, because it then needs more information on siblings
which often can't be collected from a single context.
Any function which takes a pointer as a parameter lacks knowledge that
the pointer is "bound to a sequence, i.e. an array object". So it is
already the case that reasoning is about pointer arithmetic is impaired.
This is caused by the design of type system.
The cause is irrelevant; the fact is that it's the way the system works.
You can pass pointers to functions, and those functions can do pointer
arithmetic on them if those pointers happen to point to the right thing.
Not necessary. This is how C-style pointers work with typechecking. A type
like ptr<T> can also work like that. But C++ does not have limitation to
force every parametric type working only in this exact style.
Post by Nicol Bolas
Working around semantic limitations is not a fix of it.
Post by FrankHB1989
In general, the C++ type system is not power enough to encode such
information in type signatures. Some form of gradual typing is needed.
Though there would more problems if you do fix it, e.g. ABI.
However, that does not implies that the knowledge cannot be reasoned
besides typechecking.
It doesn't "imply" anything; it outright states it. A `T*` is a pointer;
that's a type. There is no type-based way to distinguish a pointer to an
array element from a pointer to something that isn't an array element.
Therefore, typechecking *alone* cannot be used to determine if pointer
arithmetic is valid.
That is the way C++ works today. Given that fact, there is no reason why
we can't extend the set of conditions in which pointer arithmetic works.
There is also no sufficient reason about why it *has to* be extended in
this way.

Under the current wording, in order to dynamically know whether pointer
Post by Nicol Bolas
Post by FrankHB1989
Post by Nicol Bolas
arithmetic is legitimate, you have to be able to look at the pointer and
know that it points into an array object of `T`s. Under the new wording, in
order to dynamically know whether pointer arithmetic is legitimate, you
have to be able to look at the pointer, and then look in the direction of
the arithmetic to see if there are more `T`s, and that they all share the
same storage and are not themselves subobjects.
I submit that if you can reason about the former, then you have all of
the information needed to reason about the latter. In both cases, you have
to be able to walk through memory and determine what is actually in that
region of storage pointed to by a pointer. You have to be able to turn an
address into the nested sequence of objects that are pointed to by that
address.
Really? The former is not depending on the memory model directly. It
needs only to know the length of array and one of the pointer values to
*n*th element of the array, not the complete element type, nor the
complete array type. Size of element is not interested here, nor is the
layout of elements in the array.
You forget what we're doing here.
We're trying to reason about the object model implications about a
particular piece of code. Specifically, we have a function that takes a
pointer. That function is performing pointer arithmetic on that pointer.
We're trying to see what it would take for the object model to be able to
verify, at runtime, if that pointer arithmetic would result in well-defined
behavior. What must the object model be able to determine from that pointer.
Well, in order to get to "runtime", we must first get past *compile-time*.
And it is ill-formed for code to perform pointer arithmetic on `T*` if `T`
is incomplete. Therefore, we must already know the "size of element".
To reason a program does not mean to figure out all the information that
can be determined by the program semantics. The fact that an conforming
implementation must have known the exact size of the type in the
well-formed program does not mean that the one who are reasoning the
program should. If the array in the program compiled, the element type does
have a positive size, but the exact value is not necessarily interested in
reasoning. For example, it is just a common divisor in boundary checking
based on address calculation; so why not use the operand in pointer
arithmetic directly instead of addresses?

Furthermore, you do need to know the "layout of elements in the array",
Post by Nicol Bolas
because you need to make sure that the type `T` that points to is the *same
type* as the array element type. After all, the `T*` could be pointing to
the first subobject of the array element type `U`. Since `U*` will have the
same address as the `T*` first subobject, you need to be able to
differentiate between these cases.
Knowing that a `T*` happens to point into an array isn't enough; it must
point into an array of `T`. It must be a direct subobject of an array of
`T`.
So given a `T*`, we have to be able to ask (at least) if it is a subobject
and, if it is, if that containing object is an array of `T`.
Verifying correctness with type-based aliasing is a different scene which
requires other rules. And if the propose is to figure out all the possible
undefined behavior upon the rules derived from the object model (at least,
as per [object.life]), the information provided here is still far from
enough.

And if you have the length of the array, and the array element type, then *by
Post by Nicol Bolas
definition* you know the "complete array type".
The only way to determine the well-behaved status of this is to have data
structures in memory that can take a typed pointer and get the containing
object. And if you can do this once, you can do this repeatedly until there
is no containing object.
The only difference between the "pointer arithmetic on arrays" and
"pointer arithmetic on top-level sequences" is that the former can have
"containing object" be based only on static subobject definitions, while
the latter requires "containing object" to handle dynamic "nested-within"
objects. But really, if you have a memory system that can compute the
former, there's no reason you can't extend it to compute the latter.
Simplicity. Locality.
Probably more importantly, computation complexity matters a lot, when it is
checked using my brain during coding.

On the contrary, the latter is not possible if you don't know the layout of
Post by Nicol Bolas
Post by FrankHB1989
members of the complete type it resides in, and the way to determine all
the information is to calculate the layout depending on the complete type
definitions and the memory model, or, to simulate the actual allocations.
I don't see it a desired abstraction even the wording can be patched in
Post by Nicol Bolas
Post by FrankHB1989
limited cases.
So, being able to implement `std::vector` in a platform-neutral fashion
is not a desired feature of the language?
No. To be accurate, I agree that allowing implementation of `std::vector`
being portable (in sense of conforming C++ programs) is a *convenience*.
1. *Technically*, it is a convenience, not a must, even lacking of the
convenience easily cause bad things we can see, *politically*.
... huh?
I did not say solution of the CWG issue itself is not needed. But the
proposed rules change is not necessarily needed.

See the proposed alternatives by others.

2. It has pros and cons. Not all cases need such convenience and it can
Post by Nicol Bolas
Post by FrankHB1989
harm users not relying on it in reality. For instance, insufficient
vendor-based optimization. Users can hardly rescue the case unless they
turn themselves to be vendors.
3. As I have said, it should not be only applicable on `std::vector`. It
is strange to allow `std::vector` as the only special case. If the ability
is important to keep, why not only `std::vector`?
I don't know what you're talking about. The feature we're discussing here
would not be limited to `vector`. It's simply the most obvious use case for
it and the biggest justification for having it.
This is like X-Y problem.
If it is really desired to allow `std::vector` in platform-neutral fashion,
why not just *guarantee* "magic-free" property for implementation standard
library components in general? Even `std::vector` is a notable example
frustrating users, it is debatable to be special enough as the special case
to get away from the magic.

And if we have generally solved this problem by other means, is the
proposed change still needed?

4.To avoid it requiring magic directly to be conforming is the correct
Post by Nicol Bolas
Post by FrankHB1989
direction. But it does not imply modification of core rules is always the
desired choice just for that. (Personally I prefer
magic-wrapped-in-another-std-library approach.)
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-17 22:52:02 UTC
Permalink
Post by FrankHB1989
圚 2017幎9月17日星期日 UTC+8䞋午9:22:29Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月17日星期日 UTC+8䞊午8:35:24Nicol Bolas写道
Working around semantic limitations is not a fix of it.
In general, the C++ type system is not power enough to encode such
information in type signatures. Some form of gradual typing is needed.
Though there would more problems if you do fix it, e.g. ABI.
However, that does not implies that the knowledge cannot be reasoned
besides typechecking.
It doesn't "imply" anything; it outright states it. A `T*` is a pointer;
that's a type. There is no type-based way to distinguish a pointer to an
array element from a pointer to something that isn't an array element.
Therefore, typechecking *alone* cannot be used to determine if pointer
arithmetic is valid.
That is the way C++ works today. Given that fact, there is no reason why
we can't extend the set of conditions in which pointer arithmetic works.
There is also no sufficient reason about why it *has to* be extended in
this way.
The reasons have already been explained. Whether you find them "sufficient"
is a personal choice.

To me, so long as the object model remains reasonable and coherent, fixing
the problem is worth it.
Post by FrankHB1989
Under the current wording, in order to dynamically know whether pointer
Post by Nicol Bolas
Post by FrankHB1989
Post by Nicol Bolas
arithmetic is legitimate, you have to be able to look at the pointer and
know that it points into an array object of `T`s. Under the new wording, in
order to dynamically know whether pointer arithmetic is legitimate, you
have to be able to look at the pointer, and then look in the direction of
the arithmetic to see if there are more `T`s, and that they all share the
same storage and are not themselves subobjects.
I submit that if you can reason about the former, then you have all of
the information needed to reason about the latter. In both cases, you have
to be able to walk through memory and determine what is actually in that
region of storage pointed to by a pointer. You have to be able to turn an
address into the nested sequence of objects that are pointed to by that
address.
Really? The former is not depending on the memory model directly. It
needs only to know the length of array and one of the pointer values to
*n*th element of the array, not the complete element type, nor the
complete array type. Size of element is not interested here, nor is the
layout of elements in the array.
You forget what we're doing here.
We're trying to reason about the object model implications about a
particular piece of code. Specifically, we have a function that takes a
pointer. That function is performing pointer arithmetic on that pointer.
We're trying to see what it would take for the object model to be able to
verify, at runtime, if that pointer arithmetic would result in well-defined
behavior. What must the object model be able to determine from that pointer.
Well, in order to get to "runtime", we must first get past *compile-time*.
And it is ill-formed for code to perform pointer arithmetic on `T*` if `T`
is incomplete. Therefore, we must already know the "size of element".
To reason a program does not mean to figure out all the information that
can be determined by the program semantics. The fact that an conforming
implementation must have known the exact size of the type in the
well-formed program does not mean that the one who are reasoning the
program should.
I disagree. You cannot reason about an ill-formed program, since *by
definition*, an ill-formed program is semantic nonsense. If it's
ill-formed, it is not a C++ program. And I submit that you cannot use C++
logic to reason about things that aren't C++ programs.
Post by FrankHB1989
If the array in the program compiled, the element type does have a
positive size, but the exact value is not necessarily interested in
reasoning. For example, it is just a common divisor in boundary checking
based on address calculation; so why not use the operand in pointer
arithmetic directly instead of addresses?
Furthermore, you do need to know the "layout of elements in the array",
Post by Nicol Bolas
because you need to make sure that the type `T` that points to is the *same
type* as the array element type. After all, the `T*` could be pointing
to the first subobject of the array element type `U`. Since `U*` will have
the same address as the `T*` first subobject, you need to be able to
differentiate between these cases.
Knowing that a `T*` happens to point into an array isn't enough; it must
point into an array of `T`. It must be a direct subobject of an array of
`T`.
So given a `T*`, we have to be able to ask (at least) if it is a
subobject and, if it is, if that containing object is an array of `T`.
Verifying correctness with type-based aliasing is a different scene which
requires other rules. And if the propose is to figure out all the possible
undefined behavior upon the rules derived from the object model (at least,
as per [object.life]), the information provided here is still far from
enough.
My point is that the proposed solution does not render the object model to
be incoherent or nonsensical. It is just as reasonable as the old version,
requiring the same reasoning tools that the old version required.

It simply has different answers for different situations. In the original
version, you say that pointer arithmetic moves through an array. In the new
version, you say that pointer arithmetic is moving between sequential
top-level objects (which arrays are a subset of). If you find that
difficult to reason about... I can't really help that.
Post by FrankHB1989
And if you have the length of the array, and the array element type, then *by
Post by Nicol Bolas
definition* you know the "complete array type".
The only way to determine the well-behaved status of this is to have data
structures in memory that can take a typed pointer and get the containing
object. And if you can do this once, you can do this repeatedly until there
is no containing object.
The only difference between the "pointer arithmetic on arrays" and
"pointer arithmetic on top-level sequences" is that the former can have
"containing object" be based only on static subobject definitions, while
the latter requires "containing object" to handle dynamic "nested-within"
objects. But really, if you have a memory system that can compute the
former, there's no reason you can't extend it to compute the latter.
Simplicity.
Necessary functionality trumps simplicity.
Post by FrankHB1989
Locality.
Locality of what? I already demonstrated that the current system has no
greater "locality" than this one.
Post by FrankHB1989
Probably more importantly, computation complexity matters a lot, when it
is checked using my brain during coding.
On the contrary, the latter is not possible if you don't know the layout
Post by Nicol Bolas
Post by FrankHB1989
of members of the complete type it resides in, and the way to determine all
the information is to calculate the layout depending on the complete type
definitions and the memory model, or, to simulate the actual allocations.
I don't see it a desired abstraction even the wording can be patched in
Post by Nicol Bolas
Post by FrankHB1989
limited cases.
So, being able to implement `std::vector` in a platform-neutral
fashion is not a desired feature of the language?
No. To be accurate, I agree that allowing implementation of
`std::vector` being portable (in sense of conforming C++ programs) is a
*convenience*.
1. *Technically*, it is a convenience, not a must, even lacking of the
convenience easily cause bad things we can see, *politically*.
... huh?
I did not say solution of the CWG issue itself is not needed. But the
proposed rules change is not necessarily needed.
See the proposed alternatives by others.
The only alternative on this thread which isn't merely an alternate
statement or limited version of what I described is "use magic function to
declare that a region of sequential top-level objects contains an array".
Which is an expert-only tool, since non-experts would never even guess that
such a thing would be needed.

Dynamic creation of arrays should not be an expert-only thing.
Post by FrankHB1989
2. It has pros and cons. Not all cases need such convenience and it can
Post by Nicol Bolas
Post by FrankHB1989
harm users not relying on it in reality. For instance, insufficient
vendor-based optimization. Users can hardly rescue the case unless they
turn themselves to be vendors.
3. As I have said, it should not be only applicable on `std::vector`. It
is strange to allow `std::vector` as the only special case. If the ability
is important to keep, why not only `std::vector`?
I don't know what you're talking about. The feature we're discussing here
would not be limited to `vector`. It's simply the most obvious use case for
it and the biggest justification for having it.
This is like X-Y problem.
If it is really desired to allow `std::vector` in platform-neutral
fashion, why not just *guarantee* "magic-free" property for
implementation standard library components in general? Even `std::vector`
is a notable example frustrating users, it is debatable to be special
enough as the special case to get away from the magic.
What part of "The feature we're discussing here *would not be limited to
`vector`*" eluded you? `vector` already works. The point is not to make
`vector` work. It's to allow users to write their own dynamic arrays, which
includes the possibility of writing `vector`, but also any other kind of
dynamic array type. And to do so without deep knowledge of esoteric C++
functions.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-18 04:16:11 UTC
Permalink
圚 2017幎9月18日星期䞀 UTC+8䞊午6:52:03Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月17日星期日 UTC+8䞋午9:22:29Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月17日星期日 UTC+8䞊午8:35:24Nicol Bolas写道
Working around semantic limitations is not a fix of it.
In general, the C++ type system is not power enough to encode such
information in type signatures. Some form of gradual typing is needed.
Though there would more problems if you do fix it, e.g. ABI.
However, that does not implies that the knowledge cannot be reasoned
besides typechecking.
It doesn't "imply" anything; it outright states it. A `T*` is a pointer;
that's a type. There is no type-based way to distinguish a pointer to an
array element from a pointer to something that isn't an array element.
Therefore, typechecking *alone* cannot be used to determine if pointer
arithmetic is valid.
That is the way C++ works today. Given that fact, there is no reason why
we can't extend the set of conditions in which pointer arithmetic works.
There is also no sufficient reason about why it *has to* be extended in
this way.
The reasons have already been explained. Whether you find them
"sufficient" is a personal choice.
To me, so long as the object model remains reasonable and coherent, fixing
the problem is worth it.
To me, so long as the object model remains reasonable and coherent,
avoiding being replaced by memory model is worth it.
Post by Nicol Bolas
Post by FrankHB1989
Under the current wording, in order to dynamically know whether pointer
Post by Nicol Bolas
Post by FrankHB1989
Post by Nicol Bolas
arithmetic is legitimate, you have to be able to look at the pointer and
know that it points into an array object of `T`s. Under the new wording, in
order to dynamically know whether pointer arithmetic is legitimate, you
have to be able to look at the pointer, and then look in the direction of
the arithmetic to see if there are more `T`s, and that they all share the
same storage and are not themselves subobjects.
I submit that if you can reason about the former, then you have all of
the information needed to reason about the latter. In both cases, you have
to be able to walk through memory and determine what is actually in that
region of storage pointed to by a pointer. You have to be able to turn an
address into the nested sequence of objects that are pointed to by that
address.
Really? The former is not depending on the memory model directly. It
needs only to know the length of array and one of the pointer values to
*n*th element of the array, not the complete element type, nor the
complete array type. Size of element is not interested here, nor is the
layout of elements in the array.
You forget what we're doing here.
We're trying to reason about the object model implications about a
particular piece of code. Specifically, we have a function that takes a
pointer. That function is performing pointer arithmetic on that pointer.
We're trying to see what it would take for the object model to be able to
verify, at runtime, if that pointer arithmetic would result in well-defined
behavior. What must the object model be able to determine from that pointer.
Well, in order to get to "runtime", we must first get past
*compile-time*. And it is ill-formed for code to perform pointer
arithmetic on `T*` if `T` is incomplete. Therefore, we must already know
the "size of element".
To reason a program does not mean to figure out all the information that
can be determined by the program semantics. The fact that an conforming
implementation must have known the exact size of the type in the
well-formed program does not mean that the one who are reasoning the
program should.
I disagree. You cannot reason about an ill-formed program, since *by
definition*, an ill-formed program is semantic nonsense. If it's
ill-formed, it is not a C++ program. And I submit that you cannot use C++
logic to reason about things that aren't C++ programs.
To let it make sense, the program itself being reasoned is assumed
well-formed. The fact is not necessarily rechecked in reasoning; generally,
it has to be separately verified if not provided as a premise.

If the array in the program compiled, the element type does have a positive
Post by Nicol Bolas
Post by FrankHB1989
size, but the exact value is not necessarily interested in reasoning. For
example, it is just a common divisor in boundary checking based on address
calculation; so why not use the operand in pointer arithmetic directly
instead of addresses?
Furthermore, you do need to know the "layout of elements in the array",
Post by Nicol Bolas
because you need to make sure that the type `T` that points to is the *same
type* as the array element type. After all, the `T*` could be pointing
to the first subobject of the array element type `U`. Since `U*` will have
the same address as the `T*` first subobject, you need to be able to
differentiate between these cases.
Knowing that a `T*` happens to point into an array isn't enough; it must
point into an array of `T`. It must be a direct subobject of an array of
`T`.
So given a `T*`, we have to be able to ask (at least) if it is a
subobject and, if it is, if that containing object is an array of `T`.
Verifying correctness with type-based aliasing is a different scene
which requires other rules. And if the propose is to figure out all the
possible undefined behavior upon the rules derived from the object model
(at least, as per [object.life]), the information provided here is still
far from enough.
My point is that the proposed solution does not render the object model to
be incoherent or nonsensical. It is just as reasonable as the old version,
requiring the same reasoning tools that the old version required.
The C++ object is largely decoupled with the memory model. It is
apparently deliberately designed to be address-space-agnostic, as well as
being isolated with assumptions of object layout between arbitrary
objects.To enforce stronger assumptions on the object model with rules
requiring properties more fit in underlying model seems not like an effort
to make it coherent and sensible.

It simply has different answers for different situations. In the original
Post by Nicol Bolas
version, you say that pointer arithmetic moves through an array. In the new
version, you say that pointer arithmetic is moving between sequential
top-level objects (which arrays are a subset of). If you find that
difficult to reason about... I can't really help that.
The difference is what I meant for "locality" - the property of allowing
determining specific properties in the context where pointer value exists
without non "local" knowledge. Specifically, this includes the pointer
value and the identity of the sequence (a reference and the length of the
array object) it belongs. Originally, the pointer is bound with the
sequence directly, so one can just assume there is always an array that
directly contains it. As of pointer arithmetics, the resulted valid nonnull
pointer value are always in the transitive closure of a quite limited set
of finite well-defined operations which can be resolved on that array. On
the contrary, the new version make it impossible to reason without
simulation of object layout, which needs additional information not "local"
at all.

Practically, there is one more difference. The length of the sequence are
often passed to the context where pointer value exists, which reflects a
quite common API style. In such case, the limitation on sequence can be
strengthened by a separated length provided by user, then only a base
pointer of the range is need to form the "local" knowledge. (Note the
sequence can also be a subrange in the array.) This is actually the check
logic on several operations of an random access iterator and it can be
easily reused here. Whether the range is based on pointer value is
implementation details. The new version can not work like this because
there is no general portable way to iterate over sequence consisted of
offsets of arbitrary subobjects in the given top-level object.
Post by Nicol Bolas
Post by FrankHB1989
And if you have the length of the array, and the array element type, then *by
Post by Nicol Bolas
definition* you know the "complete array type".
The only way to determine the well-behaved status of this is to have
data structures in memory that can take a typed pointer and get the
containing object. And if you can do this once, you can do this repeatedly
until there is no containing object.
The only difference between the "pointer arithmetic on arrays" and
"pointer arithmetic on top-level sequences" is that the former can have
"containing object" be based only on static subobject definitions, while
the latter requires "containing object" to handle dynamic "nested-within"
objects. But really, if you have a memory system that can compute the
former, there's no reason you can't extend it to compute the latter.
Simplicity.
Necessary functionality trumps simplicity.
Well... less is more :-)
To treat workaround as necessary functionality is more or less... a
personal choice.
Post by Nicol Bolas
Post by FrankHB1989
Locality.
Locality of what? I already demonstrated that the current system has no
greater "locality" than this one.
See above.

Probably more importantly, computation complexity matters a lot, when it is
Post by Nicol Bolas
Post by FrankHB1989
checked using my brain during coding.
On the contrary, the latter is not possible if you don't know the layout
Post by Nicol Bolas
Post by FrankHB1989
of members of the complete type it resides in, and the way to determine all
the information is to calculate the layout depending on the complete type
definitions and the memory model, or, to simulate the actual allocations.
I don't see it a desired abstraction even the wording can be patched in
Post by Nicol Bolas
Post by FrankHB1989
limited cases.
So, being able to implement `std::vector` in a platform-neutral
fashion is not a desired feature of the language?
No. To be accurate, I agree that allowing implementation of
`std::vector` being portable (in sense of conforming C++ programs) is a
*convenience*.
1. *Technically*, it is a convenience, not a must, even lacking of the
convenience easily cause bad things we can see, *politically*.
... huh?
I did not say solution of the CWG issue itself is not needed. But the
proposed rules change is not necessarily needed.
See the proposed alternatives by others.
The only alternative on this thread which isn't merely an alternate
statement or limited version of what I described is "use magic function to
declare that a region of sequential top-level objects contains an array".
Which is an expert-only tool, since non-experts would never even guess that
such a thing would be needed.
The magic should be rarely used. It should not disturb you to write daily
code.

Note nowadays writing a properly optimized vector has already been exact a
kind of expert-level work. (Exception guarantees? Allocators? Expansion
policy?)

Even there are no magic functions, the proposed rules are still like magic
for non-experts, just implicit and more likely to be ignored by average
users.

Moreover, non-experts are likely to ignore UB that related to object model
totally. (How many users have learned strict aliasing?) They take the
rights to breaking type and/or memory safety for granted, without
attention.on extra responsibility needed. I doubt loosing the rules would
make the case worse.

Dynamic creation of arrays should not be an expert-only thing.
Post by Nicol Bolas
Post by FrankHB1989
2. It has pros and cons. Not all cases need such convenience and it can
Post by Nicol Bolas
Post by FrankHB1989
harm users not relying on it in reality. For instance, insufficient
vendor-based optimization. Users can hardly rescue the case unless they
turn themselves to be vendors.
3. As I have said, it should not be only applicable on `std::vector`.
It is strange to allow `std::vector` as the only special case. If the
ability is important to keep, why not only `std::vector`?
I don't know what you're talking about. The feature we're discussing
here would not be limited to `vector`. It's simply the most obvious use
case for it and the biggest justification for having it.
This is like X-Y problem.
If it is really desired to allow `std::vector` in platform-neutral
fashion, why not just *guarantee* "magic-free" property for
implementation standard library components in general? Even `std::vector`
is a notable example frustrating users, it is debatable to be special
enough as the special case to get away from the magic.
What part of "The feature we're discussing here *would not be limited to
`vector`*" eluded you? `vector` already works. The point is not to make
`vector` work. It's to allow users to write their own dynamic arrays, which
includes the possibility of writing `vector`, but also any other kind of
dynamic array type.
I was certainly talking about implementing a `std::vector` like interface
by users. My point is not only for `std::vector`, other `std` components
can have opportunities sharing the same kind of guarantees. This is a
broader problem, though.

And to do so without deep knowledge of esoteric C++ functions.
It depends. I don't think it needs to be esoteric than the rules you
proposed.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2017-09-09 00:27:09 UTC
Permalink
Post by Nicol Bolas
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
[expr.add] *Status: *drafting *Submitter: *Jonathan Wakely
*Date: *2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there
should be some allowance made for allowing pointer arithmetic using a
pointer to a base class if the derived class is a standard-layout class
with no non-static data members. It is possible that std::launder
could play a part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of
std::vector may be required, perhaps using std::launder as part of
iterator processing.
It seems incredible that the direction of the Standard would be toward
making pointer arithmetic undefined for objects inside an std::vector
just because they have a const member or reference member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to
std:launder would be in the implementation of vector itself. However, this
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain
types of `C`. Namely those mentioned above: types containing references or
`const` objects.
This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic issue,
making &v[0] + 2 illegal pointer arithmetic in the first place,
because std::vector (most likely) constructed the objects
separately. Second, making this undefined just because of a const or
reference nonstatic member would break an unbelievable amount of existing
C++ code if this arithmetic were to suddenly require a call to
std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent objects
whose storage is provided by the same array.
What does "provided by the same array" mean, exactly?
See the definition of "provides storage" here: http://eel.is/c++draft/i
ntro.object#3
Right, but the allocator functions don't create objects "of type “array of
N unsigned char” or of type “array of N std​::​byte”". Nor does
`std::aligned_storage/union_t`. So neither can "provide storage" for an
object under those rules. You can certainly create objects in that storage.
But that won't be the same as "provide storage".
Part of the relevant changes would be specifying that ::operator new and
malloc do in fact create such an array object; this has the nice
side-effect of guaranteeing that pointer arithmetic on dynamically
allocated storage actually works.

Unless you're saying that `vector` has to allocate memory, then do `new()
Post by Nicol Bolas
char[]` on the allocation, and only then perform construction on any types
in the memory. Or unless you're saying that every allocation of memory, *every
object*, is also an array of bytes in addition to being whatever it
currently is.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-09 01:05:01 UTC
Permalink
Post by Richard Smith
Post by Nicol Bolas
Post by Richard Smith
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
[expr.add] *Status: *drafting *Submitter: *Jonathan Wakely
*Date: *2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there
should be some allowance made for allowing pointer arithmetic using a
pointer to a base class if the derived class is a standard-layout class
with no non-static data members. It is possible that std::launder
could play a part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of
std::vector may be required, perhaps using std::launder as part of
iterator processing.
It seems incredible that the direction of the Standard would be
toward making pointer arithmetic undefined for objects inside an
std::vector just because they have a const member or reference
member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to
std:launder would be in the implementation of vector itself. However, this
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain
types of `C`. Namely those mentioned above: types containing references or
`const` objects.
This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic
issue, making &v[0] + 2 illegal pointer arithmetic in the first
place, because std::vector (most likely) constructed the objects
separately. Second, making this undefined just because of a const or
reference nonstatic member would break an unbelievable amount of existing
C++ code if this arithmetic were to suddenly require a call to
std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent
objects whose storage is provided by the same array.
What does "provided by the same array" mean, exactly?
http://eel.is/c++draft/intro.object#3
Right, but the allocator functions don't create objects "of type “array
of N unsigned char” or of type “array of N std​::​byte”". Nor does
`std::aligned_storage/union_t`. So neither can "provide storage" for an
object under those rules. You can certainly create objects in that storage.
But that won't be the same as "provide storage".
Part of the relevant changes would be specifying that ::operator new and
malloc do in fact create such an array object; this has the nice
side-effect of guaranteeing that pointer arithmetic on dynamically
allocated storage actually works.
But that also means that pointer arithmetic on non-dynamically allocated
storage does not work. Which means `vector` only works if it uses memory
allocated by `::operator new` or `malloc`. If you have some static storage,
I guess it had better already be a byte array.

And what of `std::aligned_storage/union_t`? Are those now required to be
byte arrays?
Post by Richard Smith
Unless you're saying that `vector` has to allocate memory, then do `new()
Post by Nicol Bolas
char[]` on the allocation, and only then perform construction on any types
in the memory. Or unless you're saying that every allocation of memory, *every
object*, is also an array of bytes in addition to being whatever it
currently is.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2017-09-09 01:39:10 UTC
Permalink
Post by Nicol Bolas
Post by Richard Smith
Post by Nicol Bolas
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
[expr.add] *Status: *drafting *Submitter: *Jonathan Wakely
*Date: *2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there
should be some allowance made for allowing pointer arithmetic using a
pointer to a base class if the derived class is a standard-layout class
with no non-static data members. It is possible that std::launder
could play a part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of
std::vector may be required, perhaps using std::launder as part of
iterator processing.
It seems incredible that the direction of the Standard would be
toward making pointer arithmetic undefined for objects inside an
std::vector just because they have a const member or reference
member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to
std:launder would be in the implementation of vector itself. However, this
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain
types of `C`. Namely those mentioned above: types containing references or
`const` objects.
This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic
issue, making &v[0] + 2 illegal pointer arithmetic in the first
place, because std::vector (most likely) constructed the objects
separately. Second, making this undefined just because of a const or
reference nonstatic member would break an unbelievable amount of existing
C++ code if this arithmetic were to suddenly require a call to
std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent
objects whose storage is provided by the same array.
What does "provided by the same array" mean, exactly?
See the definition of "provides storage" here: http://eel.is/c++draft/i
ntro.object#3
Right, but the allocator functions don't create objects "of type “array
of N unsigned char” or of type “array of N std​::​byte”". Nor does
`std::aligned_storage/union_t`. So neither can "provide storage" for an
object under those rules. You can certainly create objects in that storage.
But that won't be the same as "provide storage".
Part of the relevant changes would be specifying that ::operator new and
malloc do in fact create such an array object; this has the nice
side-effect of guaranteeing that pointer arithmetic on dynamically
allocated storage actually works.
But that also means that pointer arithmetic on non-dynamically allocated
storage does not work. Which means `vector` only works if it uses memory
allocated by `::operator new` or `malloc`. If you have some static storage,
I guess it had better already be a byte array.
And what of `std::aligned_storage/union_t`? Are those now required to be
byte arrays?
They should be moved to Annex D.
Post by Nicol Bolas
Unless you're saying that `vector` has to allocate memory, then do `new()
Post by Richard Smith
Post by Nicol Bolas
char[]` on the allocation, and only then perform construction on any types
in the memory. Or unless you're saying that every allocation of memory, *every
object*, is also an array of bytes in addition to being whatever it
currently is.
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-09 03:56:32 UTC
Permalink
Post by Richard Smith
Post by Nicol Bolas
Post by Richard Smith
Part of the relevant changes would be specifying that ::operator new and
malloc do in fact create such an array object; this has the nice
side-effect of guaranteeing that pointer arithmetic on dynamically
allocated storage actually works.
But that also means that pointer arithmetic on non-dynamically allocated
storage does not work. Which means `vector` only works if it uses memory
allocated by `::operator new` or `malloc`. If you have some static storage,
I guess it had better already be a byte array.
And what of `std::aligned_storage/union_t`? Are those now required to be
byte arrays?
They should be moved to Annex D.
To be replaced with... what, exactly? Oh sure, you can probably easily
replace `aligned_storage` with a `sizeof/alignas` array declaration. But
`aligned_union` is not something so easily or concisely replicated.

Wouldn't it be easier to just declare them to *be* byte arrays of the
appropriate size/alignment? Right now, they're stated to be POD types. And
an array of bytes is a POD type, yes? And since we never specified exactly
what type it was, nobody was allowed to write conforming, portable code
that relied on specific properties of the type, other than being POD.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-08 21:35:41 UTC
Permalink
Post by Nicol Bolas
Post by Richard Smith
Post by Myriachan
2182. Pointer arithmetic in array-like containers *Section: *5.7
*2015-10-20
The current direction for issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>
(see paper P0137) calls into question the validity of doing pointer
arithmetic to address separately-allocated but contiguous objects in a
container like std::vector. A related question is whether there should
be some allowance made for allowing pointer arithmetic using a pointer to a
base class if the derived class is a standard-layout class with no
non-static data members. It is possible that std::launder could play a
part in the resolution of this issue.
*Notes from the February, 2016 meeting:*
This issue is expected to be resolved by the resolution of issue 1776
<http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1776>.
The major problem is when the elements of the vector contain constant or
reference members; 3.8 [basic.life] paragraph 7 implies that pointer
arithmetic leading to such an object produces undefined behavior, and CWG
expects this to continue. Some changes to the interface of std::vector
may be required, perhaps using std::launder as part of iterator
processing.
It seems incredible that the direction of the Standard would be toward
making pointer arithmetic undefined for objects inside an std::vector
just because they have a const member or reference member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
No, I doubt that's the intent; rather, the requisite calls to std:launder
would be in the implementation of vector itself. However, this case would
std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away
That being undefined, to me, is perfectly valid, but only for certain
types of `C`. Namely those mentioned above: types containing references or
`const` objects.
I wouldn't want to be the person who has to answer that question on Stack
Overflow...
Post by Nicol Bolas
This is completely ridiculous to me.
Post by Richard Smith
Post by Myriachan
First of all, it seems that P0137R1 didn't solve the arithmetic issue,
making &v[0] + 2 illegal pointer arithmetic in the first place, because
std::vector (most likely) constructed the objects separately. Second,
making this undefined just because of a const or reference nonstatic member
would break an unbelievable amount of existing C++ code if this arithmetic
were to suddenly require a call to std::launder.
For the first issue, it seems like we should formally define pointer
arithmetic as working across adjacent array objects, with individual
objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent objects
whose storage is provided by the same array.
What does "provided by the same array" mean, exactly? Right now, we
already have that.
The problem is that we don't allow pointer arithmetic to work across
adjacent objects of the same type whose storage is provided by the same
*allocation*.
The "same array" definition has its own issues.

For one thing, it still prevents optimization in a lot of cases. Take my S
example and instead write it as a function taking a reference:

struct S {
int a, b, c;
};
...
static_assert(offsetof(S, b) == offsetof(S, a) + sizeof(S::a));
static_assert(offsetof(S, c) == offsetof(S, b) + sizeof(S::b));

void Function(S &s)
{
(&s.a)[2] = 2;
assert(s.c == 2);
}

A compiler cannot assume that this is undefined behavior, because what if
we call Function like this?:

alignas(S) unsigned char storage[sizeof(S)];
S *ps = new(storage) S();
Function(*ps);

In this case, the compiler would not always be able to assume undefined
behavior, because the reference "s" could be entirely backed by one single
array, meeting the proposed conditions for pointer arithmetic.

In addition to this issue, I would say that we are better off considering
all objects to be additionally considered to be overlaid on top of an array
of chars/unsigned chars/std::bytes. Without this concession, offsetof() is
basically useless.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Continue reading on narkive:
Loading...