Discussion:
Is this union aliasing code well-defined?
(too old to reply)
Myriachan
2017-09-25 20:41:55 UTC
Permalink
This question that "supercat" posted on Stack Overflow ran into an
interesting problem:

https://stackoverflow.com/questions/46205744/is-this-use-of-unions-strictly-conforming/

A copy of the code involved is as follows:

struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
union s1s2 { struct s1 v1; struct s2 v2; };

static int read_s1x(struct s1 *p) { return p->x; }
static void write_s2x(struct s2 *p, int v) { p->x=v;}

int test(union s1s2 *p1, union s1s2 *p2, union s1s2 *p3)
{
if (read_s1x(&p1->v1))
{
unsigned short temp;
temp = p3->v1.x;
p3->v2.x = temp;
write_s2x(&p2->v2,1234);
temp = p3->v2.x;
p3->v1.x = temp;
}
return read_s1x(&p1->v1);
}
int test2(int x)
{
union s1s2 q[2];
q->v1.x = 4321;
return test(q,q+x,q+x);
}
#include <stdio.h>
int main(void)
{
printf("%d\n",test2(0));
}


Both GCC and Clang in -fstrict-aliasing mode with optimizations are acting
as if they ran into undefined behavior, and return 4321 instead of the
expected 1234. This happens in both C and C++ mode. Intel C++ and Visual
C++ return the expected 1234. All four compilers hardwire the result as a
constant parameter to printf rather than call test2 or modify memory at
runtime.

From my reading of the C++ Standard, particularly [class.union]/5,
assignment expressions through a union member access changes the active
member of the union (if the union member has a trivial default constructor,
which it does here, being C code). Taking the address of p2->v2 and p1->v1
ought to be legal because those are the active members of the union at the
time their pointers are taken.

Is this a well-defined program, or is there subtle undefined behavior
happening here?

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-09-25 22:09:52 UTC
Permalink
Post by Myriachan
Both GCC and Clang in -fstrict-aliasing mode with optimizations are acting
as if they ran into undefined behavior, and return 4321 instead of the
expected 1234. This happens in both C and C++ mode. Intel C++ and Visual
C++ return the expected 1234. All four compilers hardwire the result as a
constant parameter to printf rather than call test2 or modify memory at
runtime.
From my reading of the C++ Standard, particularly [class.union]/5,
assignment expressions through a union member access changes the active
member of the union (if the union member has a trivial default constructor,
which it does here, being C code). Taking the address of p2->v2 and p1->v1
ought to be legal because those are the active members of the union at the
time their pointers are taken.
Is this a well-defined program, or is there subtle undefined behavior
happening here?
Reading from an inactive member of the union is UB. However, reading from
members of the struct belonging to a common initial sequence is not. See
12.2 [class.mem]/23

"In a standard-layout union with an active member (12.3) of struct type T1, it
is permitted to read a non-static data member m of another union member of
struct type T2 provided m is part of the common initial sequence of T1 and T2;
the behavior is as if the corresponding member of T1 were nominated."

So my reading is that your code should have perfectly-defined behaviour:
- struct s1 and s2 have a common initial sequence that includes x
(note: it comprises all members, so they are layout-compatible)
- reading from s1s2::v1.x is like reading from s1s2::v2.x if v2 is active
- the parameters to the test() function are not marked "restrict", so they
can all alias one another (be equal)

The only thing I am not so sure of is the read_s1x and write_s2x functions:
since they take pointers to different types, is the compiler allowed to assume
that write_s2x() cannot modify an object of type s1?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Chris Hallock
2017-09-25 22:58:38 UTC
Permalink
Post by Myriachan
Post by Myriachan
Both GCC and Clang in -fstrict-aliasing mode with optimizations are
acting
Post by Myriachan
as if they ran into undefined behavior, and return 4321 instead of the
expected 1234. This happens in both C and C++ mode. Intel C++ and
Visual
Post by Myriachan
C++ return the expected 1234. All four compilers hardwire the result as
a
Post by Myriachan
constant parameter to printf rather than call test2 or modify memory at
runtime.
From my reading of the C++ Standard, particularly [class.union]/5,
assignment expressions through a union member access changes the active
member of the union (if the union member has a trivial default
constructor,
Post by Myriachan
which it does here, being C code). Taking the address of p2->v2 and
p1->v1
Post by Myriachan
ought to be legal because those are the active members of the union at
the
Post by Myriachan
time their pointers are taken.
Is this a well-defined program, or is there subtle undefined behavior
happening here?
Reading from an inactive member of the union is UB. However, reading from
members of the struct belonging to a common initial sequence is not. See
12.2 [class.mem]/23
The example program slyly avoids reading from inactive members.
Post by Myriachan
since they take pointers to different types, is the compiler allowed to assume
that write_s2x() cannot modify an object of type s1?
Not via the parameter, anyway (not even with reinterpret_cast and launder).
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Chris Hallock
2017-09-25 23:02:31 UTC
Permalink
Post by Thiago Macieira
The only thing I am not so sure of is the read_s1x and write_s2x
Post by Thiago Macieira
since they take pointers to different types, is the compiler allowed to assume
that write_s2x() cannot modify an object of type s1?
Not via the parameter, anyway (not even with reinterpret_cast and launder).
(And provided that the argument points to an actual s2 object.)
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-25 23:06:03 UTC
Permalink
Post by Myriachan
Post by Myriachan
Both GCC and Clang in -fstrict-aliasing mode with optimizations are
acting
Post by Myriachan
as if they ran into undefined behavior, and return 4321 instead of the
expected 1234. This happens in both C and C++ mode. Intel C++ and
Visual
Post by Myriachan
C++ return the expected 1234. All four compilers hardwire the result as
a
Post by Myriachan
constant parameter to printf rather than call test2 or modify memory at
runtime.
From my reading of the C++ Standard, particularly [class.union]/5,
assignment expressions through a union member access changes the active
member of the union (if the union member has a trivial default
constructor,
Post by Myriachan
which it does here, being C code). Taking the address of p2->v2 and
p1->v1
Post by Myriachan
ought to be legal because those are the active members of the union at
the
Post by Myriachan
time their pointers are taken.
Is this a well-defined program, or is there subtle undefined behavior
happening here?
Reading from an inactive member of the union is UB. However, reading from
members of the struct belonging to a common initial sequence is not. See
12.2 [class.mem]/23
"In a standard-layout union with an active member (12.3) of struct type T1, it
is permitted to read a non-static data member m of another union member of
struct type T2 provided m is part of the common initial sequence of T1 and T2;
the behavior is as if the corresponding member of T1 were nominated."
- struct s1 and s2 have a common initial sequence that includes x
(note: it comprises all members, so they are layout-compatible)
- reading from s1s2::v1.x is like reading from s1s2::v2.x if v2 is active
- the parameters to the test() function are not marked "restrict", so they
can all alias one another (be equal)
since they take pointers to different types, is the compiler allowed to assume
that write_s2x() cannot modify an object of type s1?
There is no case within the code that reads an inactive member of the union
- the active member is changed by the assignment operators done through a
union access expression ([class.union]/5).

// active member of q[0] at start is v1.
if (read_s1x(&p1->v1))
{
unsigned short temp;
temp = p3->v1.x; // read of v1, the current active member of q[0].
p3->v2.x = temp;
// active member of q[0] is now v2.
write_s2x(&p2->v2,1234);
temp = p3->v2.x; // read of v2, the current active member of q[0].
p3->v1.x = temp;
// active member of q[0] is now v1.
}
// active member of q[0] is v1 regardless of path "if" takes.
return read_s1x(&p1->v1);

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-09-26 14:10:43 UTC
Permalink
Post by Myriachan
There is no case within the code that reads an inactive member of the
union - the active member is changed by the assignment operators done
through a union access expression ([class.union]/5).
I'm amazed but not surprised that cases like this don't move the
optimizationists to realize that struct aliasing is a fundamentally bad
idea. (And the object model too, for that matter.) Not even compilers
understand the rules.

Unions were always the reinterpret_cast of C. They weren't only used to
save space. They were used to access data of one type as data of another
type. (Picking apart the bits of floating-point numbers is the
paradigmatic example.) Then the optimizationists ruined everything.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-09-26 14:38:40 UTC
Permalink
Post by Hyman Rosen
Unions were always the reinterpret_cast of C. They weren't only used to
save space. They were used to access data of one type as data of another
type. (Picking apart the bits of floating-point numbers is the
paradigmatic example.) Then the optimizationists ruined everything.
That was never officially allowed.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nevin Liber
2017-09-26 14:56:38 UTC
Permalink
Post by Thiago Macieira
Post by Hyman Rosen
Unions were always the reinterpret_cast of C. They weren't only used to
save space. They were used to access data of one type as data of another
type. (Picking apart the bits of floating-point numbers is the
paradigmatic example.) Then the optimizationists ruined everything.
That was never officially allowed.
It kinda was in K&R1: "It is the responsibility of the programmer to keep
track of what type is currently stored in a union; the results are machine
dependent if something is stored as one type and extracted as another."

What I don't get are his endless rants about this. A pessimizing compiler
which defines all undefined behavior is a conforming extension, so there is
no reason he cannot implement this himself or pay his vendor to implement
this.
--
Nevin ":-)" Liber <mailto:***@eviloverlord.com> +1-847-691-1404
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-26 15:26:27 UTC
Permalink
On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber
Post by Nevin Liber
Post by Thiago Macieira
Post by Hyman Rosen
Unions were always the reinterpret_cast of C. They weren't only used to
save space. They were used to access data of one type as data of
another
Post by Hyman Rosen
type. (Picking apart the bits of floating-point numbers is the
paradigmatic example.) Then the optimizationists ruined everything.
That was never officially allowed.
It kinda was in K&R1: "It is the responsibility of the programmer to keep
track of what type is currently stored in a union; the results are machine
dependent if something is stored as one type and extracted as another."
What I don't get are his endless rants about this. A pessimizing compiler
which defines all undefined behavior is a conforming extension, so there is
no reason he cannot implement this himself or pay his vendor to implement
this.
Because such code would be non-portable. He wants *everyone* to be able to
write such code and have it mean the same thing everywhere.

Even if they don't want to.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-09-26 16:56:04 UTC
Permalink
Post by Nicol Bolas
On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber
Post by Nevin Liber
What I don't get are his endless rants about this.
He wants *everyone* to be able to write such code and have it mean the
same thing everywhere.
Even if they don't want to.
Yes.

<rant>

As I have said (or ranted) many times before, the purpose of a programming
language is to control the operation of a computer. It is best when the
programming language constructs have straightforward and unambiguous
meaning because that enhances the ability of everyone involved - the
authors, the readers, and the programming systems - to agree what the
program does. When language constructs are unclear, ambiguous,
unspecified, or undefined, different parties may understand the meaning of
the program differently, causing errors to go undetected. In the case of
unspecified or undefined behavior, the programming system may initially
appear to agree with the intentions of the programmer, but secretly permit
itself to disagree, so that future builds of the program, perhaps years
later, no longer perform as the programmer intended.

The purpose of optimization is to change some aspect of a program (usually
its speed, sometimes its size) while not changing its meaning. But C and
C++ have allowed optimization opportunities to feed back into the language
design, resulting in a plethora of unspecified and undefined behavior in
the languages just so optimizers may make assumptions about the code,
assumptions that are easily unwarranted because they cover constructs that
have been widely used and have "worked", precisely because these languages
have been used for "low-level" close-to-the-machine system development
where aliasing, bit-fiddling, integer overflow, and wide-ranging pointer
manipulation are important. Moreover, the details of what behaviors are
not allowed are themselves difficult to specify clearly, so programmers
cannot tell whether they are following the rules or not.

We are now in a situation where we supposedly cannot write std::vector in
standard C++.
We are now in a situation where a() += b(), a() << b(), and a() <= b() each
have different rules for the order of calling a() and b().
We are in a situation where the standard cannot even specify the function
prototypes of the classes it defines, but must resort to weasel words like
"this function does not participate in overload resolution when...".
We are in a situation where C++ has become overwhelmingly complex, and
where traps lie in wait for programmers, who cannot even be wary because
the dangerous areas and the safe areas are fractally intertwined.

</rant>
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-26 20:17:13 UTC
Permalink
Post by Hyman Rosen
Post by Nicol Bolas
On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber
Post by Nevin Liber
What I don't get are his endless rants about this.
He wants *everyone* to be able to write such code and have it mean the
same thing everywhere.
Even if they don't want to.
Yes.
<rant>
As I have said (or ranted) many times before, the purpose of a programming
language is to control the operation of a computer. It is best when the
programming language constructs have straightforward and unambiguous
meaning because that enhances the ability of everyone involved - the
authors, the readers, and the programming systems - to agree what the
program does. When language constructs are unclear, ambiguous,
unspecified, or undefined, different parties may understand the meaning of
the program differently, causing errors to go undetected. In the case of
unspecified or undefined behavior, the programming system may initially
appear to agree with the intentions of the programmer, but secretly permit
itself to disagree, so that future builds of the program, perhaps years
later, no longer perform as the programmer intended.
The purpose of optimization is to change some aspect of a program (usually
its speed, sometimes its size) while not changing its meaning. But C and
C++ have allowed optimization opportunities to feed back into the language
design, resulting in a plethora of unspecified and undefined behavior in
the languages just so optimizers may make assumptions about the code,
assumptions that are easily unwarranted because they cover constructs that
have been widely used and have "worked", precisely because these languages
have been used for "low-level" close-to-the-machine system development
where aliasing, bit-fiddling, integer overflow, and wide-ranging pointer
manipulation are important. Moreover, the details of what behaviors are
not allowed are themselves difficult to specify clearly, so programmers
cannot tell whether they are following the rules or not.
We are now in a situation where we supposedly cannot write std::vector in
standard C++.
We are now in a situation where a() += b(), a() << b(), and a() <= b()
each have different rules for the order of calling a() and b().
We are in a situation where the standard cannot even specify the function
prototypes of the classes it defines, but must resort to weasel words like
"this function does not participate in overload resolution when...".
We are in a situation where C++ has become overwhelmingly complex, and
where traps lie in wait for programmers, who cannot even be wary because
the dangerous areas and the safe areas are fractally intertwined.
</rant>
I kind of wish this were in a different thread, because the code I copied
in the original message appears to me to be well-defined even in the
current Standard.

As for what you said, C++ is always going to have undefined behavior, just
like C. The question is really about where to draw the line of what is
defined and what is not, and similarly, what compilers are allowed to
optimize.

I feel that the situation has drifted too far in favor of leaving more
things undefined to give compilers optimization opportunities. But that
does not mean that there does not exist certain things that compilers
should be allowed to assume.

If we wanted an absolute object model that can't be mucked with at a low
level, we should go code C# or Java. They're safer and much easier.
Conversely, if we need to control every last thing a machine does, we
should code in assembly language, not C/C++.

The niche of C and C++ in modern times is their intermediate position
between assembly language and the managed languages. We just need to
decide where.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-26 20:44:01 UTC
Permalink
Post by Myriachan
Post by Hyman Rosen
Post by Nicol Bolas
On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber
Post by Nevin Liber
What I don't get are his endless rants about this.
He wants *everyone* to be able to write such code and have it mean the
same thing everywhere.
Even if they don't want to.
Yes.
<rant>
As I have said (or ranted) many times before, the purpose of a
programming language is to control the operation of a computer. It is best
when the programming language constructs have straightforward and
unambiguous meaning because that enhances the ability of everyone involved
- the authors, the readers, and the programming systems - to agree what the
program does. When language constructs are unclear, ambiguous,
unspecified, or undefined, different parties may understand the meaning of
the program differently, causing errors to go undetected. In the case of
unspecified or undefined behavior, the programming system may initially
appear to agree with the intentions of the programmer, but secretly permit
itself to disagree, so that future builds of the program, perhaps years
later, no longer perform as the programmer intended.
The purpose of optimization is to change some aspect of a program
(usually its speed, sometimes its size) while not changing its meaning.
But C and C++ have allowed optimization opportunities to feed back into the
language design, resulting in a plethora of unspecified and undefined
behavior in the languages just so optimizers may make assumptions about the
code, assumptions that are easily unwarranted because they cover constructs
that have been widely used and have "worked", precisely because these
languages have been used for "low-level" close-to-the-machine system
development where aliasing, bit-fiddling, integer overflow, and
wide-ranging pointer manipulation are important. Moreover, the details of
what behaviors are not allowed are themselves difficult to specify clearly,
so programmers cannot tell whether they are following the rules or not.
We are now in a situation where we supposedly cannot write std::vector in
standard C++.
We are now in a situation where a() += b(), a() << b(), and a() <= b()
each have different rules for the order of calling a() and b().
We are in a situation where the standard cannot even specify the function
prototypes of the classes it defines, but must resort to weasel words like
"this function does not participate in overload resolution when...".
We are in a situation where C++ has become overwhelmingly complex, and
where traps lie in wait for programmers, who cannot even be wary because
the dangerous areas and the safe areas are fractally intertwined.
</rant>
I kind of wish this were in a different thread, because the code I copied
in the original message appears to me to be well-defined even in the
current Standard.
Thus far, everyone seems to agree that it *is* well-defined.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-26 22:11:27 UTC
Permalink
Post by Nicol Bolas
Post by Myriachan
I kind of wish this were in a different thread, because the code I copied
in the original message appears to me to be well-defined even in the
current Standard.
Thus far, everyone seems to agree that it *is* well-defined.
OK, thank you.

I filed bugs on GCC and Clang about this issue, since I believed it to be
well-defined.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82224
https://bugs.llvm.org/show_bug.cgi?id=34632

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Edward Catmur
2017-09-26 22:21:28 UTC
Permalink
Post by Myriachan
Post by Nicol Bolas
Post by Myriachan
I kind of wish this were in a different thread, because the code I
copied in the original message appears to me to be well-defined even in the
current Standard.
Thus far, everyone seems to agree that it *is* well-defined.
OK, thank you.
I filed bugs on GCC and Clang about this issue, since I believed it to be
well-defined.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82224
https://bugs.llvm.org/show_bug.cgi?id=34632
Note that [class.union]/5 doesn't seem to be relevant; it is possible to
trigger the bug while switching active union member by aggregate assignment:

int g() { return [i = 0] {
union { struct { int x; } v1; struct { int x; } v2; } q[2]{{4321}};
q[0].v2 = { q[0].v1.x };
[&qv2 = q[i].v2]{ qv2.x = 1234; }();
[&q3 = q[0]] { q3.v1 = { q3.v2.x }; }();
return [&qv1 = q[0].v1]{ return qv1.x; }();
}(); }

This program is about as minimal as I can make it for gcc; for clang
(trunk) the last line can be further simplified while preserving the bug.

Changing the data member v2.x to type long (with appropriate casts) avoids
the bug, which to me indicates that it might be an issue with common
initial sequence handling.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-09-26 23:05:53 UTC
Permalink
Post by Edward Catmur
Post by Myriachan
Post by Nicol Bolas
Post by Myriachan
I kind of wish this were in a different thread, because the code I
copied in the original message appears to me to be well-defined even in the
current Standard.
Thus far, everyone seems to agree that it *is* well-defined.
OK, thank you.
I filed bugs on GCC and Clang about this issue, since I believed it to be
well-defined.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82224
https://bugs.llvm.org/show_bug.cgi?id=34632
Note that [class.union]/5 doesn't seem to be relevant; it is possible to
int g() { return [i = 0] {
union { struct { int x; } v1; struct { int x; } v2; } q[2]{{4321}};
q[0].v2 = { q[0].v1.x };
[&qv2 = q[i].v2]{ qv2.x = 1234; }();
[&q3 = q[0]] { q3.v1 = { q3.v2.x }; }();
return [&qv1 = q[0].v1]{ return qv1.x; }();
}(); }
This program is about as minimal as I can make it for gcc; for clang
(trunk) the last line can be further simplified while preserving the bug.
Changing the data member v2.x to type long (with appropriate casts) avoids
the bug, which to me indicates that it might be an issue with common
initial sequence handling.
Yeah, that seems likely. Richard Smith just posted the below simplified
example to the Clang bug. Even though the common initial sequence rule
isn't invoked in the original code, it seems like both GCC and Clang
interpret the situation as invoking that rule, and both get it wrong.
Richard's example uses the common initial sequence rule with similar
results on both compilers. Just as in supercat's code, Visual C++ and
Intel C++ handle Richard's example correctly.

Richard Smith 2017-09-26 15:50:24 PDT
Post by Edward Catmur
struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
union s1s2 { struct s1 v1; struct s2 v2; };
static int read_s1x(struct s1 *p) { return p->x; }
static void write_s2x(struct s2 *p, int v) { p->x=v;}
int test(union s1s2 *p1, union s1s2 *p2)
{
if (p1->v1.x)
{
write_s2x(&p2->v2,1234);
return read_s1x(&p1->v1);
}
return 0;
}
int test2(int x)
{
union s1s2 u = {.v2.x = 4321};
return test(&u, &u);
}
Note that this never even changes the active union member (it's always v2); instead it relies on the "common initial sequence" rule for the two loads through 'v1.x'.
Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-26 20:48:50 UTC
Permalink
Post by Hyman Rosen
Post by Nicol Bolas
On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber
Post by Nevin Liber
What I don't get are his endless rants about this.
He wants *everyone* to be able to write such code and have it mean the
same thing everywhere.
Even if they don't want to.
Yes.
<rant>
As I have said (or ranted) many times before, the purpose of a programming
language is to control the operation of a computer.
No, the purpose of a *programmer* is to control the operation of a
computer. A programming language is just an intermediary between the
programmer's desires and the computer.

It is best when the programming language constructs have straightforward
Post by Hyman Rosen
and unambiguous meaning because that enhances the ability of everyone
involved - the authors, the readers, and the programming systems - to agree
what the program does. When language constructs are unclear, ambiguous,
unspecified, or undefined, different parties may understand the meaning of
the program differently, causing errors to go undetected. In the case of
unspecified or undefined behavior, the programming system may initially
appear to agree with the intentions of the programmer, but secretly permit
itself to disagree, so that future builds of the program, perhaps years
later, no longer perform as the programmer intended.
The purpose of optimization is to change some aspect of a program (usually
its speed, sometimes its size) while not changing its meaning. But C and
C++ have allowed optimization opportunities to feed back into the language
design,
"Have allowed"? This has been true *since the beginning* of C's
standardization. Indeed, one could reasonably argue that the "feed back" of
these "optimization opportunities" are a big part of why C has become a
lingua franca among platforms.

It's certainly a big part of what attracted users to C in the early days.
Being able to have undefined behavior made compilers small and fast, and
made compiled executables small and fast, while still allowing low-level
code to be written, and still allowing it to be implemented across a *wide*
variety of platforms.

That's a very specific intersection of features, and I don't think you
could do that without UB rules.

resulting in a plethora of unspecified and undefined behavior in the
Post by Hyman Rosen
languages just so optimizers may make assumptions about the code,
assumptions that are easily unwarranted because they cover constructs that
have been widely used and have "worked", precisely because these languages
have been used for "low-level" close-to-the-machine system development
where aliasing, bit-fiddling, integer overflow, and wide-ranging pointer
manipulation are important. Moreover, the details of what behaviors are
not allowed are themselves difficult to specify clearly, so programmers
cannot tell whether they are following the rules or not.
We are now in a situation where we supposedly cannot write std::vector in
standard C++.
"Now"? You say that as if we haven't *always* been in that situation. Show
me the version of C++ that had an object model that permitted implementing
`std::vector` without invoking UB.

We are now in a situation where a() += b(), a() << b(), and a() <= b() each
Post by Hyman Rosen
have different rules for the order of calling a() and b().
Sure, but at least now *there are rules* for some of those cases. Before,
it was "you can't rely on it". Now you can in specific cases.

Better well-defined behavior in some cases than undefined behavior in all.

We are in a situation where the standard cannot even specify the function
Post by Hyman Rosen
prototypes of the classes it defines, but must resort to weasel words like
"this function does not participate in overload resolution when...".
... what does SFINAE gymnastics have to do with undefined behavior? How
those work is very well defined; what *isn't* defined is how a particular
implementation uses them to achieve a given effect.

Just as we don't define the exact algorithm `std::sort` uses. The standard
describes behavior and visible effects, not implementation.

And again, the language that permitted such "weasel words" in the standard
has *always* been there. Standard library template functions have *never*
been required to exactly match the defined prototypes.

We are in a situation where C++ has become overwhelmingly complex, and
Post by Hyman Rosen
where traps lie in wait for programmers, who cannot even be wary because
the dangerous areas and the safe areas are fractally intertwined.
... how is that different from any other day as a C or C++ programmer?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-27 05:51:27 UTC
Permalink
圚 2017幎9月27日星期䞉 UTC+8䞊午12:56:27Hyman Rosen写道
Post by Nicol Bolas
On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber
Post by Nevin Liber
What I don't get are his endless rants about this.
He wants *everyone* to be able to write such code and have it mean the
same thing everywhere.
Even if they don't want to.
Yes.
I have tired to repeat again. But you always forget the point, so...
(Anyway, you'd better remember, your problems here have almost nothing to
do with C++.)

<rant>
As I have said (or ranted) many times before, the purpose of a programming
language is to control the operation of a computer.
False. You have multiple misconceptions.

First, a programming language in general is always abstract because the
rules consists it cannot be concrete. It can live without a computer. (This
is similar to an algorithm.)

Second, a programming language in practice often has nothing to do with any
computer. They deal things with *model*s, for exmpale, *abstraction machine*s
(as C and C++ do), or *formal system*s. For any programming language need
to be portable, no computer can be the model simply because no one can
manufacture such a computer to be compatible with any other ones,
physically.

Only the *implementation*s of programming language can target specific
computers (though it is still not guaranteed in general).

Third, to use a programming language to work is a matter of programmers,
not programming languages.

So where is your purpose come?

It is best when the programming language constructs have straightforward
and unambiguous meaning because that enhances the ability of everyone
involved - the authors, the readers, and the programming systems - to agree
what the program does.
False. It is often true in industrial that we must have consensus to avoid *overspecialization
*things to waste time.

To clarify a program the meaning that no one is interested does not help.
To forbid such program being constructed is in general not feasible because
you have no way to detect the "interest" or "intention". Only reasonable
cost can be paid for it.

When language constructs are unclear, ambiguous, unspecified, or undefined,
different parties may understand the meaning of the program differently,
causing errors to go undetected.
These adjectives are not the same, specifically serving different purposes
with different sets of agreements. Why mix them together? Or just because
you failed to distinguish them?

In the case of unspecified or undefined behavior, the programming system
may initially appear to agree with the intentions of the programmer, but
secretly permit itself to disagree, so that future builds of the program,
perhaps years later, no longer perform as the programmer intended.
That's a QoI problem, a fault of the programmer, or both, *by design*.
The domain boundary of programming language design does not include such
agreement because it cannot be feasible in general to *guarantee* serving
any user (however ignorant) well in any case.

When you specifying the uninterested behavior, you are annoying other
programmers who do believe there are no sane use exposed by the rules that
everyone should follow and who simply do not bother the cases. The code
relying on such rules is more annoying. Rules of undefined or unspecified
behavior should be clear, though.

If the programmer does not agree, he/she has freedom to make change the
whole world, if he/she is able to do. That seems not to be... you.

The purpose of optimization is to change some aspect of a program (usually
its speed, sometimes its size) while not changing its meaning.
False. In general, "meaning" is not the invariant of program
transformation, nor the invariant obeyed by optimizer during the
translation. The only key invariant for this is specified by *conforming*
rules, which essentially determines whether an implementation is of the
language defined by specification or not. In other words, if "meaning" is
not expected to be changed, the "meaning" of the program is *based on* such
rules, with the program as an input.
But C and C++ have allowed optimization opportunities to feed back into
the language design, resulting in a plethora of unspecified and undefined
behavior in the languages just so optimizers may make assumptions about the
code, assumptions that are easily unwarranted because they cover constructs
that have been widely used and have "worked", precisely because these
languages have been used for "low-level" close-to-the-machine system
development where aliasing, bit-fiddling, integer overflow, and
wide-ranging pointer manipulation are important. Moreover, the details of
what behaviors are not allowed are themselves difficult to specify clearly,
so programmers cannot tell whether they are following the rules or not.
This shows you even don't clearly know what a C or C++ program can mean.

Based on very similar abstract machines (which you obviously ignored), the
conforming rules in C is rough equivalent to a subset of rules in C++
called "as-if" rules. To be fair, C++ actually has more surprising (true
"WTF" stuff to newbies) things which allow semantics not obeying as-if
rule, as some kind of necessary evil. But it seems not relevant to you...
We are now in a situation where we supposedly cannot write std::vector in
standard C++.
We are now in a situation where a() += b(), a() << b(), and a() <= b()
each have different rules for the order of calling a() and b().
We are in a situation where the standard cannot even specify the function
prototypes of the classes it defines, but must resort to weasel words like
"this function does not participate in overload resolution when...".
We are in a situation where C++ has become overwhelmingly complex, and
where traps lie in wait for programmers, who cannot even be wary because
the dangerous areas and the safe areas are fractally intertwined.
I don't see how these points related to your previous rants.
</rant>
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-09-27 15:01:48 UTC
Permalink
Post by FrankHB1989
圚 2017幎9月27日星期䞉 UTC+8䞊午12:56:27Hyman Rosen写道
Post by Nicol Bolas
On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber
Post by Nevin Liber
What I don't get are his endless rants about this.
He wants *everyone* to be able to write such code and have it mean the
same thing everywhere.
Even if they don't want to.
Yes.
I have tired to repeat again. But you always forget the point, so...
(Anyway, you'd better remember, your problems here have almost nothing to
do with C++.)
<rant>
As I have said (or ranted) many times before, the purpose of a
programming language is to control the operation of a computer.
False. You have multiple misconceptions.
First, a programming language in general is always abstract because the
rules consists it cannot be concrete. It can live without a computer. (This
is similar to an algorithm.)
Second, a programming language in practice often has nothing to do with
any computer.
While yes, programming languages often do define "models", "abstract
machines" and the like, those "models" and "abstract machines" are based on *actual
computers* to some degree. Memory in the C++ memory model is laid out as a
sequence of "bytes" because we know that's how memory works on computers.
Java requires 2's complement integer math because all of the platforms that
Java is interested in supporting offer 2's complement integer math
natively. C/C++ do not make 2's complement part of their abstract machines,
because they want to be able to run fast on non-2's complement machines.

Programming languages may be written against models, but those models are
always designed with an eye to actual machines. The reality of where the
implementations are expected to be implemented informs the models we use to
abstract them.

So while it's wrong to say that programming languages are for computers,
it's just as wrong to say that they're purely for models too.

Oh, and "in practice", programming languages *always* have to do with
actual computers. Because "in practice" means that you're writing code that
you intend to run on one or more implementations. "In theory" would be when
you care solely about writing against the abstraction.
Post by FrankHB1989
They deal things with *model*s, for exmpale, *abstraction machine*s (as C
and C++ do), or *formal system*s. For any programming language need to be
portable, no computer can be the model simply because no one can
manufacture such a computer to be compatible with any other ones,
physically.
Only the *implementation*s of programming language can target specific
computers (though it is still not guaranteed in general).
Third, to use a programming language to work is a matter of programmers,
not programming languages.
So where is your purpose come?
It is best when the programming language constructs have straightforward
and unambiguous meaning because that enhances the ability of everyone
involved - the authors, the readers, and the programming systems - to agree
what the program does.
False. It is often true in industrial that we must have consensus to avoid *overspecialization
*things to waste time.
To clarify a program the meaning that no one is interested does not help.
And yet, people *keep writing them*, so obviously someone is "interested"
in that meaning.

To forbid such program being constructed is in general not feasible because
Post by FrankHB1989
you have no way to detect the "interest" or "intention". Only reasonable
cost can be paid for it.
When language constructs are unclear, ambiguous, unspecified, or
undefined, different parties may understand the meaning of the program
differently, causing errors to go undetected.
These adjectives are not the same, specifically serving different purposes
with different sets of agreements. Why mix them together? Or just because
you failed to distinguish them?
Because the distinctions are essentially irrelevant to his point. That
being that, if you write a program that does certain things, the language
does not clearly state what will happen. From a user perspective, the
code's behavior is unknown. They have a certain expectation of what "ought"
to happen, but the language has (typically esoteric) rules that make the
code not do what they believe they have written.

The specific word you use for such circumstances is irrelevant. What
matters is that you wrote X, and the code looks like it should do X, but it
may not. This creates confusion between the user's intent and the
language's definition. Which leads to the potential for errors, which are
not easy to catch, since such circumstances are allowed to *appear* to work.

Compilation failures tell the user that what they tried is non-functional.
To allow a program to compile, yet for it to still not be functional as
described, creates problems.

In the case of unspecified or undefined behavior, the programming system
Post by FrankHB1989
may initially appear to agree with the intentions of the programmer, but
secretly permit itself to disagree, so that future builds of the program,
perhaps years later, no longer perform as the programmer intended.
That's a QoI problem, a fault of the programmer, or both, *by design*.
That's essentially a tautology. You're saying the programming language is
right because it's right.

If a language, "by design," creates lots of circumstances where useful code
looks like it will work one way, but in fact works another way or has
unpredictable results, that's a problem with the "design" of the language.

Now for C/C++, this is a tradeoff. By leaving certain things undefined, the
language becomes more useful to us. We wouldn't be able to cast things to
`void*` and back (a perfectly well-defined operation), if we had to
*statically* ensure that UB was not possible. So we accept the problem in
the language because doing so offers us benefits which we could not get
another way.

But that acceptance should not be used to say that the problem *doesn't
exist*. If a language frequently promotes misunderstanding, that's a fault
in the *language*, not in the programmer. You may still use it anyway, but
let's not pretend it's not actually a problem.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-09-27 16:49:26 UTC
Permalink
Thanks. FrankHB1989 and I have these run-ins repeatedly, so I won't bother
replying to him - our underlying points of view are so different that
nothing useful would result.

Now for C/C++, this is a tradeoff. By leaving certain things undefined, the
Post by Nicol Bolas
language becomes more useful to us. We wouldn't be able to cast things to
`void*` and back (a perfectly well-defined operation), if we had to
*statically* ensure that UB was not possible. So we accept the problem in
the language because doing so offers us benefits which we could not get
another way.
Note that this particular example is diametrically opposite of what I would
want. I want C++ objects to be "bags of bits". I don't want to statically
ensure that no UB is possible, I want to allow dynamic behavior now
considered UB to be well-defined or implementation-defined instead. In my
vision, the standard would say something like this:







*The value representation of an uninitialized object contains an
unspecified but definite set of bits (which may be different for each
instance of such an object). The value representation of a
byte-initialized object (that is, an object some of whose bytes have been
replaced by copying, e.g., by memcpy) contains its original data modified
by the copied data. If this representation is not a trap value, then the
object is treated as if it were a constructed object with that
representation.(Note: Invariants of the object guaranteed by execution of a
constructor of its class may not hold.)Some objects (notably those of types
with virtual members, virtual base classes, or reference members) may have
value representations with parts that are inaccessible via mechanisms
provided by this Standard other than byte copying. It is
implementation-defined as to which operations on an object involve use of
this inaccessible data and what this inaccessible data contains. If such
an operation is attempted on an uninitialized or byte-initialized object
and the inaccessible data for that operation is not correct for that
object, the behavior is undefined.All members of a union are treated as if
they were references to byte-initialized objects, using the rules above.
The value representation of each member of the union is a prefix of the **value
representation of the union.*

*A memory segment is either the storage of a declared complete object or a
region of storage returned by an allocator. A pointer into a memory
segment is a correctly-aligned pointer of some object type whose value is
within the segment or one past the end. Two pointers of the same type into
the same memory segment may be subtracted from each other. An integer may
be added to a pointer into a memory segment provided the result is a
pointer into the same memory segment. If the result does not point past
the end of the memory segment then it is considered to be a pointer to a
byte-initialized object, using the rules above.*

*A pointer or reference to an object of any type may be cast to a pointer
or reference of any other object type provided that the alignment of the
pointer is suitable for the target type. If **the memory segment of the
object can fully contain the object of the target type then t**he object
may be accessed through that pointer as if it were a byte-initialized
object of the target type, using the rules above.*
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-28 03:30:19 UTC
Permalink
圚 2017幎9月27日星期䞉 UTC+8䞋午11:01:49Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月27日星期䞉 UTC+8䞊午12:56:27Hyman Rosen写道
Post by Nicol Bolas
On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber
Post by Nevin Liber
What I don't get are his endless rants about this.
He wants *everyone* to be able to write such code and have it mean the
same thing everywhere.
Even if they don't want to.
Yes.
I have tired to repeat again. But you always forget the point, so...
(Anyway, you'd better remember, your problems here have almost nothing to
do with C++.)
<rant>
As I have said (or ranted) many times before, the purpose of a
programming language is to control the operation of a computer.
False. You have multiple misconceptions.
First, a programming language in general is always abstract because the
rules consists it cannot be concrete. It can live without a computer. (This
is similar to an algorithm.)
Second, a programming language in practice often has nothing to do with
any computer.
While yes, programming languages often do define "models", "abstract
machines" and the like, those "models" and "abstract machines" are based on *actual
computers* to some degree. Memory in the C++ memory model is laid out as
a sequence of "bytes" because we know that's how memory works on computers.
Java requires 2's complement integer math because all of the platforms that
Java is interested in supporting offer 2's complement integer math
natively. C/C++ do not make 2's complement part of their abstract machines,
because they want to be able to run fast on non-2's complement machines.
Yes and no.
While yes, a model in a practical programming language should to share
characteristics with actual machines (otherwise it fails to be
implementable, thus not practical), it is not *based on* the machine by
design. They are in parallel. Once the design is made, it has nothing to do
with any actual machines, unless the model fails to be a good abstraction
of these targets in a whole. If we do find something we need cannot be
allowed *essentially* by the model, the model should be extended, or
redesigned. But this would almost never occur on the base part since it can
be always *very *generic comparing to the actual machines the language
required to support by users, or the need is not consistent with the goal
of the language at all (e.g. general side effects in a pure functional
language).

The whole model can be designed simply by taking the intersection of
characteristics of all target implementations (if any), but this is not the
only way. Norms to reflect the characteristics among variant real machines
are *parametrized* in the model, being the intentional *unspecified*
properties in the rules. The rules to limit the latitude of these
parameters are negotiable, however, changing on them does not alter the
rationale at all. So they are naturally developed independently, result in
modularity in the model. Some concrete limitations do come from actual
machines, but they have quite limited effects.

Taking C++ as the example, it has the abstract machine as its model of
semantics. The memory model of C++ defines a subset of norms as part of the
abstract machine semantics. Since C++ need to support a broad set of
machines as actual implementations already did, design by enumeration of
machines is impractical. Instead we limit the common characteristics to
make it simple, modularized into memory model, object model, execution
model, etc. The memory model provides a shape of common characteristics of
part of memory subsystem in the target machines interested by the target
audience of the language. On the other hand, the object model has separated
concerns to allow users caring only use of the first-class entity of the
language... Usual modification of one of the components of the module
should usually not break another.

Portability can then naturally measured by comparing the model rather than
machine support in implementations. Compared to Java, modeling in ISO C++
actually enforces more portability (though slightly less than ISO C). Java
has fixed, or, *specialized*, some parameters in the C++ model, so it is
less portable in this sense. Whether we need to support the specific
features in a portable way is another story. The modularized model design
should always work in both cases.

Programming languages may be written against models, but those models are
Post by Nicol Bolas
always designed with an eye to actual machines. The reality of where the
implementations are expected to be implemented informs the models we use to
abstract them.
No.
Not all program languages have machine in the mind during designing. It
depends on what we expect the language to do and how the language is
designed.

Illustrated below.

So while it's wrong to say that programming languages are for computers,
Post by Nicol Bolas
it's just as wrong to say that they're purely for models too.
Oh, and "in practice", programming languages *always* have to do with
actual computers. Because "in practice" means that you're writing code that
you intend to run on one or more implementations. "In theory" would be when
you care solely about writing against the abstraction.
Still not always have to, even in practice.

For things we usually called as native languages, they targets to
instruction set architecture of machines (ISA), which is argubly still not
a very concrete set of actual machines compared to language targeting on
more low-level abstractions like a specific processor microarchitecture
(which I am working on) or register-transfer level (but not the
register-transfer language). For languages targets lower levels than ISA,
they are usually machine-specific, but this is not strictly required (they
can be still portable across different machines in some degree); and the
machines are not necessarily computers.

Some other languages are designed to targets only with specific
intermediate layers above machine ISA, using them (e.g. JVM, CLR, etc)
instead of any machine ISA. Some intermediate layers can be alternative
machine ISA (like hardware JVM) so they are more or less machine-dependent,
but not all of them can fall in this category reasonably. For example, the
POSIX shell language relies on an awkward layer far from any actual machine
by design (even things like `CHAR_BIT == 8` are mandated by POSIX but they
are not explicit parts of the design of the shell language). These
languages are naturally (real-)machine-independent. In such cases, lack of
knowledge of machines can still make the design work. Implementations are
also not required to target on machine-level interfaces directly.
Post by Nicol Bolas
Post by FrankHB1989
They deal things with *model*s, for exmpale, *abstraction machine*s (as
C and C++ do), or *formal system*s. For any programming language need to
be portable, no computer can be the model simply because no one can
manufacture such a computer to be compatible with any other ones,
physically.
Only the *implementation*s of programming language can target specific
computers (though it is still not guaranteed in general).
Third, to use a programming language to work is a matter of programmers,
not programming languages.
So where is your purpose come?
It is best when the programming language constructs have straightforward
and unambiguous meaning because that enhances the ability of everyone
involved - the authors, the readers, and the programming systems - to agree
what the program does.
False. It is often true in industrial that we must have consensus to
avoid *overspecialization *things to waste time.
To clarify a program the meaning that no one is interested does not help.
And yet, people *keep writing them*, so obviously someone is "interested"
in that meaning.
That is what we should better avoid here.
To forbid such program being constructed is in general not feasible because
Post by Nicol Bolas
Post by FrankHB1989
you have no way to detect the "interest" or "intention". Only reasonable
cost can be paid for it.
When language constructs are unclear, ambiguous, unspecified, or
undefined, different parties may understand the meaning of the program
differently, causing errors to go undetected.
These adjectives are not the same, specifically serving different
purposes with different sets of agreements. Why mix them together? Or just
because you failed to distinguish them?
Because the distinctions are essentially irrelevant to his point. That
being that, if you write a program that does certain things, the language
does not clearly state what will happen. From a user perspective, the
code's behavior is unknown. They have a certain expectation of what "ought"
to happen, but the language has (typically esoteric) rules that make the
code not do what they believe they have written.
The specific word you use for such circumstances is irrelevant. What
matters is that you wrote X, and the code looks like it should do X, but it
may not. This creates confusion between the user's intent and the
language's definition. Which leads to the potential for errors, which are
not easy to catch, since such circumstances are allowed to *appear* to work.
Compilation failures tell the user that what they tried is non-functional.
To allow a program to compile, yet for it to still not be functional as
described, creates problems.
In the case of unspecified or undefined behavior, the programming system
Post by FrankHB1989
may initially appear to agree with the intentions of the programmer, but
secretly permit itself to disagree, so that future builds of the program,
perhaps years later, no longer perform as the programmer intended.
That's a QoI problem, a fault of the programmer, or both, *by design*.
That's essentially a tautology. You're saying the programming language is
right because it's right.
No. Requiring sane knowledge to use the language correctly is the
*premise*, not the conclusion.

If a language, "by design," creates lots of circumstances where useful code
Post by Nicol Bolas
looks like it will work one way, but in fact works another way or has
unpredictable results, that's a problem with the "design" of the language.
Avoiding unpredictable results *unconditionally *can not be the goal of
the language in practice. It is in generally not implementable, because in
practice they can be introduced by variant sources, which cannot be totally
avoided by tuning the language design. (That's also why we need contracts.)
Languages are surely bad to introduce unpredictable results randomly, but
there are cases they deserved. It is doubtful to call them "useful"
especially when are they are not avoided merely for ignorance of sane and
explicit rules. Such cases raises a problem of use (and teaching, tooling,
design of more robustness features as extensions provided by particular
implementations, etc), but not "design" (in the language spec).

Now for C/C++, this is a tradeoff. By leaving certain things undefined, the
Post by Nicol Bolas
language becomes more useful to us. We wouldn't be able to cast things to
`void*` and back (a perfectly well-defined operation), if we had to
*statically* ensure that UB was not possible. So we accept the problem in
the language because doing so offers us benefits which we could not get
another way.
But that acceptance should not be used to say that the problem *doesn't
exist*. If a language frequently promotes misunderstanding, that's a
fault in the *language*, not in the programmer. You may still use it
anyway, but let's not pretend it's not actually a problem.
Not quite. Consider another question: if it is really a fault, it is in
mind, in normative text, or in informative text?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
FrankHB1989
2017-09-28 03:30:19 UTC
Permalink
圚 2017幎9月27日星期䞉 UTC+8䞋午11:01:49Nicol Bolas写道
Post by Nicol Bolas
Post by FrankHB1989
圚 2017幎9月27日星期䞉 UTC+8䞊午12:56:27Hyman Rosen写道
Post by Nicol Bolas
On Tuesday, September 26, 2017 at 10:57:22 AM UTC-4, Nevin ":-)" Liber
Post by Nevin Liber
What I don't get are his endless rants about this.
He wants *everyone* to be able to write such code and have it mean the
same thing everywhere.
Even if they don't want to.
Yes.
I have tired to repeat again. But you always forget the point, so...
(Anyway, you'd better remember, your problems here have almost nothing to
do with C++.)
<rant>
As I have said (or ranted) many times before, the purpose of a
programming language is to control the operation of a computer.
False. You have multiple misconceptions.
First, a programming language in general is always abstract because the
rules consists it cannot be concrete. It can live without a computer. (This
is similar to an algorithm.)
Second, a programming language in practice often has nothing to do with
any computer.
While yes, programming languages often do define "models", "abstract
machines" and the like, those "models" and "abstract machines" are based on *actual
computers* to some degree. Memory in the C++ memory model is laid out as
a sequence of "bytes" because we know that's how memory works on computers.
Java requires 2's complement integer math because all of the platforms that
Java is interested in supporting offer 2's complement integer math
natively. C/C++ do not make 2's complement part of their abstract machines,
because they want to be able to run fast on non-2's complement machines.
Yes and no.
While yes, a model in a practical programming language should to share
characteristics with actual machines (otherwise it fails to be
implementable, thus not practical), it is not *based on* the machine by
design. They are in parallel. Once the design is made, it has nothing to do
with any actual machines, unless the model fails to be a good abstraction
of these targets in a whole. If we do find something we need cannot be
allowed *essentially* by the model, the model should be extended, or
redesigned. But this would almost never occur on the base part since it can
be always *very *generic comparing to the actual machines the language
required to support by users, or the need is not consistent with the goal
of the language at all (e.g. general side effects in a pure functional
language).

The whole model can be designed simply by taking the intersection of
characteristics of all target implementations (if any), but this is not the
only way. Norms to reflect the characteristics among variant real machines
are *parametrized* in the model, being the intentional *unspecified*
properties in the rules. The rules to limit the latitude of these
parameters are negotiable, however, changing on them does not alter the
rationale at all. So they are naturally developed independently, result in
modularity in the model. Some concrete limitations do come from actual
machines, but they have quite limited effects.

Taking C++ as the example, it has the abstract machine as its model of
semantics. The memory model of C++ defines a subset of norms as part of the
abstract machine semantics. Since C++ need to support a broad set of
machines as actual implementations already did, design by enumeration of
machines is impractical. Instead we limit the common characteristics to
make it simple, modularized into memory model, object model, execution
model, etc. The memory model provides a shape of common characteristics of
part of memory subsystem in the target machines interested by the target
audience of the language. On the other hand, the object model has separated
concerns to allow users caring only use of the first-class entity of the
language... Usual modification of one of the components of the module
should usually not break another.

Portability can then naturally measured by comparing the model rather than
machine support in implementations. Compared to Java, modeling in ISO C++
actually enforces more portability (though slightly less than ISO C). Java
has fixed, or, *specialized*, some parameters in the C++ model, so it is
less portable in this sense. Whether we need to support the specific
features in a portable way is another story. The modularized model design
should always work in both cases.

Programming languages may be written against models, but those models are
Post by Nicol Bolas
always designed with an eye to actual machines. The reality of where the
implementations are expected to be implemented informs the models we use to
abstract them.
No.
Not all program languages have machine in the mind during designing. It
depends on what we expect the language to do and how the language is
designed.

Illustrated below.

So while it's wrong to say that programming languages are for computers,
Post by Nicol Bolas
it's just as wrong to say that they're purely for models too.
Oh, and "in practice", programming languages *always* have to do with
actual computers. Because "in practice" means that you're writing code that
you intend to run on one or more implementations. "In theory" would be when
you care solely about writing against the abstraction.
Still not always have to, even in practice.

For things we usually called as native languages, they targets to
instruction set architecture of machines (ISA), which is argubly still not
a very concrete set of actual machines compared to language targeting on
more low-level abstractions like a specific processor microarchitecture
(which I am working on) or register-transfer level (but not the
register-transfer language). For languages targets lower levels than ISA,
they are usually machine-specific, but this is not strictly required (they
can be still portable across different machines in some degree); and the
machines are not necessarily computers.

Some other languages are designed to targets only with specific
intermediate layers above machine ISA, using them (e.g. JVM, CLR, etc)
instead of any machine ISA. Some intermediate layers can be alternative
machine ISA (like hardware JVM) so they are more or less machine-dependent,
but not all of them can fall in this category reasonably. For example, the
POSIX shell language relies on an awkward layer far from any actual machine
by design (even things like `CHAR_BIT == 8` are mandated by POSIX but they
are not explicit parts of the design of the shell language). These
languages are naturally (real-)machine-independent. In such cases, lack of
knowledge of machines can still make the design work. Implementations are
also not required to target on machine-level interfaces directly.
Post by Nicol Bolas
Post by FrankHB1989
They deal things with *model*s, for exmpale, *abstraction machine*s (as
C and C++ do), or *formal system*s. For any programming language need to
be portable, no computer can be the model simply because no one can
manufacture such a computer to be compatible with any other ones,
physically.
Only the *implementation*s of programming language can target specific
computers (though it is still not guaranteed in general).
Third, to use a programming language to work is a matter of programmers,
not programming languages.
So where is your purpose come?
It is best when the programming language constructs have straightforward
and unambiguous meaning because that enhances the ability of everyone
involved - the authors, the readers, and the programming systems - to agree
what the program does.
False. It is often true in industrial that we must have consensus to
avoid *overspecialization *things to waste time.
To clarify a program the meaning that no one is interested does not help.
And yet, people *keep writing them*, so obviously someone is "interested"
in that meaning.
That is what we should better avoid here.
To forbid such program being constructed is in general not feasible because
Post by Nicol Bolas
Post by FrankHB1989
you have no way to detect the "interest" or "intention". Only reasonable
cost can be paid for it.
When language constructs are unclear, ambiguous, unspecified, or
undefined, different parties may understand the meaning of the program
differently, causing errors to go undetected.
These adjectives are not the same, specifically serving different
purposes with different sets of agreements. Why mix them together? Or just
because you failed to distinguish them?
Because the distinctions are essentially irrelevant to his point. That
being that, if you write a program that does certain things, the language
does not clearly state what will happen. From a user perspective, the
code's behavior is unknown. They have a certain expectation of what "ought"
to happen, but the language has (typically esoteric) rules that make the
code not do what they believe they have written.
The specific word you use for such circumstances is irrelevant. What
matters is that you wrote X, and the code looks like it should do X, but it
may not. This creates confusion between the user's intent and the
language's definition. Which leads to the potential for errors, which are
not easy to catch, since such circumstances are allowed to *appear* to work.
Compilation failures tell the user that what they tried is non-functional.
To allow a program to compile, yet for it to still not be functional as
described, creates problems.
In the case of unspecified or undefined behavior, the programming system
Post by FrankHB1989
may initially appear to agree with the intentions of the programmer, but
secretly permit itself to disagree, so that future builds of the program,
perhaps years later, no longer perform as the programmer intended.
That's a QoI problem, a fault of the programmer, or both, *by design*.
That's essentially a tautology. You're saying the programming language is
right because it's right.
No. Requiring sane knowledge to use the language correctly is the
*premise*, not the conclusion.

If a language, "by design," creates lots of circumstances where useful code
Post by Nicol Bolas
looks like it will work one way, but in fact works another way or has
unpredictable results, that's a problem with the "design" of the language.
Avoiding unpredictable results *unconditionally *can not be the goal of
the language in practice. It is in generally not implementable, because in
practice they can be introduced by variant sources, which cannot be totally
avoided by tuning the language design. (That's also why we need contracts.)
Languages are surely bad to introduce unpredictable results randomly, but
there are cases they deserved. It is doubtful to call them "useful"
especially when are they are not avoided merely for ignorance of sane and
explicit rules. Such cases raises a problem of use (and teaching, tooling,
design of more robustness features as extensions provided by particular
implementations, etc), but not "design" (in the language spec).

Now for C/C++, this is a tradeoff. By leaving certain things undefined, the
Post by Nicol Bolas
language becomes more useful to us. We wouldn't be able to cast things to
`void*` and back (a perfectly well-defined operation), if we had to
*statically* ensure that UB was not possible. So we accept the problem in
the language because doing so offers us benefits which we could not get
another way.
But that acceptance should not be used to say that the problem *doesn't
exist*. If a language frequently promotes misunderstanding, that's a
fault in the *language*, not in the programmer. You may still use it
anyway, but let's not pretend it's not actually a problem.
Not quite. Consider another question: if it is really a fault, it is in
mind, in normative text, or in informative text?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-17 22:39:59 UTC
Permalink
Post by Nicol Bolas
If a language, "by design," creates lots of circumstances where useful
code looks like it will work one way, but in fact works another way or has
unpredictable results, that's a problem with the "design" of the language.
Just as a particularly relevant example, have a look at
<https://dxr.mozilla.org/mozilla-beta/source/js/src/dtoa.c>.
This is one version of David M. Gay's float/decimal conversion code.
(This code is ubiquitous - it's the foundation for strtod and dtoa variants
including in C and C++ standard libraries.)

Granted that it's in C, it nevertheless makes use of traditional union
punning to access parts of doubles as integers, and it uses memcpy
as part of variable length arrays, copying bytes using a pointer to the
middle of an object and going far beyond its apparent program-defined
end. It's full of undefined behavior, thoroughly utilizing the "bag of
bits"
model.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
i***@gmail.com
2017-10-18 17:25:07 UTC
Permalink
Post by Hyman Rosen
Post by Nicol Bolas
If a language, "by design," creates lots of circumstances where useful
code looks like it will work one way, but in fact works another way or has
unpredictable results, that's a problem with the "design" of the language.
Just as a particularly relevant example, have a look at
<https://dxr.mozilla.org/mozilla-beta/source/js/src/dtoa.c>.
This is one version of David M. Gay's float/decimal conversion code.
(This code is ubiquitous - it's the foundation for strtod and dtoa variants
including in C and C++ standard libraries.)
Granted that it's in C, it nevertheless makes use of traditional union
punning to access parts of doubles as integers, and it uses memcpy
as part of variable length arrays, copying bytes using a pointer to the
middle of an object and going far beyond its apparent program-defined
end. It's full of undefined behavior, thoroughly utilizing the "bag of
bits"
model.
And multiple of ifdef to work correctly, standard library do not need obey
C/C++ rules (look on std::vector or node extraction from map with change
constnes of field)
and it is only place where all undefined things are defined.
User code can't do that because you do not know witch version of standard
library you use and each can work different (even same system and vendor
but different version).
One goal of C/C++ is portability and this code is not portable at all,
because if you using enough ifdefs you could run "same" code in C# or any
other language.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-18 17:44:14 UTC
Permalink
Post by i***@gmail.com
Post by Hyman Rosen
Post by Nicol Bolas
If a language, "by design," creates lots of circumstances where useful
code looks like it will work one way, but in fact works another way or has
unpredictable results, that's a problem with the "design" of the language.
Just as a particularly relevant example, have a look at
<https://dxr.mozilla.org/mozilla-beta/source/js/src/dtoa.c>.
This is one version of David M. Gay's float/decimal conversion code.
(This code is ubiquitous - it's the foundation for strtod and dtoa variants
including in C and C++ standard libraries.)
Granted that it's in C, it nevertheless makes use of traditional union
punning to access parts of doubles as integers, and it uses memcpy
as part of variable length arrays, copying bytes using a pointer to the
middle of an object and going far beyond its apparent program-defined
end. It's full of undefined behavior, thoroughly utilizing the "bag of
bits"
model.
And multiple of ifdef to work correctly, standard library do not need obey
C/C++ rules (look on std::vector or node extraction from map with change
constnes of field)
and it is only place where all undefined things are defined.
User code can't do that because you do not know witch version of standard
library you use and each can work different (even same system and vendor
but different version).
One goal of C/C++ is portability and this code is not portable at all,
because if you using enough ifdefs you could run "same" code in C# or any
other language.
What makes you think this code was compiled and tested as anything but user
code?
Which compilers document switches that say "allow undefined behavior for
standard library code"?
Why do you believe that the "bag of bits" model, augmented by
implementation information, is not portable?

C and C++ (and Fortran and...) programmers have been using the "bag of
bits" model for a good
half-century, and their code has behaved as they expected. Compilers that
make believe that code
does not do such things will silently break code that has perfectly
predictable behavior in the "bag
of bits" model. They do this at the behest of the optimizationists, who do
not care if such code breaks
as long as they can come up with examples where some other code can be made
to run faster. But
the optimizationists cannot even come up with a coherent description of
what is allowed by the language
("vector cannot be implemented in standard C++") without tying themselves
in knots.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
i***@gmail.com
2017-10-18 18:29:43 UTC
Permalink
Post by Hyman Rosen
Post by i***@gmail.com
Post by Hyman Rosen
Post by Nicol Bolas
If a language, "by design," creates lots of circumstances where useful
code looks like it will work one way, but in fact works another way or has
unpredictable results, that's a problem with the "design" of the language.
Just as a particularly relevant example, have a look at
<https://dxr.mozilla.org/mozilla-beta/source/js/src/dtoa.c>.
This is one version of David M. Gay's float/decimal conversion code.
(This code is ubiquitous - it's the foundation for strtod and dtoa variants
including in C and C++ standard libraries.)
Granted that it's in C, it nevertheless makes use of traditional union
punning to access parts of doubles as integers, and it uses memcpy
as part of variable length arrays, copying bytes using a pointer to the
middle of an object and going far beyond its apparent program-defined
end. It's full of undefined behavior, thoroughly utilizing the "bag of
bits"
model.
And multiple of ifdef to work correctly, standard library do not need
obey C/C++ rules (look on std::vector or node extraction from map with
change constnes of field)
and it is only place where all undefined things are defined.
User code can't do that because you do not know witch version of standard
library you use and each can work different (even same system and vendor
but different version).
One goal of C/C++ is portability and this code is not portable at all,
because if you using enough ifdefs you could run "same" code in C# or any
other language.
What makes you think this code was compiled and tested as anything but
user code?
Which compilers document switches that say "allow undefined behavior for
standard library code"?
Why do you believe that the "bag of bits" model, augmented by
implementation information, is not portable?
C and C++ (and Fortran and...) programmers have been using the "bag of
bits" model for a good
half-century, and their code has behaved as they expected. Compilers that
make believe that code
does not do such things will silently break code that has perfectly
predictable behavior in the "bag
of bits" model. They do this at the behest of the optimizationists, who
do not care if such code breaks
as long as they can come up with examples where some other code can be
made to run faster. But
the optimizationists cannot even come up with a coherent description of
what is allowed by the language
("vector cannot be implemented in standard C++") without tying themselves
in knots.
There no need switch or anything in library code to disable undefined
behavior, its simply NOT PORTABLE and why it could do any thing it want.
If you want your code follow same principle then you too can ignore all
things that standard say about UB.
And again `std::vector` is not problem if it can't be implemented by C++
because library writer can use compiler intrinsic to do it,
problem that is if anyone else cant do it without using intrinsic and
loosing portability.

"perfectly predictable behavior" but not portable, stay in old compiler and
it will work, if you would use what language support then upgrading
compilers would not break your code.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-18 18:41:30 UTC
Permalink
Post by i***@gmail.com
If you want your code follow same principle then you too can ignore all
things that standard say about UB.
No, that's completely wrong. Because of the optimizationists, compilers
assume
that programs do not execute undefined behavior, and if any code path leads
to
undefined behavior, the compiler assumes that this path will not be taken
and does
not translate that path to behave as the programmer expects in the "bag of
bits"
model. That means that if I do
union { double d; unsigned long long u; }; d = 1; printf("%llu\n", u);
the compiler will notice that it is undefined behavior for me to access u
after setting
d and it can remove the entire call to printf.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
i***@gmail.com
2017-10-18 18:57:47 UTC
Permalink
Post by Hyman Rosen
Post by i***@gmail.com
If you want your code follow same principle then you too can ignore all
things that standard say about UB.
No, that's completely wrong. Because of the optimizationists, compilers
assume
that programs do not execute undefined behavior, and if any code path
leads to
undefined behavior, the compiler assumes that this path will not be taken
and does
not translate that path to behave as the programmer expects in the "bag of
bits"
model. That means that if I do
union { double d; unsigned long long u; }; d = 1; printf("%llu\n", u);
the compiler will notice that it is undefined behavior for me to access u
after setting
d and it can remove the entire call to printf.
You completely miss the point, IF you ditch portability you can do any
thing you want. You can use compiler that do not do it, use special flags,
do not optimize, use intrinsic etc. etc.
You could even write your own compiler od change existing ones.
Only problem is when you pretend you write portable code and compilers
assume that you did it. If you break contact do not except compiler to
follow it too.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-18 19:31:35 UTC
Permalink
Post by i***@gmail.com
Post by Hyman Rosen
Post by i***@gmail.com
If you want your code follow same principle then you too can ignore all
things that standard say about UB.
No, that's completely wrong. Because of the optimizationists, compilers
assume
that programs do not execute undefined behavior, and if any code path
leads to
undefined behavior, the compiler assumes that this path will not be taken
and does
not translate that path to behave as the programmer expects in the "bag
of bits"
model. That means that if I do
union { double d; unsigned long long u; }; d = 1; printf("%llu\n", u);
the compiler will notice that it is undefined behavior for me to access u
after setting
d and it can remove the entire call to printf.
You completely miss the point, IF you ditch portability you can do any
thing you want. You can use compiler that do not do it, use special flags,
do not optimize, use intrinsic etc. etc.
You could even write your own compiler od change existing ones.
Only problem is when you pretend you write portable code and compilers
assume that you did it. If you break contact do not except compiler to
follow it too.
You completely miss the point. The David Gay code in question was first
written in 1991,
and has worked as expected on a huge variety of machines and compilers.
But new compilers
give themselves permission to destroy paths that involve undefined behavior
and are more likely
to think they have found such paths, and so this code that's a quarter
century old can start breaking
now just by being rebuilt.

And I don't know why you think the code isn't portable. Portable code
behaves the same way
in different environments. You seem to believe that code that configures
itself to its environment
is not portable, but I reject such a claim.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
i***@gmail.com
2017-10-18 19:51:50 UTC
Permalink
Post by Hyman Rosen
Post by i***@gmail.com
Post by Hyman Rosen
Post by i***@gmail.com
If you want your code follow same principle then you too can ignore all
things that standard say about UB.
No, that's completely wrong. Because of the optimizationists, compilers
assume
that programs do not execute undefined behavior, and if any code path
leads to
undefined behavior, the compiler assumes that this path will not be
taken and does
not translate that path to behave as the programmer expects in the "bag
of bits"
model. That means that if I do
union { double d; unsigned long long u; }; d = 1; printf("%llu\n", u);
the compiler will notice that it is undefined behavior for me to access
u after setting
d and it can remove the entire call to printf.
You completely miss the point, IF you ditch portability you can do any
thing you want. You can use compiler that do not do it, use special flags,
do not optimize, use intrinsic etc. etc.
You could even write your own compiler od change existing ones.
Only problem is when you pretend you write portable code and compilers
assume that you did it. If you break contact do not except compiler to
follow it too.
You completely miss the point. The David Gay code in question was first
written in 1991,
and has worked as expected on a huge variety of machines and compilers.
But new compilers
give themselves permission to destroy paths that involve undefined
behavior and are more likely
to think they have found such paths, and so this code that's a quarter
century old can start breaking
now just by being rebuilt.
And I don't know why you think the code isn't portable. Portable code
behaves the same way
in different environments. You seem to believe that code that configures
itself to its environment
is not portable, but I reject such a claim.
a) this code was allowed in any standard?
b) use compilers from 1991 then.
c) ifdefs is not sign of portable code, working but not portable code
because you need adjust it in any new environment.
d) and new ifdef for new compilers.
e) compilers do not delete your code willy nilly, you need have serious
bugs to get that.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-10-18 21:48:12 UTC
Permalink
Post by i***@gmail.com
Post by Hyman Rosen
Post by i***@gmail.com
Post by Hyman Rosen
Post by i***@gmail.com
If you want your code follow same principle then you too can ignore
all things that standard say about UB.
No, that's completely wrong. Because of the optimizationists,
compilers assume
that programs do not execute undefined behavior, and if any code path
leads to
undefined behavior, the compiler assumes that this path will not be
taken and does
not translate that path to behave as the programmer expects in the "bag
of bits"
model. That means that if I do
union { double d; unsigned long long u; }; d = 1; printf("%llu\n", u);
the compiler will notice that it is undefined behavior for me to access
u after setting
d and it can remove the entire call to printf.
You completely miss the point, IF you ditch portability you can do any
thing you want. You can use compiler that do not do it, use special flags,
do not optimize, use intrinsic etc. etc.
You could even write your own compiler od change existing ones.
Only problem is when you pretend you write portable code and compilers
assume that you did it. If you break contact do not except compiler to
follow it too.
You completely miss the point. The David Gay code in question was first
written in 1991,
and has worked as expected on a huge variety of machines and compilers.
But new compilers
give themselves permission to destroy paths that involve undefined
behavior and are more likely
to think they have found such paths, and so this code that's a quarter
century old can start breaking
now just by being rebuilt.
And I don't know why you think the code isn't portable. Portable code
behaves the same way
in different environments. You seem to believe that code that configures
itself to its environment
is not portable, but I reject such a claim.
a) this code was allowed in any standard?
b) use compilers from 1991 then.
c) ifdefs is not sign of portable code, working but not portable code
because you need adjust it in any new environment.
d) and new ifdef for new compilers.
e) compilers do not delete your code willy nilly, you need have serious
bugs to get that.
That last part is most assuredly not true. Integer overflow is not a
"serious bug", but it most assuredly is UB, and there are compilers that
will cull out code on the assumption that A + B will always be greater than
either value.

UB is not always a "serious bug".
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nicol Bolas
2017-10-18 21:43:49 UTC
Permalink
Post by Hyman Rosen
Post by i***@gmail.com
Post by Hyman Rosen
Post by i***@gmail.com
If you want your code follow same principle then you too can ignore all
things that standard say about UB.
No, that's completely wrong. Because of the optimizationists, compilers
assume
that programs do not execute undefined behavior, and if any code path
leads to
undefined behavior, the compiler assumes that this path will not be
taken and does
not translate that path to behave as the programmer expects in the "bag
of bits"
model. That means that if I do
union { double d; unsigned long long u; }; d = 1; printf("%llu\n", u);
the compiler will notice that it is undefined behavior for me to access
u after setting
d and it can remove the entire call to printf.
You completely miss the point, IF you ditch portability you can do any
thing you want. You can use compiler that do not do it, use special flags,
do not optimize, use intrinsic etc. etc.
You could even write your own compiler od change existing ones.
Only problem is when you pretend you write portable code and compilers
assume that you did it. If you break contact do not except compiler to
follow it too.
You completely miss the point. The David Gay code in question was first
written in 1991,
and has worked as expected on a huge variety of machines and compilers.
But new compilers
give themselves permission to destroy paths that involve undefined
behavior and are more likely
to think they have found such paths, and so this code that's a quarter
century old can start breaking
now just by being rebuilt.
New compilers "give themselves" *nothing*; they have *always* had
permission to do so. They are merely exercising that right.

At least, in theory.

And I don't know why you think the code isn't portable.
Because the standard is what defines portability. If your code is correct
against the standard but doesn't work on a compiler, that's unquestionably
a compiler bug. If your code is incorrect against the standard but doesn't
work on a compiler, the compiler team may or may not consider it a bug.

Sure, a piece of code can happen to work as you expect on some number of
compilers. Or indeed all of them. But without the standard explicitly
specifying that it behave as you expect, you're just relying on hope that
it will continue to behave as you expect.

Indeed, is that not exactly the problem you want "corrected"? That the
standard gives compilers the right to make this code not work as you
expect, and you want to deny it that right? What other purpose is there for
putting such things in the standard except to ensure portability?

Portable code behaves the same way
Post by Hyman Rosen
in different environments. You seem to believe that code that configures
itself to its environment
is not portable, but I reject such a claim.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-18 22:52:21 UTC
Permalink
Post by Nicol Bolas
Post by Hyman Rosen
And I don't know why you think the code isn't portable.
Because the standard is what defines portability.
Oh, I agree with that. The other poster seemed to think that having
environmental
tests made the code non-portable.

Sure, a piece of code can happen to work as you expect on some number of
Post by Nicol Bolas
compilers. Or indeed all of them. But without the standard explicitly
specifying that it behave as you expect, you're just relying on hope that
it will continue to behave as you expect.
Indeed, is that not exactly the problem you want "corrected"? That the
standard gives compilers the right to make this code not work as you
expect, and you want to deny it that right? What other purpose is there for
putting such things in the standard except to ensure portability?
Yes, no argument there. What I'm illustrating here is that working code
that lies
at the heart of many standard library implementations relies on the "bag of
bits"
model, using both union type punning and copying bytes past the apparent end
of objects. Enshrining the "bag of bits" model into standard C++ would
prevent
compilers from breaking this code and would do away with the silliness that
led
to std::launder and the notion that std::vector can't be implemented
legally.

I know that will never happen, of course, because the optimizationists hold
sway.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-10-20 18:21:41 UTC
Permalink
The day c++ stops being ultimately, prettified C, is the day it becomes
obsolete.

There are many high level languages that with abstract memory models to
choose from. The attraction of c++ (at least for me) is that it allows high
level constructs and expression while touching the real machine.

The removal of type-punning from unions makes unions useless for the only
job they were ever designed to do - to overlay different-shaped views over
the same bag of bits.

If you can't do that with them, the entire keyword is pointless, as you can
get the same behaviour as union by simply reinterpret-casting a
std::aligned_storage::type.

I totally understand the desire to allow optimisations and the rationale
behind the current situation. I happen to think that the current standard
has taken a suboptimal approach.

In my view, if a compiler can prove that every access of a union in a block
of code is through the same type, it should be allowed to optimised the
access to that 'object'. If it can't prove it (i.e. it sees type-punning
happening) it should not optimise it unless the optimisation can take the
type punning into account, which let's face it, it could - because of
static analysis - we do it for constexpr after all.
Post by Hyman Rosen
Post by Nicol Bolas
Post by Hyman Rosen
And I don't know why you think the code isn't portable.
Because the standard is what defines portability.
Oh, I agree with that. The other poster seemed to think that having
environmental
tests made the code non-portable.
Sure, a piece of code can happen to work as you expect on some number of
Post by Nicol Bolas
compilers. Or indeed all of them. But without the standard explicitly
specifying that it behave as you expect, you're just relying on hope that
it will continue to behave as you expect.
Indeed, is that not exactly the problem you want "corrected"? That the
standard gives compilers the right to make this code not work as you
expect, and you want to deny it that right? What other purpose is there for
putting such things in the standard except to ensure portability?
Yes, no argument there. What I'm illustrating here is that working code
that lies
at the heart of many standard library implementations relies on the "bag
of bits"
model, using both union type punning and copying bytes past the apparent end
of objects. Enshrining the "bag of bits" model into standard C++ would
prevent
compilers from breaking this code and would do away with the silliness
that led
to std::launder and the notion that std::vector can't be implemented
legally.
I know that will never happen, of course, because the optimizationists
hold sway.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-20 18:36:46 UTC
Permalink
On 20 Oct 2017 19:21, "Richard Hodges" <***@gmail.com> wrote:

The day c++ stops being ultimately, prettified C, is the day it becomes
obsolete.

There are many high level languages that with abstract memory models to
choose from. The attraction of c++ (at least for me) is that it allows high
level constructs and expression while touching the real machine.

The removal of type-punning from unions makes unions useless for the only
job they were ever designed to do - to overlay different-shaped views over
the same bag of bits.


So their use to implement space-efficient sum types was what, an
unfortunate accident?

If you can't do that with them, the entire keyword is pointless, as you can
get the same behaviour as union by simply reinterpret-casting a
std::aligned_storage::type.


And you can convert bit representations between types using memcpy.

I totally understand the desire to allow optimisations and the rationale
behind the current situation. I happen to think that the current standard
has taken a suboptimal approach.

In my view, if a compiler can prove that every access of a union in a block
of code is through the same type, it should be allowed to optimised the
access to that 'object'. If it can't prove it (i.e. it sees type-punning
happening) it should not optimise it unless the optimisation can take the
type punning into account, which let's face it, it could - because of
static analysis - we do it for constexpr after all.
Post by Hyman Rosen
Post by Nicol Bolas
Post by Hyman Rosen
And I don't know why you think the code isn't portable.
Because the standard is what defines portability.
Oh, I agree with that. The other poster seemed to think that having
environmental
tests made the code non-portable.
Sure, a piece of code can happen to work as you expect on some number of
Post by Nicol Bolas
compilers. Or indeed all of them. But without the standard explicitly
specifying that it behave as you expect, you're just relying on hope that
it will continue to behave as you expect.
Indeed, is that not exactly the problem you want "corrected"? That the
standard gives compilers the right to make this code not work as you
expect, and you want to deny it that right? What other purpose is there for
putting such things in the standard except to ensure portability?
Yes, no argument there. What I'm illustrating here is that working code
that lies
at the heart of many standard library implementations relies on the "bag
of bits"
model, using both union type punning and copying bytes past the apparent end
of objects. Enshrining the "bag of bits" model into standard C++ would
prevent
compilers from breaking this code and would do away with the silliness
that led
to std::launder and the notion that std::vector can't be implemented
legally.
I know that will never happen, of course, because the optimizationists
hold sway.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/
isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-10-20 19:07:32 UTC
Permalink
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
So their use to implement space-efficient sum types was what, an
unfortunate accident?

No, it was intended, but this was of course before there were variadic
templates. Since we have those plus type traits, we can implement a union
with nothing more than std::aligned_storage<max_size<Ts...>::value,
max_align<Ts...>::value>>::type
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
And you can convert bit representations between types using memcpy.
You can indeed - probably the most obtuse expression of "I just want this
memory to be treated like an int" one could think of. Please remember, I do
understand the rational for the current standard. I did say that.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
That isn't what K&R1 says about unions...
Perhaps not, but it is pretty much the only thing we ever used unions for
(oh, apart from the blocks in the K&R malloc implementation's free list).




On 20 October 2017 at 20:36, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
The day c++ stops being ultimately, prettified C, is the day it becomes
obsolete.
There are many high level languages that with abstract memory models to
choose from. The attraction of c++ (at least for me) is that it allows high
level constructs and expression while touching the real machine.
The removal of type-punning from unions makes unions useless for the only
job they were ever designed to do - to overlay different-shaped views over
the same bag of bits.
So their use to implement space-efficient sum types was what, an
unfortunate accident?
If you can't do that with them, the entire keyword is pointless, as you
can get the same behaviour as union by simply reinterpret-casting a
std::aligned_storage::type.
And you can convert bit representations between types using memcpy.
I totally understand the desire to allow optimisations and the rationale
behind the current situation. I happen to think that the current standard
has taken a suboptimal approach.
In my view, if a compiler can prove that every access of a union in a
block of code is through the same type, it should be allowed to optimised
the access to that 'object'. If it can't prove it (i.e. it sees
type-punning happening) it should not optimise it unless the optimisation
can take the type punning into account, which let's face it, it could -
because of static analysis - we do it for constexpr after all.
Post by Hyman Rosen
Post by Nicol Bolas
Post by Hyman Rosen
And I don't know why you think the code isn't portable.
Because the standard is what defines portability.
Oh, I agree with that. The other poster seemed to think that having
environmental
tests made the code non-portable.
Sure, a piece of code can happen to work as you expect on some number of
Post by Nicol Bolas
compilers. Or indeed all of them. But without the standard explicitly
specifying that it behave as you expect, you're just relying on hope that
it will continue to behave as you expect.
Indeed, is that not exactly the problem you want "corrected"? That the
standard gives compilers the right to make this code not work as you
expect, and you want to deny it that right? What other purpose is there for
putting such things in the standard except to ensure portability?
Yes, no argument there. What I'm illustrating here is that working code
that lies
at the heart of many standard library implementations relies on the "bag
of bits"
model, using both union type punning and copying bytes past the apparent end
of objects. Enshrining the "bag of bits" model into standard C++ would
prevent
compilers from breaking this code and would do away with the silliness
that led
to std::launder and the notion that std::vector can't be implemented
legally.
I know that will never happen, of course, because the optimizationists
hold sway.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/is
ocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-20 20:58:43 UTC
Permalink
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
So their use to implement space-efficient sum types was what, an
unfortunate accident?

No, it was intended, but this was of course before there were variadic
templates. Since we have those plus type traits, we can implement a union
with nothing more than std::aligned_storage<max_size<Ts...>::value,
max_align<Ts...>::value>>::type


You mean aligned_union_t<0, Ts... >?
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
And you can convert bit representations between types using memcpy.
You can indeed - probably the most obtuse expression of "I just want this
memory to be treated like an int" one could think of. Please remember, I do
understand the rational for the current standard. I did say that.


But that's not something that it makes sense to be able to express. We
don't compute with values in memory; we load them into variables and store
the results.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
That isn't what K&R1 says about unions...
Perhaps not, but it is pretty much the only thing we ever used unions for
(oh, apart from the blocks in the K&R malloc implementation's free list).




On 20 October 2017 at 20:36, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
The day c++ stops being ultimately, prettified C, is the day it becomes
obsolete.
There are many high level languages that with abstract memory models to
choose from. The attraction of c++ (at least for me) is that it allows high
level constructs and expression while touching the real machine.
The removal of type-punning from unions makes unions useless for the only
job they were ever designed to do - to overlay different-shaped views over
the same bag of bits.
So their use to implement space-efficient sum types was what, an
unfortunate accident?
If you can't do that with them, the entire keyword is pointless, as you
can get the same behaviour as union by simply reinterpret-casting a
std::aligned_storage::type.
And you can convert bit representations between types using memcpy.
I totally understand the desire to allow optimisations and the rationale
behind the current situation. I happen to think that the current standard
has taken a suboptimal approach.
In my view, if a compiler can prove that every access of a union in a
block of code is through the same type, it should be allowed to optimised
the access to that 'object'. If it can't prove it (i.e. it sees
type-punning happening) it should not optimise it unless the optimisation
can take the type punning into account, which let's face it, it could -
because of static analysis - we do it for constexpr after all.
Post by Hyman Rosen
Post by Nicol Bolas
Post by Hyman Rosen
And I don't know why you think the code isn't portable.
Because the standard is what defines portability.
Oh, I agree with that. The other poster seemed to think that having
environmental
tests made the code non-portable.
Sure, a piece of code can happen to work as you expect on some number of
Post by Nicol Bolas
compilers. Or indeed all of them. But without the standard explicitly
specifying that it behave as you expect, you're just relying on hope that
it will continue to behave as you expect.
Indeed, is that not exactly the problem you want "corrected"? That the
standard gives compilers the right to make this code not work as you
expect, and you want to deny it that right? What other purpose is there for
putting such things in the standard except to ensure portability?
Yes, no argument there. What I'm illustrating here is that working code
that lies
at the heart of many standard library implementations relies on the "bag
of bits"
model, using both union type punning and copying bytes past the apparent end
of objects. Enshrining the "bag of bits" model into standard C++ would
prevent
compilers from breaking this code and would do away with the silliness
that led
to std::launder and the notion that std::vector can't be implemented
legally.
I know that will never happen, of course, because the optimizationists
hold sway.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/is
ocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/
isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Nevin Liber
2017-10-20 18:37:29 UTC
Permalink
Post by Richard Hodges
The removal of type-punning from unions makes unions useless for the only
job they were ever designed to do - to overlay different-shaped views over
the same bag of bits.
That isn't what K&R1 says about unions. What is your reference document
which states that was what they were designed to do?

K&R1: "so long as the usage is consistent: the type retrieved must be the
type most recently stored"
--
Nevin ":-)" Liber <mailto:***@eviloverlord.com> +1-847-691-1404
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Ville Voutilainen
2017-10-20 18:45:58 UTC
Permalink
Post by Nevin Liber
Post by Richard Hodges
The removal of type-punning from unions makes unions useless for the only
job they were ever designed to do - to overlay different-shaped views over
the same bag of bits.
That isn't what K&R1 says about unions. What is your reference document
which states that was what they were designed to do?
K&R1: "so long as the usage is consistent: the type retrieved must be the
type most recently stored"
There seems to be an obvious mismatch between programmers who want
bit-blasting at all costs
and users who want TBAA. It might perhaps be useful to ask whether
there's any hints/remnants/evidence/even hearsay
of how dmr viewed it when C was developed. There's a non-zero chance
that some people might be able
to provide more than guesses at that.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-10-23 20:05:36 UTC
Permalink
Post by Ville Voutilainen
There seems to be an obvious mismatch between programmers who want
bit-blasting at all costs
and users who want TBAA. It might perhaps be useful to ask whether
there's any hints/remnants/evidence/even hearsay
of how dmr viewed it when C was developed. There's a non-zero chance
that some people might be able
to provide more than guesses at that.
I'm in the middle. I do a lot of low-level stuff, where I want to mess
with the raw bytes, but far more often I do generic programming that
doesn't need it.

I would be happy if there were a way to tell the compiler "here there be
dragons" for a section of code, but the rest can have assumptions. The
difficult part then is the interactions between "normal" code and
"bit-blasting" code. If I write a custom memory manager, the memory
manager itself needs to do evil work, but the consumer of the memory might
use the objects in an ordinary fashion.

This isn't an easy problem to solve >.<

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-23 20:48:27 UTC
Permalink
Post by Myriachan
This isn't an easy problem to solve >.<
It is, if you disregard the optimizationists. You solve the problem by
defining
the language using the bag-of-bits model, and you let the compilers optimize
whenever they can prove that what they're doing doesn't affect the behavior
of the code. Essentially, everything is volatile, but without the
requirement of
actual loads and stores when the compiler can prove they're unnecessary.

"Here be dragons" is the wrong approach, because dragons are everywhere.
Conceivably there could be some "here be angels" annotation, but it's more
than likely that the behavior would be just as difficult to specify as it
is now.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-10-23 21:18:46 UTC
Permalink
Post by Nevin Liber
Post by Richard Hodges
The removal of type-punning from unions makes unions useless for the only
job they were ever designed to do - to overlay different-shaped views over
the same bag of bits.
That isn't what K&R1 says about unions. What is your reference document
which states that was what they were designed to do?
K&R1: "so long as the usage is consistent: the type retrieved must be the
type most recently stored"
I don't have K&R1 to look up, but I do have K&R2. K&R2 page 147 has that
same sentence, and has the following sentence, which extends to page 148:

K&R2: "It is the programmer's responsibility to keep track of which type is
currently stored in a union; the results are implementation-dependent if
something is stored as one type and extracted as another."

"Implementation-dependent" is ambiguous here. It could mean that what
happens is *entirely* up to the implementation, in that it could decide to
make such cases undefined behavior and all hell breaks loose. Or it could
mean that what you get when you extract as the wrong type is whatever the
bits mean to the implementation in that situation.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Patrice Roy
2017-10-24 00:19:31 UTC
Permalink
Implementation-dependent is not UB. It's well-defined per-implementation,
but not necessarily portable. The code remains correct for that
implementation.
Post by Myriachan
Post by Nevin Liber
Post by Richard Hodges
The removal of type-punning from unions makes unions useless for the
only job they were ever designed to do - to overlay different-shaped views
over the same bag of bits.
That isn't what K&R1 says about unions. What is your reference document
which states that was what they were designed to do?
K&R1: "so long as the usage is consistent: the type retrieved must be the
type most recently stored"
I don't have K&R1 to look up, but I do have K&R2. K&R2 page 147 has that
K&R2: "It is the programmer's responsibility to keep track of which type
is currently stored in a union; the results are implementation-dependent if
something is stored as one type and extracted as another."
"Implementation-dependent" is ambiguous here. It could mean that what
happens is *entirely* up to the implementation, in that it could decide
to make such cases undefined behavior and all hell breaks loose. Or it
could mean that what you get when you extract as the wrong type is whatever
the bits mean to the implementation in that situation.
Melissa
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Patrice Roy
2017-10-24 00:24:58 UTC
Permalink
To be clear : http://eel.is/c++draft/intro.execution#2

That's one risk of UB less :)
Post by Patrice Roy
Implementation-dependent is not UB. It's well-defined per-implementation,
but not necessarily portable. The code remains correct for that
implementation.
Post by Myriachan
Post by Nevin Liber
Post by Richard Hodges
The removal of type-punning from unions makes unions useless for the
only job they were ever designed to do - to overlay different-shaped views
over the same bag of bits.
That isn't what K&R1 says about unions. What is your reference document
which states that was what they were designed to do?
K&R1: "so long as the usage is consistent: the type retrieved must be
the type most recently stored"
I don't have K&R1 to look up, but I do have K&R2. K&R2 page 147 has that
K&R2: "It is the programmer's responsibility to keep track of which type
is currently stored in a union; the results are implementation-dependent if
something is stored as one type and extracted as another."
"Implementation-dependent" is ambiguous here. It could mean that what
happens is *entirely* up to the implementation, in that it could decide
to make such cases undefined behavior and all hell breaks loose. Or it
could mean that what you get when you extract as the wrong type is whatever
the bits mean to the implementation in that situation.
Melissa
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-24 02:54:37 UTC
Permalink
Post by Patrice Roy
Implementation-dependent is not UB. It's well-defined per-implementation,
but not necessarily portable. The code remains correct for that
implementation.
That is clear, today. But did K&R mean that when they wrote the book? Did they
consistently mean the current interpretation, as opposed to current UB?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-10-24 04:15:31 UTC
Permalink
I don’t think it’s particulalry useful to ponder what K&R meant by this or
that. They weren’t holy prophets, just guys trying to make assembler less
of a pain to write.

The assembler community caught on and became the c community. C made it
easier to express oneself in the abstract while still being able to
predict, more or less, what code would appear.

Pointers were index registers, unions were blocks of memory, hex numbers
cast as pointers to volatile were memory-mapped I/O addresses. Simple.

Then C compilers started getting a little cleverer - loop unrolling,
automatic register allocation and the like.

Now we have a dichotomy. For complete optimisation we need complete
abstraction, for bit-manipulation it would be nice to have a multi-shaped
bag of bits type. I don’t see those as incompatible.

If there were some way to express to the compiler that stores must “happen
before here” and loads cannot happen “before here” then compilers could
assume abstraction until told to materialise all that abstraction in memory.
Post by Thiago Macieira
Post by Patrice Roy
Implementation-dependent is not UB. It's well-defined per-implementation,
but not necessarily portable. The code remains correct for that
implementation.
That is clear, today. But did K&R mean that when they wrote the book? Did they
consistently mean the current interpretation, as opposed to current UB?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-24 04:25:59 UTC
Permalink
I don’t think it’s particulalry useful to ponder what K&R meant by this or
that. They weren’t holy prophets, just guys trying to make assembler less
of a pain to write.
Ok, then we mustn't interpret when they write "implementation-defined" as the
current meaning. They could have meant what we today understand to be UB.

Following that, we conclude that K&R never meant for you to read from an
inactive member of a union.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-10-24 10:47:53 UTC
Permalink
Post by Thiago Macieira
Following that, we conclude that K&R never meant for you to read from an
inactive member of a union.

We could, but that would be missing the most important thing, which is what
do *I* want and what do other users of C++ want? Mr Kernighan and Mr
Ritchie have had their time bashing keys. I expect they are enjoying a
profitable retirement.

What do we want? Of course we want it all - awesome optimisation plus the
ability to directly address memory bytes through an object-shaped lens.

I personally don't think that's difficult to provide, so why not provide it?

All we need is some rule such as "whenever a union is or could be addressed
through some other lens other than the one that was previously written, all
underlying bytes will have deemed to have been written, and the next read
object will be *as if* its corresponding bytes had been written".

Then the union would be perfectly type-punnable and perfectly optimisable. This
would even allow unions to be used for type punning in constexpr
environments - such as for determining endianness.

I have now posted two possible solutions, while the rest of the community
seems intent solely on defending a partisan position.

Anyone else care to approach this in a positive way?

R
Post by Thiago Macieira
Post by Richard Hodges
I don’t think it’s particulalry useful to ponder what K&R meant by this
or
Post by Richard Hodges
that. They weren’t holy prophets, just guys trying to make assembler less
of a pain to write.
Ok, then we mustn't interpret when they write "implementation-defined" as the
current meaning. They could have meant what we today understand to be UB.
Following that, we conclude that K&R never meant for you to read from an
inactive member of a union.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
charleyb123 .
2017-10-24 12:27:38 UTC
Permalink
<snip, reading from inactive member of a union>
Post by Richard Hodges
Anyone else care to approach this in a positive way?
*- In C11, non-active union member access is well-defined.
*- In C++, this is undefined.

It has been suggested that the difference is because C does not have
destructors. However, in C++ we accept that "not" executing a trivial
destructor is well-defined (and this technique is commonly used in memory
allocators for trivial objects).

IMHO, therefore it stands to reason that C++ could make non-active union
member access well-defined if the member objects each had trivial
destructors.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-24 14:39:32 UTC
Permalink
Post by charleyb123 .
<snip, reading from inactive member of a union>
Post by Richard Hodges
Anyone else care to approach this in a positive way?
*- In C11, non-active union member access is well-defined.
*- In C++, this is undefined.
It has been suggested that the difference is because C does not have
destructors. However, in C++ we accept that "not" executing a trivial
destructor is well-defined (and this technique is commonly used in memory
allocators for trivial objects).
IMHO, therefore it stands to reason that C++ could make non-active union
member access well-defined if the member objects each had trivial
destructors.
The common initial sequence rule holds for non-trivial standard-layout
unions, so that suggestion does not give the full picture.

The difference is more that C++ takes a more rigorous approach towards its
data model; this has the happy consequence of allowing type-based aliasing
optimizations. As has been mentioned above (or possibly elsewhere), the
status quo in C11 is not entirely viewed without some regret.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-24 14:50:53 UTC
Permalink
Post by Thiago Macieira
Post by Thiago Macieira
Following that, we conclude that K&R never meant for you to read from an
inactive member of a union.
We could, but that would be missing the most important thing, which is
what do *I* want and what do other users of C++ want? Mr Kernighan and Mr
Ritchie have had their time bashing keys. I expect they are enjoying a
profitable retirement.
What do we want? Of course we want it all - awesome optimisation plus the
ability to directly address memory bytes through an object-shaped lens.
I personally don't think that's difficult to provide, so why not provide it?
Absolutely! It doesn't even have to be provided in the language; a library
solution will serve perfectly well.
Post by Thiago Macieira
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Right; or we could write a union-style class that *actually* writes and
reads the underlying bytes to some appropriate buffer. This would be even
better, as it would not violate the data model or damage optimization
opportunities for other programs or other parts of the same program.
Post by Thiago Macieira
Then the union would be perfectly type-punnable and perfectly optimisable. This
would even allow unions to be used for type punning in constexpr
environments - such as for determining endianness.
I have now posted two possible solutions, while the rest of the community
seems intent solely on defending a partisan position.
Anyone else care to approach this in a positive way?
R
Post by Thiago Macieira
Post by Richard Hodges
I don’t think it’s particulalry useful to ponder what K&R meant by this
or
Post by Richard Hodges
that. They weren’t holy prophets, just guys trying to make assembler
less
Post by Richard Hodges
of a pain to write.
Ok, then we mustn't interpret when they write "implementation-defined" as the
current meaning. They could have meant what we today understand to be UB.
Following that, we conclude that K&R never meant for you to read from an
inactive member of a union.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/
isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-24 15:23:33 UTC
Permalink
On Tue, Oct 24, 2017 at 10:50 AM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Absolutely! It doesn't even have to be provided in the language; a library
solution will serve perfectly well.
The solution is to treat all variables as volatile unless the compiler can
prove
that the memory involved cannot change between accesses.

Note that way before type-based access analysis, optimizationists broke the
ability to
reason about C code by allowing intermediate values of floating-point
expressions to
be kept in extended precision, leading to absurd results such as identical
expressions
comparing unequal because one had been spilled to memory and the other kept
in a
register.

To keep flogging my hobby horse, computer programs are written in order to
control
the operations of a computer. To the extent that the programming system
fails to convey
the intent of the programmer into the operation of the computer, the
programming system
is broken. The idea that a compiler takes upon itself the ability to
express such incredulity
about intention that it silently chooses not to implement it should be
horrifying.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-24 15:30:21 UTC
Permalink
On 24 Oct 2017 16:23, "Hyman Rosen" <***@gmail.com> wrote:

On Tue, Oct 24, 2017 at 10:50 AM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Absolutely! It doesn't even have to be provided in the language; a library
solution will serve perfectly well.
The solution is to treat all variables as volatile unless the compiler can
prove
that the memory involved cannot change between accesses.

Note that way before type-based access analysis, optimizationists broke the
ability to
reason about C code by allowing intermediate values of floating-point
expressions to
be kept in extended precision, leading to absurd results such as identical
expressions
comparing unequal because one had been spilled to memory and the other kept
in a
register.

To keep flogging my hobby horse, computer programs are written in order to
control
the operations of a computer. To the extent that the programming system
fails to convey
the intent of the programmer into the operation of the computer, the
programming system
is broken. The idea that a compiler takes upon itself the ability to
express such incredulity
about intention that it silently chooses not to implement it should be
horrifying.

Just as well you've never looked to see what the *processor* vendors are
doing with your code, then.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-24 15:45:43 UTC
Permalink
On Tue, Oct 24, 2017 at 11:30 AM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Just as well you've never looked to see what the *processor* vendors are
doing with your code, then.
Why do you think I've never looked?
<
https://www.theregister.co.uk/2017/06/25/intel_skylake_kaby_lake_hyperthreading/
But the processor vendors are not trying to destroy the programmer's coding
model.
From the programmer's point of view, the machine presents registers and
memory,
and those are read and written as the program says, regardless of the
machinations
that the processor may be doing under the surface. The compilers, on the
other hand,
are looking for any excuse to throw away code that the programmer has
written and not
do what the programmer intended. "You couldn't possibly have meant to do
that, so I'm
just going to ignore your instructions and not bother to tell you" is
something that would
get a human employee fired, and it's no less displeasing in a compiler.

And by the way, a minute after my last post, a colleague called me over
with exactly the
floating-point problem I had posted - things that should have been exactly
equal were
comparing unequal. I suggested the usual fix, saving values in volatile
double variables,
and the problem went away. (Gcc on Intel using x87 rather than SSE for
floating-point.)
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-24 20:47:30 UTC
Permalink
On 24 Oct 2017 16:46, "Hyman Rosen" <***@gmail.com> wrote:

On Tue, Oct 24, 2017 at 11:30 AM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Just as well you've never looked to see what the *processor* vendors are
doing with your code, then.
Why do you think I've never looked?
<https://www.theregister.co.uk/2017/06/25/intel_skylake_
kaby_lake_hyperthreading/>

But the processor vendors are not trying to destroy the programmer's coding
model.
From the programmer's point of view, the machine presents registers and
memory,
and those are read and written as the program says, regardless of the
machinations
that the processor may be doing under the surface. The compilers, on the
other hand,
are looking for any excuse to throw away code that the programmer has
written and not
do what the programmer intended. "You couldn't possibly have meant to do
that, so I'm
just going to ignore your instructions and not bother to tell you" is
something that would
get a human employee fired, and it's no less displeasing in a compiler.


So you don't find out-of-order and speculative execution disturbing? You
aren't concerned with how memory accesses appear to other threads and to
signal handlers?


And by the way, a minute after my last post, a colleague called me over
with exactly the
floating-point problem I had posted - things that should have been exactly
equal were
comparing unequal. I suggested the usual fix, saving values in volatile
double variables,
and the problem went away. (Gcc on Intel using x87 rather than SSE for
floating-point.)
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/
isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-24 22:01:02 UTC
Permalink
On Tue, Oct 24, 2017 at 4:47 PM, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
So you don't find out-of-order and speculative execution disturbing?
No. Those aren't visible to the programming model.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
You aren't concerned with how memory accesses appear to other threads and
to signal handlers?
No. Data races and such have always been problematic, so the golden rule
is to always
use the synchronization primitives defined by the language or programming
system.
(In keeping with my bag-of-bits point of view, I suppose you think that I
think that volatile
should serve that purpose, but I don't - processors have always written
long things to
memory in chunks, not atomically. People who thought that volatile meant
atomic were
always just wrong.)
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-24 20:56:08 UTC
Permalink
Post by Hyman Rosen
And by the way, a minute after my last post, a colleague called me over
with exactly the
floating-point problem I had posted - things that should have been exactly
equal were
comparing unequal. I suggested the usual fix, saving values in volatile
double variables,
and the problem went away. (Gcc on Intel using x87 rather than SSE for
floating-point.)
Compile with -mfpmath=sse and stop supporting older than Pentium III (1999).
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-24 22:03:25 UTC
Permalink
Post by Thiago Macieira
Compile with -mfpmath=sse and stop supporting older than Pentium III (1999).
How will that fix a badly specified programming language?
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-24 23:06:58 UTC
Permalink
Post by Hyman Rosen
Post by Thiago Macieira
Compile with -mfpmath=sse and stop supporting older than Pentium III (1999).
How will that fix a badly specified programming language?
There's nothing wrong with the language. I was just giving you advice to make
your code faster.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-24 23:25:46 UTC
Permalink
Post by Thiago Macieira
There's nothing wrong with the language.
A language where a + b == a + b is not required to be true for numbers a
and b is broken.
People who want to calculate in extended precision can use long double.
Forcing this on
everyone was an exercise in optimizationism - producing poorly defined
results because they
can be produced faster.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-25 00:05:15 UTC
Permalink
Post by Hyman Rosen
Post by Thiago Macieira
There's nothing wrong with the language.
A language where a + b == a + b is not required to be true for numbers a
and b is broken.
This is not required to be true either
a / b * b == a * b / b

So what makes your example special?

Floating point operations can and do produce different results depending on
the order of the operations, due to loss of precision. That has nothing to do
with C++, but with the nature of floating point.
Post by Hyman Rosen
People who want to calculate in extended precision can use long double.
Forcing this on
everyone was an exercise in optimizationism - producing poorly defined
results because they
can be produced faster.
No, it was required because that's how the 8087 co-processor works. Doing it
any other way would be unbearably slow for normal use-cases, pessimising a lot
of people and having them pay for something they don't use.

I think you were advocating recently that the language should not get in the
way of using the processor and co-processors the way they were intended.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-25 00:08:23 UTC
Permalink
Post by Thiago Macieira
Post by Hyman Rosen
A language where a + b == a + b is not required to be true for numbers a
and b is broken.
This is not required to be true either
a / b * b == a * b / b
Oh, the irony: the above statement is more likely to be true with 8087 than
without it.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-25 15:07:32 UTC
Permalink
On Tuesday, 24 October 2017 16:25:46 PDT Hyman Rosen wrote:>
Post by Hyman Rosen
A language where a + b == a + b is not required to be true for numbers a
and b is broken.
This is not required to be true either
a / b * b == a * b / b
So what makes your example special?
Your example isn't even true for integers. My example is special because it
involves identical expressions with identical values evaluating to different
results.

Floating point operations can and do produce different results depending on
the order of the operations, due to loss of precision. That has nothing to do
with C++, but with the nature of floating point.
You are putting up a straw man. In my example, we are varying nothing at
all,
not values, not operations, and not order. The language allows identical
arithmetic
expressions to produce different results. That's broken.

No, it was required because that's how the 8087 co-processor works. Doing it
any other way would be unbearably slow for normal use-cases, pessimising a lot
of people and having them pay for something they don't use.
I could speak again about optimizationism and producing bad results quickly,
but that's not even necessary here. All the language has to require is that
identical full floating-point arithmetic expressions involving the same
operand
values must give the same result value. (I can insert pedantry to define a
"full floating-point arithmetic expression" if you want.) Compilers can
still use
extended precision in the middle if they really, really want to, but they
have
to get to the same result every time.
I think you were advocating recently that the language should not get in the
way of using the processor and co-processors the way they were intended.
No. I was advocating that the language should not willfully break programs
that were written with a certain model in mind and that had been working
correctly according to that model for decades. Breaking a + b == a + b is
exactly the same language problem as type-based alias analysis, namely
not conforming to the programmer's model of how the system works, and
doing so silently.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-25 16:02:02 UTC
Permalink
Post by Hyman Rosen
Your example isn't even true for integers. My example is special because it
involves identical expressions with identical values evaluating to
different results.
Ok, mine was a bad example. See a better one below.

Your two expressions are not identical. You're thinking that because math
tells you they should be, since you learned in 1st grade that addition is
commutative. If they were identical, they would produce the same result.

In any case, I'm hard-pressed to find a case where changing the order of the
operands in an addition will cause a different result. FP differences happen
when there are at least two operations, causing intermediaries to be spilled
or not spilled; which parts of the significands get lost and at what times,
etc.. So, for example:

double sum(double a, double b, double c)
{
a += b;
return a + c;
}

If you compile this in debug mode, it will probably produce a different result
compared to release mode if the compiler uses 387 math.

But remember the result could be different too if you had written:

a += c;
return a + b;

or

b += c;
return a + b;

FP math in computers depends on the order of the operations. Period.

Test operands: a = 1<<52; b = .5; c = .5.
Post by Hyman Rosen
You are putting up a straw man. In my example, we are varying nothing at
all,
not values, not operations, and not order. The language allows identical
arithmetic
expressions to produce different results. That's broken.
Your example doesn't produce different results. And you're varying the order
of the operands.
Post by Hyman Rosen
Post by Thiago Macieira
No, it was required because that's how the 8087 co-processor works. Doing it
any other way would be unbearably slow for normal use-cases, pessimising a lot
of people and having them pay for something they don't use.
I could speak again about optimizationism and producing bad results quickly,
but that's not even necessary here. All the language has to require is
that identical full floating-point arithmetic expressions involving the
same operand
values must give the same result value.
The language could have decided on that. It decided to follow the rules of how
the hardware would do.
Post by Hyman Rosen
Post by Thiago Macieira
I think you were advocating recently that the language should not get in the
way of using the processor and co-processors the way they were intended.
No. I was advocating that the language should not willfully break programs
that were written with a certain model in mind and that had been working
correctly according to that model for decades. Breaking a + b == a + b is
exactly the same language problem as type-based alias analysis, namely
not conforming to the programmer's model of how the system works, and
doing so silently.
What about the programs that relied on the intermediary extended precision?
Should we break them?
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-25 16:32:53 UTC
Permalink
Post by Thiago Macieira
Your two expressions are not identical. You're thinking that because math
tells you they should be, since you learned in 1st grade that addition is
commutative. If they were identical, they would produce the same result.
I think you misread my example. I'm not arguing about a + b == b + a.
I'm arguing about a + b == a + b. They two expressions on either side
of the equality operator are identical. The standard allows this to be
false,
and compilers actually generate code that makes this false.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-25 20:13:23 UTC
Permalink
Post by Hyman Rosen
Post by Thiago Macieira
Your two expressions are not identical. You're thinking that because math
tells you they should be, since you learned in 1st grade that addition is
commutative. If they were identical, they would produce the same result.
I think you misread my example. I'm not arguing about a + b == b + a.
I'm arguing about a + b == a + b. They two expressions on either side
of the equality operator are identical. The standard allows this to be
false,
and compilers actually generate code that makes this false.
Setting aside the case of one of the operands being NaN, in which case the
result is legitimately allowed to be false, I don't see why the expression
would ever be false.

You're right, the expressions on either side are identical, therefore they
produce the same result.

See https://godbolt.org/g/nZDUkh for current proof. Three of the four
compilers performed exactly one addition, not two. They only checked to see if
the result was NaN. ICC did perform two additions, but the operands are the
same, so I don't see why the results wouldn't be (I'm not sure it's dealing
with NaN correctly, though).

See also https://godbolt.org/g/5No6pQ for the fast-math case, which MSVC
documents as
fast - "fast" floating-point model; results are less predictable
Three of the four are constant true.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-25 21:13:47 UTC
Permalink
The following code fails when built by `g++-4.8.5 -m32` on an x86 machine
given 0x1000000000000001, say, as an argument.

#include <stdio.h>
#include <stdlib.h>
double d(long long n) { return n; }
int test(volatile long long n) { return d(n) == d(n); }
int main(int c, char **v) {
while (--c > 0) {
long long n = strtoll(*++v, 0, 0);
printf("%llx %s\n", n, test(n) ? "good" : "bad");
}
}

Later versions of g++ don't do that for this program, but they could, or
they
could for similar ones. A version of this problem showed up in a real test
case
in our code, not just something I contrived.

The C11 Standard says, on the other hand,

*Implementations employing wide registers have to take care to honor
appropriate *

*semantics.Values are independent of whether they are represented in a
register or in memory.For *
*example, an implicit spilling of a register is not permitted to alter the
value.Also, an explicit *store and load
*is required to round to the precision of the storage type.In particular,
casts and assignments are required to *
*perform their specified conversion.*
which seems much better. In fact, building exactly the same program above
with the same
version of gcc, but as C rather than C++, succeeds when -std=c11 is
specified but fails when
it is not. To me, that means that the implementors gave themselves
permission to produce
the above result - it was not an accident.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-25 23:01:22 UTC
Permalink
Post by Hyman Rosen
The following code fails when built by `g++-4.8.5 -m32` on an x86 machine
given 0x1000000000000001, say, as an argument.
This number cannot be represented with precision in a double, only in a long
double.
Post by Hyman Rosen
#include <stdio.h>
#include <stdlib.h>
double d(long long n) { return n; }
What you're seeing is a side-effect of this function. See
https://godbolt.org/g/8oNtXR

The i386 SysV calling convention returns floating point in ST(0), so both
current Clang and GCC 4.8 simply ask the coprocessor to load it, then return
that. ICC in strict mode as well as current GCC store it to memory first then
reload, to force the value to lose precision.
Post by Hyman Rosen
int test(volatile long long n) { return d(n) == d(n); }
int main(int c, char **v) {
while (--c > 0) {
long long n = strtoll(*++v, 0, 0);
printf("%llx %s\n", n, test(n) ? "good" : "bad");
}
}
Later versions of g++ don't do that for this program, but they could, or
they
could for similar ones. A version of this problem showed up in a real test
case
in our code, not just something I contrived.
I understand it's not contrived, but it's artificial, because of that volatile.
The parameter to the test function is most definitely not volatile, so this
code is artificial and trying to trick the compiler with something.

And yet I don't see how this code would produce "bad" for any non-NaN value.
Did you forget a subtraction somewhere? Like return d(n) == d(n - 1)? The core
of the comparison with GCC 4.8.5 is:

fildq 24(%esp)
fucomip %st(0), %st

It's comparing d(n) to itself, so the result must be true for any value that
is not NaN. That's independent of whether there was rounding or loss of
precision.
Post by Hyman Rosen
The C11 Standard says, on the other hand,
*Implementations employing wide registers have to take care to honor
appropriate *
*semantics.Values are independent of whether they are represented in a
register or in memory.For *
*example, an implicit spilling of a register is not permitted to alter the
value.Also, an explicit *store and load
*is required to round to the precision of the storage type.In particular,
casts and assignments are required to *
*perform their specified conversion.*
which seems much better. In fact, building exactly the same program above
with the same
version of gcc, but as C rather than C++, succeeds when -std=c11 is
specified but fails when
it is not. To me, that means that the implementors gave themselves
permission to produce
the above result - it was not an accident.
The absence of the text cannot be attributed to conscious deletion. It can
just as likely be lack of addition. That is to say, it's possible it was added
to C at some point but not to C++.

It's actually easy to check this hypothesis: when was the wording you found
introduced to the C language? C99 has the exact same passage as you quoted,
but C89 did not. The example was different and it only said

"Alternatively, an operation involving only ints or floats may be executed
using double-precision operations if neither range nor precision is lost
thereby."

It only disallows *loss* of precision, not gain of it. So it seems the C
language adopted language sometime between 1989 and 1999 to disallow this, but
the same change was never added to C++.

What I can't tell you is whether this issue was never brought up or if it was
rejected.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-26 15:47:08 UTC
Permalink
Post by Thiago Macieira
Post by Hyman Rosen
The following code fails when built by `g++-4.8.5 -m32` on an x86 machine
given 0x1000000000000001, say, as an argument.
This number cannot be represented with precision in a double, only in a long
double.
Yes. That's why I chose it.
Post by Thiago Macieira
Post by Hyman Rosen
double d(long long n) { return n; }
What you're seeing is a side-effect of this function. See
https://godbolt.org/g/8oNtXR
The i386 SysV calling convention returns floating point in ST(0), so both
current Clang and GCC 4.8 simply ask the coprocessor to load it, then return
that. ICC in strict mode as well as current GCC store it to memory first then
reload, to force the value to lose precision.
Yes. My point is to demonstrate that poor language specifications give the
compilers permission to not do the store-and-load, and instead return the
value in extended precision.
Post by Thiago Macieira
Post by Hyman Rosen
int test(volatile long long n) { return d(n) == d(n); }
I understand it's not contrived, but it's artificial, because of that volatile.
The parameter to the test function is most definitely not volatile, so this
code is artificial and trying to trick the compiler with something.
I'm using volatile so that the compiler cannot elide one of the calls to
d(n).
It's not "tricky". And I don't know what you mean about the parameter not
"being" volatile; volatile is defined by the standard as

*accesses throughvolatile glvalues are evaluated strictly according to the
rules of the abstractmachine *and that's what I wanted to have happen.

And yet I don't see how this code would produce "bad" for any non-NaN value.
It's right here: <https://godbolt.org/g/KhntqH>

d(n) returns its value in a floating-point register without reducing its
precision.
The compiler generates the first call to d(n) and spills it to memory (line
25 in
the assembly listing). That reduces its precision. Then the compiler
generates
the second call, reloads the result of the first call into a register (line
31), and
compares that register with the one holding the result of the second call
(line 32),
resulting in a comparison of a reduced-precision value with an
extended-precision
value. And that's how we get d(n) == d(n) to be false.

It's comparing d(n) to itself, so the result must be true for any value that
Post by Thiago Macieira
is not NaN. That's independent of whether there was rounding or loss of
precision.
So you would think and hope. But it's not required by the C++ Standard.
Post by Thiago Macieira
Post by Hyman Rosen
To me, that means that the implementors gave themselves permission to
produce the above result - it was not an accident.
The absence of the text cannot be attributed to conscious deletion. It can
just as likely be lack of addition. That is to say, it's possible it was added
to C at some point but not to C++.
I'm saying that the implementors of the compiler deliberately used the
absence
of this requirement as permission to generate the code that returns the
value
without reducing its precision, since the very same compiler does reduce the
precision when building in C11 mode. And I'm saying that a language spec
allowing d(n) == d(n) to be false as in my example is broken.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-26 16:04:39 UTC
Permalink
Post by Hyman Rosen
Post by Thiago Macieira
The i386 SysV calling convention returns floating point in ST(0), so both
current Clang and GCC 4.8 simply ask the coprocessor to load it, then return
that. ICC in strict mode as well as current GCC store it to memory first then
reload, to force the value to lose precision.
Yes. My point is to demonstrate that poor language specifications give the
compilers permission to not do the store-and-load, and instead return the
value in extended precision.
No doubt. But the converse would assume that the specification authors had
perfect foreknowledge of all situations and perfect ability to write the text.
They can't do that, they're humans. There will be issues in the language.
Post by Hyman Rosen
Post by Thiago Macieira
Post by Hyman Rosen
int test(volatile long long n) { return d(n) == d(n); }
I understand it's not contrived, but it's artificial, because of that volatile.
The parameter to the test function is most definitely not volatile, so this
code is artificial and trying to trick the compiler with something.
I'm using volatile so that the compiler cannot elide one of the calls to
d(n).
It's not "tricky". And I don't know what you mean about the parameter not
"being" volatile; volatile is defined by the standard as
*accesses throughvolatile glvalues are evaluated strictly according to the
rules of the abstractmachine *and that's what I wanted to have happen.
That's what I meant by artificial. You artificially chose to make it volatile,
when the data itself is not. The variable's value cannot change behind the
compiler back: the function parameter is not in MMIO memory range, its address
is not passed to other threads of execution, etc.

And besides, all four compilers DID elide one of the calls to d(n).

Therefore, you did not have a good reason to use volatile. It's artificial.
Post by Hyman Rosen
And yet I don't see how this code would produce "bad" for any non-NaN value.
It's right here: <https://godbolt.org/g/KhntqH>
That's debug mode. It never occurred to me to try that.
Post by Hyman Rosen
d(n) returns its value in a floating-point register without reducing its
precision.
The compiler generates the first call to d(n) and spills it to memory (line
25 in
the assembly listing). That reduces its precision. Then the compiler
generates
the second call, reloads the result of the first call into a register (line
31), and
compares that register with the one holding the result of the second call
(line 32),
resulting in a comparison of a reduced-precision value with an
extended-precision
value. And that's how we get d(n) == d(n) to be false.
Understood.
Post by Hyman Rosen
Post by Thiago Macieira
Post by Hyman Rosen
To me, that means that the implementors gave themselves permission to
produce the above result - it was not an accident.
The absence of the text cannot be attributed to conscious deletion. It can
just as likely be lack of addition. That is to say, it's possible it was added
to C at some point but not to C++.
I'm saying that the implementors of the compiler deliberately used the
absence
of this requirement as permission to generate the code that returns the
value
without reducing its precision, since the very same compiler does reduce the
precision when building in C11 mode. And I'm saying that a language spec
allowing d(n) == d(n) to be false as in my example is broken.
The language does allow that, as it stands.

As we've already seen, the requirement to C was added in C99. So it's very
likely that the compilers implemented the current C++ behaviour up until a
point in time when they were forced to lose precision to comply with the C
language. Since C++ did not add the same text, some compiler writers decided
not to apply the same fix to C++.

If the same text is added to C++, those compilers will probably adapt.
Consider filing either a paper or a defect to have the text adopted into C++.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-26 16:22:18 UTC
Permalink
Post by Thiago Macieira
Post by Hyman Rosen
It's not "tricky". And I don't know what you mean about the parameter
not
Post by Hyman Rosen
"being" volatile; volatile is defined by the standard as
*accesses throughvolatile glvalues are evaluated strictly according to
the
Post by Hyman Rosen
rules of the abstractmachine *and that's what I wanted to have happen.
That's what I meant by artificial. You artificially chose to make it volatile,
when the data itself is not. The variable's value cannot change behind the
compiler back: the function parameter is not in MMIO memory range, its address
is not passed to other threads of execution, etc.
Where does the Standard impose any such requirement on things declared
volatile?

And besides, all four compilers DID elide one of the calls to d(n).
Where do you see that? There was some inlining of the calls, but as far as
I can
see, each compilation instance contains two conversions of an integer to a
double,
which is exactly what I wanted to accomplish with volatile. I could muck
about with
the code and have it pass and convert two command-line arguments instead of
one,
but it would be pointless. Using volatile is not the locus of the problem
here.

That's debug mode. It never occurred to me to try that.


It's not debug mode, it's merely "not optimized" mode.

As we've already seen, the requirement to C was added in C99. So it's very
Post by Thiago Macieira
likely that the compilers implemented the current C++ behaviour up until a
point in time when they were forced to lose precision to comply with the C
language. Since C++ did not add the same text, some compiler writers decided
not to apply the same fix to C++.
My point exactly. Compiler writers are the worst of the optimizationists
because
they're the ones trying to come up with every possible trick so that they
can point
at the resulting assembly language and admire its magnificence.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-26 16:35:42 UTC
Permalink
Post by Hyman Rosen
Post by Thiago Macieira
That's what I meant by artificial. You artificially chose to make it volatile,
when the data itself is not. The variable's value cannot change behind the
compiler back: the function parameter is not in MMIO memory range, its address
is not passed to other threads of execution, etc.
Where does the Standard impose any such requirement on things declared
volatile?
The keyword is you telling the compiler that the data may be changed
asynchronously and therefore every access must be reloaded.

Your data doesn't do that.
Post by Hyman Rosen
And besides, all four compilers DID elide one of the calls to d(n).
Where do you see that? There was some inlining of the calls, but as far as
I can
see, each compilation instance contains two conversions of an integer to a
double,
which is exactly what I wanted to accomplish with volatile. I could muck
about with
the code and have it pass and convert two command-line arguments instead of
one,
but it would be pointless. Using volatile is not the locus of the problem
here.
No, volatile isn't the issue.

I was referring to code in release mode, in the links that I sent. There's
exactly one conversion from integer to FP, with the FILD instruction.
Post by Hyman Rosen
Post by Thiago Macieira
As we've already seen, the requirement to C was added in C99. So it's very
likely that the compilers implemented the current C++ behaviour up until a
point in time when they were forced to lose precision to comply with the C
language. Since C++ did not add the same text, some compiler writers decided
not to apply the same fix to C++.
My point exactly. Compiler writers are the worst of the optimizationists
because
they're the ones trying to come up with every possible trick so that they
can point
at the resulting assembly language and admire its magnificence.
I don't see anything wrong with that.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-26 16:59:05 UTC
Permalink
Post by Thiago Macieira
Post by Hyman Rosen
Where does the Standard impose any such requirement on things declared
volatile?
The keyword is you telling the compiler that the data may be changed
asynchronously and therefore every access must be reloaded.
Your data doesn't do that.
That is an incorrect characterization of volatile. Volatile means that the
program
must access the variable as defined by the abstract machine, and that such
accesses are classified as side-effects. Perhaps I'm running the program
on a
device which displays memory accesses in lights and I want to see lots of
flashing.
Perhaps I'm trying to stress test my RAM. Perhaps I'm trying to break
security by
using repeated memory access to affect adjacent memory cells.

I was referring to code in release mode, in the links that I sent. There's
Post by Thiago Macieira
exactly one conversion from integer to FP, with the FILD instruction.
I don't think you sent that link. The only link I saw contained d() but
not test().
Post by Thiago Macieira
Post by Hyman Rosen
Compiler writers are the worst of the optimizationists because
they're the ones trying to come up with every possible trick so that they
can point at the resulting assembly language and admire its magnificence.
I don't see anything wrong with that.
I know. That's the problem :-)
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-26 20:27:01 UTC
Permalink
Post by Hyman Rosen
Post by Thiago Macieira
Post by Hyman Rosen
Where does the Standard impose any such requirement on things declared
volatile?
The keyword is you telling the compiler that the data may be changed
asynchronously and therefore every access must be reloaded.
Your data doesn't do that.
That is an incorrect characterization of volatile. Volatile means that the
program
must access the variable as defined by the abstract machine, and that such
accesses are classified as side-effects. Perhaps I'm running the program
on a
device which displays memory accesses in lights and I want to see lots of
flashing.
This is a valid reason to use volatile. But that's not your case, since the
variable in question is a parameter to a function, which most architectures
even pass in registers. A register can't be volatile.
Post by Hyman Rosen
Perhaps I'm trying to stress test my RAM.
This is not a valid reason. The abstract machine has no such concept, so you
had better write assembly instead.
Post by Hyman Rosen
Perhaps I'm trying to break
security by
using repeated memory access to affect adjacent memory cells.
Not a valid reason either. If you're trying to force a piece of hardware to do
something it's not supposed to do, then by complete definition this is outside
the parameters of a well-formed program.

You can get it by side-effect, but any minor change anywhere in the compiler,
your sources or any libraries you use could make the effect disappear.
Post by Hyman Rosen
Post by Thiago Macieira
I was referring to code in release mode, in the links that I sent. There's
exactly one conversion from integer to FP, with the FILD instruction.
I don't think you sent that link. The only link I saw contained d() but
not test().
Right, I didn't send the link. Sorry about that.

Anyway, if you compile your code with those four compilers and using -O2, all
four produce one single integer-to-double conversion.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-26 21:55:19 UTC
Permalink
Post by Thiago Macieira
This is a valid reason to use volatile. But that's not your case, since the
variable in question is a parameter to a function, which most architectures
even pass in registers. A register can't be volatile.
I'm sorry, but you don't get to make up extra reasons and qualifications
outside of
what the Standard describes about volatile. The Standard allows a
parameter to
be declared as volatile, and then access to the parameter must occur
according
to the rules of the abstract machine.

And you're not even right, since regardless of calling convention and ABI
the
Standard permits taking the address of a parameter and indirecting through
it
just like any other variable.

Anyway, if you compile your code with those four compilers and using -O2,
Post by Thiago Macieira
all
four produce one single integer-to-double conversion.
That doesn't help. The problem isn't finding a compiler setting that will
make the
code do what I want. The problem is that the language is badly specified,
so that
compilers are free to change their behavior at any time to do what I don't
want.

The behavior of the version of gcc that allowed d(n) == d(n) to be false is
legal
(maybe?) according to the Standard. That means, once again, that code that
seemingly works, that passes all of its tests, can be broken willy-nilly
when some
compiler writers decide to take advantage of another freedom they discover
in the
Standard.

I actually wonder if the compiler was behaving legally or not. The
Standard says:

*The values of the floating operands and the results of floating
expressions may be*
*represented in greater precision and range than that required by the type;
the types*
*are not changed thereby.*64
*64) The cast and assignment operators must still perform their specific
conversions*
*as described in 8.4, 8.2.9 and 8.18.*

So can the return value of a function be represented in greater precision?
Returning
a value is not a cast or an assignment, after all. (And are footnotes
normative?)
I don't know the answer, and I don't know how to figure it out.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-26 22:21:19 UTC
Permalink
Post by Hyman Rosen
Post by Thiago Macieira
This is a valid reason to use volatile. But that's not your case, since the
variable in question is a parameter to a function, which most architectures
even pass in registers. A register can't be volatile.
I'm sorry, but you don't get to make up extra reasons and qualifications
outside of
what the Standard describes about volatile. The Standard allows a
parameter to
be declared as volatile, and then access to the parameter must occur
according
to the rules of the abstract machine.
I'm not.

A primitive parameter is neither const nor volatile. That's in the spec. You
can add the cv qualification in your function implementation even if it's not
in the declaration. The following is valid and is not an overload:

void test(double d);
void test(const double d)
{
// ....
}

I see the point in declaring const, since it helps you with avoiding
accidentally modifying the parameter in the body of your function.
Post by Hyman Rosen
And you're not even right, since regardless of calling convention and ABI
the
Standard permits taking the address of a parameter and indirecting through
it
just like any other variable.
And at that point, having the variable be volatile would make sense.

Your code didn't do that. Hence it was artificially using the qualification.
Post by Hyman Rosen
That doesn't help. The problem isn't finding a compiler setting that will
make the
code do what I want. The problem is that the language is badly specified,
so that
compilers are free to change their behavior at any time to do what I don't
want.
"The language is badly specified" -- sure, the language of the text is not
perfect. We all know that. We're striving to make it better all the time.

"change their behaviour [...] to what I don't want" -- you're not the only C++
user out there. The language does not conform to your wishes alone, but to the
general needs of the user base at large.

In this specific case, my guess is that your wish is what the majority should
want too, as demonstrated by C already having that text.
Post by Hyman Rosen
The behavior of the version of gcc that allowed d(n) == d(n) to be false is
legal
(maybe?) according to the Standard. That means, once again, that code that
seemingly works, that passes all of its tests, can be broken willy-nilly
when some
compiler writers decide to take advantage of another freedom they discover
in the
Standard.
Just because some code "works" today doesn't mean it will work tomorrow, if it
depending on unconfirmed assumptions. You're not about to tell me that thread-
unsafe code should keep its behaviour as it did in the early 1990s when run
today on multi-thread multi-core CPUs, are you?

Not to mention outright bugs in the source code or in the compiler. I hope
you're not suggesting that compiler writers never fix bugs because someone
could be depending on the erroneous outcome.
Post by Hyman Rosen
I actually wonder if the compiler was behaving legally or not. The
*The values of the floating operands and the results of floating
expressions may be*
*represented in greater precision and range than that required by the type;
the types*
*are not changed thereby.*64
*64) The cast and assignment operators must still perform their specific
conversions*
*as described in 8.4, 8.2.9 and 8.18.*
So can the return value of a function be represented in greater precision?
Returning
a value is not a cast or an assignment, after all. (And are footnotes
normative?)
I don't know the answer, and I don't know how to figure it out.
My suggestion is that you treat this as a defect and submit a defect report,
asking that we adopt C99's language that explicitly makes this behaviour
forbidden.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Hyman Rosen
2017-10-27 15:11:40 UTC
Permalink
Post by Thiago Macieira
Post by Hyman Rosen
I'm sorry, but you don't get to make up extra reasons and qualifications
outside of what the Standard describes about volatile.
I'm not.
That's what you say...
Post by Thiago Macieira
Post by Hyman Rosen
the Standard permits taking the address of a parameter and indirecting
through
Post by Hyman Rosen
it just like any other variable.
And at that point, having the variable be volatile would make sense.
Your code didn't do that. Hence it was artificially using the
qualification.
...but here you are, making up extra qualifications around the use of
volatile.
The Standard lets parameters be declared volatile. There are no
qualifications
for doing so.

The Standard notes in [dcl.type.cv]:


*Note: volatile is a hint to the implementation to avoid aggressive
optimizationinvolving *
*the object because the value of the object might be changed by
meansundetectable by **an implementation.*

so it would make no sense for the implementation to look for things that
qualify
something to be volatile, when those things can be undetectable!

"change their behaviour [...] to what I don't want" -- you're not the only
Post by Thiago Macieira
C++
user out there. The language does not conform to your wishes alone, but to the
general needs of the user base at large.
If that were true, it would come with a garbage collector :-) :-)? :-(

But I didn't say the language, I said the compiler. Given a badly or
ambiguously
specified language, compilers can use the specification to compile programs
that
behave contrary to the expectations of the programmers and the plain intent
of the
code, and they can adopt this behavior at any time, breaking code that has
been
tested and working (i.e., behaving according to expectation) for a long
time.
Post by Thiago Macieira
Just because some code "works" today doesn't mean it will work tomorrow, if it
depending on unconfirmed assumptions. You're not about to tell me that thread-
unsafe code should keep its behaviour as it did in the early 1990s when run
today on multi-thread multi-core CPUs, are you?
Maybe? There were certainly multiprogramming models that did not use
preemption, but used priority to decide what thread would run. There were
models where preemption was assumed but only one thread at a time would
run. If those programs used implementation-provided facilities, then those
implementations ought to provide support to keep them working.

Not to mention outright bugs in the source code or in the compiler. I hope
Post by Thiago Macieira
you're not suggesting that compiler writers never fix bugs because someone
could be depending on the erroneous outcome.
In fact, compiler vendors are often reluctant to fix errors if those fixes
would
cause ABI incompatibilities.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Thiago Macieira
2017-10-27 15:31:18 UTC
Permalink
Post by Hyman Rosen
Post by Thiago Macieira
And at that point, having the variable be volatile would make sense.
Your code didn't do that. Hence it was artificially using the qualification.
...but here you are, making up extra qualifications around the use of
volatile.
The Standard lets parameters be declared volatile. There are no
qualifications
for doing so.
I'm not saying it's ill-formed. I'm saying it's artificial.

It's like using std::atomic<> for all primitives (at least on platforms where
it doesn't use mutex locks). You can do it, but there's no real need for it.
Post by Hyman Rosen
But I didn't say the language, I said the compiler. Given a badly or
ambiguously
specified language, compilers can use the specification to compile programs
that
behave contrary to the expectations of the programmers and the plain intent
of the
code, and they can adopt this behavior at any time, breaking code that has
been
tested and working (i.e., behaving according to expectation) for a long
time.
That I agree with.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2017-10-24 17:29:37 UTC
Permalink
Post by Thiago Macieira
Following that, we conclude that K&R never meant for you to read from an
inactive member of a union.

We could, but that would be missing the most important thing, which is what
do *I* want and what do other users of C++ want? Mr Kernighan and Mr
Ritchie have had their time bashing keys. I expect they are enjoying a
profitable retirement.

What do we want? Of course we want it all - awesome optimisation plus the
ability to directly address memory bytes through an object-shaped lens.

I personally don't think that's difficult to provide, so why not provide it?

All we need is some rule such as "whenever a union is or could be addressed
through some other lens other than the one that was previously written, all
underlying bytes will have deemed to have been written, and the next read
object will be *as if* its corresponding bytes had been written".

Then the union would be perfectly type-punnable and perfectly optimisable.


Actually, no, this is not perfectly optimizable. In fact, it invalidates a
whole class of profitable optimisations based on type-based alias analysis.
It's also harmful to other aspects of the language (eg, constant expression
evaluation cannot respect these rules in general).

This would even allow unions to be used for type punning in constexpr
environments - such as for determining endianness.

I have now posted two possible solutions, while the rest of the community
seems intent solely on defending a partisan position.

Anyone else care to approach this in a positive way?

R
Post by Thiago Macieira
Post by Richard Hodges
I don’t think it’s particulalry useful to ponder what K&R meant by this
or
Post by Richard Hodges
that. They weren’t holy prophets, just guys trying to make assembler less
of a pain to write.
Ok, then we mustn't interpret when they write "implementation-defined" as the
current meaning. They could have meant what we today understand to be UB.
Following that, we conclude that K&R never meant for you to read from an
inactive member of a union.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-10-25 01:01:58 UTC
Permalink
Post by Richard Hodges
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Then the union would be perfectly type-punnable and perfectly optimisable.
Actually, no, this is not perfectly optimizable. In fact, it invalidates a
whole class of profitable optimisations based on type-based alias analysis.
It's also harmful to other aspects of the language (eg, constant expression
evaluation cannot respect these rules in general).
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-10-25 02:59:23 UTC
Permalink
Post by Myriachan
Post by Richard Hodges
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Then the union would be perfectly type-punnable and perfectly optimisable.
Actually, no, this is not perfectly optimizable. In fact, it invalidates
a whole class of profitable optimisations based on type-based alias
analysis. It's also harmful to other aspects of the language (eg, constant
expression evaluation cannot respect these rules in general).
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.
I just thought of something else: what if the aliasing rules were to ignore
classes entirely, and instead only dealt with primitive types? How much
would that break type-based aliasing analysis's ability to optimize? The
common subsequence rule would be implicit, because you're ultimately
reading using the correct type for what was written there before.

To me, that's what the rule ought to be for aliasing: if a memory location
is written as primitive type X, it must be read back as either cv X or cv
byte type (std::byte, char, unsigned char). (Placement new without
initialization would be considered a write of an indefinite value.) It
shouldn't matter whether a class was involved, nor the identity of said
classes.

The rule would make something like this well-defined:

struct X { ... int i; ... };
alignas(X) unsigned char b[sizeof(X)];
X *x = new(b) X;
x->i = 2;
assert(*reinterpret_cast<int *>(&b[offsetof(X, i)]) == 2);

...which most C++ programmers expect to work, but technically doesn't.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-10-25 05:51:21 UTC
Permalink
Post by Myriachan
I just thought of something else: what if the aliasing rules were to
ignore classes entirely, and instead only dealt with primitive types?

I think that has a ring of sensibility to it.

Furthermore in the case of a union of a and b, the compiler already knows
that a and b live at the same address. It should treat them as just
different views of the same memory. The actual 'object' is the underlying
byte array holding the entire union. a and b are not 'objects' at all -
just shapes of memory access (or at least should be IMHO).

Noting the above example, the idea that placement new could return an x
that differs from &b seems to me to be just daft. If you can't
placement-new an X at &b then the compiler/runtime should barf at that
point - not just move the object.
Post by Myriachan
Post by Myriachan
Post by Richard Hodges
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Then the union would be perfectly type-punnable and perfectly optimisable.
Actually, no, this is not perfectly optimizable. In fact, it invalidates
a whole class of profitable optimisations based on type-based alias
analysis. It's also harmful to other aspects of the language (eg, constant
expression evaluation cannot respect these rules in general).
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.
I just thought of something else: what if the aliasing rules were to
ignore classes entirely, and instead only dealt with primitive types? How
much would that break type-based aliasing analysis's ability to optimize?
The common subsequence rule would be implicit, because you're ultimately
reading using the correct type for what was written there before.
To me, that's what the rule ought to be for aliasing: if a memory location
is written as primitive type X, it must be read back as either cv X or cv
byte type (std::byte, char, unsigned char). (Placement new without
initialization would be considered a write of an indefinite value.) It
shouldn't matter whether a class was involved, nor the identity of said
classes.
struct X { ... int i; ... };
alignas(X) unsigned char b[sizeof(X)];
X *x = new(b) X;
x->i = 2;
assert(*reinterpret_cast<int *>(&b[offsetof(X, i)]) == 2);
...which most C++ programmers expect to work, but technically doesn't.
Melissa
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-25 09:42:44 UTC
Permalink
Post by Myriachan
I just thought of something else: what if the aliasing rules were to
ignore classes entirely, and instead only dealt with primitive types?

I think that has a ring of sensibility to it.

Furthermore in the case of a union of a and b, the compiler already knows
that a and b live at the same address. It should treat them as just
different views of the same memory. The actual 'object' is the underlying
byte array holding the entire union. a and b are not 'objects' at all -
just shapes of memory access (or at least should be IMHO).


Another opinion is that C++ has and should have a data model; that the type
of an object determines how its storage can be accessed. Byte-wise storage
access is still available as an escape hatch, differentiating C++ from
other high-level languages.


Noting the above example, the idea that placement new could return an x
that differs from &b seems to me to be just daft. If you can't
placement-new an X at &b then the compiler/runtime should barf at that
point - not just move the object.


The addresses x and b are the same. What this example demonstrates is that
you cannot access subobjects per se without going through the appropriate
access path via the complete object.
Post by Myriachan
Post by Myriachan
Post by Richard Hodges
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Then the union would be perfectly type-punnable and perfectly optimisable.
Actually, no, this is not perfectly optimizable. In fact, it invalidates
a whole class of profitable optimisations based on type-based alias
analysis. It's also harmful to other aspects of the language (eg, constant
expression evaluation cannot respect these rules in general).
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.
I just thought of something else: what if the aliasing rules were to
ignore classes entirely, and instead only dealt with primitive types? How
much would that break type-based aliasing analysis's ability to optimize?
The common subsequence rule would be implicit, because you're ultimately
reading using the correct type for what was written there before.
To me, that's what the rule ought to be for aliasing: if a memory location
is written as primitive type X, it must be read back as either cv X or cv
byte type (std::byte, char, unsigned char). (Placement new without
initialization would be considered a write of an indefinite value.) It
shouldn't matter whether a class was involved, nor the identity of said
classes.
struct X { ... int i; ... };
alignas(X) unsigned char b[sizeof(X)];
X *x = new(b) X;
x->i = 2;
assert(*reinterpret_cast<int *>(&b[offsetof(X, i)]) == 2);
...which most C++ programmers expect to work, but technically doesn't.
Melissa
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/
isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-10-25 10:30:34 UTC
Permalink
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Another opinion is that C++ has and should have a data model
Agreed
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
that the type of an object determines how its storage can be accessed
The type should aid the programmer in making correct and intuitive
decisions for the vast majority of cases, which the current c++ data model
does.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Byte-wise storage access is still available as an escape hatch
It should be, but it's not really is it? The only way to get defined
behaviour is to memcpy from one imaginary object to another. The
memcpy-is-really-bit-alias paradigm is verbose, difficult to teach and
creates programs who's source code is basically lying.

For example, what would be wrong with this model?

union U {
int a;
float b;
} u;

u.a = 1;
auto val = u.b; // get the float who's integer representation is binary 1

The compiler is absolutely in a position to determine that a and b are
aliases, and their bitwise configurations are the very same array of N
bits. The object is u, a and b are merely views of it.

Similarly

foo(u); // where foo is declared as extern void foo(U&)

Must surely cause the compiler to assume that the write of u.a *must* be
visible in the bits of u.b prior to and after the call, otherwise the call
might fail and any reads of u after the call might be invalid.

In summary, today's compilers have all the information necessary to know
that u.a and u.v *are the same object*. It is counterintuitive (as is seen
by the numerous mistakes made by beginners and intermediates) to mandate
otherwise.







On 25 October 2017 at 11:42, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Myriachan
I just thought of something else: what if the aliasing rules were to
ignore classes entirely, and instead only dealt with primitive types?
I think that has a ring of sensibility to it.
Furthermore in the case of a union of a and b, the compiler already knows
that a and b live at the same address. It should treat them as just
different views of the same memory. The actual 'object' is the underlying
byte array holding the entire union. a and b are not 'objects' at all -
just shapes of memory access (or at least should be IMHO).
Another opinion is that C++ has and should have a data model; that the
type of an object determines how its storage can be accessed. Byte-wise
storage access is still available as an escape hatch, differentiating C++
from other high-level languages.
Noting the above example, the idea that placement new could return an x
that differs from &b seems to me to be just daft. If you can't
placement-new an X at &b then the compiler/runtime should barf at that
point - not just move the object.
The addresses x and b are the same. What this example demonstrates is that
you cannot access subobjects per se without going through the appropriate
access path via the complete object.
Post by Myriachan
Post by Myriachan
Post by Richard Hodges
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Then the union would be perfectly type-punnable and perfectly optimisable.
Actually, no, this is not perfectly optimizable. In fact, it
invalidates a whole class of profitable optimisations based on type-based
alias analysis. It's also harmful to other aspects of the language (eg,
constant expression evaluation cannot respect these rules in general).
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.
I just thought of something else: what if the aliasing rules were to
ignore classes entirely, and instead only dealt with primitive types? How
much would that break type-based aliasing analysis's ability to optimize?
The common subsequence rule would be implicit, because you're ultimately
reading using the correct type for what was written there before.
To me, that's what the rule ought to be for aliasing: if a memory
location is written as primitive type X, it must be read back as either cv
X or cv byte type (std::byte, char, unsigned char). (Placement new without
initialization would be considered a write of an indefinite value.) It
shouldn't matter whether a class was involved, nor the identity of said
classes.
struct X { ... int i; ... };
alignas(X) unsigned char b[sizeof(X)];
X *x = new(b) X;
x->i = 2;
assert(*reinterpret_cast<int *>(&b[offsetof(X, i)]) == 2);
...which most C++ programmers expect to work, but technically doesn't.
Melissa
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/is
ocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-25 11:01:45 UTC
Permalink
Post by Richard Hodges
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Another opinion is that C++ has and should have a data model
Agreed
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
that the type of an object determines how its storage can be accessed
The type should aid the programmer in making correct and intuitive
decisions for the vast majority of cases, which the current c++ data model
does.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Byte-wise storage access is still available as an escape hatch
It should be, but it's not really is it? The only way to get defined
behaviour is to memcpy from one imaginary object to another. The
memcpy-is-really-bit-alias paradigm is verbose, difficult to teach and
creates programs who's source code is basically lying.
Verbosity is easily solved with wrappers - one for load, one for store, one
for cast. I have never found the memcpy idiom difficult to explain, but
admittedly I am not an educator. I don't see the idiom as in any way
deceptive; copying an object representation seems to me an obvious way to
frame the operations.

For example, what would be wrong with this model?
Post by Richard Hodges
union U {
int a;
float b;
} u;
u.a = 1;
auto val = u.b; // get the float who's integer representation is binary 1
The issue is that the expression u.b is an lvalue of type float. That means
that a reference to float can be bound to it, or a pointer to float formed
to it, and there is in that reference or pointer no indication that a union
inactive member access may be involved.

The compiler is absolutely in a position to determine that a and b are
Post by Richard Hodges
aliases, and their bitwise configurations are the very same array of N
bits. The object is u, a and b are merely views of it.
Sure. But once a reference or pointer to u.b is obtained, the compiler
loses that information.

Perhaps we need a syntax to declare union non-static data members that -
like bitfields - cannot be used to form references or pointers.
Post by Richard Hodges
Similarly
foo(u); // where foo is declared as extern void foo(U&)
Must surely cause the compiler to assume that the write of u.a *must* be
visible in the bits of u.b prior to and after the call, otherwise the call
might fail and any reads of u after the call might be invalid.
Yes, but the compiler is entitled to change its mind during link-time
optimization.

In summary, today's compilers have all the information necessary to know
Post by Richard Hodges
that u.a and u.v *are the same object*. It is counterintuitive (as is seen
by the numerous mistakes made by beginners and intermediates) to mandate
otherwise.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-10-25 11:23:27 UTC
Permalink
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
The issue is that the expression u.b is an lvalue of type float.
And here is the problem.

u::b's type should not be 'float'. It should be 'a float-like-interface on
an array of bytes that represents union { int; float; }'. The 'object' is
better viewed as 'u' - the bag of bits, not 'a' or 'b'. If we view it that
way, all aliasing issues go away. int x = u.a = u.b = 1.0; becomes
perfectly legal.

I appreciate we can do this with a custom class that wraps an aligned byte
buffer. In which case, back to my previous question - why have the union
keyword at all? It's obsolete as of c++11. Kill it and end the argument,
since it's only valid use is as the storage for a non-template
discriminated union. std::/boost:: variant already covers that and,
tellingly, cannot be implemented with a union...
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Sure. But once a reference or pointer to u.b is obtained, the compiler
loses that information.

At the present time, since compilers today are programmed to (more or less)
meet the minimum expectations of the standard. We have already established
that I think the standard is short-changing us.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Yes, but the compiler is entitled to change its mind during link-time
optimization.

If the compiler can carry sufficient contextual information to perform
link-time optimisations, it can carry the information to know that 'u' can
legally represent an int and a float at the same time.

I do appreciate that there are several million lines of code in the world
that argue against changing the behaviour of `union` (even though I'll
wager that roughly half those uses actually transgress the standard).



On 25 October 2017 at 13:01, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Richard Hodges
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Another opinion is that C++ has and should have a data model
Agreed
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
that the type of an object determines how its storage can be accessed
The type should aid the programmer in making correct and intuitive
decisions for the vast majority of cases, which the current c++ data model
does.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Byte-wise storage access is still available as an escape hatch
It should be, but it's not really is it? The only way to get defined
behaviour is to memcpy from one imaginary object to another. The
memcpy-is-really-bit-alias paradigm is verbose, difficult to teach and
creates programs who's source code is basically lying.
Verbosity is easily solved with wrappers - one for load, one for store,
one for cast. I have never found the memcpy idiom difficult to explain, but
admittedly I am not an educator. I don't see the idiom as in any way
deceptive; copying an object representation seems to me an obvious way to
frame the operations.
For example, what would be wrong with this model?
Post by Richard Hodges
union U {
int a;
float b;
} u;
u.a = 1;
auto val = u.b; // get the float who's integer representation is binary 1
The issue is that the expression u.b is an lvalue of type float. That
means that a reference to float can be bound to it, or a pointer to float
formed to it, and there is in that reference or pointer no indication that
a union inactive member access may be involved.
The compiler is absolutely in a position to determine that a and b are
Post by Richard Hodges
aliases, and their bitwise configurations are the very same array of N
bits. The object is u, a and b are merely views of it.
Sure. But once a reference or pointer to u.b is obtained, the compiler
loses that information.
Perhaps we need a syntax to declare union non-static data members that -
like bitfields - cannot be used to form references or pointers.
Post by Richard Hodges
Similarly
foo(u); // where foo is declared as extern void foo(U&)
Must surely cause the compiler to assume that the write of u.a *must* be
visible in the bits of u.b prior to and after the call, otherwise the call
might fail and any reads of u after the call might be invalid.
Yes, but the compiler is entitled to change its mind during link-time
optimization.
In summary, today's compilers have all the information necessary to know
Post by Richard Hodges
that u.a and u.v *are the same object*. It is counterintuitive (as is seen
by the numerous mistakes made by beginners and intermediates) to mandate
otherwise.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-25 12:09:02 UTC
Permalink
Post by Richard Hodges
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
The issue is that the expression u.b is an lvalue of type float.
And here is the problem.
u::b's type should not be 'float'. It should be 'a float-like-interface on
an array of bytes that represents union { int; float; }'. The 'object' is
better viewed as 'u' - the bag of bits, not 'a' or 'b'. If we view it that
way, all aliasing issues go away. int x = u.a = u.b = 1.0; becomes
perfectly legal.
I appreciate we can do this with a custom class that wraps an aligned byte
buffer. In which case, back to my previous question - why have the union
keyword at all? It's obsolete as of c++11. Kill it and end the argument,
since it's only valid use is as the storage for a non-template
discriminated union. std::/boost:: variant already covers that and,
tellingly, cannot be implemented with a union...
Ah, no: std::variant can be implemented with a union; see, for example,
https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-v3/include/std/variant#L332
ff. Indeed, it must be implemented with a union for constexpr access to
work.
Post by Richard Hodges
Sure. But once a reference or pointer to u.b is obtained, the compiler
loses that information.
At the present time, since compilers today are programmed to (more or
less) meet the minimum expectations of the standard. We have already
established that I think the standard is short-changing us.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Yes, but the compiler is entitled to change its mind during link-time
optimization.
If the compiler can carry sufficient contextual information to perform
link-time optimisations, it can carry the information to know that 'u' can
legally represent an int and a float at the same time.
Not if that information has been erased by the user. Consider:

union U { int i; float f; } u[2];
u[0].i = 42;
u[1].f = 3.14;
unsigned n = 0;
std::cin >> n;
float* p = &u[n % 2].f;

It is not reasonable to expect the compiler to retain and propagate the
information that indirecting p may or may not involve a union inactive
member access, depending on operator input.

I do appreciate that there are several million lines of code in the world
Post by Richard Hodges
that argue against changing the behaviour of `union` (even though I'll
wager that roughly half those uses actually transgress the standard).
On 25 October 2017 at 13:01, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Richard Hodges
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Another opinion is that C++ has and should have a data model
Agreed
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
that the type of an object determines how its storage can be accessed
The type should aid the programmer in making correct and intuitive
decisions for the vast majority of cases, which the current c++ data model
does.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Byte-wise storage access is still available as an escape hatch
It should be, but it's not really is it? The only way to get defined
behaviour is to memcpy from one imaginary object to another. The
memcpy-is-really-bit-alias paradigm is verbose, difficult to teach and
creates programs who's source code is basically lying.
Verbosity is easily solved with wrappers - one for load, one for store,
one for cast. I have never found the memcpy idiom difficult to explain, but
admittedly I am not an educator. I don't see the idiom as in any way
deceptive; copying an object representation seems to me an obvious way to
frame the operations.
For example, what would be wrong with this model?
Post by Richard Hodges
union U {
int a;
float b;
} u;
u.a = 1;
auto val = u.b; // get the float who's integer representation is binary 1
The issue is that the expression u.b is an lvalue of type float. That
means that a reference to float can be bound to it, or a pointer to float
formed to it, and there is in that reference or pointer no indication that
a union inactive member access may be involved.
The compiler is absolutely in a position to determine that a and b are
Post by Richard Hodges
aliases, and their bitwise configurations are the very same array of N
bits. The object is u, a and b are merely views of it.
Sure. But once a reference or pointer to u.b is obtained, the compiler
loses that information.
Perhaps we need a syntax to declare union non-static data members that -
like bitfields - cannot be used to form references or pointers.
Post by Richard Hodges
Similarly
foo(u); // where foo is declared as extern void foo(U&)
Must surely cause the compiler to assume that the write of u.a *must* be
visible in the bits of u.b prior to and after the call, otherwise the call
might fail and any reads of u after the call might be invalid.
Yes, but the compiler is entitled to change its mind during link-time
optimization.
In summary, today's compilers have all the information necessary to know
Post by Richard Hodges
that u.a and u.v *are the same object*. It is counterintuitive (as is seen
by the numerous mistakes made by beginners and intermediates) to mandate
otherwise.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/
isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-10-25 13:27:07 UTC
Permalink
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Ah, no: std::variant can be implemented with a union
I stand corrected. Thank you.








On 25 October 2017 at 14:09, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Richard Hodges
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
The issue is that the expression u.b is an lvalue of type float.
And here is the problem.
u::b's type should not be 'float'. It should be 'a float-like-interface
on an array of bytes that represents union { int; float; }'. The 'object'
is better viewed as 'u' - the bag of bits, not 'a' or 'b'. If we view it
that way, all aliasing issues go away. int x = u.a = u.b = 1.0; becomes
perfectly legal.
I appreciate we can do this with a custom class that wraps an aligned
byte buffer. In which case, back to my previous question - why have the
union keyword at all? It's obsolete as of c++11. Kill it and end the
argument, since it's only valid use is as the storage for a non-template
discriminated union. std::/boost:: variant already covers that and,
tellingly, cannot be implemented with a union...
Ah, no: std::variant can be implemented with a union; see, for example,
https://github.com/gcc-mirror/gcc/blob/master/
libstdc%2B%2B-v3/include/std/variant#L332 ff. Indeed, it must be
implemented with a union for constexpr access to work.
Post by Richard Hodges
Sure. But once a reference or pointer to u.b is obtained, the compiler
loses that information.
At the present time, since compilers today are programmed to (more or
less) meet the minimum expectations of the standard. We have already
established that I think the standard is short-changing us.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Yes, but the compiler is entitled to change its mind during link-time
optimization.
If the compiler can carry sufficient contextual information to perform
link-time optimisations, it can carry the information to know that 'u' can
legally represent an int and a float at the same time.
union U { int i; float f; } u[2];
u[0].i = 42;
u[1].f = 3.14;
unsigned n = 0;
std::cin >> n;
float* p = &u[n % 2].f;
It is not reasonable to expect the compiler to retain and propagate the
information that indirecting p may or may not involve a union inactive
member access, depending on operator input.
I do appreciate that there are several million lines of code in the world
Post by Richard Hodges
that argue against changing the behaviour of `union` (even though I'll
wager that roughly half those uses actually transgress the standard).
On 25 October 2017 at 13:01, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Richard Hodges
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Another opinion is that C++ has and should have a data model
Agreed
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
that the type of an object determines how its storage can be accessed
The type should aid the programmer in making correct and intuitive
decisions for the vast majority of cases, which the current c++ data model
does.
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Byte-wise storage access is still available as an escape hatch
It should be, but it's not really is it? The only way to get defined
behaviour is to memcpy from one imaginary object to another. The
memcpy-is-really-bit-alias paradigm is verbose, difficult to teach and
creates programs who's source code is basically lying.
Verbosity is easily solved with wrappers - one for load, one for store,
one for cast. I have never found the memcpy idiom difficult to explain, but
admittedly I am not an educator. I don't see the idiom as in any way
deceptive; copying an object representation seems to me an obvious way to
frame the operations.
For example, what would be wrong with this model?
Post by Richard Hodges
union U {
int a;
float b;
} u;
u.a = 1;
auto val = u.b; // get the float who's integer representation is binary 1
The issue is that the expression u.b is an lvalue of type float. That
means that a reference to float can be bound to it, or a pointer to float
formed to it, and there is in that reference or pointer no indication that
a union inactive member access may be involved.
The compiler is absolutely in a position to determine that a and b are
Post by Richard Hodges
aliases, and their bitwise configurations are the very same array of N
bits. The object is u, a and b are merely views of it.
Sure. But once a reference or pointer to u.b is obtained, the compiler
loses that information.
Perhaps we need a syntax to declare union non-static data members that -
like bitfields - cannot be used to form references or pointers.
Post by Richard Hodges
Similarly
foo(u); // where foo is declared as extern void foo(U&)
Must surely cause the compiler to assume that the write of u.a *must*
be visible in the bits of u.b prior to and after the call, otherwise the
call might fail and any reads of u after the call might be invalid.
Yes, but the compiler is entitled to change its mind during link-time
optimization.
In summary, today's compilers have all the information necessary to know
Post by Richard Hodges
that u.a and u.v *are the same object*. It is counterintuitive (as is seen
by the numerous mistakes made by beginners and intermediates) to mandate
otherwise.
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/is
ocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-25 09:29:15 UTC
Permalink
Post by Myriachan
Post by Richard Hodges
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Then the union would be perfectly type-punnable and perfectly optimisable.
Actually, no, this is not perfectly optimizable. In fact, it invalidates
a whole class of profitable optimisations based on type-based alias
analysis. It's also harmful to other aspects of the language (eg, constant
expression evaluation cannot respect these rules in general).
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.
I just thought of something else: what if the aliasing rules were to ignore
classes entirely, and instead only dealt with primitive types? How much
would that break type-based aliasing analysis's ability to optimize? The
common subsequence rule would be implicit, because you're ultimately
reading using the correct type for what was written there before.


That would violate the principle that user-defined types (including and
especially class types) should have as close as possible the same behavior
and privileges as built-in types. It would also make it far too easy to
break class invariants.


To me, that's what the rule ought to be for aliasing: if a memory location
is written as primitive type X, it must be read back as either cv X or cv
byte type (std::byte, char, unsigned char). (Placement new without
initialization would be considered a write of an indefinite value.) It
shouldn't matter whether a class was involved, nor the identity of said
classes.

The rule would make something like this well-defined:

struct X { ... int i; ... };
alignas(X) unsigned char b[sizeof(X)];
X *x = new(b) X;
x->i = 2;
assert(*reinterpret_cast<int *>(&b[offsetof(X, i)]) == 2);

...which most C++ programmers expect to work, but technically doesn't.


However the equivalent (and barely any more verbose) code using memcpy to
load the int from its storage location is guaranteed to work.


Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Hodges
2017-10-25 09:37:16 UTC
Permalink
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
However the equivalent (and barely any more verbose) code using memcpy to
load the int from its storage location is guaranteed to work.

Are you saying that following the assignment x->i = 2;

memcpy(&some_int, &b[offsetof(X, i)], sizeof(int)); will copy the value 2
to some_int?

Is that to say that the presence of memcpy causes the compiler to 'flush'
all as-ifs to memory prior to the flow of control going over the memcpy?


On 25 October 2017 at 11:29, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Myriachan
Post by Richard Hodges
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Then the union would be perfectly type-punnable and perfectly optimisable.
Actually, no, this is not perfectly optimizable. In fact, it invalidates
a whole class of profitable optimisations based on type-based alias
analysis. It's also harmful to other aspects of the language (eg, constant
expression evaluation cannot respect these rules in general).
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.
I just thought of something else: what if the aliasing rules were to
ignore classes entirely, and instead only dealt with primitive types? How
much would that break type-based aliasing analysis's ability to optimize?
The common subsequence rule would be implicit, because you're ultimately
reading using the correct type for what was written there before.
That would violate the principle that user-defined types (including and
especially class types) should have as close as possible the same behavior
and privileges as built-in types. It would also make it far too easy to
break class invariants.
To me, that's what the rule ought to be for aliasing: if a memory location
is written as primitive type X, it must be read back as either cv X or cv
byte type (std::byte, char, unsigned char). (Placement new without
initialization would be considered a write of an indefinite value.) It
shouldn't matter whether a class was involved, nor the identity of said
classes.
struct X { ... int i; ... };
alignas(X) unsigned char b[sizeof(X)];
X *x = new(b) X;
x->i = 2;
assert(*reinterpret_cast<int *>(&b[offsetof(X, i)]) == 2);
...which most C++ programmers expect to work, but technically doesn't.
However the equivalent (and barely any more verbose) code using memcpy to
load the int from its storage location is guaranteed to work.
Melissa
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-25 10:30:54 UTC
Permalink
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
However the equivalent (and barely any more verbose) code using memcpy
to load the int from its storage location is guaranteed to work.
Are you saying that following the assignment x->i = 2;
memcpy(&some_int, &b[offsetof(X, i)], sizeof(int)); will copy the value 2
to some_int?
Yes, that's right.
Is that to say that the presence of memcpy causes the compiler to 'flush'
all as-ifs to memory prior to the flow of control going over the memcpy?
Yes, but only with respect to the memcpy itself; that is, the memcpy will
see all lexically prior writes to that memory. It doesn't mean that e.g. a
subsequent read via a pointer to float will see those writes.
On 25 October 2017 at 11:29, 'Edward Catmur' via ISO C++ Standard -
Post by 'Edward Catmur' via ISO C++ Standard - Discussion
Post by Myriachan
Post by Richard Hodges
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Then the union would be perfectly type-punnable and perfectly optimisable.
Actually, no, this is not perfectly optimizable. In fact, it
invalidates a whole class of profitable optimisations based on type-based
alias analysis. It's also harmful to other aspects of the language (eg,
constant expression evaluation cannot respect these rules in general).
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.
I just thought of something else: what if the aliasing rules were to
ignore classes entirely, and instead only dealt with primitive types? How
much would that break type-based aliasing analysis's ability to optimize?
The common subsequence rule would be implicit, because you're ultimately
reading using the correct type for what was written there before.
That would violate the principle that user-defined types (including and
especially class types) should have as close as possible the same behavior
and privileges as built-in types. It would also make it far too easy to
break class invariants.
To me, that's what the rule ought to be for aliasing: if a memory
location is written as primitive type X, it must be read back as either cv
X or cv byte type (std::byte, char, unsigned char). (Placement new without
initialization would be considered a write of an indefinite value.) It
shouldn't matter whether a class was involved, nor the identity of said
classes.
struct X { ... int i; ... };
alignas(X) unsigned char b[sizeof(X)];
X *x = new(b) X;
x->i = 2;
assert(*reinterpret_cast<int *>(&b[offsetof(X, i)]) == 2);
...which most C++ programmers expect to work, but technically doesn't.
However the equivalent (and barely any more verbose) code using memcpy to
load the int from its storage location is guaranteed to work.
Melissa
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/is
ocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/
isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
'Edward Catmur' via ISO C++ Standard - Discussion
2017-10-25 06:54:04 UTC
Permalink
Post by Richard Hodges
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Then the union would be perfectly type-punnable and perfectly optimisable.
Actually, no, this is not perfectly optimizable. In fact, it invalidates a
whole class of profitable optimisations based on type-based alias analysis.
It's also harmful to other aspects of the language (eg, constant expression
evaluation cannot respect these rules in general).
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.


Any such system APIs can be made to work by the vendor, or if not then the
vendor can forswear the appropriate transformations. The question is
whether there are a significant number of cross-platform third-party
sockaddr-style APIs.


Melissa
--
---
You received this message because you are subscribed to a topic in the
Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/
isocpp.org/d/topic/std-discussion/9digoVjgX8A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2017-10-27 22:13:04 UTC
Permalink
Post by Myriachan
Post by Richard Hodges
All we need is some rule such as "whenever a union is or could be
addressed through some other lens other than the one that was previously
written, all underlying bytes will have deemed to have been written, and
the next read object will be *as if* its corresponding bytes had been
written".
Then the union would be perfectly type-punnable and perfectly optimisable.
Actually, no, this is not perfectly optimizable. In fact, it invalidates
a whole class of profitable optimisations based on type-based alias
analysis. It's also harmful to other aspects of the language (eg, constant
expression evaluation cannot respect these rules in general).
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.
What compilers (GCC and Clang, at least) *actually* do right now is to
permit the aliasing if it's "sufficiently obvious" that the program is
doing it, and otherwise they believe that distinct types and distinct
access paths can never alias. That also appears to match up pretty well
with what users do and expect to work. If we could precisely specify what
"sufficiently obvious" means, then perhaps that would be a way forward.
That's not necessarily the best option, but it's at least something to
consider, and something that has worked out *mostly* OK in the real world.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-10-27 23:07:04 UTC
Permalink
Post by Richard Smith
Post by Myriachan
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.
What compilers (GCC and Clang, at least) *actually* do right now is to
permit the aliasing if it's "sufficiently obvious" that the program is
doing it, and otherwise they believe that distinct types and distinct
access paths can never alias. That also appears to match up pretty well
with what users do and expect to work. If we could precisely specify what
"sufficiently obvious" means, then perhaps that would be a way forward.
That's not necessarily the best option, but it's at least something to
consider, and something that has worked out *mostly* OK in the real world.
Combined with a standard way to override the aliasing rules, it would
work. Structures like sockaddr and OSVERSIONINFOW are going to exist, and
programmers aren't going to accept memcpy as the way to do it.

Compilers also permit the aliasing if the target of the call is completely
invisible, such as the module boundary crossed with system APIs. That's
one reason that sockaddr works.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Smith
2017-10-27 23:59:01 UTC
Permalink
Post by Myriachan
Post by Richard Smith
Post by Myriachan
What would be the right solution, then? The proposals from the compiler
writes' side so far have been to more or less remove the "common sequence"
rule from the language in favor of requiring that all such accesses go
through a union type. This would break a lot of system APIs and other
existing code without providing a good solution.
What compilers (GCC and Clang, at least) *actually* do right now is to
permit the aliasing if it's "sufficiently obvious" that the program is
doing it, and otherwise they believe that distinct types and distinct
access paths can never alias. That also appears to match up pretty well
with what users do and expect to work. If we could precisely specify what
"sufficiently obvious" means, then perhaps that would be a way forward.
That's not necessarily the best option, but it's at least something to
consider, and something that has worked out *mostly* OK in the real world.
Combined with a standard way to override the aliasing rules, it would
work. Structures like sockaddr and OSVERSIONINFOW are going to exist, and
programmers aren't going to accept memcpy as the way to do it.
Compilers also permit the aliasing if the target of the call is completely
invisible, such as the module boundary crossed with system APIs. That's
one reason that sockaddr works.
Yes, some kind of standard alias analysis barrier / "I am changing the
dynamic type of this memory in a way you can't see, but the byte values
persist" seems like a really good idea. (C++17 node handles also need that
functionality.) LTO is only going to get more prevalent, so it's likely
only a matter of time until things like sockaddr actually stop working in
practice, and we really need to have another option in place for when that
happens.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Dilip Ranganathan
2017-10-18 19:06:57 UTC
Permalink
I am a bit confused. Are you saying portable code cannot have UB by
definition?
Post by i***@gmail.com
Post by Hyman Rosen
Post by i***@gmail.com
Post by Hyman Rosen
Post by Nicol Bolas
If a language, "by design," creates lots of circumstances where useful
code looks like it will work one way, but in fact works another way or has
unpredictable results, that's a problem with the "design" of the language.
Just as a particularly relevant example, have a look at
<https://dxr.mozilla.org/mozilla-beta/source/js/src/dtoa.c>.
This is one version of David M. Gay's float/decimal conversion code.
(This code is ubiquitous - it's the foundation for strtod and dtoa variants
including in C and C++ standard libraries.)
Granted that it's in C, it nevertheless makes use of traditional union
punning to access parts of doubles as integers, and it uses memcpy
as part of variable length arrays, copying bytes using a pointer to the
middle of an object and going far beyond its apparent program-defined
end. It's full of undefined behavior, thoroughly utilizing the "bag of
bits"
model.
And multiple of ifdef to work correctly, standard library do not need
obey C/C++ rules (look on std::vector or node extraction from map with
change constnes of field)
and it is only place where all undefined things are defined.
User code can't do that because you do not know witch version of
standard library you use and each can work different (even same system and
vendor but different version).
One goal of C/C++ is portability and this code is not portable at all,
because if you using enough ifdefs you could run "same" code in C# or any
other language.
What makes you think this code was compiled and tested as anything but
user code?
Which compilers document switches that say "allow undefined behavior for
standard library code"?
Why do you believe that the "bag of bits" model, augmented by
implementation information, is not portable?
C and C++ (and Fortran and...) programmers have been using the "bag of
bits" model for a good
half-century, and their code has behaved as they expected. Compilers
that make believe that code
does not do such things will silently break code that has perfectly
predictable behavior in the "bag
of bits" model. They do this at the behest of the optimizationists, who
do not care if such code breaks
as long as they can come up with examples where some other code can be
made to run faster. But
the optimizationists cannot even come up with a coherent description of
what is allowed by the language
("vector cannot be implemented in standard C++") without tying
themselves in knots.
There no need switch or anything in library code to disable undefined
behavior, its simply NOT PORTABLE and why it could do any thing it want.
If you want your code follow same principle then you too can ignore all
things that standard say about UB.
And again `std::vector` is not problem if it can't be implemented by C++
because library writer can use compiler intrinsic to do it,
problem that is if anyone else cant do it without using intrinsic and
loosing portability.
"perfectly predictable behavior" but not portable, stay in old compiler
and it will work, if you would use what language support then upgrading
compilers would not break your code.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at https://groups.google.com/a/isocpp.org/group/std-
discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
i***@gmail.com
2017-10-18 19:17:26 UTC
Permalink
Post by Dilip Ranganathan
I am a bit confused. Are you saying portable code cannot have UB by
definition?
Post by i***@gmail.com
There no need switch or anything in library code to disable undefined
behavior, its simply NOT PORTABLE and why it could do any thing it want.
If you want your code follow same principle then you too can ignore all
things that standard say about UB.
And again `std::vector` is not problem if it can't be implemented by C++
because library writer can use compiler intrinsic to do it,
problem that is if anyone else cant do it without using intrinsic and
loosing portability.
"perfectly predictable behavior" but not portable, stay in old compiler
and it will work, if you would use what language support then upgrading
compilers would not break your code.
--
If code relay on UB then yes, its not portable code by definition, if is
bug then its it simply bugged portable code that should be fixed.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Chris Hallock
2017-09-25 22:11:39 UTC
Permalink
Post by Myriachan
This question that "supercat" posted on Stack Overflow ran into an
https://stackoverflow.com/questions/46205744/is-this-use-of-unions-strictly-conforming/
struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
union s1s2 { struct s1 v1; struct s2 v2; };
static int read_s1x(struct s1 *p) { return p->x; }
static void write_s2x(struct s2 *p, int v) { p->x=v;}
int test(union s1s2 *p1, union s1s2 *p2, union s1s2 *p3)
{
if (read_s1x(&p1->v1))
{
unsigned short temp;
temp = p3->v1.x;
p3->v2.x = temp;
write_s2x(&p2->v2,1234);
temp = p3->v2.x;
p3->v1.x = temp;
}
return read_s1x(&p1->v1);
}
int test2(int x)
{
union s1s2 q[2];
q->v1.x = 4321;
return test(q,q+x,q+x);
}
#include <stdio.h>
int main(void)
{
printf("%d\n",test2(0));
}
Both GCC and Clang in -fstrict-aliasing mode with optimizations are acting
as if they ran into undefined behavior, and return 4321 instead of the
expected 1234. This happens in both C and C++ mode. Intel C++ and Visual
C++ return the expected 1234. All four compilers hardwire the result as a
constant parameter to printf rather than call test2 or modify memory at
runtime.
From my reading of the C++ Standard, particularly [class.union]/5,
assignment expressions through a union member access changes the active
member of the union (if the union member has a trivial default constructor,
which it does here, being C code). Taking the address of p2->v2 and p1->v1
ought to be legal because those are the active members of the union at the
time their pointers are taken.
Is this a well-defined program, or is there subtle undefined behavior
happening here?
Melissa
Looks valid to me (for C++, at least).
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Yubin Ruan
2017-10-27 13:00:40 UTC
Permalink
+Cc gcc-list.

Does any gcc developer have any comments?
Post by Myriachan
This question that "supercat" posted on Stack Overflow ran into an
https://stackoverflow.com/questions/46205744/is-this-use-of-unions-strictly-conforming/
struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
union s1s2 { struct s1 v1; struct s2 v2; };
static int read_s1x(struct s1 *p) { return p->x; }
static void write_s2x(struct s2 *p, int v) { p->x=v;}
int test(union s1s2 *p1, union s1s2 *p2, union s1s2 *p3)
{
if (read_s1x(&p1->v1))
{
unsigned short temp;
temp = p3->v1.x;
p3->v2.x = temp;
write_s2x(&p2->v2,1234);
temp = p3->v2.x;
p3->v1.x = temp;
}
return read_s1x(&p1->v1);
}
int test2(int x)
{
union s1s2 q[2];
q->v1.x = 4321;
return test(q,q+x,q+x);
}
#include <stdio.h>
int main(void)
{
printf("%d\n",test2(0));
}
Both GCC and Clang in -fstrict-aliasing mode with optimizations are acting
as if they ran into undefined behavior, and return 4321 instead of the
expected 1234. This happens in both C and C++ mode. Intel C++ and Visual
C++ return the expected 1234. All four compilers hardwire the result as a
constant parameter to printf rather than call test2 or modify memory at
runtime.
From my reading of the C++ Standard, particularly [class.union]/5,
assignment expressions through a union member access changes the active
member of the union (if the union member has a trivial default constructor,
which it does here, being C code). Taking the address of p2->v2 and p1->v1
ought to be legal because those are the active members of the union at the
time their pointers are taken.
Is this a well-defined program, or is there subtle undefined behavior
happening here?
Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Richard Biener
2017-10-27 08:54:48 UTC
Permalink
Post by Yubin Ruan
+Cc gcc-list.
Does any gcc developer have any comments?
See PR82224. The code is valid.
Post by Yubin Ruan
Post by Myriachan
This question that "supercat" posted on Stack Overflow ran into an
https://stackoverflow.com/questions/46205744/is-this-use-of-unions-strictly-conforming/
struct s1 {unsigned short x;};
struct s2 {unsigned short x;};
union s1s2 { struct s1 v1; struct s2 v2; };
static int read_s1x(struct s1 *p) { return p->x; }
static void write_s2x(struct s2 *p, int v) { p->x=v;}
int test(union s1s2 *p1, union s1s2 *p2, union s1s2 *p3)
{
if (read_s1x(&p1->v1))
{
unsigned short temp;
temp = p3->v1.x;
p3->v2.x = temp;
write_s2x(&p2->v2,1234);
temp = p3->v2.x;
p3->v1.x = temp;
}
return read_s1x(&p1->v1);
}
int test2(int x)
{
union s1s2 q[2];
q->v1.x = 4321;
return test(q,q+x,q+x);
}
#include <stdio.h>
int main(void)
{
printf("%d\n",test2(0));
}
Both GCC and Clang in -fstrict-aliasing mode with optimizations are acting
as if they ran into undefined behavior, and return 4321 instead of the
expected 1234. This happens in both C and C++ mode. Intel C++ and Visual
C++ return the expected 1234. All four compilers hardwire the result as a
constant parameter to printf rather than call test2 or modify memory at
runtime.
From my reading of the C++ Standard, particularly [class.union]/5,
assignment expressions through a union member access changes the active
member of the union (if the union member has a trivial default constructor,
which it does here, being C code). Taking the address of p2->v2 and p1->v1
ought to be legal because those are the active members of the union at the
time their pointers are taken.
Is this a well-defined program, or is there subtle undefined behavior
happening here?
Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Myriachan
2017-10-27 18:47:03 UTC
Permalink
Post by Richard Biener
Post by Yubin Ruan
+Cc gcc-list.
Does any gcc developer have any comments?
See PR82224. The code is valid.
I was about to put a link to this thread on GCC PR82224, but it looks like
Yubin beat me to it =^-^= I would recommend also reading the Clang bug
tracker thread as well, for interesting points they brought up.

I feel that this union issue - and the related common subsequence rule -
are important to the long-term future of C and C++. The answer to the
problem will have big effects on systems programmers.

Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Continue reading on narkive:
Loading...