Discussion:
std::promise and future could have some lock-free or non-blocking properties
(too old to reply)
Itaj Sherman
2018-11-08 01:32:27 UTC
Permalink
Like other atomics have is_lock_free property for some operations, because
there are expectations for such support on many systems,
I wonder what about std::promise/future?

I'd like to have:
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after promise::set_value()
returned, it could be lock-free

Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.

Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.

I hope I'm not abusing the definition of "lock-free" somehow, otherwise I
guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a mutex
and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.

BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-11 00:58:25 UTC
Permalink
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations, because
there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after promise::set_value()
returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise I
guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a mutex
and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
To clarify further the reason why I ask this,

One could implement such promise/future by doing atomic compare_exchange on
a flag first - promise::set_value() would mark it completed,
future::get/wait would mark that some threads are waiting on the future.
Then only if they have to they use the scheduler for blocking and notifying
(futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it implements
the user mode compare_exchange fast path.

Presumably most systems would have support for some mechanism that would
enable this.
However, mutex+condition_variable cannot be used for this because they
don’t remember the notify call if the thread that is going to wait did not
enter the wait yet - or else they force you to always lock the mutex on the
notify side.

So my question boils down to: Why does the standard on one hand bother to
define mechanisms like mutex and condition_variable that have these
weaknesses, but on the other hand does not define semaphore, and even in
promise/future avoids mentioning the possibility of lockfree fast path.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Tony V E
2018-11-11 06:14:15 UTC
Permalink
<html><head></head><body lang="en-US" style="background-color: rgb(255, 255, 255); line-height: initial;"> <div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">Do you think it is currently possible for an implementation to make it lock-free in the way you suggest?‎ ie without changing the current API?</div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><br></div><div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">If so, do you know if the implementation you use already does this?</div> <div style="width: 100%; font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);"><br style="display:initial"></div> <div style="font-size: initial; font-family: Calibri, 'Slate Pro', sans-serif, sans-serif; color: rgb(31, 73, 125); text-align: initial; background-color: rgb(255, 255, 255);">Sent&nbsp;from&nbsp;my&nbsp;BlackBerry&nbsp;portable&nbsp;Babbage&nbsp;Device</div> <table width="100%" style="background-color:white;border-spacing:0px;"> <tbody><tr><td colspan="2" style="font-size: initial; text-align: initial; background-color: rgb(255, 255, 255);"> <div style="border-style: solid none none; border-top-color: rgb(181, 196, 223); border-top-width: 1pt; padding: 3pt 0in 0in; font-family: Tahoma, 'BB Alpha Sans', 'Slate Pro'; font-size: 10pt;"> <div><b>From: </b>Itaj Sherman</div><div><b>Sent: </b>Saturday, November 10, 2018 4:58 PM</div><div><b>To: </b>ISO C++ Standard - Discussion</div><div><b>Reply To: </b>std-***@isocpp.org</div><div><b>Subject: </b>[std-discussion] Re: std::promise and future could have some lock-free or non-blocking properties</div></div></td></tr></tbody></table><div style="border-style: solid none none; border-top-color: rgb(186, 188, 209); border-top-width: 1pt; font-size: initial; text-align: initial; background-color: rgb(255, 255, 255);"></div><br><div id="_originalContent" style=""><div dir="ltr">On Thursday, 8 November 2018 03:32:27 UTC+2, Itaj Sherman wrote:<blockquote class="gmail_quote" style="margin: 0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div dir="ltr">Like other atomics have is_lock_free property for some operations, because there are expectations for such support on many systems,<br>I wonder what about std::promise/future?<br><br>I'd like to have:<br>lock-free promise::set_value()<br>lock-free future::valid()<br>if a future::get() invocation strongly-happens-after promise::set_value() returned, it could be lock-free<br><br>Isn't it a reasonable expectation that the standard promise/future be implemented this way on some systems?<br>And that I could check that with some is_lock_free property.<br><br>Or else, that I should be able to implement such things by myself using more primitive blocks from that standard library - such do not currently exist. <br>E.g. it's not possible to just use atomics+mutex+condition_<wbr>variable to implement such a "semi lockfree" promise/future, not without some busy spinning in future.get() or somewhere.<br><br>I hope I'm not abusing the definition of "lock-free" somehow, otherwise I guess I just mean non-blocking.<br>I'm especially concerned about threads in future.get() that lock a mutex and happen to get preempted, then they cause other threads in promise::set_value() to block too, unnecessarily.<br><br>BTW there's a somewhat similar problem implementing in standard c++, a semaphore (like linux) that would have a lock-free post(), and lock-free try_wait().<br><br><br></div></blockquote><div><br></div><div><div>To clarify further the reason why I ask this,</div><div><br></div><div>One could implement such promise/future by doing atomic compare_exchange on a flag first - promise::set_value() would mark it completed, future::get/wait would mark that some threads are waiting on the future. Then only if they have to they use the scheduler for blocking and notifying (futex on linux, Event on Windows).</div><div>On linux it’s possible to use directly sem_post/wait because it implements the user mode compare_exchange fast path.</div><div><br></div><div>Presumably most systems would have support for some mechanism that would enable this.</div><div>However, mutex+condition_variable cannot be used for this because they don’t remember the notify call if the thread that is going to wait did not enter the wait yet - or else they force you to always lock the mutex on the notify side.</div><div><br></div><div>So my question boils down to: Why does the standard on one hand bother to define mechanisms like mutex and condition_variable that have these weaknesses, but on the other hand does not define semaphore, and even in promise/future avoids mentioning the possibility of lockfree fast path.</div><div><br></div></div><div>&nbsp;</div></div>

<p></p>

-- <br>
<br>
--- <br>
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.<br>
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:std-discussion+***@isocpp.org">std-discussion+***@isocpp.org</a>.<br>
To post to this group, send email to <a href="mailto:std-***@isocpp.org">std-***@isocpp.org</a>.<br>
Visit this group at <a href="https://groups.google.com/a/isocpp.org/group/std-discussion/">https://groups.google.com/a/isocpp.org/group/std-discussion/</a>.<br>
<br><!--end of _originalContent --></div></body></html>

<p></p>

-- <br />
<br />
--- <br />
You received this message because you are subscribed to the Google Groups &quot;ISO C++ Standard - Discussion&quot; group.<br />
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="mailto:std-discussion+***@isocpp.org">std-discussion+***@isocpp.org</a>.<br />
To post to this group, send email to <a href="mailto:std-***@isocpp.org">std-***@isocpp.org</a>.<br />
Visit this group at <a href="https://groups.google.com/a/isocpp.org/group/std-discussion/">https://groups.google.com/a/isocpp.org/group/std-discussion/</a>.<br />
Itaj Sherman
2018-11-11 10:36:25 UTC
Permalink
libstdc++ promise/future uses std::call_once that uses pthread_once which
is implemented like that - fast path atomic and then futex
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.

Maybe I lack some knowledge, but I can't figure out why they don't do a
fast path. How could it ever be better to just always lock a mutex in
set_value?!

So, I suppose my question is more about whether there's a specific reason
that the standard doesn't mention it or provides some traits - or it just
wasn't important enough for anybody to suggest.

e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to make it
lock-free in the way you suggest?‎ ie without changing the current API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could have some
lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after promise::set_value()
returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise I
guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a mutex
and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic compare_exchange
on a flag first - promise::set_value() would mark it completed,
future::get/wait would mark that some threads are waiting on the future.
Then only if they have to they use the scheduler for blocking and notifying
(futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it implements
the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that would
enable this.
However, mutex+condition_variable cannot be used for this because they
don’t remember the notify call if the thread that is going to wait did not
enter the wait yet - or else they force you to always lock the mutex on the
notify side.
So my question boils down to: Why does the standard on one hand bother to
define mechanisms like mutex and condition_variable that have these
weaknesses, but on the other hand does not define semaphore, and even in
promise/future avoids mentioning the possibility of lockfree fast path.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-11 10:48:18 UTC
Permalink
Post by Itaj Sherman
libstdc++ promise/future uses std::call_once that uses pthread_once which
is implemented like that - fast path atomic and then futex
Well, libstdc++ promise/future uses call_once, but they also use separate
fast path atomic+futex for the notify and blocking. The fact that they use
call_once is actually not related to my point.
Post by Itaj Sherman
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.
Maybe I lack some knowledge, but I can't figure out why they don't do a
fast path. How could it ever be better to just always lock a mutex in
set_value?!
So, I suppose my question is more about whether there's a specific reason
that the standard doesn't mention it or provides some traits - or it just
wasn't important enough for anybody to suggest.
e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to make it
lock-free in the way you suggest?‎ ie without changing the current API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could have some
lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise
I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a mutex
and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic compare_exchange
on a flag first - promise::set_value() would mark it completed,
future::get/wait would mark that some threads are waiting on the future.
Then only if they have to they use the scheduler for blocking and notifying
(futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it
implements the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that would
enable this.
However, mutex+condition_variable cannot be used for this because they
don’t remember the notify call if the thread that is going to wait did not
enter the wait yet - or else they force you to always lock the mutex on the
notify side.
So my question boils down to: Why does the standard on one hand bother to
define mechanisms like mutex and condition_variable that have these
weaknesses, but on the other hand does not define semaphore, and even in
promise/future avoids mentioning the possibility of lockfree fast path.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-11 14:20:21 UTC
Permalink
I’m sorry, I should explain what I said in more details.

I wanted to say that libstdc++ does implement future/promise lock-free in
cases where they don’t have to block: promise.set_value() is always
lock-free. future.get()/wait() are lock-free if they strongly-happen-after
the return from set_value(). It is lock-free because they use a futex
rather than condition_variable. It’s also very fast because first they do a
fast path test of an atomic load, and then compare_exchange, and only then
use the futex.

The futex_wake, unlike mutex+condition_variable notify, enables the
set_value() to be lock-free, even if a thread does actually need to waken a
thread on the futex that is in future.get(). futex_wake does not lock a
mutex, it only uses atomic spinlocks in kernel mode with disabled
preemption.

I also wanted to mention that they implement call_once with a fast path
lock-free - that is, any call_once that doesn’t run concurrently with other
call_once calls is lock-free, whether it is an active or passive one. By
not-concurrently I mean all other call_once calls either
strongly-happen-before or after it.

The fact that they also use call_once inside promise.set_value() is not
directly related the issue of future/promise.

boost and MSVC implementations use mutex+condition_variable. They always
lock the mutex on every set_value() and every future.get(). Even if they
added a fast path atomic variable before it, they would still have to lock
the mutex in set_value() around the call to notify, in case there are
threads waiting.
Post by Itaj Sherman
libstdc++ promise/future uses std::call_once that uses pthread_once which
is implemented like that - fast path atomic and then futex
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.
Maybe I lack some knowledge, but I can't figure out why they don't do a
fast path. How could it ever be better to just always lock a mutex in
set_value?!
So, I suppose my question is more about whether there's a specific reason
that the standard doesn't mention it or provides some traits - or it just
wasn't important enough for anybody to suggest.
e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to make it
lock-free in the way you suggest?‎ ie without changing the current API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could have some
lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise
I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a mutex
and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic compare_exchange
on a flag first - promise::set_value() would mark it completed,
future::get/wait would mark that some threads are waiting on the future.
Then only if they have to they use the scheduler for blocking and notifying
(futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it
implements the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that would
enable this.
However, mutex+condition_variable cannot be used for this because they
don’t remember the notify call if the thread that is going to wait did not
enter the wait yet - or else they force you to always lock the mutex on the
notify side.
So my question boils down to: Why does the standard on one hand bother to
define mechanisms like mutex and condition_variable that have these
weaknesses, but on the other hand does not define semaphore, and even in
promise/future avoids mentioning the possibility of lockfree fast path.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Bryce Adelstein Lelbach aka wash
2018-11-12 04:27:40 UTC
Permalink
I don't think we should invest in std::future/std::please when we are
actively designing their replacement (senders and receivers).
Post by Itaj Sherman
I’m sorry, I should explain what I said in more details.
I wanted to say that libstdc++ does implement future/promise lock-free in
cases where they don’t have to block: promise.set_value() is always
lock-free. future.get()/wait() are lock-free if they strongly-happen-after
the return from set_value(). It is lock-free because they use a futex
rather than condition_variable. It’s also very fast because first they do a
fast path test of an atomic load, and then compare_exchange, and only then
use the futex.
The futex_wake, unlike mutex+condition_variable notify, enables the
set_value() to be lock-free, even if a thread does actually need to waken a
thread on the futex that is in future.get(). futex_wake does not lock a
mutex, it only uses atomic spinlocks in kernel mode with disabled
preemption.
I also wanted to mention that they implement call_once with a fast path
lock-free - that is, any call_once that doesn’t run concurrently with other
call_once calls is lock-free, whether it is an active or passive one. By
not-concurrently I mean all other call_once calls either
strongly-happen-before or after it.
The fact that they also use call_once inside promise.set_value() is not
directly related the issue of future/promise.
boost and MSVC implementations use mutex+condition_variable. They always
lock the mutex on every set_value() and every future.get(). Even if they
added a fast path atomic variable before it, they would still have to lock
the mutex in set_value() around the call to notify, in case there are
threads waiting.
Post by Itaj Sherman
libstdc++ promise/future uses std::call_once that uses pthread_once which
is implemented like that - fast path atomic and then futex
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.
Maybe I lack some knowledge, but I can't figure out why they don't do a
fast path. How could it ever be better to just always lock a mutex in
set_value?!
So, I suppose my question is more about whether there's a specific reason
that the standard doesn't mention it or provides some traits - or it just
wasn't important enough for anybody to suggest.
e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to make it
lock-free in the way you suggest?‎ ie without changing the current API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could have some
lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise
I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a
mutex and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic compare_exchange
on a flag first - promise::set_value() would mark it completed,
future::get/wait would mark that some threads are waiting on the future.
Then only if they have to they use the scheduler for blocking and notifying
(futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it
implements the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that would
enable this.
However, mutex+condition_variable cannot be used for this because they
don’t remember the notify call if the thread that is going to wait did not
enter the wait yet - or else they force you to always lock the mutex on the
notify side.
So my question boils down to: Why does the standard on one hand bother
to define mechanisms like mutex and condition_variable that have these
weaknesses, but on the other hand does not define semaphore, and even in
promise/future avoids mentioning the possibility of lockfree fast path.
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-14 01:37:33 UTC
Permalink
Post by Bryce Adelstein Lelbach aka wash
I don't think we should invest in std::future/std::please when we are
actively designing their replacement (senders and receivers).
Certainly. Something redesigned from scratch could answer my question much
better than trying to fix future/promise.

Basically what I was asking about is why don't we have a standard way to
notify/wake another thread without explicitly locking a mutex.
I wasn't connecting the dots before, but now I see that atomic_wait and
atomic_notify are actually equivalent to the linux futex.

Is there any written draft for these senders and receivers, or other
information?
Post by Bryce Adelstein Lelbach aka wash
Post by Itaj Sherman
I’m sorry, I should explain what I said in more details.
I wanted to say that libstdc++ does implement future/promise lock-free in
cases where they don’t have to block: promise.set_value() is always
lock-free. future.get()/wait() are lock-free if they strongly-happen-after
the return from set_value(). It is lock-free because they use a futex
rather than condition_variable. It’s also very fast because first they do a
fast path test of an atomic load, and then compare_exchange, and only then
use the futex.
The futex_wake, unlike mutex+condition_variable notify, enables the
set_value() to be lock-free, even if a thread does actually need to waken a
thread on the futex that is in future.get(). futex_wake does not lock a
mutex, it only uses atomic spinlocks in kernel mode with disabled
preemption.
I also wanted to mention that they implement call_once with a fast path
lock-free - that is, any call_once that doesn’t run concurrently with other
call_once calls is lock-free, whether it is an active or passive one. By
not-concurrently I mean all other call_once calls either
strongly-happen-before or after it.
The fact that they also use call_once inside promise.set_value() is not
directly related the issue of future/promise.
boost and MSVC implementations use mutex+condition_variable. They always
lock the mutex on every set_value() and every future.get(). Even if they
added a fast path atomic variable before it, they would still have to lock
the mutex in set_value() around the call to notify, in case there are
threads waiting.
Post by Itaj Sherman
libstdc++ promise/future uses std::call_once that uses pthread_once
which is implemented like that - fast path atomic and then futex
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.
Maybe I lack some knowledge, but I can't figure out why they don't do a
fast path. How could it ever be better to just always lock a mutex in
set_value?!
So, I suppose my question is more about whether there's a specific
reason that the standard doesn't mention it or provides some traits - or it
just wasn't important enough for anybody to suggest.
e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to make it
lock-free in the way you suggest?‎ ie without changing the current API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could have
some lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself
using more primitive blocks from that standard library - such do not
currently exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow,
otherwise I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a
mutex and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic
compare_exchange on a flag first - promise::set_value() would mark it
completed, future::get/wait would mark that some threads are waiting on the
future. Then only if they have to they use the scheduler for blocking and
notifying (futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it
implements the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that
would enable this.
However, mutex+condition_variable cannot be used for this because they
don’t remember the notify call if the thread that is going to wait did not
enter the wait yet - or else they force you to always lock the mutex on the
notify side.
So my question boils down to: Why does the standard on one hand bother
to define mechanisms like mutex and condition_variable that have these
weaknesses, but on the other hand does not define semaphore, and even in
promise/future avoids mentioning the possibility of lockfree fast path.
itaj
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Bryce Adelstein Lelbach aka wash
2018-11-14 03:50:13 UTC
Permalink
P1194
Post by Itaj Sherman
Post by Bryce Adelstein Lelbach aka wash
I don't think we should invest in std::future/std::please when we are
actively designing their replacement (senders and receivers).
Certainly. Something redesigned from scratch could answer my question much
better than trying to fix future/promise.
Basically what I was asking about is why don't we have a standard way to
notify/wake another thread without explicitly locking a mutex.
I wasn't connecting the dots before, but now I see that atomic_wait and
atomic_notify are actually equivalent to the linux futex.
Is there any written draft for these senders and receivers, or other
information?
Post by Bryce Adelstein Lelbach aka wash
Post by Itaj Sherman
I’m sorry, I should explain what I said in more details.
I wanted to say that libstdc++ does implement future/promise lock-free
in cases where they don’t have to block: promise.set_value() is always
lock-free. future.get()/wait() are lock-free if they strongly-happen-after
the return from set_value(). It is lock-free because they use a futex
rather than condition_variable. It’s also very fast because first they do a
fast path test of an atomic load, and then compare_exchange, and only then
use the futex.
The futex_wake, unlike mutex+condition_variable notify, enables the
set_value() to be lock-free, even if a thread does actually need to waken a
thread on the futex that is in future.get(). futex_wake does not lock a
mutex, it only uses atomic spinlocks in kernel mode with disabled
preemption.
I also wanted to mention that they implement call_once with a fast path
lock-free - that is, any call_once that doesn’t run concurrently with other
call_once calls is lock-free, whether it is an active or passive one. By
not-concurrently I mean all other call_once calls either
strongly-happen-before or after it.
The fact that they also use call_once inside promise.set_value() is not
directly related the issue of future/promise.
boost and MSVC implementations use mutex+condition_variable. They always
lock the mutex on every set_value() and every future.get(). Even if they
added a fast path atomic variable before it, they would still have to lock
the mutex in set_value() around the call to notify, in case there are
threads waiting.
Post by Itaj Sherman
libstdc++ promise/future uses std::call_once that uses pthread_once
which is implemented like that - fast path atomic and then futex
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.
Maybe I lack some knowledge, but I can't figure out why they don't do a
fast path. How could it ever be better to just always lock a mutex in
set_value?!
So, I suppose my question is more about whether there's a specific
reason that the standard doesn't mention it or provides some traits - or it
just wasn't important enough for anybody to suggest.
e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to make it
lock-free in the way you suggest?‎ ie without changing the current API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could have
some lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself
using more primitive blocks from that standard library - such do not
currently exist.
E.g. it's not possible to just use atomics+mutex+condition_variable
to implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow,
otherwise I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a
mutex and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++,
a semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic
compare_exchange on a flag first - promise::set_value() would mark it
completed, future::get/wait would mark that some threads are waiting on the
future. Then only if they have to they use the scheduler for blocking and
notifying (futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it
implements the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that
would enable this.
However, mutex+condition_variable cannot be used for this because they
don’t remember the notify call if the thread that is going to wait did not
enter the wait yet - or else they force you to always lock the mutex on the
notify side.
So my question boils down to: Why does the standard on one hand bother
to define mechanisms like mutex and condition_variable that have these
weaknesses, but on the other hand does not define semaphore, and even in
promise/future avoids mentioning the possibility of lockfree fast path.
itaj
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Bryce Adelstein Lelbach aka wash
2018-11-14 03:50:35 UTC
Permalink
Rather, wg21.link/P1194

On Tue, Nov 13, 2018, 7:50 PM Bryce Adelstein Lelbach aka wash <
P1194
Post by Itaj Sherman
Post by Bryce Adelstein Lelbach aka wash
I don't think we should invest in std::future/std::please when we are
actively designing their replacement (senders and receivers).
Certainly. Something redesigned from scratch could answer my question
much better than trying to fix future/promise.
Basically what I was asking about is why don't we have a standard way to
notify/wake another thread without explicitly locking a mutex.
I wasn't connecting the dots before, but now I see that atomic_wait and
atomic_notify are actually equivalent to the linux futex.
Is there any written draft for these senders and receivers, or other
information?
Post by Bryce Adelstein Lelbach aka wash
Post by Itaj Sherman
I’m sorry, I should explain what I said in more details.
I wanted to say that libstdc++ does implement future/promise lock-free
in cases where they don’t have to block: promise.set_value() is always
lock-free. future.get()/wait() are lock-free if they strongly-happen-after
the return from set_value(). It is lock-free because they use a futex
rather than condition_variable. It’s also very fast because first they do a
fast path test of an atomic load, and then compare_exchange, and only then
use the futex.
The futex_wake, unlike mutex+condition_variable notify, enables the
set_value() to be lock-free, even if a thread does actually need to waken a
thread on the futex that is in future.get(). futex_wake does not lock a
mutex, it only uses atomic spinlocks in kernel mode with disabled
preemption.
I also wanted to mention that they implement call_once with a fast path
lock-free - that is, any call_once that doesn’t run concurrently with other
call_once calls is lock-free, whether it is an active or passive one. By
not-concurrently I mean all other call_once calls either
strongly-happen-before or after it.
The fact that they also use call_once inside promise.set_value() is not
directly related the issue of future/promise.
boost and MSVC implementations use mutex+condition_variable. They
always lock the mutex on every set_value() and every future.get(). Even if
they added a fast path atomic variable before it, they would still have to
lock the mutex in set_value() around the call to notify, in case there
are threads waiting.
Post by Itaj Sherman
libstdc++ promise/future uses std::call_once that uses pthread_once
which is implemented like that - fast path atomic and then futex
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.
Maybe I lack some knowledge, but I can't figure out why they don't do
a fast path. How could it ever be better to just always lock a mutex in
set_value?!
So, I suppose my question is more about whether there's a specific
reason that the standard doesn't mention it or provides some traits - or it
just wasn't important enough for anybody to suggest.
e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to make
it lock-free in the way you suggest?‎ ie without changing the current API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could have
some lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future
be implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself
using more primitive blocks from that standard library - such do not
currently exist.
E.g. it's not possible to just use atomics+mutex+condition_variable
to implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow,
otherwise I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a
mutex and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++,
a semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic
compare_exchange on a flag first - promise::set_value() would mark it
completed, future::get/wait would mark that some threads are waiting on the
future. Then only if they have to they use the scheduler for blocking and
notifying (futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it
implements the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that
would enable this.
However, mutex+condition_variable cannot be used for this because
they don’t remember the notify call if the thread that is going to wait did
not enter the wait yet - or else they force you to always lock the mutex on
the notify side.
So my question boils down to: Why does the standard on one hand
bother to define mechanisms like mutex and condition_variable that have
these weaknesses, but on the other hand does not define semaphore, and even
in promise/future avoids mentioning the possibility of lockfree fast path.
itaj
--
---
You received this message because you are subscribed to the Google Groups
"ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-14 22:37:02 UTC
Permalink
Wow, very nice!!!

I have some use case that I'm not able to figure out how you intended to be
implemented using sender/receiver without incurring an extra allocation.

Because you always split execute_???() calls from submit(), even if they
happen on the same line, I don't see how to implement the following example.

Unless you add an execute_???_and_submit( ..., receiver ).

If have an executor implementation that supports the following usage
schemes, I’m supposed to be able to wrap the same algorithm with the
sender/receiver api without incurring any further performance cost on any
of the usage schemes.

This is a simple eager executor with worker threads and a task queue.

The point here is that the tasks pushed on the queue are initialized with a
pointer to a location where the result has to be written - so they must
have the pointer given before pushed to the queue, otherwise further
synchronization will be required to provide them the pointer.

Or else, is the intention that the executor implementation must also
maintain its ready-results container, and put iterators to it inside sender
objects?!

use cases:

user1 - wants to allocate a buffer for the result before calling execute.
e.g. on its stack.

user2 - another does not want to block immediately, and doesn't have some
place to allocate the result buffer, so it must be allocate with some
allocator.

The implementation for the first user can avoid all dynamic allocations,
the second user cannot, it must have at least an atomic flag allocated
somewhere.

I don't know if you inteded to support user1 allowing it not to allocate,
but I think you can't do that with the current api that always splits
execute_???() from submit().

Here's the code for how it would be implemented from scratch, and below
considering how it might be with sender/receiver.

//this is much like a shared state for type T

template< typename T >

class synchronized_result

{

private:

atomic_int_fast_wait_t atomic_flag_;

std::optional< T > result_;

std::exception_ptr exception_;

public:

~synchronized_result()

{

this->wait(); /*must ensure that an asynchronic execution completed,
because it's holding a pointer*/

}

T get();

void wait();

void set_value(T);

template< typename E >

void set_exception(E);

};


class executor_with_worker_threads

{

private:

std::vector< std::thread > worker_threads_;

thread_safe_queue< std::function<void()> > task_queue_;

public:

template< typename Result, typename Func >

void execute(Func func, synchronized_result<Result>* sync_res)

{

/*user is responsible to keep sync_res valid until the execution completes.
It's not dangerous if the class is wrapped correctly*/

task_queue_.push([res = sync_res, f = std::move(func)]{

try {

res->set_value(f());

}

catch (...)

{

res->set_exception(current_exception());

}

});

}

};



void user_1()

{

syncronized_result<int> result1; //not allowed to move this until the
execution complete

my_executor.execute([] { return some_calculation1(); }, &result1);

syncronized_result<int> result2; //not allowed to move this until the
execution complete

my_executor.execute([] { return some_calculation2(); }, &result2);

std::cout << result1.get() << result2.get();

}


auto user_2()

{

auto result_handle = make_shared< syncronized_result<int> >();

my_executor.execute([] { return some_calculation(); }, &*result_handle );

return result_handle;

}


presumeably with the sender/receiver api it's supposed to look like this:

void user_1()

{

receiver<int> result1;

executor.make_value_task([] { return some_calculation1(); }
).submit(result1);

receiver<int> result2;

executor.make_value_task([] { return some_calculation2();
}).submit(result2);

std::cout << result1.get() << result2.get();

}


auto user_2()

{

auto sender = executor.make_value_task([] { return some_calculation2(); });

return sender; /*however this must hold a pointer to an actual buffer that
must have been already allocated somewhere*/

}

The call to make_value_task() does not know when submit() is going to be
called, so it must allocate a buffer for the result.

You cannot implement user_1 with sender/receiver without an extra
allocation. You must have some sort of make_value_task_and_sumbit().

void user_1()

{

receiver<int> result1;

executor.make_value_task_and_submit([] { return some_calculation1(); },
result1 );

receiver<int> result2;

executor.make_value_task_and_submit([] { return some_calculation2(); },
result2 );

std::cout << result1.get() << result2.get();

}


On Wednesday, 14 November 2018 05:50:48 UTC+2, Bryce Adelstein Lelbach
Post by Bryce Adelstein Lelbach aka wash
Rather, wg21.link/P1194
On Tue, Nov 13, 2018, 7:50 PM Bryce Adelstein Lelbach aka wash <
P1194
Post by Itaj Sherman
Post by Bryce Adelstein Lelbach aka wash
I don't think we should invest in std::future/std::please when we are
actively designing their replacement (senders and receivers).
Certainly. Something redesigned from scratch could answer my question
much better than trying to fix future/promise.
Basically what I was asking about is why don't we have a standard way to
notify/wake another thread without explicitly locking a mutex.
I wasn't connecting the dots before, but now I see that atomic_wait and
atomic_notify are actually equivalent to the linux futex.
Is there any written draft for these senders and receivers, or other
information?
Post by Bryce Adelstein Lelbach aka wash
Post by Itaj Sherman
I’m sorry, I should explain what I said in more details.
I wanted to say that libstdc++ does implement future/promise lock-free
in cases where they don’t have to block: promise.set_value() is always
lock-free. future.get()/wait() are lock-free if they strongly-happen-after
the return from set_value(). It is lock-free because they use a futex
rather than condition_variable. It’s also very fast because first they do a
fast path test of an atomic load, and then compare_exchange, and only then
use the futex.
The futex_wake, unlike mutex+condition_variable notify, enables the
set_value() to be lock-free, even if a thread does actually need to waken a
thread on the futex that is in future.get(). futex_wake does not lock a
mutex, it only uses atomic spinlocks in kernel mode with disabled
preemption.
I also wanted to mention that they implement call_once with a fast
path lock-free - that is, any call_once that doesn’t run concurrently with
other call_once calls is lock-free, whether it is an active or passive one.
By not-concurrently I mean all other call_once calls either
strongly-happen-before or after it.
The fact that they also use call_once inside promise.set_value() is
not directly related the issue of future/promise.
boost and MSVC implementations use mutex+condition_variable. They
always lock the mutex on every set_value() and every future.get(). Even if
they added a fast path atomic variable before it, they would still have to
lock the mutex in set_value() around the call to notify, in case
there are threads waiting.
Post by Itaj Sherman
libstdc++ promise/future uses std::call_once that uses pthread_once
which is implemented like that - fast path atomic and then futex
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.
Maybe I lack some knowledge, but I can't figure out why they don't do
a fast path. How could it ever be better to just always lock a mutex in
set_value?!
So, I suppose my question is more about whether there's a specific
reason that the standard doesn't mention it or provides some traits - or it
just wasn't important enough for anybody to suggest.
e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to make
it lock-free in the way you suggest?‎ ie without changing the current API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could have
some lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future
be implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself
using more primitive blocks from that standard library - such do not
currently exist.
E.g. it's not possible to just use atomics+mutex+condition_variable
to implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow,
otherwise I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a
mutex and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard
c++, a semaphore (like linux) that would have a lock-free post(), and
lock-free try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic
compare_exchange on a flag first - promise::set_value() would mark it
completed, future::get/wait would mark that some threads are waiting on the
future. Then only if they have to they use the scheduler for blocking and
notifying (futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it
implements the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that
would enable this.
However, mutex+condition_variable cannot be used for this because
they don’t remember the notify call if the thread that is going to wait did
not enter the wait yet - or else they force you to always lock the mutex on
the notify side.
So my question boils down to: Why does the standard on one hand
bother to define mechanisms like mutex and condition_variable that have
these weaknesses, but on the other hand does not define semaphore, and even
in promise/future avoids mentioning the possibility of lockfree fast path.
itaj
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
.
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-14 23:00:16 UTC
Permalink
It might seem weird that I would worry about a single allocation when I
already use a queue and threads polling it.
I'm thinking about executors that might have behave very lightweight in
certain cases. Suppose for example in some cases the executor realizes that
the task is very small and wants to execute it in the local thread inside make_value_task().
Then it would have to allocate a buffer to store the result.
Post by Itaj Sherman
Wow, very nice!!!
I have some use case that I'm not able to figure out how you intended to
be implemented using sender/receiver without incurring an extra allocation.
Because you always split execute_???() calls from submit(), even if they
happen on the same line, I don't see how to implement the following example.
Unless you add an execute_???_and_submit( ..., receiver ).
If have an executor implementation that supports the following usage
schemes, I’m supposed to be able to wrap the same algorithm with the
sender/receiver api without incurring any further performance cost on any
of the usage schemes.
This is a simple eager executor with worker threads and a task queue.
The point here is that the tasks pushed on the queue are initialized with
a pointer to a location where the result has to be written - so they must
have the pointer given before pushed to the queue, otherwise further
synchronization will be required to provide them the pointer.
Or else, is the intention that the executor implementation must also
maintain its ready-results container, and put iterators to it inside sender
objects?!
user1 - wants to allocate a buffer for the result before calling execute.
e.g. on its stack.
user2 - another does not want to block immediately, and doesn't have some
place to allocate the result buffer, so it must be allocate with some
allocator.
The implementation for the first user can avoid all dynamic allocations,
the second user cannot, it must have at least an atomic flag allocated
somewhere.
I don't know if you inteded to support user1 allowing it not to allocate,
but I think you can't do that with the current api that always splits
execute_???() from submit().
Here's the code for how it would be implemented from scratch, and below
considering how it might be with sender/receiver.
//this is much like a shared state for type T
template< typename T >
class synchronized_result
{
atomic_int_fast_wait_t atomic_flag_;
std::optional< T > result_;
std::exception_ptr exception_;
~synchronized_result()
{
this->wait(); /*must ensure that an asynchronic execution completed,
because it's holding a pointer*/
}
T get();
void wait();
void set_value(T);
template< typename E >
void set_exception(E);
};
class executor_with_worker_threads
{
std::vector< std::thread > worker_threads_;
thread_safe_queue< std::function<void()> > task_queue_;
template< typename Result, typename Func >
void execute(Func func, synchronized_result<Result>* sync_res)
{
/*user is responsible to keep sync_res valid until the execution
completes. It's not dangerous if the class is wrapped correctly*/
task_queue_.push([res = sync_res, f = std::move(func)]{
try {
res->set_value(f());
}
catch (...)
{
res->set_exception(current_exception());
}
});
}
};
void user_1()
{
syncronized_result<int> result1; //not allowed to move this until the
execution complete
my_executor.execute([] { return some_calculation1(); }, &result1);
syncronized_result<int> result2; //not allowed to move this until the
execution complete
my_executor.execute([] { return some_calculation2(); }, &result2);
std::cout << result1.get() << result2.get();
}
auto user_2()
{
auto result_handle = make_shared< syncronized_result<int> >();
my_executor.execute([] { return some_calculation(); }, &*result_handle );
return result_handle;
}
void user_1()
{
receiver<int> result1;
executor.make_value_task([] { return some_calculation1(); }
).submit(result1);
receiver<int> result2;
executor.make_value_task([] { return some_calculation2();
}).submit(result2);
std::cout << result1.get() << result2.get();
}
auto user_2()
{
auto sender = executor.make_value_task([] { return some_calculation2(); });
return sender; /*however this must hold a pointer to an actual buffer that
must have been already allocated somewhere*/
}
The call to make_value_task() does not know when submit() is going to be
called, so it must allocate a buffer for the result.
You cannot implement user_1 with sender/receiver without an extra
allocation. You must have some sort of make_value_task_and_sumbit().
void user_1()
{
receiver<int> result1;
executor.make_value_task_and_submit([] { return some_calculation1(); },
result1 );
receiver<int> result2;
executor.make_value_task_and_submit([] { return some_calculation2(); },
result2 );
std::cout << result1.get() << result2.get();
}
On Wednesday, 14 November 2018 05:50:48 UTC+2, Bryce Adelstein Lelbach
Post by Bryce Adelstein Lelbach aka wash
Rather, wg21.link/P1194
On Tue, Nov 13, 2018, 7:50 PM Bryce Adelstein Lelbach aka wash <
P1194
Post by Itaj Sherman
Post by Bryce Adelstein Lelbach aka wash
I don't think we should invest in std::future/std::please when we are
actively designing their replacement (senders and receivers).
Certainly. Something redesigned from scratch could answer my question
much better than trying to fix future/promise.
Basically what I was asking about is why don't we have a standard way
to notify/wake another thread without explicitly locking a mutex.
I wasn't connecting the dots before, but now I see that atomic_wait and
atomic_notify are actually equivalent to the linux futex.
Is there any written draft for these senders and receivers, or other
information?
Post by Bryce Adelstein Lelbach aka wash
Post by Itaj Sherman
I’m sorry, I should explain what I said in more details.
I wanted to say that libstdc++ does implement future/promise
lock-free in cases where they don’t have to block: promise.set_value() is
always lock-free. future.get()/wait() are lock-free if they
strongly-happen-after the return from set_value(). It is lock-free because
they use a futex rather than condition_variable. It’s also very fast
because first they do a fast path test of an atomic load, and then
compare_exchange, and only then use the futex.
The futex_wake, unlike mutex+condition_variable notify, enables the
set_value() to be lock-free, even if a thread does actually need to waken a
thread on the futex that is in future.get(). futex_wake does not lock a
mutex, it only uses atomic spinlocks in kernel mode with disabled
preemption.
I also wanted to mention that they implement call_once with a fast
path lock-free - that is, any call_once that doesn’t run concurrently with
other call_once calls is lock-free, whether it is an active or passive one.
By not-concurrently I mean all other call_once calls either
strongly-happen-before or after it.
The fact that they also use call_once inside promise.set_value() is
not directly related the issue of future/promise.
boost and MSVC implementations use mutex+condition_variable. They
always lock the mutex on every set_value() and every future.get(). Even if
they added a fast path atomic variable before it, they would still have to
lock the mutex in set_value() around the call to notify, in case
there are threads waiting.
Post by Itaj Sherman
libstdc++ promise/future uses std::call_once that uses pthread_once
which is implemented like that - fast path atomic and then futex
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.
Maybe I lack some knowledge, but I can't figure out why they don't
do a fast path. How could it ever be better to just always lock a mutex in
set_value?!
So, I suppose my question is more about whether there's a specific
reason that the standard doesn't mention it or provides some traits - or it
just wasn't important enough for anybody to suggest.
e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to make
it lock-free in the way you suggest?‎ ie without changing the current API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could have
some lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future
be implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself
using more primitive blocks from that standard library - such do not
currently exist.
E.g. it's not possible to just use
atomics+mutex+condition_variable to implement such a "semi lockfree"
promise/future, not without some busy spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow,
otherwise I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a
mutex and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard
c++, a semaphore (like linux) that would have a lock-free post(), and
lock-free try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic
compare_exchange on a flag first - promise::set_value() would mark it
completed, future::get/wait would mark that some threads are waiting on the
future. Then only if they have to they use the scheduler for blocking and
notifying (futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it
implements the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that
would enable this.
However, mutex+condition_variable cannot be used for this because
they don’t remember the notify call if the thread that is going to wait did
not enter the wait yet - or else they force you to always lock the mutex on
the notify side.
So my question boils down to: Why does the standard on one hand
bother to define mechanisms like mutex and condition_variable that have
these weaknesses, but on the other hand does not define semaphore, and even
in promise/future avoids mentioning the possibility of lockfree fast path.
itaj
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-14 23:35:56 UTC
Permalink
Post by Itaj Sherman
It might seem weird that I would worry about a single allocation when I
already use a queue and threads polling it.
I'm thinking about executors that might have behave very lightweight in
certain cases. Suppose for example in some cases the executor realizes that
the task is very small and wants to execute it in the local thread inside make_value_task().
Then it would have to allocate a buffer to store the result.
along these lines:

template< typename Result, typename Func >
void execute(Func func, synchronized_result<Result>* sync_res)
{
/*user is responsible to keep sync_res valid until the execution completes.
It's not dangerous if the class is wrapped correctly*/
auto task = [res = sync_res, f = std::move(func)]{

try {
res->set_value(f());
}
catch (...)
{
res->set_exception(current_exception());
}

};

if constexpr (is_tiny_function<Func>::value) {
task();
}
else {
task_queue_.push(task);
}
}
Post by Itaj Sherman
Post by Itaj Sherman
Wow, very nice!!!
I have some use case that I'm not able to figure out how you intended to
be implemented using sender/receiver without incurring an extra allocation.
Because you always split execute_???() calls from submit(), even if they
happen on the same line, I don't see how to implement the following example.
Unless you add an execute_???_and_submit( ..., receiver ).
If have an executor implementation that supports the following usage
schemes, I’m supposed to be able to wrap the same algorithm with the
sender/receiver api without incurring any further performance cost on any
of the usage schemes.
This is a simple eager executor with worker threads and a task queue.
The point here is that the tasks pushed on the queue are initialized with
a pointer to a location where the result has to be written - so they must
have the pointer given before pushed to the queue, otherwise further
synchronization will be required to provide them the pointer.
Or else, is the intention that the executor implementation must also
maintain its ready-results container, and put iterators to it inside sender
objects?!
user1 - wants to allocate a buffer for the result before calling execute.
e.g. on its stack.
user2 - another does not want to block immediately, and doesn't have some
place to allocate the result buffer, so it must be allocate with some
allocator.
The implementation for the first user can avoid all dynamic allocations,
the second user cannot, it must have at least an atomic flag allocated
somewhere.
I don't know if you inteded to support user1 allowing it not to allocate,
but I think you can't do that with the current api that always splits
execute_???() from submit().
Here's the code for how it would be implemented from scratch, and below
considering how it might be with sender/receiver.
//this is much like a shared state for type T
template< typename T >
class synchronized_result
{
atomic_int_fast_wait_t atomic_flag_;
std::optional< T > result_;
std::exception_ptr exception_;
~synchronized_result()
{
this->wait(); /*must ensure that an asynchronic execution completed,
because it's holding a pointer*/
}
T get();
void wait();
void set_value(T);
template< typename E >
void set_exception(E);
};
class executor_with_worker_threads
{
std::vector< std::thread > worker_threads_;
thread_safe_queue< std::function<void()> > task_queue_;
template< typename Result, typename Func >
void execute(Func func, synchronized_result<Result>* sync_res)
{
/*user is responsible to keep sync_res valid until the execution
completes. It's not dangerous if the class is wrapped correctly*/
task_queue_.push([res = sync_res, f = std::move(func)]{
try {
res->set_value(f());
}
catch (...)
{
res->set_exception(current_exception());
}
});
}
};
void user_1()
{
syncronized_result<int> result1; //not allowed to move this until the
execution complete
my_executor.execute([] { return some_calculation1(); }, &result1);
syncronized_result<int> result2; //not allowed to move this until the
execution complete
my_executor.execute([] { return some_calculation2(); }, &result2);
std::cout << result1.get() << result2.get();
}
auto user_2()
{
auto result_handle = make_shared< syncronized_result<int> >();
my_executor.execute([] { return some_calculation(); }, &*result_handle );
return result_handle;
}
void user_1()
{
receiver<int> result1;
executor.make_value_task([] { return some_calculation1(); }
).submit(result1);
receiver<int> result2;
executor.make_value_task([] { return some_calculation2();
}).submit(result2);
std::cout << result1.get() << result2.get();
}
auto user_2()
{
auto sender = executor.make_value_task([] { return some_calculation2(); });
return sender; /*however this must hold a pointer to an actual buffer
that must have been already allocated somewhere*/
}
The call to make_value_task() does not know when submit() is going to be
called, so it must allocate a buffer for the result.
You cannot implement user_1 with sender/receiver without an extra
allocation. You must have some sort of make_value_task_and_sumbit().
void user_1()
{
receiver<int> result1;
executor.make_value_task_and_submit([] { return some_calculation1(); },
result1 );
receiver<int> result2;
executor.make_value_task_and_submit([] { return some_calculation2(); },
result2 );
std::cout << result1.get() << result2.get();
}
On Wednesday, 14 November 2018 05:50:48 UTC+2, Bryce Adelstein Lelbach
Post by Bryce Adelstein Lelbach aka wash
Rather, wg21.link/P1194
On Tue, Nov 13, 2018, 7:50 PM Bryce Adelstein Lelbach aka wash <
P1194
Post by Itaj Sherman
Post by Bryce Adelstein Lelbach aka wash
I don't think we should invest in std::future/std::please when we are
actively designing their replacement (senders and receivers).
Certainly. Something redesigned from scratch could answer my question
much better than trying to fix future/promise.
Basically what I was asking about is why don't we have a standard way
to notify/wake another thread without explicitly locking a mutex.
I wasn't connecting the dots before, but now I see that atomic_wait
and atomic_notify are actually equivalent to the linux futex.
Is there any written draft for these senders and receivers, or other
information?
Post by Bryce Adelstein Lelbach aka wash
Post by Itaj Sherman
I’m sorry, I should explain what I said in more details.
I wanted to say that libstdc++ does implement future/promise
lock-free in cases where they don’t have to block: promise.set_value() is
always lock-free. future.get()/wait() are lock-free if they
strongly-happen-after the return from set_value(). It is lock-free because
they use a futex rather than condition_variable. It’s also very fast
because first they do a fast path test of an atomic load, and then
compare_exchange, and only then use the futex.
The futex_wake, unlike mutex+condition_variable notify, enables the
set_value() to be lock-free, even if a thread does actually need to waken a
thread on the futex that is in future.get(). futex_wake does not lock a
mutex, it only uses atomic spinlocks in kernel mode with disabled
preemption.
I also wanted to mention that they implement call_once with a fast
path lock-free - that is, any call_once that doesn’t run concurrently with
other call_once calls is lock-free, whether it is an active or passive one.
By not-concurrently I mean all other call_once calls either
strongly-happen-before or after it.
The fact that they also use call_once inside promise.set_value() is
not directly related the issue of future/promise.
boost and MSVC implementations use mutex+condition_variable. They
always lock the mutex on every set_value() and every future.get(). Even if
they added a fast path atomic variable before it, they would still have to
lock the mutex in set_value() around the call to notify, in case
there are threads waiting.
Post by Itaj Sherman
libstdc++ promise/future uses std::call_once that uses pthread_once
which is implemented like that - fast path atomic and then futex
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.
Maybe I lack some knowledge, but I can't figure out why they don't
do a fast path. How could it ever be better to just always lock a mutex in
set_value?!
So, I suppose my question is more about whether there's a specific
reason that the standard doesn't mention it or provides some traits - or it
just wasn't important enough for anybody to suggest.
e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to
make it lock-free in the way you suggest?‎ ie without changing the current
API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could
have some lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some
operations, because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard
promise/future be implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself
using more primitive blocks from that standard library - such do not
currently exist.
E.g. it's not possible to just use
atomics+mutex+condition_variable to implement such a "semi lockfree"
promise/future, not without some busy spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow,
otherwise I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock
a mutex and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard
c++, a semaphore (like linux) that would have a lock-free post(), and
lock-free try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic
compare_exchange on a flag first - promise::set_value() would mark it
completed, future::get/wait would mark that some threads are waiting on the
future. Then only if they have to they use the scheduler for blocking and
notifying (futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it
implements the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that
would enable this.
However, mutex+condition_variable cannot be used for this because
they don’t remember the notify call if the thread that is going to wait did
not enter the wait yet - or else they force you to always lock the mutex on
the notify side.
So my question boils down to: Why does the standard on one hand
bother to define mechanisms like mutex and condition_variable that have
these weaknesses, but on the other hand does not define semaphore, and even
in promise/future avoids mentioning the possibility of lockfree fast path.
itaj
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-15 03:20:48 UTC
Permalink
I’m sorry, I totally botched explaining the use case. I can phrase it much
clearer than I did:

If there’s an executor that for some execute calls can decide (at runtime
actually rather than compile time) that the execution of the task can be
very lightweight and run the function locally in the calling thread, that
is inside the execute_*() function.

Reasons it might know that are that the task’s runtime will be very short,
for example if it can deduce that the task does some calculation on a
range, and in many calls the range size happens to be very small, while in
others big. Another option is that the executor caches results of previous
calls.
Is it reasonable that these optimizations will be done in the executor
implementation? I’m not sure. It seems like much more of a user issue,
because the user knows more about the functions he’s running, if it's short
the user just wouldn't use the executor. Maybe. The user might know more
easily how long the function execution might take, but it will be the
executor that will be able to decide whether running it locally or
switching it to another thread is worth it.
Anyways, that’s basically the use case that brings this issue.

In these use cases, if both the execution is eager and the result can be
written immediately to some target buffer that the user supplied - you save
dynamic allocations.
This will require both that the user has the result buffer allocated before
it calls execute_*(), and that execute_*() can write to it.
I think in terms of sender/receiver, it means that the receiver has to be
supplied to the execute_*() call.

So, we have this my_executor instance, and in runtime different user calls
use it differently:

void user_1()
{
/*This user happens to always call execute() and submit() together.*/
receiver<Result> result1;
my_executor.make_value_task( my_function_object_1 ).submit(result1);

receiver<Result> result2;
my_executor.make_value_task( my_function_object_2 ).submit(result2);

std::cout << result1.get() << result2.get();
}


auto user_2()
{
auto sender = executor.make_value_task( my_function_object );
return sender; /*however this must hold a pointer to an actual buffer that
must have been already allocated somewhere*/
}

In user_1, when make_value_task() is called, the executor cannot know when
submit() will be called, so even if it decides it should run the function
locally, it doesn’t have a buffer where to write it - it has to get one
dynamically and put a unique_ptr to it inside the returned sender - in
submit the data will be passed to the receiver and the unique_ptr released.
Will the compiler be able to optimize away this alloc+dealloc?
Or else it could write the function call in the sender to be called in
submit - but then the sender has to become a more complicated type, because
the decision is only done at runtime, the sender type has to be able to
handle both asynchronic executions and local ones.

However, if you had a function like make_value_task_and_submit( func,
receiver ), the executor could optimize this case writing the result
directly to the given receiver.

void user_1()
{
//this user happens to always call execute() and submit() together:
receiver<Result> result1;
my_executor.make_value_task_and_submit( my_function_object_1,
submit(result1);

receiver<Result> result2;
my_executor.make_value_task_and_submit( my_function_object_2,
submit(result2);

std::cout << result1.get() << result2.get();
}

itaj

On Wednesday, 14 November 2018 05:50:48 UTC+2, Bryce Adelstein Lelbach
Post by Bryce Adelstein Lelbach aka wash
Rather, wg21.link/P1194
On Tue, Nov 13, 2018, 7:50 PM Bryce Adelstein Lelbach aka wash <
P1194
Post by Itaj Sherman
Post by Bryce Adelstein Lelbach aka wash
I don't think we should invest in std::future/std::please when we are
actively designing their replacement (senders and receivers).
Certainly. Something redesigned from scratch could answer my question
much better than trying to fix future/promise.
Basically what I was asking about is why don't we have a standard way to
notify/wake another thread without explicitly locking a mutex.
I wasn't connecting the dots before, but now I see that atomic_wait and
atomic_notify are actually equivalent to the linux futex.
Is there any written draft for these senders and receivers, or other
information?
Post by Bryce Adelstein Lelbach aka wash
Post by Itaj Sherman
I’m sorry, I should explain what I said in more details.
I wanted to say that libstdc++ does implement future/promise lock-free
in cases where they don’t have to block: promise.set_value() is always
lock-free. future.get()/wait() are lock-free if they strongly-happen-after
the return from set_value(). It is lock-free because they use a futex
rather than condition_variable. It’s also very fast because first they do a
fast path test of an atomic load, and then compare_exchange, and only then
use the futex.
The futex_wake, unlike mutex+condition_variable notify, enables the
set_value() to be lock-free, even if a thread does actually need to waken a
thread on the futex that is in future.get(). futex_wake does not lock a
mutex, it only uses atomic spinlocks in kernel mode with disabled
preemption.
I also wanted to mention that they implement call_once with a fast
path lock-free - that is, any call_once that doesn’t run concurrently with
other call_once calls is lock-free, whether it is an active or passive one.
By not-concurrently I mean all other call_once calls either
strongly-happen-before or after it.
The fact that they also use call_once inside promise.set_value() is
not directly related the issue of future/promise.
boost and MSVC implementations use mutex+condition_variable. They
always lock the mutex on every set_value() and every future.get(). Even if
they added a fast path atomic variable before it, they would still have to
lock the mutex in set_value() around the call to notify, in case
there are threads waiting.
Post by Itaj Sherman
libstdc++ promise/future uses std::call_once that uses pthread_once
which is implemented like that - fast path atomic and then futex
boost.thread is mutex+condition_variable.
MSVC 2017 is mutex+condition_variable.
Maybe I lack some knowledge, but I can't figure out why they don't do
a fast path. How could it ever be better to just always lock a mutex in
set_value?!
So, I suppose my question is more about whether there's a specific
reason that the standard doesn't mention it or provides some traits - or it
just wasn't important enough for anybody to suggest.
e.g. I read somewhere that they had specific reasons not to include
semaphores in the standard because they're complicated/risky to use. And I
can understand that, because in MPMC they cannot provide memory ordering
for the data they're supposed to protect (like if you take a lock-free
queue and couple it with a semaphore to implement a blocking pop). But
future/promise protects its result safely with release/acquire.
Post by Tony V E
Do you think it is currently possible for an implementation to make
it lock-free in the way you suggest?‎ ie without changing the current API?
If so, do you know if the implementation you use already does this?
Sent from my BlackBerry portable Babbage Device
*From: *Itaj Sherman
*Sent: *Saturday, November 10, 2018 4:58 PM
*To: *ISO C++ Standard - Discussion
*Subject: *[std-discussion] Re: std::promise and future could have
some lock-free or non-blocking properties
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future
be implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself
using more primitive blocks from that standard library - such do not
currently exist.
E.g. it's not possible to just use atomics+mutex+condition_variable
to implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow,
otherwise I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a
mutex and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard
c++, a semaphore (like linux) that would have a lock-free post(), and
lock-free try_wait().
To clarify further the reason why I ask this,
One could implement such promise/future by doing atomic
compare_exchange on a flag first - promise::set_value() would mark it
completed, future::get/wait would mark that some threads are waiting on the
future. Then only if they have to they use the scheduler for blocking and
notifying (futex on linux, Event on Windows).
On linux it’s possible to use directly sem_post/wait because it
implements the user mode compare_exchange fast path.
Presumably most systems would have support for some mechanism that
would enable this.
However, mutex+condition_variable cannot be used for this because
they don’t remember the notify call if the thread that is going to wait did
not enter the wait yet - or else they force you to always lock the mutex on
the notify side.
So my question boils down to: Why does the standard on one hand
bother to define mechanisms like mutex and condition_variable that have
these weaknesses, but on the other hand does not define semaphore, and even
in promise/future avoids mentioning the possibility of lockfree fast path.
itaj
--
---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
.
Visit this group at
https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Jesse Towner
2018-11-15 19:49:03 UTC
Permalink
I know from experience that it's possible to implement most operations on
future/promise in such a way that they do no block, including more advanced
features like cancellation, continuations, etc. Only future::get()/wait()
are potentially blocking calls. This is achieved by using what's known as
an eventcount[1] instead of a mutex + condition_variable. It's basically a
futex or semaphore fifo waitset and an atomic event or generation counter.
Using a 64-bit atomic integer, you can encode the event counter in the
upper 32-bits and use the lower 32-bits for asynchronous state, future
status, cancellation, forked child task or continuation dependency
counters, etc. Appropriate memory barriers on this atomic variable will
ensure proper visibility of the future value or error state on all threads.
If needed, you can split it into two 32-bit atomic variables so that you
can use a Linux futex or Windows keyed event, but you'll need an additional
store-load memory barrier with atomic_thread_fence(memory_order_seq_cst).

In my micro-benchmarks on x86/x86-64 and ARM/AArch64 devices, I've achieved
a factor of 7x performance improvement in asynchronous throughput over the
libc++ and MSVC++ implementation, 3-4x improvement over the libstdc++
implementation, and 2-3x improvement over Microsoft's PPL tasks. Although,
I also use a specialized memory allocation scheme for the internal
asynchronous state of the future/promise that utilizes thread-local caches
and lock-oscillation.

All said, senders and receivers definitely look like a better approach from
a design perspective, however it would be nice if library implementors
would invest in something akin to a futex or eventcount instead of
condition_variable for boundary cases where blocking/waiting is required. I
imagine that P0514 could be implemented with a shared bank of futexes or
eventcounts instead of binary_semaphores, to guarantee thread fairness.

[1] David P.Reed, Rajendra K.Kanodia (1979) Synchronization with
eventcounts and sequencers, Communications of the ACM, v.22 n.2, p.115-123,
February 1979
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations, because
there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after promise::set_value()
returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise I
guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a mutex
and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-15 20:29:33 UTC
Permalink
Interesting.
Well, the improvement over MSVC is clear, because it uses
condition_variable without fast path atomic, and thus locks the mutex on
every set_value() get() and wait().

I don't understand what exactly you mean by "eventcount". Is it a futex
like concept or a specific implementation?
From your description anyway it seems similar to linux usage of an atomic
counter + futex_wake + futex_wake.
I expect that the intention of std::atomic_wait/notify is to be implemented
by just like that by futex on linux, and by similar mechanisms elsewhere.

Anyway, I suppose that "eventcount" should have the same performance as
such futex usage, therefor:
The improvement over libstdc++ is less obvious, because they do use a fast
path atomic, over a futex.
But, they do it twice inside set_value() because they also use call_once()
for the construction. Doing that assures that promise.set_value() can be
called concurrently from many threads, and the first one wins - but this is
not required by the standard AFAIK.
Did your implementation support allowed calls of promise.set_value()?
As you say, possibly your improved allocation of the shared state made that
difference.

thanks for the info
itaj
Post by Jesse Towner
I know from experience that it's possible to implement most operations on
future/promise in such a way that they do no block, including more advanced
features like cancellation, continuations, etc. Only future::get()/wait()
are potentially blocking calls. This is achieved by using what's known as
an eventcount[1] instead of a mutex + condition_variable. It's basically a
futex or semaphore fifo waitset and an atomic event or generation counter.
Using a 64-bit atomic integer, you can encode the event counter in the
upper 32-bits and use the lower 32-bits for asynchronous state, future
status, cancellation, forked child task or continuation dependency
counters, etc. Appropriate memory barriers on this atomic variable will
ensure proper visibility of the future value or error state on all threads.
If needed, you can split it into two 32-bit atomic variables so that you
can use a Linux futex or Windows keyed event, but you'll need an additional
store-load memory barrier with atomic_thread_fence(memory_order_seq_cst).
In my micro-benchmarks on x86/x86-64 and ARM/AArch64 devices, I've
achieved a factor of 7x performance improvement in asynchronous throughput
over the libc++ and MSVC++ implementation, 3-4x improvement over the
libstdc++ implementation, and 2-3x improvement over Microsoft's PPL tasks.
Although, I also use a specialized memory allocation scheme for the
internal asynchronous state of the future/promise that utilizes
thread-local caches and lock-oscillation.
All said, senders and receivers definitely look like a better approach
from a design perspective, however it would be nice if library implementors
would invest in something akin to a futex or eventcount instead of
condition_variable for boundary cases where blocking/waiting is required. I
imagine that P0514 could be implemented with a shared bank of futexes or
eventcounts instead of binary_semaphores, to guarantee thread fairness.
[1] David P.Reed, Rajendra K.Kanodia (1979) Synchronization with
eventcounts and sequencers, Communications of the ACM, v.22 n.2, p.115-123,
February 1979
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after promise::set_value()
returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise I
guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a mutex
and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-15 20:31:55 UTC
Permalink
of course, I meant
Post by Itaj Sherman
Interesting.
Well, the improvement over MSVC is clear, because it uses
condition_variable without fast path atomic, and thus locks the mutex on
every set_value() get() and wait().
I don't understand what exactly you mean by "eventcount". Is it a futex
like concept or a specific implementation?
From your description anyway it seems similar to linux usage of an atomic
counter + futex_wake + futex_wake.
I expect that the intention of std::atomic_wait/notify is to be
implemented by just like that by futex on linux, and by similar mechanisms
elsewhere.
Anyway, I suppose that "eventcount" should have the same performance as
The improvement over libstdc++ is less obvious, because they do use a fast
path atomic, over a futex.
But, they do it twice inside set_value() because they also use call_once()
for the construction. Doing that assures that promise.set_value() can be
called concurrently from many threads, and the first one wins - but this is
not required by the standard AFAIK.
Did your implementation support allowed calls of promise.set_value()?
I meant to ask: did your implementation allowed concurrent calls of
promise.set_value()? (such as the first one wins)
Post by Itaj Sherman
As you say, possibly your improved allocation of the shared state made
that difference.
thanks for the info
itaj
Post by Jesse Towner
I know from experience that it's possible to implement most operations on
future/promise in such a way that they do no block, including more advanced
features like cancellation, continuations, etc. Only future::get()/wait()
are potentially blocking calls. This is achieved by using what's known as
an eventcount[1] instead of a mutex + condition_variable. It's basically a
futex or semaphore fifo waitset and an atomic event or generation counter.
Using a 64-bit atomic integer, you can encode the event counter in the
upper 32-bits and use the lower 32-bits for asynchronous state, future
status, cancellation, forked child task or continuation dependency
counters, etc. Appropriate memory barriers on this atomic variable will
ensure proper visibility of the future value or error state on all threads.
If needed, you can split it into two 32-bit atomic variables so that you
can use a Linux futex or Windows keyed event, but you'll need an additional
store-load memory barrier with atomic_thread_fence(memory_order_seq_cst).
In my micro-benchmarks on x86/x86-64 and ARM/AArch64 devices, I've
achieved a factor of 7x performance improvement in asynchronous throughput
over the libc++ and MSVC++ implementation, 3-4x improvement over the
libstdc++ implementation, and 2-3x improvement over Microsoft's PPL tasks.
Although, I also use a specialized memory allocation scheme for the
internal asynchronous state of the future/promise that utilizes
thread-local caches and lock-oscillation.
All said, senders and receivers definitely look like a better approach
from a design perspective, however it would be nice if library implementors
would invest in something akin to a futex or eventcount instead of
condition_variable for boundary cases where blocking/waiting is required. I
imagine that P0514 could be implemented with a shared bank of futexes or
eventcounts instead of binary_semaphores, to guarantee thread fairness.
[1] David P.Reed, Rajendra K.Kanodia (1979) Synchronization with
eventcounts and sequencers, Communications of the ACM, v.22 n.2, p.115-123,
February 1979
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise
I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a mutex
and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
j***@smokingguninc.com
2018-11-15 21:14:51 UTC
Permalink
Yeah, an eventcount is similar to a futex, at least in terms of its
performance characteristics, but has slightly different semantics. See the
paper I cited in my previous post for the details. You can implement the
eventcount on top of a futex or with a semaphore waitset. And just like you
can implement a mutex or a condition_variable with a futex, you can also
implement them with an eventcount. The benefits of the eventcount
implemented with a semaphore waitset over a futex is that it is more
portable as it only requires that the operating system provide you with a
binary semaphore, you can use atomic variables larger than 32-bits (both
Linux futex and Windows keyed events only allow 32-bit values), and you
have more options with how you wake up threads (FIFO, LIFO, or fine-grained
wakeup which is great for NUMA systems).

I imagine part of the performance difference with libstdc++ is because I'm
using a single 64-bit atomic variable and not a futex + once_flag +
atomic_flag (for the retrieved flag). I've done away with need to track the
retrieved state by implementing something similar to promise_contract_t
from P1054. I don't make any syscalls on my fast paths, only when there are
potential waiters are any syscalls made, so nothing equivalent to a
futex_wait happens in most cases. The store-load memory barrier is only
required when using a dual-value eventcount to ensure proper ordering of
operations with the event counter. Like you surmise, the other part of the
performance difference is the custom memory management I'm doing rather
than using the freestore.
Post by Itaj Sherman
Interesting.
Well, the improvement over MSVC is clear, because it uses
condition_variable without fast path atomic, and thus locks the mutex on
every set_value() get() and wait().
I don't understand what exactly you mean by "eventcount". Is it a futex
like concept or a specific implementation?
From your description anyway it seems similar to linux usage of an atomic
counter + futex_wake + futex_wake.
I expect that the intention of std::atomic_wait/notify is to be
implemented by just like that by futex on linux, and by similar mechanisms
elsewhere.
Anyway, I suppose that "eventcount" should have the same performance as
The improvement over libstdc++ is less obvious, because they do use a fast
path atomic, over a futex.
But, they do it twice inside set_value() because they also use call_once()
for the construction. Doing that assures that promise.set_value() can be
called concurrently from many threads, and the first one wins - but this is
not required by the standard AFAIK.
Did your implementation support allowed calls of promise.set_value()?
As you say, possibly your improved allocation of the shared state made
that difference.
thanks for the info
itaj
Post by Jesse Towner
I know from experience that it's possible to implement most operations on
future/promise in such a way that they do no block, including more advanced
features like cancellation, continuations, etc. Only future::get()/wait()
are potentially blocking calls. This is achieved by using what's known as
an eventcount[1] instead of a mutex + condition_variable. It's basically a
futex or semaphore fifo waitset and an atomic event or generation counter.
Using a 64-bit atomic integer, you can encode the event counter in the
upper 32-bits and use the lower 32-bits for asynchronous state, future
status, cancellation, forked child task or continuation dependency
counters, etc. Appropriate memory barriers on this atomic variable will
ensure proper visibility of the future value or error state on all threads.
If needed, you can split it into two 32-bit atomic variables so that you
can use a Linux futex or Windows keyed event, but you'll need an additional
store-load memory barrier with atomic_thread_fence(memory_order_seq_cst).
In my micro-benchmarks on x86/x86-64 and ARM/AArch64 devices, I've
achieved a factor of 7x performance improvement in asynchronous throughput
over the libc++ and MSVC++ implementation, 3-4x improvement over the
libstdc++ implementation, and 2-3x improvement over Microsoft's PPL tasks.
Although, I also use a specialized memory allocation scheme for the
internal asynchronous state of the future/promise that utilizes
thread-local caches and lock-oscillation.
All said, senders and receivers definitely look like a better approach
from a design perspective, however it would be nice if library implementors
would invest in something akin to a futex or eventcount instead of
condition_variable for boundary cases where blocking/waiting is required. I
imagine that P0514 could be implemented with a shared bank of futexes or
eventcounts instead of binary_semaphores, to guarantee thread fairness.
[1] David P.Reed, Rajendra K.Kanodia (1979) Synchronization with
eventcounts and sequencers, Communications of the ACM, v.22 n.2, p.115-123,
February 1979
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise
I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a mutex
and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-15 23:00:19 UTC
Permalink
Post by j***@smokingguninc.com
Yeah, an eventcount is similar to a futex, at least in terms of its
performance characteristics, but has slightly different semantics. See the
paper I cited in my previous post for the details. You can implement the
eventcount on top of a futex or with a semaphore waitset. And just like you
can implement a mutex or a condition_variable with a futex, you can also
implement them with an eventcount. The benefits of the eventcount
implemented with a semaphore waitset over a futex is that it is more
portable as it only requires that the operating system provide you with a
binary semaphore, you can use atomic variables larger than 32-bits (both
Linux futex and Windows keyed events only allow 32-bit values), and you
have more options with how you wake up threads (FIFO, LIFO, or fine-grained
wakeup which is great for NUMA systems).
I imagine part of the performance difference with libstdc++ is because I'm
using a single 64-bit atomic variable and not a futex + once_flag +
atomic_flag (for the retrieved flag). I've done away with need to track the
retrieved state by implementing something similar to promise_contract_t
from P1054. I don't make any syscalls on my fast paths, only when there are
potential waiters are any syscalls made, so nothing equivalent to a
futex_wait happens in most cases. The store-load memory barrier is only
required when using a dual-value eventcount to ensure proper ordering of
operations with the event counter. Like you surmise, the other part of the
performance difference is the custom memory management I'm doing rather
than using the freestore.
libstdc++ also does a fast path atomic release/acquire - and only if that
fails uses the kernel futex. But it does always allocate the result buffer
and shared-state on the heap.
In the fast path of set_value() they cas two atomics instead of one, first
they write the value using std::call_once(), then the second to notify the
future.

I don't understand your point about not tracking the retrieved state.
Either the future and promise share-own the state (much like shared_ptr),
and the last one destroys it.
Or if only one of the owns the state, and happens to get destructed before
the other - what happens then? You need some RCU? Were they both movable?
Post by j***@smokingguninc.com
Post by Itaj Sherman
Interesting.
Well, the improvement over MSVC is clear, because it uses
condition_variable without fast path atomic, and thus locks the mutex on
every set_value() get() and wait().
I don't understand what exactly you mean by "eventcount". Is it a futex
like concept or a specific implementation?
From your description anyway it seems similar to linux usage of an atomic
counter + futex_wake + futex_wake.
I expect that the intention of std::atomic_wait/notify is to be
implemented by just like that by futex on linux, and by similar mechanisms
elsewhere.
Anyway, I suppose that "eventcount" should have the same performance as
The improvement over libstdc++ is less obvious, because they do use a
fast path atomic, over a futex.
But, they do it twice inside set_value() because they also use
call_once() for the construction. Doing that assures that
promise.set_value() can be called concurrently from many threads, and the
first one wins - but this is not required by the standard AFAIK.
Did your implementation support allowed calls of promise.set_value()?
As you say, possibly your improved allocation of the shared state made
that difference.
thanks for the info
itaj
Post by Jesse Towner
I know from experience that it's possible to implement most operations
on future/promise in such a way that they do no block, including more
advanced features like cancellation, continuations, etc. Only
future::get()/wait() are potentially blocking calls. This is achieved by
using what's known as an eventcount[1] instead of a mutex +
condition_variable. It's basically a futex or semaphore fifo waitset and an
atomic event or generation counter. Using a 64-bit atomic integer, you can
encode the event counter in the upper 32-bits and use the lower 32-bits for
asynchronous state, future status, cancellation, forked child task or
continuation dependency counters, etc. Appropriate memory barriers on this
atomic variable will ensure proper visibility of the future value or error
state on all threads. If needed, you can split it into two 32-bit atomic
variables so that you can use a Linux futex or Windows keyed event, but
you'll need an additional store-load memory barrier with
atomic_thread_fence(memory_order_seq_cst).
In my micro-benchmarks on x86/x86-64 and ARM/AArch64 devices, I've
achieved a factor of 7x performance improvement in asynchronous throughput
over the libc++ and MSVC++ implementation, 3-4x improvement over the
libstdc++ implementation, and 2-3x improvement over Microsoft's PPL tasks.
Although, I also use a specialized memory allocation scheme for the
internal asynchronous state of the future/promise that utilizes
thread-local caches and lock-oscillation.
All said, senders and receivers definitely look like a better approach
from a design perspective, however it would be nice if library implementors
would invest in something akin to a futex or eventcount instead of
condition_variable for boundary cases where blocking/waiting is required. I
imagine that P0514 could be implemented with a shared bank of futexes or
eventcounts instead of binary_semaphores, to guarantee thread fairness.
[1] David P.Reed, Rajendra K.Kanodia (1979) Synchronization with
eventcounts and sequencers, Communications of the ACM, v.22 n.2, p.115-123,
February 1979
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after
promise::set_value() returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise
I guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a
mutex and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Itaj Sherman
2018-11-15 20:48:03 UTC
Permalink
Another question: Why do you need memory_order_seq_cst?
Isn't it enough to use release in set_value() and acquire in get()?

That's the implementation of libstdc++, and also the standard says: 30.6.5/9
Calls to functions that successfully set the stored result of a shared
state synchronize with (6.8.2) calls to
functions successfully detecting the ready state resulting from that
setting. The storage of the result (whether
normal or exceptional) into the shared state synchronizes with (6.8.2) the
successful return from a call to a
waiting function on the shared state.
Post by Jesse Towner
I know from experience that it's possible to implement most operations on
future/promise in such a way that they do no block, including more advanced
features like cancellation, continuations, etc. Only future::get()/wait()
are potentially blocking calls. This is achieved by using what's known as
an eventcount[1] instead of a mutex + condition_variable. It's basically a
futex or semaphore fifo waitset and an atomic event or generation counter.
Using a 64-bit atomic integer, you can encode the event counter in the
upper 32-bits and use the lower 32-bits for asynchronous state, future
status, cancellation, forked child task or continuation dependency
counters, etc. Appropriate memory barriers on this atomic variable will
ensure proper visibility of the future value or error state on all threads.
If needed, you can split it into two 32-bit atomic variables so that you
can use a Linux futex or Windows keyed event, but you'll need an additional
store-load memory barrier with atomic_thread_fence(memory_order_seq_cst).
In my micro-benchmarks on x86/x86-64 and ARM/AArch64 devices, I've
achieved a factor of 7x performance improvement in asynchronous throughput
over the libc++ and MSVC++ implementation, 3-4x improvement over the
libstdc++ implementation, and 2-3x improvement over Microsoft's PPL tasks.
Although, I also use a specialized memory allocation scheme for the
internal asynchronous state of the future/promise that utilizes
thread-local caches and lock-oscillation.
All said, senders and receivers definitely look like a better approach
from a design perspective, however it would be nice if library implementors
would invest in something akin to a futex or eventcount instead of
condition_variable for boundary cases where blocking/waiting is required. I
imagine that P0514 could be implemented with a shared bank of futexes or
eventcounts instead of binary_semaphores, to guarantee thread fairness.
[1] David P.Reed, Rajendra K.Kanodia (1979) Synchronization with
eventcounts and sequencers, Communications of the ACM, v.22 n.2, p.115-123,
February 1979
Post by Itaj Sherman
Like other atomics have is_lock_free property for some operations,
because there are expectations for such support on many systems,
I wonder what about std::promise/future?
lock-free promise::set_value()
lock-free future::valid()
if a future::get() invocation strongly-happens-after promise::set_value()
returned, it could be lock-free
Isn't it a reasonable expectation that the standard promise/future be
implemented this way on some systems?
And that I could check that with some is_lock_free property.
Or else, that I should be able to implement such things by myself using
more primitive blocks from that standard library - such do not currently
exist.
E.g. it's not possible to just use atomics+mutex+condition_variable to
implement such a "semi lockfree" promise/future, not without some busy
spinning in future.get() or somewhere.
I hope I'm not abusing the definition of "lock-free" somehow, otherwise I
guess I just mean non-blocking.
I'm especially concerned about threads in future.get() that lock a mutex
and happen to get preempted, then they cause other threads in
promise::set_value() to block too, unnecessarily.
BTW there's a somewhat similar problem implementing in standard c++, a
semaphore (like linux) that would have a lock-free post(), and lock-free
try_wait().
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+***@isocpp.org.
To post to this group, send email to std-***@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Continue reading on narkive:
Loading...