Discussion:
[PATCH net-next] epoll: add EPOLLEXCLUSIVE support
Hagen Paul Pfeifer
2012-02-14 20:48:04 UTC
Permalink
High performance server sometimes create one listening socket (e.g. port
80), create a epoll file descriptor and add the socket. Afterwards
create SC_NPROCESSORS_ONLN threads and wait for events. This often
result in a thundering herd problem because all CPUs are scheduled.

This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE.
If a descriptor is added with this flag only one CPU is scheduled in.

Signed-off-by: Hagen Paul Pfeifer <***@jauu.net>
Reported-by: Li Yu <***@gmail.com>
Cc: Davide Libenzi <***@xmailserver.org>
Cc: Eric Dumazet <***@gmail.com>
---
fs/eventpoll.c | 7 +++++--
include/linux/eventpoll.h | 3 +++
2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index aabdfc3..bb442b1 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -88,7 +88,7 @@
*/

/* Epoll private bits inside the event mask */
-#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET)
+#define EP_PRIVATE_BITS (EPOLLONESHOT | EPOLLET | EPOLLEXCLUSIVE)

/* Maximum number of nesting allowed inside epoll sets */
#define EP_MAX_NESTS 4
@@ -913,7 +913,10 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
pwq->whead = whead;
pwq->base = epi;
- add_wait_queue(whead, &pwq->wait);
+ if (unlikely(epi->event.events & EPOLLEXCLUSIVE))
+ add_wait_queue_exclusive(whead, &pwq->wait);
+ else
+ add_wait_queue(whead, &pwq->wait);
list_add_tail(&pwq->llink, &epi->pwqlist);
epi->nwait++;
} else {
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 657ab55..d334389 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -26,6 +26,9 @@
#define EPOLL_CTL_DEL 2
#define EPOLL_CTL_MOD 3

+/* Set Exclusive wake up behaviour for the target file descriptor */
+#define EPOLLEXCLUSIVE (1 << 29)
+
/* Set the One Shot behaviour for the target file descriptor */
#define EPOLLONESHOT (1 << 30)
--
1.7.9
Eric Dumazet
2012-02-14 21:06:15 UTC
Permalink
Le mardi 14 f=C3=A9vrier 2012 =C3=A0 21:48 +0100, Hagen Paul Pfeifer a =
High performance server sometimes create one listening socket (e.g. p=
ort
80), create a epoll file descriptor and add the socket. Afterwards
create SC_NPROCESSORS_ONLN threads and wait for events. This often
result in a thundering herd problem because all CPUs are scheduled.
=20
This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSI=
VE.
If a descriptor is added with this flag only one CPU is scheduled in.
=20
---
Seems pretty good to me.

Do you have some performance numbers to share ?
Hagen Paul Pfeifer
2012-02-14 21:38:23 UTC
Permalink
Post by Eric Dumazet
Seems pretty good to me.
Do you have some performance numbers to share ?
No, but I did some tests with one of my network performance tools. I imagine
that I can *construct* test-cases and add some 'perf stat cs:u' statistics.
IMHO it is not fair to present some artificial tunned performance numbers.
There are use-cases where EPOLLEXCLUSIVE can be really helpfull, yes I think
that this flag SHOULD be a userspace default. ;-)

Hagen

David Miller
2012-02-14 21:23:49 UTC
Permalink
From: Hagen Paul Pfeifer <***@jauu.net>
Date: Tue, 14 Feb 2012 21:48:04 +0100
Post by Hagen Paul Pfeifer
High performance server sometimes create one listening socket (e.g. port
80), create a epoll file descriptor and add the socket. Afterwards
create SC_NPROCESSORS_ONLN threads and wait for events. This often
result in a thundering herd problem because all CPUs are scheduled.
This patch add an additional flag to epoll_ctl(2) called EPOLLEXCLUSIVE.
If a descriptor is added with this flag only one CPU is scheduled in.
This is not a networking specific change and therefore should not
be submitted via my tree(s).
Loading...