Discussion:
qdisc running
Jamal Hadi Salim
2014-10-19 19:24:42 UTC
Permalink
Jesper,

You asked at the meeting the point to qdisc running.
Original intent is to allow only one cpu to enter the lower half of the
qdisc path. IOW, if one cpu was already in the qdisc then that guy
could be used to dequeue packets. i.e this is good for batching.
Original idea was Herbert's with major improvement from Eric
and a small one from me.

For history of different tried approaches look at:
Look at slide 2:
http://vger.kernel.org/netconf2011_slides/jamal_netconf2011.pdf
then download the **amazing** flash animations which describe
that history.
http://vger.kernel.org/netconf2011_slides/netconf-2011-flash.tgz

Follow the bullets in slide2 and map to the flash animations.

If you go over them, you'll see it is still needed.

I think someone oughta put those **amazing** animations on some
website;->

cheers,
jamal
Jesper Dangaard Brouer
2014-10-20 16:17:56 UTC
Permalink
Post by Jamal Hadi Salim
Jesper,
You asked at the meeting the point to qdisc running.
Talking about __QDISC___STATE_RUNNING see slide 9/16:
http://people.netfilter.org/hawk/presentations/LinuxPlumbers2014/performance_tx_qdisc_bulk_LPC2014.pdf
Post by Jamal Hadi Salim
Original intent is to allow only one cpu to enter the lower half of the
qdisc path. IOW, if one cpu was already in the qdisc then that guy
could be used to dequeue packets. i.e this is good for batching.
Original idea was Herbert's with major improvement from Eric
and a small one from me.
I guess it is good for our recent dequeue batching. But I think/hope we
can come up with a scheme that does not requires 6 lock/unlock
operations (as illustrated on slide 9).

John and I have talked about doing a lockless qdisc, but maintaining
this __QDISC___STATE_RUNNING in a lockless scenario, would cost us
extra atomic ops...

Are we still sure, that this model of only allowing a single CPU in the
dequeue path, is still the best solution? (The TXQ lock should already
protect several CPUs in this code path).
Post by Jamal Hadi Salim
http://vger.kernel.org/netconf2011_slides/jamal_netconf2011.pdf
I can see that you really needed the budget/fairness in the dequeue
loop, that we recently mangled with.
Post by Jamal Hadi Salim
then download the **amazing** flash animations which describe
that history.
http://vger.kernel.org/netconf2011_slides/netconf-2011-flash.tgz
Follow the bullets in slide2 and map to the flash animations.
What tool do I use to play these SWF files? (I tried VLC but no luck).
Post by Jamal Hadi Salim
If you go over them, you'll see it is still needed.
Too bad, I would like to avoid the second
Post by Jamal Hadi Salim
I think someone oughta put those **amazing** animations on some
website;->
I hope someone else can pick that up ;-)
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
Jamal Hadi Salim
2014-10-20 22:17:47 UTC
Permalink
Post by Jesper Dangaard Brouer
I guess it is good for our recent dequeue batching.
It is i think ;->
Post by Jesper Dangaard Brouer
But I think/hope we
can come up with a scheme that does not requires 6 lock/unlock
operations (as illustrated on slide 9).
To be clear:
2 locks + 2 unlock and 2 atomic ops.
Post by Jesper Dangaard Brouer
John and I have talked about doing a lockless qdisc, but maintaining
this __QDISC___STATE_RUNNING in a lockless scenario, would cost us
extra atomic ops...
In the animation this __QDISC___STATE_RUNNING is shown as "occupied"
flag. It is like someone is in the toilet and you cant come in;->
They have to finish dropping the packages into the toilet^Whardware ;->
If it is occupied, you put your package outside and go.
Post by Jesper Dangaard Brouer
Are we still sure, that this model of only allowing a single CPU in the
dequeue path, is still the best solution?
For sure it is the best if you want to batch. Look at that last orange
guy picking all the packages (busylock.swf). This is where all the
batching would happen.
Post by Jesper Dangaard Brouer
(The TXQ lock should already
protect several CPUs in this code path).
Note:
Maybe for the orange guy (the dequeur) the tx lock could
be avoided? Double check the code. Important to note under
busy period contention is reduced to :
1 lock + 1 unlock + 2 atomic ops for N-1 CPUs.
The orange guy on the other hand is doing 2 lock/unlock.
Post by Jesper Dangaard Brouer
I can see that you really needed the budget/fairness in the dequeue
loop, that we recently mangled with.
Yes, fairness is needed so the orange guy doesnt spend all his cycles
doing all the work (that was the basis of my presentation); unless
that is not an issue and the scheduler would move things away from
that cpu.
Post by Jesper Dangaard Brouer
What tool do I use to play these SWF files? (I tried VLC but no luck).
Firefox should work fine.

cheers,
jamal

Loading...