Discussion:
2.6.23-rc4-mm1: e1000e napi lockup
Jiri Slaby
2007-09-07 07:19:30 UTC
Permalink
Hi,

I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
napi_disable(&adapter->napi) in e1000_probe freezes the kernel on boot.

regards,
--
Jiri Slaby (***@gmail.com)
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller
2007-09-07 08:03:38 UTC
Permalink
From: Jiri Slaby <***@gmail.com>
Date: Fri, 07 Sep 2007 09:19:30 +0200
Post by Jiri Slaby
I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
napi_disable(&adapter->napi) in e1000_probe freezes the kernel on boot.
Yes, the semantics changed slightly in the net-2.6.24 tree the
other week and someone needs to fix it up.

The netif_napi_add() implicitly does a napi_disable() call. Device
open must explicitly napi_enable() and device close must explicitly
napi_disable(), and if done elsewhere these calls must be strictly
balanced.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Kok, Auke
2007-09-07 16:24:33 UTC
Permalink
Post by David Miller
Date: Fri, 07 Sep 2007 09:19:30 +0200
Post by Jiri Slaby
I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
napi_disable(&adapter->napi) in e1000_probe freezes the kernel on boot.
Yes, the semantics changed slightly in the net-2.6.24 tree the
other week and someone needs to fix it up.
The netif_napi_add() implicitly does a napi_disable() call. Device
open must explicitly napi_enable() and device close must explicitly
napi_disable(), and if done elsewhere these calls must be strictly
balanced.
I'll fix it... it's my patch that adds the new napi code to it and I need to get
it ready for the merge window anyway.

thanks for testing.

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jeff Garzik
2007-09-07 23:31:58 UTC
Permalink
Post by Kok, Auke
Post by David Miller
Date: Fri, 07 Sep 2007 09:19:30 +0200
Post by Jiri Slaby
I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
napi_disable(&adapter->napi) in e1000_probe freezes the kernel on boot.
Yes, the semantics changed slightly in the net-2.6.24 tree the
other week and someone needs to fix it up.
The netif_napi_add() implicitly does a napi_disable() call. Device
open must explicitly napi_enable() and device close must explicitly
napi_disable(), and if done elsewhere these calls must be strictly
balanced.
I'll fix it... it's my patch that adds the new napi code to it and I
need to get it ready for the merge window anyway.
well.... since its close to the merge window opening, we could see what
happens if DaveM pulls branch 'upstream' of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git

That should make this class of pre-merge-window annoyance go away.

Jeff



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
Kok, Auke
2007-09-07 23:40:14 UTC
Permalink
Post by Jeff Garzik
Post by Kok, Auke
Post by David Miller
Date: Fri, 07 Sep 2007 09:19:30 +0200
Post by Jiri Slaby
I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
napi_disable(&adapter->napi) in e1000_probe freezes the kernel on boot.
Yes, the semantics changed slightly in the net-2.6.24 tree the
other week and someone needs to fix it up.
The netif_napi_add() implicitly does a napi_disable() call. Device
open must explicitly napi_enable() and device close must explicitly
napi_disable(), and if done elsewhere these calls must be strictly
balanced.
I'll fix it... it's my patch that adds the new napi code to it and I
need to get it ready for the merge window anyway.
well.... since its close to the merge window opening, we could see what
happens if DaveM pulls branch 'upstream' of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
That should make this class of pre-merge-window annoyance go away.
If I do that now I get a big merge conflict:

$ git-pull . net-2.6.24
100% (22583/22583) done
Removed Documentation/networking/NAPI_HOWTO.txt
Auto-merged drivers/net/8139cp.c
Auto-merged drivers/net/8139too.c
CONFLICT (content): Merge conflict in drivers/net/8139too.c
Auto-merged drivers/net/Kconfig
Auto-merged drivers/net/Makefile
Auto-merged drivers/net/cxgb3/cxgb3_main.c
Auto-merged drivers/net/cxgb3/sge.c
Auto-merged drivers/net/fs_enet/fs_enet-main.c
Auto-merged drivers/net/gianfar.h
Auto-merged drivers/net/ibmveth.c
CONFLICT (content): Merge conflict in drivers/net/ibmveth.c
Auto-merged drivers/net/ibmveth.h
Auto-merged drivers/net/myri10ge/myri10ge.c
Auto-merged drivers/net/netxen/netxen_nic_main.c
Auto-merged drivers/net/pasemi_mac.c
CONFLICT (content): Merge conflict in drivers/net/pasemi_mac.c
Auto-merged drivers/net/pasemi_mac.h
Auto-merged drivers/net/pcnet32.c
Auto-merged drivers/net/ps3_gelic_net.c
Auto-merged drivers/net/qla3xxx.c
Auto-merged drivers/net/s2io.c
CONFLICT (content): Merge conflict in drivers/net/s2io.c
Auto-merged drivers/net/s2io.h
Auto-merged drivers/net/sb1250-mac.c
Auto-merged drivers/net/sky2.c
Auto-merged drivers/net/sky2.h
Auto-merged drivers/net/tsi108_eth.c
Auto-merged drivers/net/wireless/rtl8187_dev.c
Auto-merged drivers/net/xen-netfront.c
Removed include/net/tcp_ecn.h
Automatic merge failed; fix conflicts and then commit the result.


for e1000e the fixup is just a 1-line change from the previous -mm napi fixup
patch, so I don't really care (it's peanuts), but people might want to start
looking into the above conflicts ;)

Cheers,

Auke

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
Jeff Garzik
2007-09-07 23:49:37 UTC
Permalink
Post by Jeff Garzik
Post by Kok, Auke
Post by David Miller
Date: Fri, 07 Sep 2007 09:19:30 +0200
Post by Jiri Slaby
I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
napi_disable(&adapter->napi) in e1000_probe freezes the kernel on boot.
Yes, the semantics changed slightly in the net-2.6.24 tree the
other week and someone needs to fix it up.
The netif_napi_add() implicitly does a napi_disable() call. Device
open must explicitly napi_enable() and device close must explicitly
napi_disable(), and if done elsewhere these calls must be strictly
balanced.
I'll fix it... it's my patch that adds the new napi code to it and I
need to get it ready for the merge window anyway.
well.... since its close to the merge window opening, we could see
what happens if DaveM pulls branch 'upstream' of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
That should make this class of pre-merge-window annoyance go away.
oh you are _guaranteed_ conflicts. most of that is NAPI-area code that
got changed by both.



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
Kok, Auke
2007-09-07 23:52:54 UTC
Permalink
Post by Jeff Garzik
Post by Jeff Garzik
Post by Kok, Auke
Post by David Miller
Date: Fri, 07 Sep 2007 09:19:30 +0200
Post by Jiri Slaby
I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
napi_disable(&adapter->napi) in e1000_probe freezes the kernel on boot.
Yes, the semantics changed slightly in the net-2.6.24 tree the
other week and someone needs to fix it up.
The netif_napi_add() implicitly does a napi_disable() call. Device
open must explicitly napi_enable() and device close must explicitly
napi_disable(), and if done elsewhere these calls must be strictly
balanced.
I'll fix it... it's my patch that adds the new napi code to it and I
need to get it ready for the merge window anyway.
well.... since its close to the merge window opening, we could see
what happens if DaveM pulls branch 'upstream' of
git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/netdev-2.6.git
That should make this class of pre-merge-window annoyance go away.
oh you are _guaranteed_ conflicts. most of that is NAPI-area code that
got changed by both.
actually that's the only thing it was, and fixing it up was trivial (took me
about 3 minutes). it was 3x the napi code and once a struct indent change...

I'll have a new e1000e napi patch for andrew in a sec.

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jiri Slaby
2007-09-09 09:58:17 UTC
Permalink
Post by Jiri Slaby
Hi,
I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
napi_disable(&adapter->napi) in e1000_probe freezes the kernel on boot.
Ok, after these changes:
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index c1c64e2..f8ec537 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
@@ -1693,10 +1693,7 @@ quit_polling:
if (adapter->itr_setting & 3)
e1000_set_itr(adapter);
netif_rx_complete(poll_dev, napi);
- if (test_bit(__E1000_DOWN, &adapter->state))
- atomic_dec(&adapter->irq_sem);
- else
- e1000_irq_enable(adapter);
+ e1000_irq_enable(adapter);
return 0;
}

@@ -4257,7 +4254,6 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
/* tell the stack to leave us alone until e1000_open() is called */
netif_carrier_off(netdev);
netif_stop_queue(netdev);
- napi_disable(&adapter->napi);

strcpy(netdev->name, "eth%d");
err = register_netdev(netdev);


I still have problems with the driver. When I do `ip link set eth0 up', ksoftirq
runs with 100 % cpu time, so I think you endlessly re-schedule some timer (or
the new napi layer?)

regards,
--
Jiri Slaby (***@gmail.com)
Faculty of Informatics, Masaryk University
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Kok, Auke
2007-09-09 22:50:38 UTC
Permalink
Post by Jiri Slaby
Post by Jiri Slaby
Hi,
I found a regression in 2.6.23-rc4-mm1 (since -rc3-mm1) in e1000e driver.
napi_disable(&adapter->napi) in e1000_probe freezes the kernel on boot.
diff --git a/drivers/net/e1000e/netdev.c b/drivers/net/e1000e/netdev.c
index c1c64e2..f8ec537 100644
--- a/drivers/net/e1000e/netdev.c
+++ b/drivers/net/e1000e/netdev.c
if (adapter->itr_setting & 3)
e1000_set_itr(adapter);
netif_rx_complete(poll_dev, napi);
- if (test_bit(__E1000_DOWN, &adapter->state))
- atomic_dec(&adapter->irq_sem);
- else
- e1000_irq_enable(adapter);
+ e1000_irq_enable(adapter);
return 0;
}
@@ -4257,7 +4254,6 @@ static int __devinit e1000_probe(struct pci_dev *pdev,
/* tell the stack to leave us alone until e1000_open() is called */
netif_carrier_off(netdev);
netif_stop_queue(netdev);
- napi_disable(&adapter->napi);
strcpy(netdev->name, "eth%d");
err = register_netdev(netdev);
I still have problems with the driver. When I do `ip link set eth0 up', ksoftirq
runs with 100 % cpu time, so I think you endlessly re-schedule some timer (or
the new napi layer?)
something changed in the logic and e1000e apparently does something wrong. I'll
look into it on monday and resubmit a fixup patch (see robert olsson's mail as
well discussing this issue)

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Jiri Slaby
2007-09-10 06:24:05 UTC
Permalink
Post by Kok, Auke
Post by Jiri Slaby
I still have problems with the driver. When I do `ip link set eth0 up', ksoftirq
runs with 100 % cpu time, so I think you endlessly re-schedule some timer (or
the new napi layer?)
something changed in the logic and e1000e apparently does something
wrong. I'll look into it on monday and resubmit a fixup patch (see
robert olsson's mail as well discussing this issue)
he's posted me a patch, but no change on my side :(.

Anyway, I'm going away of the box on monday (today). will be back on fri to
test patches :).

thanks,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Kok, Auke
2007-09-10 06:31:47 UTC
Permalink
Post by Jiri Slaby
Post by Kok, Auke
Post by Jiri Slaby
I still have problems with the driver. When I do `ip link set eth0 up', ksoftirq
runs with 100 % cpu time, so I think you endlessly re-schedule some timer (or
the new napi layer?)
something changed in the logic and e1000e apparently does something
wrong. I'll look into it on monday and resubmit a fixup patch (see
robert olsson's mail as well discussing this issue)
he's posted me a patch, but no change on my side :(.
that's what I feared/thought as well when I saw his patch - e1000e and e1000 are
slightly different so the exit condition will be as well.
Post by Jiri Slaby
Anyway, I'm going away of the box on monday (today). will be back on fri to
test patches :).
I'll be on top of it. Since e1000e will be merged I need to get this workign
properly before the merge window opens - no pressure :)

Auke
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...