Discussion:
2.6.25-rc2-mm1 - several bugs and a crash
(too old to reply)
Tilman Schmidt
2008-02-20 21:14:20 UTC
Permalink
Ok, so here's my story on 2.6.25-rc2-mm1:

Built fine on my Pentium D in 32 bit mode, booted too, although
complaining once already while unpacking the initramfs:

<0>[ 0.069176] BUG: spinlock bad magic on CPU#0, swapper/0
<0>[ 0.069324] lock: c2c19480, .magic: 00000000, .owner: swapper/0, .owner_cpu: 0
<4>[ 0.069559] Pid: 0, comm: swapper Not tainted 2.6.25-rc2-mm1-testing #1
<4>[ 0.069710] [spin_bug+129/140] spin_bug+0x81/0x8c
<4>[ 0.069907] [_raw_spin_unlock+30/118] _raw_spin_unlock+0x1e/0x76
<4>[ 0.069997] [_spin_unlock+34/65] _spin_unlock+0x22/0x41
<4>[ 0.070194] [mnt_want_write+103/138] mnt_want_write+0x67/0x8a
<4>[ 0.070390] [sys_mkdirat+139/219] sys_mkdirat+0x8b/0xdb
<4>[ 0.070584] [clean_path+27/79] ? clean_path+0x1b/0x4f
<4>[ 0.070829] [trace_hardirqs_on+11/13] ? trace_hardirqs_on+0xb/0xd
<4>[ 0.071185] [sys_mkdir+21/23] sys_mkdir+0x15/0x17
<4>[ 0.071378] [do_name+279/440] do_name+0x117/0x1b8
<4>[ 0.071570] [write_buffer+34/49] write_buffer+0x22/0x31
<4>[ 0.071763] [flush_window+105/184] flush_window+0x69/0xb8
<4>[ 0.071996] [unpack_to_rootfs+1585/2238] unpack_to_rootfs+0x631/0x8be
<4>[ 0.072192] [trace_hardirqs_on_caller+248/301] ? trace_hardirqs_on_caller+0xf8/0x12d
<4>[ 0.072440] [restore_nocheck_notrace+0/16] ? restore_nocheck_notrace+0x0/0x10
<4>[ 0.072689] [populate_rootfs+37/270] populate_rootfs+0x25/0x10e
<4>[ 0.072886] [alternative_instructions+344/349] ? alternative_instructions+0x158/0x15d
<4>[ 0.073139] [start_kernel+840/858] start_kernel+0x348/0x35a
<4>[ 0.073335] =======================

Still, X came up fine, I could log in (Gnome feeling subjectively
a bit sluggish), call up a web page from the Internet in Firefox,
and start perusing the logs, when the whole system froze: neither
mouse nor keyboard would react anymore, and only the Wind^Wreset
button would put me back in control. After rebooting into the
previous, non-mm kernel I found this in the syslog:

Feb 20 17:22:40 xenon kernel: [ 48.180297] BUG: using smp_processor_id() in preemptible [00000000] code: ntpdate/3562
Feb 20 17:22:40 xenon kernel: [ 48.180297] caller is __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
Feb 20 17:22:40 xenon kernel: [ 48.180297] Pid: 3562, comm: ntpdate Not tainted 2.6.25-rc2-mm1-testing #1
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<c02015b9>] debug_smp_processor_id+0x99/0xb0
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<fac643a7>] __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<fac64410>] nf_conntrack_find_get+0x19/0x56 [nf_conntrack]
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<fac65427>] nf_conntrack_in+0x149/0x42a [nf_conntrack]
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<c0123f5a>] ? sub_preempt_count+0xa2/0xb0
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<c0310541>] ? _spin_unlock_irqrestore+0x5b/0x71
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<c0145a93>] ? trace_hardirqs_on+0xb/0xd
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<c0144441>] ? trace_hardirqs_off+0xb/0xd
Feb 20 17:22:40 xenon kernel: [ 48.180358] [<fad011cc>] ipv4_conntrack_local+0x55/0x5e [nf_conntrack_ipv4]
Feb 20 17:22:40 xenon kernel: [ 48.180387] [<c02ba0ac>] nf_iterate+0x34/0x69
Feb 20 17:22:40 xenon kernel: [ 48.180406] [<c02c17a8>] ? dst_output+0x0/0x10
Feb 20 17:22:40 xenon kernel: [ 48.180440] [<c02ba182>] nf_hook_slow+0x45/0xa2
Feb 20 17:22:40 xenon kernel: [ 48.180463] [<c02c17a8>] ? dst_output+0x0/0x10
Feb 20 17:22:40 xenon kernel: [ 48.180499] [<c02c2ada>] __ip_local_out+0x8c/0x96
Feb 20 17:22:40 xenon kernel: [ 48.180519] [<c02c17a8>] ? dst_output+0x0/0x10
Feb 20 17:22:40 xenon kernel: [ 48.180549] [<c02c2af4>] ip_local_out+0x10/0x20
Feb 20 17:22:40 xenon kernel: [ 48.180570] [<c02c2d5d>] ip_push_pending_frames+0x259/0x2d8
Feb 20 17:22:40 xenon kernel: [ 48.180606] [<c02dbe61>] udp_push_pending_frames+0x2ba/0x328
Feb 20 17:22:40 xenon kernel: [ 48.180653] [<c02dcf9d>] udp_sendmsg+0x456/0x583
Feb 20 17:22:40 xenon kernel: [ 48.180751] [<c02e2799>] inet_sendmsg+0x40/0x4d
Feb 20 17:22:40 xenon kernel: [ 48.180783] [<c02993d5>] sock_sendmsg+0xe3/0xfe
Feb 20 17:22:40 xenon kernel: [ 48.180840] [<c013a405>] ? autoremove_wake_function+0x0/0x35
Feb 20 17:22:40 xenon kernel: [ 48.180875] [<c0167f40>] ? find_lock_page+0x81/0x8a
Feb 20 17:22:40 xenon kernel: [ 48.180902] [<c0166ab9>] ? trace_preempt_on+0x1e/0xaf
Feb 20 17:22:40 xenon kernel: [ 48.180920] [<c011b696>] ? kunmap_atomic+0x89/0xa3
Feb 20 17:22:40 xenon kernel: [ 48.180945] [<c017408c>] ? __do_fault+0x31d/0x378
Feb 20 17:22:40 xenon kernel: [ 48.180974] [<c0123f5a>] ? sub_preempt_count+0xa2/0xb0
Feb 20 17:22:40 xenon kernel: [ 48.181001] [<c011b696>] ? kunmap_atomic+0x89/0xa3
Feb 20 17:22:40 xenon kernel: [ 48.181070] [<c0299d3e>] sys_sendto+0xa9/0xc8
Feb 20 17:22:40 xenon kernel: [ 48.181111] [<c017615b>] ? handle_mm_fault+0x327/0x793
Feb 20 17:22:40 xenon kernel: [ 48.181221] [<c029a4ea>] sys_socketcall+0x100/0x183
Feb 20 17:22:40 xenon kernel: [ 48.181269] [<c0106d00>] sysenter_past_esp+0x6d/0xb9
Feb 20 17:22:40 xenon kernel: [ 48.181341] =======================

This BUG repeated thrice almost identically for ntpdate, and then with
increasing frequency for avahi-daemon, nmbd, smbd and firefox-bin, up
to the point where the system froze.

Looks Netfilter related, CCing netfilter and netdev lists.
Config attached.

HTH
T.
--
Tilman Schmidt E-Mail: ***@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
Patrick McHardy
2008-02-20 21:50:09 UTC
Permalink
Post by Tilman Schmidt
Still, X came up fine, I could log in (Gnome feeling subjectively
a bit sluggish), call up a web page from the Internet in Firefox,
and start perusing the logs, when the whole system froze: neither
mouse nor keyboard would react anymore, and only the Wind^Wreset
button would put me back in control. After rebooting into the
Feb 20 17:22:40 xenon kernel: [ 48.180297] BUG: using smp_processor_id() in preemptible [00000000] code: ntpdate/3562
Feb 20 17:22:40 xenon kernel: [ 48.180297] caller is __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
Feb 20 17:22:40 xenon kernel: [ 48.180297] Pid: 3562, comm: ntpdate Not tainted 2.6.25-rc2-mm1-testing #1
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<c02015b9>] debug_smp_processor_id+0x99/0xb0
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<fac643a7>] __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
I guess the cause for this is a combination of preemtible
RCU and conntrack using RCU since 2.6.25-rc. Using
NF_CT_STAT_INC_ATOMIC should fix it, but I'd prefer
to have a fix that doesn't increase overhead when regular
RCU is used.

I'll see if I can find a better way to fix this tommorrow.

-
To unsubscribe from this list: send the line "unsubscribe netfilter" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy
2008-02-21 11:28:50 UTC
Permalink
Post by Tilman Schmidt
Still, X came up fine, I could log in (Gnome feeling subjectively
a bit sluggish), call up a web page from the Internet in Firefox,
and start perusing the logs, when the whole system froze: neither
mouse nor keyboard would react anymore, and only the Wind^Wreset
button would put me back in control. After rebooting into the
Feb 20 17:22:40 xenon kernel: [ 48.180297] BUG: using smp_processor_id() in preemptible [00000000] code: ntpdate/3562
Feb 20 17:22:40 xenon kernel: [ 48.180297] caller is __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
Feb 20 17:22:40 xenon kernel: [ 48.180297] Pid: 3562, comm: ntpdate Not tainted 2.6.25-rc2-mm1-testing #1
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<c02015b9>] debug_smp_processor_id+0x99/0xb0
Could you test whether this patch fixes the netfilter
warnings please?
Stephen Hemminger
2008-02-21 16:32:48 UTC
Permalink
On Thu, 21 Feb 2008 12:28:50 +0100
Post by Patrick McHardy
Post by Tilman Schmidt
Still, X came up fine, I could log in (Gnome feeling subjectively
a bit sluggish), call up a web page from the Internet in Firefox,
and start perusing the logs, when the whole system froze: neither
mouse nor keyboard would react anymore, and only the Wind^Wreset
button would put me back in control. After rebooting into the
Feb 20 17:22:40 xenon kernel: [ 48.180297] BUG: using smp_processor_id() in preemptible [00000000] code: ntpdate/3562
Feb 20 17:22:40 xenon kernel: [ 48.180297] caller is __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
Feb 20 17:22:40 xenon kernel: [ 48.180297] Pid: 3562, comm: ntpdate Not tainted 2.6.25-rc2-mm1-testing #1
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<c02015b9>] debug_smp_processor_id+0x99/0xb0
Could you test whether this patch fixes the netfilter
warnings please?
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 327e847..b77eb56 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -256,13 +256,19 @@ __nf_conntrack_find(const struct nf_conntrack_tuple *tuple)
struct hlist_node *n;
unsigned int hash = hash_conntrack(tuple);
+ /* Disable BHs the entire time since we normally need to disable them
+ * at least once for the stats anyway.
+ */
+ local_bh_disable();
Use rcu_read_lock instead. local_bh_disable() won't work with some of the other forms
of RCU alternatives.

-
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy
2008-02-21 16:34:56 UTC
Permalink
Post by Stephen Hemminger
On Thu, 21 Feb 2008 12:28:50 +0100
Post by Patrick McHardy
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 327e847..b77eb56 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -256,13 +256,19 @@ __nf_conntrack_find(const struct nf_conntrack_tuple *tuple)
struct hlist_node *n;
unsigned int hash = hash_conntrack(tuple);
+ /* Disable BHs the entire time since we normally need to disable them
+ * at least once for the stats anyway.
+ */
+ local_bh_disable();
Use rcu_read_lock instead. local_bh_disable() won't work with some of the other forms
of RCU alternatives.
The caller already calls rcu_read_lock(). This is for the per-cpu
statistics.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Tilman Schmidt
2008-02-22 00:40:35 UTC
Permalink
Post by Patrick McHardy
Could you test whether this patch fixes the netfilter
warnings please?
commit 736b33102292be0d75be1e950ca9bcd5361db7dd
Date: Thu Feb 21 12:26:01 2008 +0100
[NETFILTER]: nf_conntrack: fix smp_processor_id() in preemptible code warning
Yes, it does; and the system also survives substantially longer.
(IOW, it hasn't crashed on me so far.)

Thanks,
T.
--
Tilman Schmidt E-Mail: ***@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
Tilman Schmidt
2008-02-22 00:52:53 UTC
Permalink
Post by Tilman Schmidt
Post by Patrick McHardy
[NETFILTER]: nf_conntrack: fix smp_processor_id() in preemptible code warning
Yes, it does; and the system also survives substantially longer.
(IOW, it hasn't crashed on me so far.)
Which of course it did the second after I sent off that mail. :-(
No message at all this time at the time of the crash, even though
I had "tail -f /var/log/messages" running in an ssh session.

So the nf_conntrack BUG is fixed, but the crash (and of course the
swapper "spinlock bad magic" BUG) persists.
--
Tilman Schmidt E-Mail: ***@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
Paul E. McKenney
2008-02-22 17:09:08 UTC
Permalink
Post by Tilman Schmidt
Post by Tilman Schmidt
Post by Patrick McHardy
[NETFILTER]: nf_conntrack: fix smp_processor_id() in preemptible code warning
Yes, it does; and the system also survives substantially longer.
(IOW, it hasn't crashed on me so far.)
Which of course it did the second after I sent off that mail. :-(
No message at all this time at the time of the crash, even though
I had "tail -f /var/log/messages" running in an ssh session.
So the nf_conntrack BUG is fixed, but the crash (and of course the
swapper "spinlock bad magic" BUG) persists.
Do you have CONFIG_DEBUG_PREEMPT set? That would help find any other
bugs similar to nf_conntrack.

Thanx, Paul
-
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Tilman Schmidt
2008-02-25 08:54:19 UTC
Permalink
Post by Paul E. McKenney
Post by Tilman Schmidt
So the nf_conntrack BUG is fixed, but the crash (and of course the
swapper "spinlock bad magic" BUG) persists.
Do you have CONFIG_DEBUG_PREEMPT set? That would help find any other
bugs similar to nf_conntrack.
CONFIG_DEBUG_PREEMPT=y was set but didn't produce anything.
Or perhaps it did and the message just didn't make it to the disk.
Time to set up a test with netconsole, I guess.
--
Tilman Schmidt E-Mail: ***@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
Tilman Schmidt
2008-02-27 16:37:08 UTC
Permalink
Post by Patrick McHardy
Post by Tilman Schmidt
Feb 20 17:22:40 xenon kernel: [ 48.180297] BUG: using smp_processor_id() in preemptible [00000000] code: ntpdate/3562
Feb 20 17:22:40 xenon kernel: [ 48.180297] caller is __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
Feb 20 17:22:40 xenon kernel: [ 48.180297] Pid: 3562, comm: ntpdate Not tainted 2.6.25-rc2-mm1-testing #1
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<c02015b9>] debug_smp_processor_id+0x99/0xb0
Could you test whether this patch fixes the netfilter
warnings please?
Bad news: With 2.6.25-rc3, that bug has made it into mainline.
Good news: Your patch fixes it there, too.

So I suggest you forward it there as soon as possible.

Thanks,
Tilman
--
Tilman Schmidt E-Mail: ***@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)
Patrick McHardy
2008-02-27 16:47:11 UTC
Permalink
Post by Tilman Schmidt
Post by Patrick McHardy
Post by Tilman Schmidt
Feb 20 17:22:40 xenon kernel: [ 48.180297] BUG: using smp_processor_id() in preemptible [00000000] code: ntpdate/3562
Feb 20 17:22:40 xenon kernel: [ 48.180297] caller is __nf_conntrack_find+0x9b/0xeb [nf_conntrack]
Feb 20 17:22:40 xenon kernel: [ 48.180297] Pid: 3562, comm: ntpdate Not tainted 2.6.25-rc2-mm1-testing #1
Feb 20 17:22:40 xenon kernel: [ 48.180297] [<c02015b9>] debug_smp_processor_id+0x99/0xb0
Could you test whether this patch fixes the netfilter
warnings please?
Bad news: With 2.6.25-rc3, that bug has made it into mainline.
Good news: Your patch fixes it there, too.
No, its been there since -rc1.
Post by Tilman Schmidt
So I suggest you forward it there as soon as possible.
Already done, should hit upstream soon.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Continue reading on narkive:
Loading...