Discussion:
2.6.14-rc4-mm1
(too old to reply)
Reuben Farrelly
2005-10-17 06:19:31 UTC
Permalink
Hi,
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.14-rc4/2.6.14-rc4-mm1/
- Lots of i2c, PCI and USB updates
- Large input layer update to convert it all to dynamic input_dev allocation
- Significant x86_64 updates
- MD updates
- Lots of core memory management scalability rework
Compiles and runs here, but noting these messages when ethernet link down:

Oct 17 18:49:40 tornado kernel: NEIGH: BUG, double timer add, state is 1
Oct 17 18:51:04 tornado last message repeated 3 times
Oct 17 18:52:05 tornado last message repeated 5 times
Oct 17 18:52:11 tornado last message repeated 2 times

net/core/neighbour.c has this:

static inline void neigh_add_timer(struct neighbour *n, unsigned long when)
{
if (unlikely(mod_timer(&n->timer, when))) {
printk("NEIGH: BUG, double timer add, state is %x\n",
n->nud_state);
}
}


Network guys?

reuben
Herbert Xu
2005-10-23 07:32:15 UTC
Permalink
[NEIGH] Fix add_timer race in neigh_add_timer

neigh_add_timer cannot use add_timer unconditionally. The reason is that
by the time it has obtained the write lock someone else (e.g., neigh_update)
could have already added a new timer.

So it should only use mod_timer and deal with its return value accordingly.

This bug would have led to rare neighbour cache entry leaks.

Signed-off-by: Herbert Xu <***@gondor.apana.org.au>
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu
2005-10-23 07:30:21 UTC
Permalink
Post by Reuben Farrelly
Oct 17 18:49:40 tornado kernel: NEIGH: BUG, double timer add, state is 1
Oct 17 18:51:04 tornado last message repeated 3 times
Oct 17 18:52:05 tornado last message repeated 5 times
Oct 17 18:52:11 tornado last message repeated 2 times
Excellent. Looks like we actually caught something. Pity we don't have
a stack trace which means that there might be more bugs.

Anyway, here are three patches which should fix this. This should go
into 2.6.14.

Arnaldo, you can pull them from

master.kernel.org:/pub/scm/linux/kernel/git/herbert/net-2.6.git

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu
2005-10-23 07:31:12 UTC
Permalink
[NEIGH] Print stack trace in neigh_add_timer

Stack traces are very helpful in determining the exact nature of a bug.
So let's print a stack trace when the timer is added twice.

Signed-off-by: Herbert Xu <***@gondor.apana.org.au>
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Herbert Xu
2005-10-23 07:33:31 UTC
Permalink
[NEIGH] Fix timer leak in neigh_changeaddr

neigh_changeaddr attempts to delete neighbour timers without setting
nud_state. This doesn't work because the timer may have already fired
when we acquire the write lock in neigh_changeaddr. The result is that
the timer may keep firing for quite a while until the entry reaches
NEIGH_FAILED.

It should be setting the nud_state straight away so that if the timer
has already fired it can simply exit once we relinquish the lock.

In fact, this whole function is simply duplicating the logic in
neigh_ifdown which in turn is already doing the right thing when
it comes to deleting timers and setting nud_state.

So all we have to do is take that code out and put it into a common
function and make both neigh_changeaddr and neigh_ifdown call it.

Signed-off-by: Herbert Xu <***@gondor.apana.org.au>
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <***@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Ben Greear
2005-10-23 18:00:17 UTC
Permalink
Post by Herbert Xu
[NEIGH] Fix timer leak in neigh_changeaddr
neigh_changeaddr attempts to delete neighbour timers without setting
nud_state. This doesn't work because the timer may have already fired
when we acquire the write lock in neigh_changeaddr. The result is that
the timer may keep firing for quite a while until the entry reaches
NEIGH_FAILED.
It should be setting the nud_state straight away so that if the timer
has already fired it can simply exit once we relinquish the lock.
In fact, this whole function is simply duplicating the logic in
neigh_ifdown which in turn is already doing the right thing when
it comes to deleting timers and setting nud_state.
So all we have to do is take that code out and put it into a common
function and make both neigh_changeaddr and neigh_ifdown call it.
Thanks for all who reproduced and fixed this...I'm glad to know I wasn't
insane when I first tried to fix it and then couldn't reproduce
the problem anymore! :)

Ben
--
Ben Greear <***@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
Arnaldo Carvalho de Melo
2005-10-23 16:16:18 UTC
Permalink
Post by Herbert Xu
Post by Reuben Farrelly
Oct 17 18:49:40 tornado kernel: NEIGH: BUG, double timer add, state is 1
Oct 17 18:51:04 tornado last message repeated 3 times
Oct 17 18:52:05 tornado last message repeated 5 times
Oct 17 18:52:11 tornado last message repeated 2 times
Excellent. Looks like we actually caught something. Pity we don't have
a stack trace which means that there might be more bugs.
Anyway, here are three patches which should fix this. This should go
into 2.6.14.
Arnaldo, you can pull them from
master.kernel.org:/pub/scm/linux/kernel/git/herbert/net-2.6.git
Thanks, pulled.

I guess at some point I'll try to make the neighbour code more like the
sock one where we have things like sk_reset_timer and sk_stop_timer for
this purpose.

- Arnaldo
Continue reading on narkive:
Loading...