Discussion:
Regarding tx-nocache-copy in the Sheevaplug
Lluís Batlle i Rossell
2014-10-13 10:52:46 UTC
Permalink
Hello,

on the 7th of January 2014 ths patch was applied:
https://lkml.org/lkml/2014/1/7/307

[PATCH v2] net: Do not enable tx-nocache-copy by default
=20
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packets=
to be
sent corrupted. I think this machine has something special about the ca=
che.

Enabling back this tx-nocache-copy (as it used to be before the patch) =
the
transfers work fine again. I think that most people, encountering this =
problem,
completely disable the tx offload instead of enabling back this setting=
=2E

Is this an ARM kernel problem regarding this platform?

Thank you,
Llu=EDs
Eric Dumazet
2014-10-13 12:26:11 UTC
Permalink
Post by Lluís Batlle i Rossell
Hello,
=20
https://lkml.org/lkml/2014/1/7/307
=20
[PATCH v2] net: Do not enable tx-nocache-copy by default
=20
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packe=
ts to be
Post by Lluís Batlle i Rossell
sent corrupted. I think this machine has something special about the =
cache.
Post by Lluís Batlle i Rossell
=20
Enabling back this tx-nocache-copy (as it used to be before the patch=
) the
Post by Lluís Batlle i Rossell
transfers work fine again. I think that most people, encountering thi=
s problem,
Post by Lluís Batlle i Rossell
completely disable the tx offload instead of enabling back this setti=
ng.
Post by Lluís Batlle i Rossell
=20
Is this an ARM kernel problem regarding this platform?
Which NIC and driver is this exactly ?
Lluís Batlle i Rossell
2014-10-13 12:32:17 UTC
Permalink
Post by Eric Dumazet
Post by Lluís Batlle i Rossell
Hello,
=20
https://lkml.org/lkml/2014/1/7/307
=20
[PATCH v2] net: Do not enable tx-nocache-copy by default
=20
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made pac=
kets to be
Post by Eric Dumazet
Post by Lluís Batlle i Rossell
sent corrupted. I think this machine has something special about th=
e cache.
Post by Eric Dumazet
Post by Lluís Batlle i Rossell
=20
Enabling back this tx-nocache-copy (as it used to be before the pat=
ch) the
Post by Eric Dumazet
Post by Lluís Batlle i Rossell
transfers work fine again. I think that most people, encountering t=
his problem,
Post by Eric Dumazet
Post by Lluís Batlle i Rossell
completely disable the tx offload instead of enabling back this set=
ting.
Post by Eric Dumazet
Post by Lluís Batlle i Rossell
=20
Is this an ARM kernel problem regarding this platform?
=20
Which NIC and driver is this exactly ?
According to dmesg in 3.10.1:
[ 7.858872] mv643xx_eth: MV-643xx 10/100/1000 ethernet driver versio=
n 1.4
[ 7.866001] mv643xx_eth_port mv643xx_eth_port.0 eth0: port 0 with MA=
C address 00:50:43:01:d1:bb

Regards,
Llu=EDs.
Andrew Lunn
2014-10-13 14:21:56 UTC
Permalink
On Mon, Oct 13, 2014 at 12:52:46PM +0200, Llu=EDs Batlle i Rossell wrot=
Post by Lluís Batlle i Rossell
Hello,
=20
https://lkml.org/lkml/2014/1/7/307
=20
[PATCH v2] net: Do not enable tx-nocache-copy by default
=20
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packe=
ts to be
Post by Lluís Batlle i Rossell
sent corrupted. I think this machine has something special about the =
cache.

Hi Llu=EDs

Please could you describe your test setup. I would like to try to
reproduce the problem. I have a machine based on kirkwood 6282 and the
same ethernet.

Thanks
Andrew
Lluís Batlle i Rossell
2014-10-13 14:31:38 UTC
Permalink
Enabling tx offload and disabling tx-nocache-copy, making the machine *=
send* a
lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite e=
asy to
reproduce here.

As for the hardware, it's an old sheevaplug board.
On Mon, Oct 13, 2014 at 12:52:46PM +0200, Llu=EDs Batlle i Rossell wr=
Post by Lluís Batlle i Rossell
Hello,
=20
https://lkml.org/lkml/2014/1/7/307
=20
[PATCH v2] net: Do not enable tx-nocache-copy by default
=20
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made pac=
kets to be
Post by Lluís Batlle i Rossell
sent corrupted. I think this machine has something special about th=
e cache.
=20
Hi Llu=EDs
=20
Please could you describe your test setup. I would like to try to
reproduce the problem. I have a machine based on kirkwood 6282 and th=
e
same ethernet.
=20
Thanks
Andrew
Eric Dumazet
2014-10-13 14:49:19 UTC
Permalink
Enabling tx offload and disabling tx-nocache-copy, making the machine=
*send* a
lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's quite=
easy to
reproduce here.
=20
As for the hardware, it's an old sheevaplug board.
Have you tried disabling TSO only, and are you using the latest kernel =
?

Ezequiel Garcia added lot of changes recently.
Lluís Batlle i Rossell
2014-10-13 15:48:07 UTC
Permalink
Enabling tx offload and disabling tx-nocache-copy, making the machi=
ne *send* a
lot of ssh traffic (sftp for example) makes ssh fail HMAC. It's qui=
te easy to
reproduce here.
=20
As for the hardware, it's an old sheevaplug board.
=20
=20
Have you tried disabling TSO only, and are you using the latest kerne=
l ?
=20
Ezequiel Garcia added lot of changes recently.
=20
=20
Is TSO TCP segmentation offload? It's disabled. The kernel is 3.16.3 (d=
ebian).
https://packages.debian.org/testing/kernel/linux-image-3.16-2-kirkwood
Benjamin Poirier
2014-10-15 21:57:01 UTC
Permalink
Post by Lluís Batlle i Rossell
Hello,
=20
https://lkml.org/lkml/2014/1/7/307
=20
[PATCH v2] net: Do not enable tx-nocache-copy by default
=20
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made packe=
ts to be
Post by Lluís Batlle i Rossell
sent corrupted. I think this machine has something special about the =
cache.
Post by Lluís Batlle i Rossell
=20
Enabling back this tx-nocache-copy (as it used to be before the patch=
) the
Post by Lluís Batlle i Rossell
transfers work fine again. I think that most people, encountering thi=
s problem,
Post by Lluís Batlle i Rossell
completely disable the tx offload instead of enabling back this setti=
ng.
Post by Lluís Batlle i Rossell
=20
Is this an ARM kernel problem regarding this platform?
This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
skb_do_copy_data_nocache() should end up using __copy_from_user()
regardless of tx-nocache-copy.
Eric Dumazet
2014-10-15 22:45:27 UTC
Permalink
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
Hello,
=20
https://lkml.org/lkml/2014/1/7/307
=20
[PATCH v2] net: Do not enable tx-nocache-copy by default
=20
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made pac=
kets to be
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
sent corrupted. I think this machine has something special about th=
e cache.
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
=20
Enabling back this tx-nocache-copy (as it used to be before the pat=
ch) the
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
transfers work fine again. I think that most people, encountering t=
his problem,
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
completely disable the tx offload instead of enabling back this set=
ting.
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
=20
Is this an ARM kernel problem regarding this platform?
=20
This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
skb_do_copy_data_nocache() should end up using __copy_from_user()
regardless of tx-nocache-copy.
kmap_atomic()/kunmap_atomic() is missing, so we lack
__cpuc_flush_dcache_area() operations.
Benjamin Poirier
2014-10-16 17:34:01 UTC
Permalink
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
Hello,
=20
https://lkml.org/lkml/2014/1/7/307
=20
[PATCH v2] net: Do not enable tx-nocache-copy by default
=20
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made p=
ackets to be
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
sent corrupted. I think this machine has something special about =
the cache.
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
=20
Enabling back this tx-nocache-copy (as it used to be before the p=
atch) the
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
transfers work fine again. I think that most people, encountering=
this problem,
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
completely disable the tx offload instead of enabling back this s=
etting.
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
=20
Is this an ARM kernel problem regarding this platform?
=20
This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
skb_do_copy_data_nocache() should end up using __copy_from_user()
regardless of tx-nocache-copy.
=20
kmap_atomic()/kunmap_atomic() is missing, so we lack
__cpuc_flush_dcache_area() operations.
=20
You lost me there.
1) I don't see the link
2) It seems kmap_atomic and so on are there:
$ grep kmap_atomic System.map-3.16-2-kirkwood
c0014838 T kmap_atomic
c001491c T kmap_atomic_pfn
c00149a4 T kmap_atomic_to_page

MACH_KIRKWOOD selects CPU_FEROCEON which has
__cpuc_flush_dcache_area ->
cpu_cache.flush_kern_dcache_area ->
feroceon_flush_kern_dcache_area
Lluís Batlle i Rossell
2014-10-16 17:46:28 UTC
Permalink
Post by Benjamin Poirier
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
Hello,
=20
https://lkml.org/lkml/2014/1/7/307
=20
[PATCH v2] net: Do not enable tx-nocache-copy by default
=20
In the Sheevaplug (ARM Feroceon 88FR131 from Marvell) this made=
packets to be
Post by Benjamin Poirier
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
sent corrupted. I think this machine has something special abou=
t the cache.
Post by Benjamin Poirier
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
=20
Enabling back this tx-nocache-copy (as it used to be before the=
patch) the
Post by Benjamin Poirier
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
transfers work fine again. I think that most people, encounteri=
ng this problem,
Post by Benjamin Poirier
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
completely disable the tx offload instead of enabling back this=
setting.
Post by Benjamin Poirier
Post by Eric Dumazet
Post by Benjamin Poirier
Post by Lluís Batlle i Rossell
=20
Is this an ARM kernel problem regarding this platform?
=20
This is odd, only x86 defines ARCH_HAS_NOCACHE_UACCESS. On arm,
skb_do_copy_data_nocache() should end up using __copy_from_user()
regardless of tx-nocache-copy.
=20
kmap_atomic()/kunmap_atomic() is missing, so we lack
__cpuc_flush_dcache_area() operations.
=20
=20
You lost me there.
1) I don't see the link
$ grep kmap_atomic System.map-3.16-2-kirkwood
c0014838 T kmap_atomic
c001491c T kmap_atomic_pfn
c00149a4 T kmap_atomic_to_page
=20
MACH_KIRKWOOD selects CPU_FEROCEON which has
__cpuc_flush_dcache_area ->
cpu_cache.flush_kern_dcache_area ->
feroceon_flush_kern_dcache_area
Hello all,

it seems I was a bit wrong - although enabling back tx-nocache-copy mak=
es the
tx-errors happen much less often (ssh complaining about HMAC), they sti=
ll
happen. It seems that something was introduced in some recent kernels t=
hat broke
the tx offload.

I have no idea what it can be, but since 2.6 until at least 3.10 the ne=
twork
driver worked fine with tx offload in this sheevaplug board.

Regards,
Llu=EDs.
Benjamin Poirier
2014-10-17 20:55:30 UTC
Permalink
On 2014/10/16 19:46, Llu=EDs Batlle i Rossell wrote:
[...]
Post by Lluís Batlle i Rossell
=20
Hello all,
=20
it seems I was a bit wrong - although enabling back tx-nocache-copy m=
akes the
Post by Lluís Batlle i Rossell
tx-errors happen much less often (ssh complaining about HMAC), they s=
till
Post by Lluís Batlle i Rossell
happen. It seems that something was introduced in some recent kernels=
that broke
Post by Lluís Batlle i Rossell
the tx offload.
=20
I have no idea what it can be, but since 2.6 until at least 3.10 the =
network
Post by Lluís Batlle i Rossell
driver worked fine with tx offload in this sheevaplug board.
It's not the most pleasant alternative but if you can be sure enough
whether the problem is occurring or not, you could try bisecting,
possibly limiting the bisection to mv643xx

$ git bisect start v3.16.3 v3.10 -- drivers/net/ethernet/marvell/mv643x=
x_eth.c
Bisecting: 16 revisions left to test after this (roughly 4 steps)

The problem might be outside of the driver though.

Eric Dumazet
2014-10-16 17:48:25 UTC
Permalink
Post by Benjamin Poirier
Post by Eric Dumazet
kmap_atomic()/kunmap_atomic() is missing, so we lack
__cpuc_flush_dcache_area() operations.
You lost me there.
1) I don't see the link
$ grep kmap_atomic System.map-3.16-2-kirkwood
c0014838 T kmap_atomic
c001491c T kmap_atomic_pfn
c00149a4 T kmap_atomic_to_page
MACH_KIRKWOOD selects CPU_FEROCEON which has
__cpuc_flush_dcache_area ->
cpu_cache.flush_kern_dcache_area ->
feroceon_flush_kern_dcache_area
I meant to put a '?' instead of a '.'

Note that tcp does a copy, using :
Loading...