Discussion:
Linux, tcpdump and vlan
(too old to reply)
Patrick McHardy
2007-07-18 22:57:12 UTC
Permalink
[...]
1). If set promiscuous, the e1000 should disable any vlan rx filtering, so that it can receive vlan frames of other vlan id's. Other ethernet drivers probably need fixed as well.
2). The packet layer should change the rx skb device from the vlan 'fake' device (eth0.2) to the corresponding physical device (eth0), so when we run tcpdump on eth0 we see all vlan-tagged and non-vlan-tagged frames
3). The packet socket layer should insert the vlan tag header before passing frames to the upper layer, so tcpdump can display them.
Put another way, once you enable VLAN header stripping, you
won't see the headers for *any* VLAN, not only for those you're
actually running locally. This is also a problem for devices
like macvlan, where it would be desirable to make use of
hardware VLAN accerlation. I was thinking about storing the
information somewhere in the packets meta-data on both RX and
TX paths, that would also allow tcpdump to properly display
packets.

I have planned to look into this when I find some time.
Your suggestion of disabling VLAN acceleration in promiscous
mode sounds like a reasonable solution until then ..

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ben Greear
2007-07-18 23:22:57 UTC
Permalink
Post by Patrick McHardy
[...]
1). If set promiscuous, the e1000 should disable any vlan rx filtering, so that it can receive vlan frames of other vlan id's. Other ethernet drivers probably need fixed as well.
2). The packet layer should change the rx skb device from the vlan 'fake' device (eth0.2) to the corresponding physical device (eth0), so when we run tcpdump on eth0 we see all vlan-tagged and non-vlan-tagged frames
3). The packet socket layer should insert the vlan tag header before passing frames to the upper layer, so tcpdump can display them.
Put another way, once you enable VLAN header stripping, you
won't see the headers for *any* VLAN, not only for those you're
actually running locally. This is also a problem for devices
like macvlan, where it would be desirable to make use of
hardware VLAN accerlation. I was thinking about storing the
information somewhere in the packets meta-data on both RX and
TX paths, that would also allow tcpdump to properly display
packets.
MAC-VLAN could gather this information based on it's parent
device (ie, if parent-dev has VID 7, then add VID 7 to the meta
data. There would be no need for any driver changes I think.

Other than TCP-dump, or some other raw protocol that wants to see
the VLAN header in user-space, I can't think of what use this would
be, however. And, if you just disable VLAN accel in the NIC (see below),
that would make this mac-vlan hackery not needed at all?
Post by Patrick McHardy
I have planned to look into this when I find some time.
Your suggestion of disabling VLAN acceleration in promiscous
mode sounds like a reasonable solution until then ..
I think a better method would be to allow disabling VLAN HW accel for a
NIC with
ethtool. Then, the packets will be received by the software stack with
the vlan
header intact. Something sniffing on the physical dev will
automatically get the
VLAN header.

Thanks,
Ben
--
Ben Greear <***@candelatech.com>
Candela Technologies Inc http://www.candelatech.com


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy
2007-07-18 23:34:08 UTC
Permalink
Post by Ben Greear
Post by Patrick McHardy
Put another way, once you enable VLAN header stripping, you
won't see the headers for *any* VLAN, not only for those you're
actually running locally. This is also a problem for devices
like macvlan, where it would be desirable to make use of
hardware VLAN accerlation. I was thinking about storing the
information somewhere in the packets meta-data on both RX and
TX paths, that would also allow tcpdump to properly display
packets.
MAC-VLAN could gather this information based on it's parent
device (ie, if parent-dev has VID 7, then add VID 7 to the meta
data. There would be no need for any driver changes I think.
Its actually more a problem on the RX path. VLAN acceleration
works (at least with some drivers) by enabling HW header striping
and using the VLAN ID for an immediate lookup in the VLAN devices
configured on that device. So if the VLAN is not configured on the
real device but something like macvlan, it will get the packet
without a header and without any indication that this was a VLAN
packet. This is also what causes the tcpdump problem.

On the TX path, it could simply use the CB, but this is actually
also wrong (for both macvlan and real devices) since qdiscs have
ownership of the skb in between, and at least netem *does* modify
the CB, breaking VLAN.
Post by Ben Greear
Other than TCP-dump, or some other raw protocol that wants to see
the VLAN header in user-space, I can't think of what use this would
be, however. And, if you just disable VLAN accel in the NIC (see below),
that would make this mac-vlan hackery not needed at all?
Optimizations for macvlan are not too important, I agree. But for
tcpdump I consider it a bug.
Post by Ben Greear
Post by Patrick McHardy
I have planned to look into this when I find some time.
Your suggestion of disabling VLAN acceleration in promiscous
mode sounds like a reasonable solution until then ..
I think a better method would be to allow disabling VLAN HW accel for a
NIC with
ethtool. Then, the packets will be received by the software stack with
the vlan
header intact. Something sniffing on the physical dev will
automatically get the
VLAN header.
That would also be fine. But considering that the TX path is
problematic too, a clean solution for all of this would be
to store the VLAN id in the skb. And we do have some holes
to plug currently :)

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ben Greear
2007-07-19 00:01:03 UTC
Permalink
Post by Patrick McHardy
Post by Ben Greear
Post by Patrick McHardy
Put another way, once you enable VLAN header stripping, you
won't see the headers for *any* VLAN, not only for those you're
actually running locally. This is also a problem for devices
like macvlan, where it would be desirable to make use of
hardware VLAN accerlation. I was thinking about storing the
information somewhere in the packets meta-data on both RX and
TX paths, that would also allow tcpdump to properly display
packets.
MAC-VLAN could gather this information based on it's parent
device (ie, if parent-dev has VID 7, then add VID 7 to the meta
data. There would be no need for any driver changes I think.
Its actually more a problem on the RX path. VLAN acceleration
works (at least with some drivers) by enabling HW header striping
and using the VLAN ID for an immediate lookup in the VLAN devices
configured on that device. So if the VLAN is not configured on the
real device but something like macvlan, it will get the packet
without a header and without any indication that this was a VLAN
packet. This is also what causes the tcpdump problem.
This reminded me of something:

If we are using VLAN HW-Accel, then the skb hits the mac-vlan check with
the skb->dev == vlan-device.
So, in this case, we can put mac-vlans on top of 802.1Q VLANs.

But, if we are not using VLAN hw-accel, the skb hits the mac-vlan check
with skb->dev == ethernet-device.
In this case, we could NOT have the mac-vlan on top of the 802.1Q VLAN,
but we can have a MAC-VLAN
on the raw ethernet and we could add 802.1Q vlans on top of the
mac-vlan. This is because the
.1Q vlan will only be found once we go into the protocol handler logic,
which is necessarily after the
MAC-VLAN check logic.

Unless I am confused in my conjecture above, this is likely to confuse
others who try to mix and
match MAC-VLANs and 802.1Q VLANs.
Post by Patrick McHardy
Post by Ben Greear
Post by Patrick McHardy
I have planned to look into this when I find some time.
Your suggestion of disabling VLAN acceleration in promiscous
mode sounds like a reasonable solution until then ..
I think a better method would be to allow disabling VLAN HW accel for a
NIC with
ethtool. Then, the packets will be received by the software stack with
the vlan
header intact. Something sniffing on the physical dev will
automatically get the
VLAN header.
That would also be fine. But considering that the TX path is
problematic too, a clean solution for all of this would be
to store the VLAN id in the skb. And we do have some holes
to plug currently :)
With VLAN HW accel disabled, the skb will have the VLAN header in it by
the time it
hits the ethX interface, so sniffing there should still show the
header. It won't show
when sniffing the VLAN device, but I think that is OK.

Thanks,
Ben
--
Ben Greear <***@candelatech.com>
Candela Technologies Inc http://www.candelatech.com


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy
2007-07-19 00:19:27 UTC
Permalink
Post by Ben Greear
Post by Patrick McHardy
Its actually more a problem on the RX path. VLAN acceleration
works (at least with some drivers) by enabling HW header striping
and using the VLAN ID for an immediate lookup in the VLAN devices
configured on that device. So if the VLAN is not configured on the
real device but something like macvlan, it will get the packet
without a header and without any indication that this was a VLAN
packet. This is also what causes the tcpdump problem.
If we are using VLAN HW-Accel, then the skb hits the mac-vlan check with
the skb->dev == vlan-device.
So, in this case, we can put mac-vlans on top of 802.1Q VLANs.
But, if we are not using VLAN hw-accel, the skb hits the mac-vlan check
with skb->dev == ethernet-device.
In this case, we could NOT have the mac-vlan on top of the 802.1Q VLAN,
but we can have a MAC-VLAN
on the raw ethernet and we could add 802.1Q vlans on top of the
mac-vlan. This is because the
.1Q vlan will only be found once we go into the protocol handler logic,
which is necessarily after the
MAC-VLAN check logic.
Unless I am confused in my conjecture above, this is likely to confuse
others who try to mix and
match MAC-VLANs and 802.1Q VLANs.
The current code doesn't use hardware acceleration and works fine
in all combinations where only vlan *or* macvlan devices are used
on the underlying device.

If you mix them macvlan won't get to see vlan headers anymore,
same as for tcpdump, bridge devices, or anything else that
might care. A bridge eating VLAN headers should be a clearer
indication of a bug than an inaccurate tcpdump ..

The real problem is that the device removes the header for all
vlans, not only for those that are configured on the device.
This is a result of how the hardware works. But since we don't
have the data available later, we can't even fix it up in
software.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa
2007-07-19 13:28:46 UTC
Permalink
Post by Patrick McHardy
Your suggestion of disabling VLAN acceleration in promiscous
mode sounds like a reasonable solution until then ..
I'm not sure promiscous mode is related to the problem.
Tcpdump without promiscous mode makes perfect sense.

I don't know very well VLAN code internals, but I think
the VLAN # is used for looking up the interface, so
presenting the "original" packet on the trunk device
would IMHO involve some skb cloning, and perhaps some
ethtool option could probably control that.

Not sure about untagged frames vs. tagged frames with
the default VLAN id - can the hardware at all differentiate
between them?


Or, perhaps it should be left (almost) as is - with "software"
VLANs the traffic always goes through the master interface,
but with "accelerated" mode it only goes through logical
interfaces and doesn't show up on master? Probably with
exception of invalid VLANs, which could be injected back to
master (because no logical device exists)?
--
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Stephen Hemminger
2007-07-19 13:41:31 UTC
Permalink
On Thu, 19 Jul 2007 15:28:46 +0200
Post by Krzysztof Halasa
Post by Patrick McHardy
Your suggestion of disabling VLAN acceleration in promiscous
mode sounds like a reasonable solution until then ..
I'm not sure promiscous mode is related to the problem.
Tcpdump without promiscous mode makes perfect sense.
I don't know very well VLAN code internals, but I think
the VLAN # is used for looking up the interface, so
presenting the "original" packet on the trunk device
would IMHO involve some skb cloning, and perhaps some
ethtool option could probably control that.
Not sure about untagged frames vs. tagged frames with
the default VLAN id - can the hardware at all differentiate
between them?
Or, perhaps it should be left (almost) as is - with "software"
VLANs the traffic always goes through the master interface,
but with "accelerated" mode it only goes through logical
interfaces and doesn't show up on master? Probably with
exception of invalid VLANs, which could be injected back to
master (because no logical device exists)?
I don't claim to be a VLAN expert but there are really three cases
for handling tagged frames

1) non-accelerated device
* all frames show in promiscious mode
* tag is part of the frame that shows up
in tcpdump, and then gets stripped by the 8021q module.
2) rx tag stripping device
* all frames show in promiscious mode
* tag is in skb but NOT passed to tcpdump
3) rx vlan acceleration
* only frames that for vlan's that are registered show up
in promisicous mode
* tag is in skb but NOT passed to tcpdump

Unfortunately, the tag is lost as part of the VLAN acceleration process
so it is not a simple matter of changing code in AF_PACKET receive
to restore the tag.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy
2007-07-19 14:00:19 UTC
Permalink
Post by Stephen Hemminger
On Thu, 19 Jul 2007 15:28:46 +0200
Post by Krzysztof Halasa
Post by Patrick McHardy
Your suggestion of disabling VLAN acceleration in promiscous
mode sounds like a reasonable solution until then ..
I'm not sure promiscous mode is related to the problem.
Tcpdump without promiscous mode makes perfect sense.
Good point.
Post by Stephen Hemminger
Post by Krzysztof Halasa
I don't know very well VLAN code internals, but I think
the VLAN # is used for looking up the interface, so
presenting the "original" packet on the trunk device
would IMHO involve some skb cloning, and perhaps some
ethtool option could probably control that.
Not sure about untagged frames vs. tagged frames with
the default VLAN id - can the hardware at all differentiate
between them?
Or, perhaps it should be left (almost) as is - with "software"
VLANs the traffic always goes through the master interface,
but with "accelerated" mode it only goes through logical
interfaces and doesn't show up on master? Probably with
exception of invalid VLANs, which could be injected back to
master (because no logical device exists)?
The last case is the problematic one, the tag might be gone.
Post by Stephen Hemminger
I don't claim to be a VLAN expert but there are really three cases
for handling tagged frames
1) non-accelerated device
* all frames show in promiscious mode
* tag is part of the frame that shows up
in tcpdump, and then gets stripped by the 8021q module.
2) rx tag stripping device
* all frames show in promiscious mode
* tag is in skb but NOT passed to tcpdump
3) rx vlan acceleration
* only frames that for vlan's that are registered show up
in promisicous mode
* tag is in skb but NOT passed to tcpdump
Unfortunately, the tag is lost as part of the VLAN acceleration process
so it is not a simple matter of changing code in AF_PACKET receive
to restore the tag.
I think case 2) is not correct, the tag is stripped and is not in the
skb. Check out sky2 for example :)

if (sky2->vlgrp && (status & GMR_FS_VLAN)) {
vlan_hwaccel_receive_skb(skb,

sky2->vlgrp,
be16_to_cpu(sky2->rx_tag));


The tag it uses for the lookup comes from the descriptor. I don't
know any examples for case 3), but I would expect that the header
is also removed.

Anyway, I think what we should do is store the VLAN tag in the skb
meta data. That would not only allow tcpdump to reconstruct it, it
would also fix the invalid use of skb->cb on the TX path. It would
also fix the bridge eating VLAN headers case (bridge on eth0 + eth1,
additionally eth0.1 on eth0 using vlan RX accerlation with header
stripping) and would allow to simply forward the vlan tag to the
outgoing device in case it supports hardware accererated vlan tagging.

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa
2007-07-19 14:23:48 UTC
Permalink
Post by Stephen Hemminger
1) non-accelerated device
* all frames show in promiscious mode
* tag is part of the frame that shows up
in tcpdump, and then gets stripped by the 8021q module.
Sure. It's IMHO good and working, modulo the tag being removed
on the master device (optional cloning or something, IIRC).
Post by Stephen Hemminger
2) rx tag stripping device
* all frames show in promiscious mode
* tag is in skb but NOT passed to tcpdump
3) rx vlan acceleration
* only frames that for vlan's that are registered show up
in promisicous mode
* tag is in skb but NOT passed to tcpdump
I wasn't aware of devices doing 3. Aren't we able to tell them
to receive all packets anyway (even unknown VLANs#)?
Post by Stephen Hemminger
Unfortunately, the tag is lost as part of the VLAN acceleration process
so it is not a simple matter of changing code in AF_PACKET receive
to restore the tag.
I'm not sure if we really want it. If needed we can disable
acceleration, can't we? While accelerated we can see the packets
(without tags) on logical devices.

However seeing unknown tags on master device (with tcpdump etc)
would certainly be useful.
--
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Stephen Hemminger
2007-07-19 15:00:33 UTC
Permalink
On Thu, 19 Jul 2007 16:23:48 +0200
Post by Krzysztof Halasa
Post by Stephen Hemminger
1) non-accelerated device
* all frames show in promiscious mode
* tag is part of the frame that shows up
in tcpdump, and then gets stripped by the 8021q module.
Sure. It's IMHO good and working, modulo the tag being removed
on the master device (optional cloning or something, IIRC).
Post by Stephen Hemminger
2) rx tag stripping device
* all frames show in promiscious mode
* tag is in skb but NOT passed to tcpdump
3) rx vlan acceleration
* only frames that for vlan's that are registered show up
in promisicous mode
* tag is in skb but NOT passed to tcpdump
I wasn't aware of devices doing 3. Aren't we able to tell them
to receive all packets anyway (even unknown VLANs#)?
See NETIF_F_HW_VLAN_FILTER (e1000, etc).
Post by Krzysztof Halasa
Post by Stephen Hemminger
Unfortunately, the tag is lost as part of the VLAN acceleration process
so it is not a simple matter of changing code in AF_PACKET receive
to restore the tag.
I'm not sure if we really want it. If needed we can disable
acceleration, can't we? While accelerated we can see the packets
(without tags) on logical devices.
Not at runtime, acceleration is always on if you compile kernel with vlan
support. That is a design mistake as far as I can tell.
Post by Krzysztof Halasa
However seeing unknown tags on master device (with tcpdump etc)
would certainly be useful.
Only in promiscuous mode. In some sense tag is part of the mac address.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa
2007-07-19 15:45:13 UTC
Permalink
Post by Stephen Hemminger
Not at runtime, acceleration is always on if you compile kernel with vlan
support. That is a design mistake as far as I can tell.
I think so.
Post by Stephen Hemminger
Post by Krzysztof Halasa
However seeing unknown tags on master device (with tcpdump etc)
would certainly be useful.
Only in promiscuous mode. In some sense tag is part of the mac address.
Well, in "some sense" maybe, though the MAC address is rather
strictly defined to be a 6-octet value. I can live with
promiscous anyway, it's really minor issue.
--
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Stephen Hemminger
2007-07-19 15:20:08 UTC
Permalink
On Thu, 19 Jul 2007 16:23:48 +0200
Post by Krzysztof Halasa
Post by Stephen Hemminger
1) non-accelerated device
* all frames show in promiscious mode
* tag is part of the frame that shows up
in tcpdump, and then gets stripped by the 8021q module.
Sure. It's IMHO good and working, modulo the tag being removed
on the master device (optional cloning or something, IIRC).
Post by Stephen Hemminger
2) rx tag stripping device
* all frames show in promiscious mode
* tag is in skb but NOT passed to tcpdump
3) rx vlan acceleration
* only frames that for vlan's that are registered show up
in promisicous mode
* tag is in skb but NOT passed to tcpdump
I wasn't aware of devices doing 3. Aren't we able to tell them
to receive all packets anyway (even unknown VLANs#)?
See NETIF_F_HW_VLAN_FILTER (e1000, etc).
Post by Krzysztof Halasa
Post by Stephen Hemminger
Unfortunately, the tag is lost as part of the VLAN acceleration process
so it is not a simple matter of changing code in AF_PACKET receive
to restore the tag.
I'm not sure if we really want it. If needed we can disable
acceleration, can't we? While accelerated we can see the packets
(without tags) on logical devices.
Not at runtime, acceleration is always on if you compile kernel with vlan
support. That is a design mistake as far as I can tell.
Post by Krzysztof Halasa
However seeing unknown tags on master device (with tcpdump etc)
would certainly be useful.
Only in promiscuous mode. In some sense tag is part of the mac address.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
andrei radulescu-banu
2007-07-19 15:47:01 UTC
Permalink
The consensus seems to be that skb's need to carry vlan accelerated tags in their cb's, on rx as well as tx. VLAN_TX_SKB_CB() is perfect for that.
[Patrick] On the TX path, it could simply use the CB, but this is actually
also wrong (for both macvlan and real devices) since qdiscs have
ownership of the skb in between, and at least netem *does* modify
the CB, breaking VLAN.

Thanks for pointing that out... It appears to me that qdisc/netem already breaks the vlan implementation, in the path

vlan_dev_hwaccel_hard_start_xmit(): sets accelerated vlan tag in skb->cb, calls
dev_queue_xmit(): may pass skb to qdisc/netem, which may mangle skb->cb before calling
dev->hard_start_xmit(), resulting in a tx frame without its vlan tag.

So netem needs to look for hw accelerated vlan metadata and insert it in the skb... Don't see any other way around this.
[Patrick] Your suggestion of disabling VLAN acceleration in promiscuous
mode sounds like a reasonable solution until then ..

I was rather thinking of keeping hw vlan acceleration in promiscuous mode. Upon becoming promisc, the driver will be changed to disable vlan filters - it will reenable them when leaving promisc mode.

My 2 cents on vlan hw acceleration: it does not save much in computing cycles, if software is written carefully. It is vlan filtering that saves computing time.
[Ben] I think a better method would be to allow disabling VLAN HW accel for a NIC with ethtool.
This requires changes to ethtool and e1000 driver, +other drivers. It is a handy thing to have. I don't view it as a solution to tcpdump - or to the vlan bridging problem. One concern: if we're switching hw accel mode on the fly, we need to carefully protect tx frames that are just about going out and have already been set up for the opposite mode.

Any comments on what is the expected behavior of 'tcpdump -i eth0.2' vs. 'tcpdump -i eth0'?

Andrei Radulescu-Banu
Brix Networks






____________________________________________________________________________________
Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/
Stephen Hemminger
2007-07-19 16:21:05 UTC
Permalink
On Thu, 19 Jul 2007 08:47:01 -0700 (PDT)
Post by andrei radulescu-banu
The consensus seems to be that skb's need to carry vlan accelerated tags in their cb's, on rx as well as tx. VLAN_TX_SKB_CB() is perfect for that.
[Patrick] On the TX path, it could simply use the CB, but this is actually
also wrong (for both macvlan and real devices) since qdiscs have
ownership of the skb in between, and at least netem *does* modify
the CB, breaking VLAN.
No, VLAN is wrong to expect the CB to survive through layers. The CB is
a private scribble area that can be used by which ever piece of code currently
"owns" the skb. If data needs to be passed from layer to layer, it needs to
be done as separate fields in the skb itself. If A passes an skb to B, then
the CB can be changed by B (or things it calls) before it arrives at C.



-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy
2007-07-19 16:33:15 UTC
Permalink
Post by andrei radulescu-banu
The consensus seems to be that skb's need to carry vlan accelerated tags in their cb's, on rx as well as tx. VLAN_TX_SKB_CB() is perfect for that.
No its not. Its only legal to use while something has ownership
of the skb. Between VLAN devices and real devices qdiscs are
free to use it.
Post by andrei radulescu-banu
[Patrick] On the TX path, it could simply use the CB, but this is actually
also wrong (for both macvlan and real devices) since qdiscs have
ownership of the skb in between, and at least netem *does* modify
the CB, breaking VLAN.
Thanks for pointing that out... It appears to me that qdisc/netem already breaks the vlan implementation, in the path
vlan_dev_hwaccel_hard_start_xmit(): sets accelerated vlan tag in skb->cb, calls
dev_queue_xmit(): may pass skb to qdisc/netem, which may mangle skb->cb before calling
dev->hard_start_xmit(), resulting in a tx frame without its vlan tag.
So netem needs to look for hw accelerated vlan metadata and insert it in the skb... Don't see any other way around this.
No, we might want to put other data in the cb in the future.
VLAN should follow the rules instead.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ben Greear
2007-07-19 16:47:24 UTC
Permalink
Post by andrei radulescu-banu
[Ben] I think a better method would be to allow disabling VLAN HW accel for a NIC with ethtool.
This requires changes to ethtool and e1000 driver, +other drivers. It is a handy thing to have. I don't view it as a solution to tcpdump - or to the vlan bridging problem. One concern: if we're switching hw accel mode on the fly, we need to carefully protect tx frames that are just about going out and have already been set up for the opposite mode.
I think it would be valid to let a few packets slip through on the old
behaviour during changeover..or perhaps to drop them
entirely if that is required.

Turning off vlan hw-accel when the nic goes promisc is also going to
require driver changes, I believe, so
either way you have to do that work.

If tcpdump and/or bridging needs to disable the hw-accel, then it can
explicitly do so by some API. That is better than overloading
the promisc flag in my opinion. This is especially true since promisc
is not easily readable by user-space and things like tcpdump
cannot have full control of promisc (if a mac-vlan has the NIC in
promisc mode, for instance, then tcpdump can never disable it.)
Post by andrei radulescu-banu
Any comments on what is the expected behavior of 'tcpdump -i eth0.2' vs. 'tcpdump -i eth0'?
I would expect that you see tags with -i eth0, but not with -i eth0.2

That is the way it currently works with non-hw-accell VLANs (or it was
the last I checked).

Ben
--
Ben Greear <***@candelatech.com>
Candela Technologies Inc http://www.candelatech.com


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
andrei radulescu-banu
2007-07-19 16:02:38 UTC
Permalink
One additional thought: with the proposed changes in my prev message, the driver can be set to hw vlan accelerated mode, even if no vlan interfaces are configured. We would not have to switch hw vlan accelerated mode anymore, when vlan interfaces are created or destroyed.







____________________________________________________________________________________
Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545433
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa
2007-07-20 19:58:27 UTC
Permalink
Another idea - perhaps we could make the software VLANs behave
the same as hw ones? I.e., stripping the tag on RX while setting
some magic skb field?

The packets could go via main interface first (normal path, with
eth_type_trans stripping the tag and setting protocol = some 802.1Q),
netif_rx | netif_receive_skb, then through the VLAN device with
finally eth_type_trans setting the IPv4 etc. protocol to pass to
L3 layers.

I can see potential problems on TX, the packets would have to be
presented without the tag (but with VLAN ID set somewhere in the skb)
and that probably means all drivers would have to be modified.

Seems a bit of work, I know my message is missing the patch...
--
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ben Greear
2007-07-20 20:34:28 UTC
Permalink
Post by Krzysztof Halasa
Another idea - perhaps we could make the software VLANs behave
the same as hw ones? I.e., stripping the tag on RX while setting
some magic skb field?
The packets could go via main interface first (normal path, with
eth_type_trans stripping the tag and setting protocol = some 802.1Q),
netif_rx | netif_receive_skb, then through the VLAN device with
finally eth_type_trans setting the IPv4 etc. protocol to pass to
L3 layers.
There is already a flag you can set on vlan devices (reorder-header)
that strips the VLAN tag before presenting it to user-space.
Post by Krzysztof Halasa
I can see potential problems on TX, the packets would have to be
presented without the tag (but with VLAN ID set somewhere in the skb)
and that probably means all drivers would have to be modified.
On tx, if it shows up on the vlan device, we add that device's VID to
the header if no VID is currently in the SKB. If it is in the SKB header
we change the VID to be the tx dev's VID (if it was different). This allows user-space
to send a raw ethernet frame on a vlan device and have it automatically
go out of the box on the correct vlan. User-space can also send raw VLAN frames
and have those also go out on the correct VLAN.
Post by Krzysztof Halasa
Seems a bit of work, I know my message is missing the patch...
Unless I mis-understand, this has been working since 2.4 days :)

Ben
--
Ben Greear <***@candelatech.com>
Candela Technologies Inc http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa
2007-07-21 11:32:31 UTC
Permalink
Post by Ben Greear
There is already a flag you can set on vlan devices (reorder-header)
that strips the VLAN tag before presenting it to user-space.
Sure, but isn't it only valid for VLAN device (not the main ethX)?
I.e., can you have the tag stripped from frames captured on ethX?
Post by Ben Greear
On tx, if it shows up on the vlan device, we add that device's VID to
the header if no VID is currently in the SKB. If it is in the SKB header
we change the VID to be the tx dev's VID (if it was different). This allows user-space
to send a raw ethernet frame on a vlan device and have it automatically
go out of the box on the correct vlan. User-space can also send raw VLAN frames
and have those also go out on the correct VLAN.
Well... I think the tag should be added unconditionally (for things like
QinQ) but that's trivial and minor.

IOW: I think all Ethernet interfaces should always be VLAN-aware,
stripping the tag (only one) early on RX and adding it late on TX.
That means tcpdump would see packets with exactly one tag removed
(unless there was no tag), in both RX and TX.

Tcpdump would need other means to get VLAN id...
--
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ben Greear
2007-07-21 17:57:14 UTC
Permalink
Post by Krzysztof Halasa
Post by Ben Greear
There is already a flag you can set on vlan devices (reorder-header)
that strips the VLAN tag before presenting it to user-space.
Sure, but isn't it only valid for VLAN device (not the main ethX)?
I.e., can you have the tag stripped from frames captured on ethX?
No. I don't see a good reason to strip on ethX. That hardware accel
VLANs strip
is an inconvenience in my opinion, no need to force it in software as well.
Post by Krzysztof Halasa
Post by Ben Greear
On tx, if it shows up on the vlan device, we add that device's VID to
the header if no VID is currently in the SKB. If it is in the SKB header
we change the VID to be the tx dev's VID (if it was different). This allows user-space
to send a raw ethernet frame on a vlan device and have it automatically
go out of the box on the correct vlan. User-space can also send raw VLAN frames
and have those also go out on the correct VLAN.
Well... I think the tag should be added unconditionally (for things like
QinQ) but that's trivial and minor.
I think that for Q in Q, we would need some explicit flag on each skb to
know when to add or modify
the VID. I was never able to think of an automatic solution that worked
in all cases
(bridging, writing raw packets from user space, normal receive, normal
transmit, ...)

Modifying the bridging code would fix that path, and adding a socket opt
to deal with writing
raw packets from user-space should handle the other tricky case I believe.
Post by Krzysztof Halasa
IOW: I think all Ethernet interfaces should always be VLAN-aware,
stripping the tag (only one) early on RX and adding it late on TX.
That means tcpdump would see packets with exactly one tag removed
(unless there was no tag), in both RX and TX.
Tcpdump would need other means to get VLAN id...
What benefit will this add? It will certainly decrease performance to
copy around
the header for every VLAN packet, so there would have to be a good reason to
add this logic...

Ben
--
Ben Greear <***@candelatech.com>
Candela Technologies Inc http://www.candelatech.com


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa
2007-07-21 21:15:43 UTC
Permalink
Post by Ben Greear
Post by Krzysztof Halasa
IOW: I think all Ethernet interfaces should always be VLAN-aware,
stripping the tag (only one) early on RX and adding it late on TX.
That means tcpdump would see packets with exactly one tag removed
(unless there was no tag), in both RX and TX.
Tcpdump would need other means to get VLAN id...
What benefit will this add? It will certainly decrease performance to
copy around
the header for every VLAN packet, so there would have to be a good reason to
add this logic...
I'd have to do some tests... Hopefully in this decade, forget it for
now.

The primary reason - consistency with hw VLAN cards -> simpler
code.

The performance is already decreased (not sure if it's noticeable)
most of the time, i.e., when not transparently bridging VLAN
trunks. Bridging VLAN trunks is, of course, theoretically possible,
but it's rather not a common operation when using .1Q.
That is, with header reordering, of course.

Anyway, -ENOPATCH from me for now.
--
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
andrei radulescu-banu
2007-07-19 17:46:52 UTC
Permalink
[Andrei] VLAN_TX_SKB_CB() is perfect for that.
[Patrick, Stephen] No its not. Its only legal to use while something has ownership
of the skb. Between VLAN devices and real devices qdiscs are
free to use it.

All right, using VLAN_TX_SKB_CB() is a bad idea. In that case, we need to amend the skb struct, I don't see another way.






____________________________________________________________________________________
Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
andrei radulescu-banu
2007-07-19 18:20:43 UTC
Permalink
[Ben] If tcpdump and/or bridging needs to disable the hw-accel, then it can
explicitly do so by some API. That is better than overloading
the promisc flag in my opinion.

I guess I could be persuaded in the end. But let me still play devil advocate. The semantics of 'promiscuous', in my opinion, mean 'receive everything', including vlan.
[Ben] This is especially true since promisc
is not easily readable by user-space and things like tcpdump
cannot have full control of promisc (if a mac-vlan has the NIC in
promisc mode, for instance, then tcpdump can never disable it.)

I agree with all the above. For example when you run 'ifconfig' during 'tcpdump', the interface does not have the promiscuous flag set!!

This confused me for a while, until I realized that tcpdump's packet socket was using an obscure packet_dev_mc() API (af_packet.c) to get the interface in promiscuous mode. The reason for this is that packet_mc_add() implements a reference counted mechanism for promiscuous. So that:
- starting tcpdump instance 1 sets promiscuous mode
- starting tcpdump instance 2 bumps the ref count in packet_mc_add()
- killing tcpdump instance 1 bumps down the ref count, the interface stays promiscuous
- killing tcpdump instance 2 truly clear promiscuous mode.

The trick here is that when you kill tcpdump, the kernel clears the packet socket, and in process bumps down the ref count. Had tcpdump manually set/cleared the promisc flag, the interface would have stayed promisc after tcpdump was killed.

(The mac-vlan driver must have this corner problem as well. If a mac-vlan interface is disabled while tcpdump runs, it may yank promiscuousness from under tcpdump.)

So if you want to create an ethtool API to set vlan-promiscuous mode, one problem to grapple is that we need a similar mechanism to the above, so you can run two concurrent tcpdump's (or tcpdump while bridging vlans) and the vlan-promiscuous mode gets set correctly each time. For tcpdump at least, the new ethtool API needs to be called from packet_mc_add().









____________________________________________________________________________________
Yahoo! oneSearch: Finally, mobile search
that gives answers, not web links.
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Stephen Hemminger
2007-07-19 19:28:10 UTC
Permalink
On Thu, 19 Jul 2007 11:20:43 -0700 (PDT)
Post by Ben Greear
[Ben] If tcpdump and/or bridging needs to disable the hw-accel, then it can
explicitly do so by some API. That is better than overloading
the promisc flag in my opinion.
I guess I could be persuaded in the end. But let me still play devil advocate. The semantics of 'promiscuous', in my opinion, mean 'receive everything', including vlan.
[Ben] This is especially true since promisc
is not easily readable by user-space and things like tcpdump
cannot have full control of promisc (if a mac-vlan has the NIC in
promisc mode, for instance, then tcpdump can never disable it.)
I agree with all the above. For example when you run 'ifconfig' during 'tcpdump', the interface does not have the promiscuous flag set!!
In kernel it is a nice atomic counter, no problem.
Post by Ben Greear
- starting tcpdump instance 1 sets promiscuous mode
- starting tcpdump instance 2 bumps the ref count in packet_mc_add()
- killing tcpdump instance 1 bumps down the ref count, the interface stays promiscuous
- killing tcpdump instance 2 truly clear promiscuous mode.
The trick here is that when you kill tcpdump, the kernel clears the packet socket, and in process bumps down the ref count. Had tcpdump manually set/cleared the promisc flag, the interface would have stayed promisc after tcpdump was killed.
(The mac-vlan driver must have this corner problem as well. If a mac-vlan interface is disabled while tcpdump runs, it may yank promiscuousness from under tcpdump.)
The kernel has no such problem
Post by Ben Greear
So if you want to create an ethtool API to set vlan-promiscuous mode, one problem to grapple is that we need a similar mechanism to the above, so you can run two concurrent tcpdump's (or tcpdump while bridging vlans) and the vlan-promiscuous mode gets set correctly each time. For tcpdump at least, the new ethtool API needs to be called from packet_mc_add().
andrei radulescu-banu
2007-07-19 21:38:20 UTC
Permalink
During debugging, I noticed that dev_queue_xmit() is called twice for tx vlan frames. This results in a frame being passed twice to a packet socket bound to 'any' interface. If the packet socket is bound to a specific interface, though, it will get only one copy of the tx frame, which is good.

In more detail: suppose we're tx'ing a frame, and the route table lookup yields a vlan outgoing device eth0.2. dev_queue_xmit() is called, which calls dev_queue_xmit_nit() for dev = eth0.2 then dev->hard_start_xmit() for dev = eth0.2.

The latter call gets into the vlan layer, which attaches the vlan id 2 (accelerated or not... in my e1000 case accelerated) then calls dev_queue_xmit() again. This time around dev_queue_xmit_nit() is called for dev = eth0, and dev->hard_start_xmit() actually calls the ethernet driver.

The net result is that dev_queue_xmit_nit() is called twice, once for dev=eth0.2 then for dev=eth0.



____________________________________________________________________________________
Shape Yahoo! in your own image. Join our Network Research Panel today! http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7


-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Ben Greear
2007-07-19 23:38:57 UTC
Permalink
Post by andrei radulescu-banu
During debugging, I noticed that dev_queue_xmit() is called twice for tx vlan frames. This results in a frame being passed twice to a packet socket bound to 'any' interface. If the packet socket is bound to a specific interface, though, it will get only one copy of the tx frame, which is good.
In more detail: suppose we're tx'ing a frame, and the route table lookup yields a vlan outgoing device eth0.2. dev_queue_xmit() is called, which calls dev_queue_xmit_nit() for dev = eth0.2 then dev->hard_start_xmit() for dev = eth0.2.
The latter call gets into the vlan layer, which attaches the vlan id 2 (accelerated or not... in my e1000 case accelerated) then calls dev_queue_xmit() again. This time around dev_queue_xmit_nit() is called for dev = eth0, and dev->hard_start_xmit() actually calls the ethernet driver.
The net result is that dev_queue_xmit_nit() is called twice, once for dev=eth0.2 then for dev=eth0.
Maybe binding to all isn't such a good idea then.

Ben
--
Ben Greear <***@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
Krzysztof Halasa
2007-07-20 20:19:45 UTC
Permalink
Post by Ben Greear
Post by andrei radulescu-banu
The net result is that dev_queue_xmit_nit() is called twice, once
for dev=eth0.2 then for dev=eth0.
Maybe binding to all isn't such a good idea then.
Anyway I would expect the frame on eth0.2 and then on eth0 as well.
Anything different is crazy.
--
Krzysztof Halasa
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Continue reading on narkive:
Loading...