Discussion:
[PATCH 0/3] net: minor gso encapsulation fixes
Florian Westphal
2014-10-19 20:42:17 UTC
Permalink
The following series fixes a minor bug in the gro segmentation handlers
when handling encapsulation offloads.

Theoretically this could cause kernel panic when the stack tries
to software-segment such a GRE offload packet, but it looks like there
is only one affected call site (tbf scheduler) and it handles NULL
return value.

I've included a followup patch to add IS_ERR_OR_NULL checks to all
the various skb_gso_segment call sites.

While looking into this, I also found that size computation of the individual
segments is incorrect as we do not consider skb->encapsulation.

core/skbuff.c | 13 ++++++++++---
ipv4/af_inet.c | 2 +-
ipv4/gre_offload.c | 2 +-
ipv4/ip_output.c | 2 +-
ipv4/udp_offload.c | 2 +-
ipv6/ip6_offload.c | 2 +-
mpls/mpls_gso.c | 2 +-
netfilter/nfnetlink_queue_core.c | 2 +-
openvswitch/datapath.c | 2 +-
xfrm/xfrm_output.c | 2 +-
10 files changed, 19 insertions(+), 12 deletions(-)
Florian Westphal
2014-10-19 20:42:18 UTC
Permalink
skb_gso_segment() has a 'features' argument representing offload features
available to the output path.

A few handlers, e.g. GRE, instead re-fetch the features of skb->dev and use
those instead of the provided ones when handing encapsulation/tunnels.

Depending on dev->hw_enc_features of the output device skb_gso_segment() can
then return NULL even when the caller has disabled all GSO feature bits,
as segmentation of inner header thinks device will take care of segmentation.

This e.g. affects the tbf scheduler, which will silently drop GRE-encap GSO skbs
that did not fit the remaining token quota as the segmentation does not work
when device supports corresponding hw offload capabilities.

Cc: Eric Dumazet <***@google.com>
Cc: Pravin B Shelar <***@nicira.com>
Signed-off-by: Florian Westphal <***@strlen.de>
---
net/ipv4/af_inet.c | 2 +-
net/ipv4/gre_offload.c | 2 +-
net/ipv4/udp_offload.c | 2 +-
net/ipv6/ip6_offload.c | 2 +-
net/mpls/mpls_gso.c | 2 +-
5 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 92db7a6..8b7fe5b 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1246,7 +1246,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,

encap = SKB_GSO_CB(skb)->encap_level > 0;
if (encap)
- features = skb->dev->hw_enc_features & netif_skb_features(skb);
+ features &= skb->dev->hw_enc_features;
SKB_GSO_CB(skb)->encap_level += ihl;

skb_reset_transport_header(skb);
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index a777295..e084534 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -68,7 +68,7 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
skb->mac_len = skb_inner_network_offset(skb);

/* segment inner packet. */
- enc_features = skb->dev->hw_enc_features & netif_skb_features(skb);
+ enc_features = skb->dev->hw_enc_features & features;
segs = skb_mac_gso_segment(skb, enc_features);
if (IS_ERR_OR_NULL(segs)) {
skb_gso_error_unwind(skb, protocol, ghl, mac_offset, mac_len);
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 507310e..6480cea 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -58,7 +58,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
skb->encap_hdr_csum = 1;

/* segment inner packet. */
- enc_features = skb->dev->hw_enc_features & netif_skb_features(skb);
+ enc_features = skb->dev->hw_enc_features & features;
segs = gso_inner_segment(skb, enc_features);
if (IS_ERR_OR_NULL(segs)) {
skb_gso_error_unwind(skb, protocol, tnl_hlen, mac_offset,
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 9034f76..820cdb1 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -89,7 +89,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,

encap = SKB_GSO_CB(skb)->encap_level > 0;
if (encap)
- features = skb->dev->hw_enc_features & netif_skb_features(skb);
+ features &= skb->dev->hw_enc_features;
SKB_GSO_CB(skb)->encap_level += sizeof(*ipv6h);

ipv6h = ipv6_hdr(skb);
diff --git a/net/mpls/mpls_gso.c b/net/mpls/mpls_gso.c
index e28ed2e..f0f5309 100644
--- a/net/mpls/mpls_gso.c
+++ b/net/mpls/mpls_gso.c
@@ -48,7 +48,7 @@ static struct sk_buff *mpls_gso_segment(struct sk_buff *skb,
__skb_push(skb, skb->mac_len);

/* Segment inner packet. */
- mpls_features = skb->dev->mpls_features & netif_skb_features(skb);
+ mpls_features = skb->dev->mpls_features & features;
segs = skb_mac_gso_segment(skb, mpls_features);
--
2.0.4
Florian Westphal
2014-10-19 20:42:19 UTC
Permalink
skb_gso_segment has three possible return values:
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL. This can happen when GSO is used for header verification.

However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.

Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.

However, in the past, there have been issues with some protocol handlers
erronously not respecting the specified feature mask in some cases.

Signed-off-by: Florian Westphal <***@strlen.de>
---
net/ipv4/ip_output.c | 2 +-
net/netfilter/nfnetlink_queue_core.c | 2 +-
net/openvswitch/datapath.c | 2 +-
net/xfrm/xfrm_output.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index e35b712..88a0ee3 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -231,7 +231,7 @@ static int ip_finish_output_gso(struct sk_buff *skb)
*/
features = netif_skb_features(skb);
segs = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
- if (IS_ERR(segs)) {
+ if (IS_ERR_OR_NULL(segs)) {
kfree_skb(skb);
return -ENOMEM;
}
diff --git a/net/netfilter/nfnetlink_queue_core.c b/net/netfilter/nfnetlink_queue_core.c
index a82077d..7c60ccd 100644
--- a/net/netfilter/nfnetlink_queue_core.c
+++ b/net/netfilter/nfnetlink_queue_core.c
@@ -665,7 +665,7 @@ nfqnl_enqueue_packet(struct nf_queue_entry *entry, unsigned int queuenum)
* returned by nf_queue. For instance, callers rely on -ECANCELED to
* mean 'ignore this hook'.
*/
- if (IS_ERR(segs))
+ if (IS_ERR_OR_NULL(segs))
goto out_err;
queued = 0;
err = 0;
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 2e31d9e..f98f708 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -322,7 +322,7 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb,
int err;

segs = __skb_gso_segment(skb, NETIF_F_SG, false);
- if (IS_ERR(segs))
+ if (IS_ERR_OR_NULL(segs))
return PTR_ERR(segs);

/* Queue all of the segments. */
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 499d6c1..eef4334 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -155,7 +155,7 @@ static int xfrm_output_gso(struct sk_buff *skb)

segs = skb_gso_segment(skb, 0);
kfree_skb(skb);
- if (IS_ERR(segs))
+ if (IS_ERR_OR_NULL(segs))
return PTR_ERR(segs);

do {
--
2.0.4
David Miller
2014-10-20 00:39:43 UTC
Permalink
From: Florian Westphal <***@strlen.de>
Date: Sun, 19 Oct 2014 22:42:19 +0200
Post by Florian Westphal
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL. This can happen when GSO is used for header verification.
However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.
Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.
However, in the past, there have been issues with some protocol handlers
erronously not respecting the specified feature mask in some cases.
I don't think it makes sense to return PTR_ERR(p) when
p is NULL.
Florian Westphal
2014-10-20 07:05:04 UTC
Permalink
Post by David Miller
Date: Sun, 19 Oct 2014 22:42:19 +0200
Post by Florian Westphal
1. a pointer to the first segmented skb
2. an errno value (IS_ERR())
3. NULL. This can happen when GSO is used for header verification.
However, several callers currently test IS_ERR instead of IS_ERR_OR_NULL
and would oops when NULL is returned.
Note that these call sites should never actually see such a NULL return
value; all callers mask out the GSO bits in the feature argument.
However, in the past, there have been issues with some protocol handlers
erronously not respecting the specified feature mask in some cases.
I don't think it makes sense to return PTR_ERR(p) when
p is NULL.
Good point. Will respin.

Florian Westphal
2014-10-19 20:42:20 UTC
Permalink
if ->encapsulation is set we have to use inner_tcp_hdrlen and add the
size of the inner network headers too.

This is 'mostly harmless'; tbf might send skb that is slightly over
quota or drop skb even if it would have fit.

Signed-off-by: Florian Westphal <***@strlen.de>
---
net/core/skbuff.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7b3df0d..eb4b48b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4059,15 +4059,22 @@ EXPORT_SYMBOL_GPL(skb_scrub_packet);
unsigned int skb_gso_transport_seglen(const struct sk_buff *skb)
{
const struct skb_shared_info *shinfo = skb_shinfo(skb);
+ unsigned int thlen = 0;

- if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))
- return tcp_hdrlen(skb) + shinfo->gso_size;
+ if (skb->encapsulation) {
+ thlen = skb_inner_transport_header(skb) -
+ skb_transport_header(skb);

+ if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))
+ thlen += inner_tcp_hdrlen(skb);
+ } else if (likely(shinfo->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) {
+ thlen = tcp_hdrlen(skb);
+ }
/* UFO sets gso_size to the size of the fragmentation
* payload, i.e. the size of the L4 (UDP) header is already
* accounted for.
*/
- return shinfo->gso_size;
+ return thlen + shinfo->gso_size;
}
EXPORT_SYMBOL_GPL(skb_gso_transport_seglen);
--
2.0.4
Florian Westphal
2014-10-19 20:55:11 UTC
Permalink
Post by Florian Westphal
The following series fixes a minor bug in the gro segmentation handlers
when handling encapsulation offloads.
In case you're wondering, 'is there a 4th patch in this series'?

There was, I intentionally did not send it (IMO its -next material)
but forgot to adjust the "PATCH x/y" part of the subject lines.
Loading...