Discussion:
[PATCH net-next] tcp: Add TCP_FREEZE socket option
(too old to reply)
Kristian Evensen
2014-10-22 15:36:36 UTC
Permalink
From: Kristian Evensen <***@gmail.com>

This patch introduces support for Freeze-TCP [1].

Devices that are mobile frequently experience temporary disconnects, for example
due to signal fading or a technology change. These changes can last for a
substantial amount of time (>10 seconds), potentially causing multiple RTOs to
expire and the sender to enter slow start. Even though a device has
reconnected, it can take a long time for the TCP connection to recover.

Operators of mobile broadband networks mitigate this issue by placing TCP
splitters at the edge of their networks. However, the splitters typically only
operate on some ports (mostly only port 80) and violate the end-to-end
principle. The operator's TCP splitter receives a notification when a temporary
disconnect occurs and starts sending Zero Window Announcements (ZWA) to the
remote part of the connection. When a devices regains connectivity, the window
is reopened.

Freeze-TCP is a client-side only approach for enabling application developers to
trigger sending ZWAs. It is implemented as a socket option and accepts three
different values. If the value is set to one, the connection is frozen. A ZWA is
sent and the window size set to 0 in any reply to additional packets arriving
from remote party. If the value is set to two, the connection is unfrozen and a
window update announcement is sent. If the value is set to three, two additional
window update announcements are sent. This is referred to as TR-ACK in the paper
and is used to increase probability that a window update announcement will be
received.

When to trigger Freeze-TCP depends on the application requirements and
underlaying network, is not the responsibility of the kernel. One approach is to
have the application, or a daemon, analyze the meta data exported from a mobile
broadband modem. A temporary disconnect can often be detected in advance by
looking at different statistics.

[1] - T. Goff, J. Moronski, D. S. Phatak, and V. Gupta, "Freeze-TCP: a True
End-to-end TCP Enhancement Mechanism for Mobile Environments," In Proceedings of
IEEE INFOCOM 2000. URL: http://www.csee.umbc.edu/~phatak/publications/ftcp.pdf

Signed-off-by: Kristian Evensen <***@gmail.com>
---
include/linux/tcp.h | 3 ++-
include/uapi/linux/tcp.h | 1 +
net/ipv4/tcp.c | 33 +++++++++++++++++++++++++++++++++
net/ipv4/tcp_output.c | 8 +++++++-
4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index c2dee7d..7ed26c1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -187,7 +187,8 @@ struct tcp_sock {
syn_data:1, /* SYN includes data */
syn_fastopen:1, /* SYN includes Fast Open option */
syn_data_acked:1,/* data in SYN is acked by SYN-ACK */
- is_cwnd_limited:1;/* forward progress limited by snd_cwnd? */
+ is_cwnd_limited:1,/* forward progress limited by snd_cwnd? */
+ frozen:1; /* Artifically deflate announced window to 0 */
u32 tlp_high_seq; /* snd_nxt at the time of TLP retransmit. */

/* RTT measurement */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 3b97183..bc0684d 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -112,6 +112,7 @@ enum {
#define TCP_FASTOPEN 23 /* Enable FastOpen on listeners */
#define TCP_TIMESTAMP 24
#define TCP_NOTSENT_LOWAT 25 /* limit number of unsent bytes in write queue */
+#define TCP_FREEZE 26 /* Freeze TCP connection by sending ZWA */

struct tcp_repair_opt {
__u32 opt_code;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1bec4e7..5bf30d0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2339,6 +2339,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
struct inet_connection_sock *icsk = inet_csk(sk);
int val;
int err = 0;
+ u8 itr = 0;

/* These are data/string values, all the others are ints */
switch (optname) {
@@ -2600,6 +2601,35 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
tp->notsent_lowat = val;
sk->sk_write_space(sk);
break;
+ case TCP_FREEZE:
+ if (val < 1 || val > 3 ||
+ !((1 << sk->sk_state) & TCPF_ESTABLISHED)) {
+ err = -EINVAL;
+ break;
+ }
+
+ if (val == 1) {
+ tp->frozen = 1;
+ tcp_send_ack(sk);
+ break;
+ } else if (!tp->frozen) {
+ err = -EINVAL;
+ break;
+ }
+
+ tp->frozen = 0;
+ tcp_send_ack(sk);
+
+ if (val == 2)
+ break;
+
+ /* If val is three, send two additional reconnection ACKs to
+ * increase chance of a non-zero windows announcement arriving.
+ */
+ for (itr = 0; itr < 2; itr++)
+ tcp_send_ack(sk);
+
+ break;
default:
err = -ENOPROTOOPT;
break;
@@ -2832,6 +2862,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
case TCP_NOTSENT_LOWAT:
val = tp->notsent_lowat;
break;
+ case TCP_FREEZE:
+ val = tp->frozen;
+ break;
default:
return -ENOPROTOOPT;
}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 3af2129..9c1429b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -958,7 +958,13 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
*/
th->window = htons(min(tp->rcv_wnd, 65535U));
} else {
- th->window = htons(tcp_select_window(sk));
+ /* Because window is only artifically deflated to zero, we
+ * postpone updating tcp state until connection is unfrozen
+ */
+ if (unlikely(tp->frozen))
+ th->window = 0;
+ else
+ th->window = htons(tcp_select_window(sk));
}
th->check = 0;
th->urg_ptr = 0;
--
1.8.3.2
Eric Dumazet
2014-10-22 15:54:32 UTC
Permalink
Post by Kristian Evensen
This patch introduces support for Freeze-TCP [1].
Devices that are mobile frequently experience temporary disconnects, for example
due to signal fading or a technology change. These changes can last for a
substantial amount of time (>10 seconds), potentially causing multiple RTOs to
expire and the sender to enter slow start. Even though a device has
reconnected, it can take a long time for the TCP connection to recover.
Operators of mobile broadband networks mitigate this issue by placing TCP
splitters at the edge of their networks. However, the splitters typically only
operate on some ports (mostly only port 80) and violate the end-to-end
principle. The operator's TCP splitter receives a notification when a temporary
disconnect occurs and starts sending Zero Window Announcements (ZWA) to the
remote part of the connection. When a devices regains connectivity, the window
is reopened.
Freeze-TCP is a client-side only approach for enabling application developers to
trigger sending ZWAs. It is implemented as a socket option and accepts three
different values. If the value is set to one, the connection is frozen. A ZWA is
sent and the window size set to 0 in any reply to additional packets arriving
from remote party. If the value is set to two, the connection is unfrozen and a
window update announcement is sent. If the value is set to three, two additional
window update announcements are sent. This is referred to as TR-ACK in the paper
and is used to increase probability that a window update announcement will be
received.
When to trigger Freeze-TCP depends on the application requirements and
underlaying network, is not the responsibility of the kernel. One approach is to
have the application, or a daemon, analyze the meta data exported from a mobile
broadband modem. A temporary disconnect can often be detected in advance by
looking at different statistics.
[1] - T. Goff, J. Moronski, D. S. Phatak, and V. Gupta, "Freeze-TCP: a True
End-to-end TCP Enhancement Mechanism for Mobile Environments," In Proceedings of
IEEE INFOCOM 2000. URL: http://www.csee.umbc.edu/~phatak/publications/ftcp.pdf
---
include/linux/tcp.h | 3 ++-
include/uapi/linux/tcp.h | 1 +
net/ipv4/tcp.c | 33 +++++++++++++++++++++++++++++++++
net/ipv4/tcp_output.c | 8 +++++++-
4 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index c2dee7d..7ed26c1 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -187,7 +187,8 @@ struct tcp_sock {
syn_data:1, /* SYN includes data */
syn_fastopen:1, /* SYN includes Fast Open option */
syn_data_acked:1,/* data in SYN is acked by SYN-ACK */
- is_cwnd_limited:1;/* forward progress limited by snd_cwnd? */
+ is_cwnd_limited:1,/* forward progress limited by snd_cwnd? */
+ frozen:1; /* Artifically deflate announced window to 0 */
u32 tlp_high_seq; /* snd_nxt at the time of TLP retransmit. */
/* RTT measurement */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 3b97183..bc0684d 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -112,6 +112,7 @@ enum {
#define TCP_FASTOPEN 23 /* Enable FastOpen on listeners */
#define TCP_TIMESTAMP 24
#define TCP_NOTSENT_LOWAT 25 /* limit number of unsent bytes in write queue */
+#define TCP_FREEZE 26 /* Freeze TCP connection by sending ZWA */
struct tcp_repair_opt {
__u32 opt_code;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 1bec4e7..5bf30d0 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2339,6 +2339,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
struct inet_connection_sock *icsk = inet_csk(sk);
int val;
int err = 0;
+ u8 itr = 0;
/* These are data/string values, all the others are ints */
switch (optname) {
@@ -2600,6 +2601,35 @@ static int do_tcp_setsockopt(struct sock *sk, int level,
tp->notsent_lowat = val;
sk->sk_write_space(sk);
break;
+ if (val < 1 || val > 3 ||
+ !((1 << sk->sk_state) & TCPF_ESTABLISHED)) {
+ err = -EINVAL;
+ break;
+ }
+
+ if (val == 1) {
+ tp->frozen = 1;
+ tcp_send_ack(sk);
+ break;
+ } else if (!tp->frozen) {
+ err = -EINVAL;
+ break;
+ }
+
+ tp->frozen = 0;
+ tcp_send_ack(sk);
+
+ if (val == 2)
+ break;
+
+ /* If val is three, send two additional reconnection ACKs to
+ * increase chance of a non-zero windows announcement arriving.
+ */
+ for (itr = 0; itr < 2; itr++)
+ tcp_send_ack(sk);
+
+ break;
err = -ENOPROTOOPT;
break;
@@ -2832,6 +2862,9 @@ static int do_tcp_getsockopt(struct sock *sk, int level,
val = tp->notsent_lowat;
break;
+ val = tp->frozen;
+ break;
return -ENOPROTOOPT;
}
This asymmetry looks strange

Following sequence should be allowed :

getsockopt(... TCP_FREEZE, &val, ...)
setsockopt(... TCP_FREEZE, &val, ...)

So setsockopt() should accept val = 0
Kristian Evensen
2014-10-22 16:10:25 UTC
Permalink
Hi,
Post by Eric Dumazet
This asymmetry looks strange
getsockopt(... TCP_FREEZE, &val, ...)
setsockopt(... TCP_FREEZE, &val, ...)
So setsockopt() should accept val = 0
Thanks for you comment and I agree. The reasoning behind my original
ordering was that I wanted the values to be in the order which made
most logical sense to me, which is Enable (1), Disable (2) and Disable
with TR-ACK (3). However, I see now that when using the option and
when combined with getsockopt(), this does not make much sense. I will
wait for some more feedback and send a revised version tomorrow with
the following ordering: Disable (0), Enable (1), Disable with TR-ACK
(2).

Thanks again,
Kristian
David Miller
2014-10-22 16:14:18 UTC
Permalink
From: Kristian Evensen <***@gmail.com>
Date: Wed, 22 Oct 2014 17:36:36 +0200
Post by Kristian Evensen
This patch introduces support for Freeze-TCP [1].
By your description I would not expect the application to get involved
with the actual final zero window advertisement decision at all.

Instead, I would expect the device layer to trigger a notification
during a "technology change" or whatever you want to call losing
connectivity, whichi TCP can receive and use to start sending zero
windows over all TCP connections using that path.

So the socket option enables or disables the facility, but doesn't
actually trigger the zero window advertisement. A real device based
event does that.

The application has no business watching for the loss of connectivity,
and I am certain you do not want that logice in every application in
order for it to take advantage of this.

And therefore there should be a global option that turns this on for
the entire system by default.

This requires a lot more work than you have done here, you need to
add all the notification handling, the logic in TCP to look at the
attached route on send and trigger zero window probes if the device
event has happened, etc.
Kristian Evensen
2014-10-22 17:08:03 UTC
Permalink
Hi,
Post by David Miller
Instead, I would expect the device layer to trigger a notification
during a "technology change" or whatever you want to call losing
connectivity, whichi TCP can receive and use to start sending zero
windows over all TCP connections using that path.
I totally agree that this is ideally something that should be
controlled by the device layer. However, these temporary disconnects
are not visible through any normal link events (like link down, loss
of address, ...). The only way to detect the events is to parse meta
data coming from devices and look at traffic statistics. This would
involve for example adding parsing of the different mobile broadband
protocols (QMI, MBIM, and so on) to the device layer. When looking at
for example the commits for the QMI driver, parsing QMI messages seems
to have intentionally been left up to user space applications to avoid
bloating driver.
Post by David Miller
And therefore there should be a global option that turns this on for
the entire system by default.
This requires a lot more work than you have done here, you need to
add all the notification handling, the logic in TCP to look at the
attached route on send and trigger zero window probes if the device
event has happened, etc.
Another approach I designed was to have a separate TCP Freeze module
and trigger the freeze/unfreeze through genetlink-messages. A user
space application will be responsible for monitoring the devices and
decide when to trigger the ZWAs. Would a design like that be
acceptable?

-Kristian
Hagen Paul Pfeifer
2014-10-22 19:50:14 UTC
Permalink
Post by Kristian Evensen
Another approach I designed was to have a separate TCP Freeze module
and trigger the freeze/unfreeze through genetlink-messages. A user
space application will be responsible for monitoring the devices and
decide when to trigger the ZWAs. Would a design like that be
acceptable?
At least better. But what userspace daemon would configure this?
Likely NetworkManager and friends. But at what conditions?

- When the WIFI signal strength is below some threshold?
- When switched to another AP?
- When switched from 802.11 to 802.3
- ...

In a NATed scenario there is no gain because IP addreses change and
the connection is lost anyway. For the signal strength thing there
might be an advantage but it has costs:

a) how long did you freeze the connection? What if NetworkManager
stops? The connection hang \infty
b) is it not better to inform the upper layer - the application - that
something happen with the link?

I mean when the application experience disruptions, the application
can decide what it do: reconnect, reconnect and resend or inform the
user. This possibility is now lost/hidden. Maybe it is no problem -
maybe it is for some applications.

I have no fundamental problems with TCP Freeze, but what is missing is
a complete story line. The use cases where it makes sense and if it is
save.

Do you have considered to bring this to the IETF (TCPM WG)?

Hagen
Kristian Evensen
2014-10-22 20:33:39 UTC
Permalink
Hi,

I am very sorry for not explaining the scenario/use-case properly.
Freeze-TCP is mostly targeted at TCP connections established through
mobile broadband networks. One example scenario is that of when a user
moves outside of an area with LTE coverage. The mobile broadband
connection will then be downgraded to 2G/3G and this process takes
10-15 seconds in the networks I have been able to measure. During this
handover, the modem/device will in most cases report that it is still
connected to LTE. So just looking at the state of the link is not good
enough, as it will appear to be working fine (except for no data
coming through it). The device does not change IP address, so TCP
connections will resume normal operation as soon as the network
connection is re-established and packet is retransmitted. However,
because of the large "idle" period, this can take another 10-15
seconds.
Post by Hagen Paul Pfeifer
At least better. But what userspace daemon would configure this?
Likely NetworkManager and friends. But at what conditions?
Yes, that would be my suggestion for tools too. The conditions would
depend on the kind of network, available information and so on.
Post by Hagen Paul Pfeifer
In a NATed scenario there is no gain because IP addreses change and
the connection is lost anyway. For the signal strength thing there
a) how long did you freeze the connection? What if NetworkManager
stops? The connection hang \infty
b) is it not better to inform the upper layer - the application - that
something happen with the link?
I mean when the application experience disruptions, the application
can decide what it do: reconnect, reconnect and resend or inform the
user. This possibility is now lost/hidden. Maybe it is no problem -
maybe it is for some applications.
This is the main reason why I went with a socket option. While I
worked on this patch I wrote a small daemon for testing purposes. This
daemon analyses data exported from a mobile broadband modem (QMI),
looks at total interface throughput and then multicasts a netlink
message when it determines that a handover might happen. This message
is only a hint and then it is up to the application developer to
decide what to do. Another solution would be a hybrid, the module will
works as I described and the socket option will be used as an opt-in
for Freeze-TCP.
Post by Hagen Paul Pfeifer
Do you have considered to bring this to the IETF (TCPM WG)?
Yes, I am currently considering it, or if I should look into different
solutions before bringing it up for discussion. The ideal solution
would be if there was a way to force a retransmit when the handover
period is over, but that opens a whole net set of problems, potential
security problems and changes TCP semantics a bit. An advantage of
Freeze-TCP is that it works fine with what we have today.

Thanks for your detailed comments!

Kristian

Cong Wang
2014-10-22 16:56:34 UTC
Permalink
On Wed, Oct 22, 2014 at 8:36 AM, Kristian Evensen
Post by Kristian Evensen
This patch introduces support for Freeze-TCP [1].
Devices that are mobile frequently experience temporary disconnects, for example
due to signal fading or a technology change. These changes can last for a
substantial amount of time (>10 seconds), potentially causing multiple RTOs to
expire and the sender to enter slow start. Even though a device has
reconnected, it can take a long time for the TCP connection to recover.
Operators of mobile broadband networks mitigate this issue by placing TCP
splitters at the edge of their networks. However, the splitters typically only
operate on some ports (mostly only port 80) and violate the end-to-end
principle. The operator's TCP splitter receives a notification when a temporary
disconnect occurs and starts sending Zero Window Announcements (ZWA) to the
remote part of the connection. When a devices regains connectivity, the window
is reopened.
At least split TCP is transparent to applications, while your approach is not.
I don't understand why you said it typically operates on some ports, since
TCP is stateful.

BTW, AFAIK Linux doesn't support split TCP.
Kristian Evensen
2014-10-22 17:11:30 UTC
Permalink
Hi,
Post by Cong Wang
At least split TCP is transparent to applications, while your approach is not.
I don't understand why you said it typically operates on some ports, since
TCP is stateful.
I see that I might have used the wrong word here. I am use to calling
them TCP splitters, but I see that the devices are also referred to as
transparent TCP proxies. Anyhow, they are still transparent, but they
violate end-to-end (even though I guess that is pretty common
now-a-days).

What I mean by the port-comment is that only connections to some ports
are proxied/split. For example, one of the operators in Norway only
proxy port 80, so any HTTPS transfer risk getting stuck after a
temporary disconnect.

-Kristian
Continue reading on narkive:
Loading...