Discussion:
Routing BUG with ppp over l2tp
Alan Stern
2014-10-20 16:39:18 UTC
Permalink
James and Michal:

I'm having problem setting up a VPN connection that uses ppp over l2tp
over ipsec (this shows up under both 3.16 and 3.17-rc7). The ipsec
part is working fine, and xl2tpd sets up its connection okay. The
problem arises when ppp starts up.

As far as I can tell, the problem is caused by bad routing. The kernel
gets confused because the IP address assigned by the VPN server to the
server's end of the ppp tunnel is the _same_ as the server's actual IP
address.

Here are some details. My local address is 192.168.0.203 (behind a
NAT-ing wireless router). The VPN server is 140.247.233.37, as you can
see from this entry in the system log:

Oct 13 17:10:27 saphir NetworkManager: xl2tpd[2616]: Connecting to host 140.247.233.37, port 1701

The addresses of the ppp tunnel endpoints are given later in the log:

Oct 13 17:10:30 saphir pppd[2618]: local IP address 10.170.30.1
Oct 13 17:10:30 saphir pppd[2618]: remote IP address 140.247.233.37

The overall status from NetworkManager shows up in the log like this:

Oct 13 17:10:30 saphir NetworkManager: ** Message: L2TP service (IP Config Get) reply received.
Oct 13 17:10:30 saphir NetworkManager[439]: <info> VPN connection 'Rowland' (IP4 Config Get) reply received from old-style plugin.
Oct 13 17:10:30 saphir NetworkManager[439]: <info> VPN Gateway: 140.247.233.37
Oct 13 17:10:30 saphir NetworkManager[439]: <info> Tunnel Device: ppp0
Oct 13 17:10:30 saphir NetworkManager[439]: <info> IPv4 configuration:
Oct 13 17:10:30 saphir NetworkManager[439]: <info> Internal Address: 10.170.30.1
Oct 13 17:10:30 saphir NetworkManager[439]: <info> Internal Prefix: 32
Oct 13 17:10:30 saphir NetworkManager[439]: <info> Internal Point-to-Point Address: 140.247.233.37
Oct 13 17:10:30 saphir NetworkManager[439]: <info> Maximum Segment Size (MSS): 0
Oct 13 17:10:30 saphir NetworkManager[439]: <info> Forbid Default Route: no
Oct 13 17:10:30 saphir NetworkManager[439]: <info> Internal DNS: 10.160.0.2
Oct 13 17:10:30 saphir NetworkManager[439]: <info> Internal DNS: 10.160.0.3
Oct 13 17:10:30 saphir NetworkManager[439]: <info> DNS Domain: '(none)'
Oct 13 17:10:30 saphir NetworkManager[439]: <info> No IPv6 configuration
Oct 13 17:10:30 saphir NetworkManager[439]: <info> VPN connection 'Rowland' (IP Config Get) complete.

Once the ppp tunnel was set up, xl2tpd started getting errors:

Oct 13 17:11:31 saphir NetworkManager: xl2tpd[2616]: network_thread: select timeout
Oct 13 17:11:32 saphir NetworkManager: xl2tpd[2616]: network_thread: select timeout
Oct 13 17:11:32 saphir NetworkManager: xl2tpd[2616]: Maximum retries exceeded for tunnel 33716. Closing.
Oct 13 17:11:32 saphir NetworkManager: xl2tpd[2616]: Connection 147 closed to 140.247.233.37, port 1701 (Timeout)

Packet-level debugging showed that once I reached this stage, the
control messages sent by xl2tpd were not received by the server. I
believe this is because they were not routed correctly.

Unfortunately, at the moment I don't have a copy of the routing table.
Nevertheless, it definitely appears that the packets xl2tpd wanted to
send directly to the VPN server were instead routed back through the
ppp tunnel! Presumably this was because the routing table contained
two entries with their destinations both set to 140.247.233.37/32 (one
for the l2tp connection and one for the ppp tunnel), and the kernel
used the wrong entry.

In fact, on several occasions during testing, the system deadlocked. I
was able to get a stack dump:

[ 2214.970639] BUG: soft lockup - CPU#1 stuck for 22s! [pppd:9423]
[ 2214.970648] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppoe pppox ppp_generic slhc authenc cmac rmd160 crypto_null ip_vti ip_tunnel af_key ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_ro xfrm6_mode_beet xfrm6_mode_tunnel ipcomp ipcomp6 xfrm6_tunnel tunnel6 xfrm_ipcomp salsa20_i586 camellia_generic cast6_generic cast5_generic cast_common deflate cts gcm ccm serpent_sse2_i586 serpent_generic glue_helper blowfish_generic blowfish_common twofish_generic twofish_i586 twofish_common xcbc sha512_generic des_generic geode_aes tpm_rng tpm timeriomem_rng virtio_rng uas usb_storage fuse ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 ip6table_filter xt_conntrack ip6_tables nf_con
ntrack vfat
[ 2214.970769] fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel arc4 iwldvm snd_hda_controller snd_hda_codec uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core v4l2_common videodev mac80211 snd_hwdep coretemp kvm_intel kvm media snd_seq snd_seq_device iTCO_wdt iTCO_vendor_support snd_pcm snd_timer snd joydev iwlwifi microcode serio_raw cfg80211 asus_laptop lpc_ich atl1c soundcore sparse_keymap rfkill input_polldev acpi_cpufreq binfmt_misc i915 i2c_algo_bit drm_kms_helper drm i2c_core video
[ 2214.970854] CPU: 1 PID: 9423 Comm: pppd Tainted: G W 3.16.3-200.fc20.i686 #1
[ 2214.970860] Hardware name: ASUSTeK Computer Inc. UL20A /UL20A , BIOS 207 11/02/2009
[ 2214.970866] task: f0706a00 ti: e359c000 task.ti: e359c000
[ 2214.970873] EIP: 0060:[<c0a077b8>] EFLAGS: 00200287 CPU: 1
[ 2214.970885] EIP is at _raw_spin_lock_bh+0x28/0x40
[ 2214.970890] EAX: e5ff02a4 EBX: e5ff02a4 ECX: 00000060 EDX: 0000005f
[ 2214.970895] ESI: e5ff02b0 EDI: e3470d40 EBP: e359dc34 ESP: e359dc34
[ 2214.970900] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 2214.970906] CR0: 8005003b CR2: b72eb000 CR3: 25f28000 CR4: 000407d0
[ 2214.970910] Stack:
[ 2214.970914] e359dc9c f94efe2a f5140000 e359dc50 c045aba6 f5140000 00200286 e359dc78
[ 2214.970929] c045c553 00000001 f5140000 001f9076 00200286 628a17c6 f075acc0 f075acc0
[ 2214.970942] 00000000 00200246 f4464848 00200246 00200246 e359dc9c e3470d84 f075acc0
[ 2214.970957] Call Trace:
[ 2214.970973] [<f94efe2a>] ppp_push+0x32a/0x550 [ppp_generic]
[ 2214.970986] [<c045aba6>] ? internal_add_timer+0x26/0x60
[ 2214.970994] [<c045c553>] ? mod_timer_pending+0x63/0x130
[ 2214.971005] [<f94f288d>] ppp_xmit_process+0x3cd/0x5e0 [ppp_generic]
[ 2214.971007] [<c0914ae1>] ? harmonize_features+0x31/0x1d0
[ 2214.971007] [<f94f2c78>] ppp_start_xmit+0x108/0x180 [ppp_generic]
[ 2214.971007] [<c0915024>] dev_hard_start_xmit+0x2c4/0x540
[ 2214.971007] [<c093244f>] sch_direct_xmit+0x9f/0x170
[ 2214.971007] [<c091546a>] __dev_queue_xmit+0x1ca/0x430
[ 2214.971007] [<c094c9b0>] ? ip_fragment+0x930/0x930
[ 2214.971007] [<c09156df>] dev_queue_xmit+0xf/0x20
[ 2214.971007] [<c091bacf>] neigh_direct_output+0xf/0x20
[ 2214.971007] [<c094cb5a>] ip_finish_output+0x1aa/0x850
[ 2214.971007] [<c094c9b0>] ? ip_fragment+0x930/0x930
[ 2214.971007] [<c094dbbf>] ip_output+0x8f/0xe0
[ 2214.971007] [<c094c9b0>] ? ip_fragment+0x930/0x930
[ 2214.971007] [<c09a4f52>] xfrm_output_resume+0x342/0x3a0
[ 2214.971007] [<c09a5013>] xfrm_output+0x43/0xf0
[ 2214.971007] [<c0998f4d>] xfrm4_output_finish+0x3d/0x40
[ 2214.971007] [<c0998e25>] __xfrm4_output+0x25/0x40
[ 2214.971007] [<c0998f7f>] xfrm4_output+0x2f/0x70
[ 2214.971007] [<c0998e00>] ? xfrm4_udp_encap_rcv+0x1b0/0x1b0
[ 2214.971007] [<c094d2e7>] ip_local_out_sk+0x27/0x30
[ 2214.971007] [<c094d5f4>] ip_queue_xmit+0x124/0x3f0
[ 2214.971007] [<c0999f04>] ? xfrm_bundle_ok+0x64/0x170
[ 2214.971007] [<c099a0ab>] ? xfrm_dst_check+0x1b/0x30
[ 2214.971007] [<f94fd618>] l2tp_xmit_skb+0x298/0x4b0 [l2tp_core]
[ 2214.971007] [<f950cd04>] pppol2tp_xmit+0x124/0x1d0 [l2tp_ppp]
[ 2214.971007] [<f94f2adb>] ppp_channel_push+0x3b/0xb0 [ppp_generic]
[ 2214.971007] [<f94f2d77>] ppp_write+0x87/0xc8 [ppp_generic]
[ 2214.971007] [<f94f2cf0>] ? ppp_start_xmit+0x180/0x180 [ppp_generic]
[ 2214.971007] [<c057723d>] vfs_write+0x9d/0x1d0
[ 2214.971007] [<c0577951>] SyS_write+0x51/0xb0
[ 2214.971007] [<c0a07b9f>] sysenter_do_call+0x12/0x12
[ 2214.971007] Code: 00 00 00 55 89 e5 66 66 66 66 90 64 81 05 90 b6 dc c0 00 02 00 00 ba 00 01 00 00 f0 66 0f c1 10 0f b6 ce 38 d1 75 04 5d c3 f3 90 <0f> b6 10 38 ca 75 f7 5d c3 90 90 90 90 90 90 90 90 90 90 90 90

The deadlock occurs because ppp_channel_push() (near the end of the
stack listing) holds the pch->downl spinlock while calling
pch->chan->ops->start_xmit(). The dump shows this call filtering down
through the routing layer and into ppp_push() (near the top of the
listing), which tries to acquire the same spinlock.

It sure looks like a ppp data packet was put into an l2tp wrapper
and then sent back to the ppp layer for transmission, rather than
getting sent out through the wlan0 interface.

Unfortunately, I can't work around this problem by reconfiguring the
VPN server -- there's no way to tell it to use a different IP address
for its end of the VPN tunnel. Furthermore, the server works just fine
with clients running Windows or OS-X.

So it looks like the problem has to be fixed either in the kernel or in
the way pppd sets up its routing entry. Can you guys help?

Thanks,

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ppp" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
James Carlson
2014-10-20 17:19:40 UTC
Permalink
Post by Alan Stern
As far as I can tell, the problem is caused by bad routing. The kernel
gets confused because the IP address assigned by the VPN server to the
server's end of the ppp tunnel is the _same_ as the server's actual IP
address.
Indeed! That's pretty darned lame behavior by that peer. It would
probably be workable if you had a virtual router instance and were able
to put the L2TP connection in one routing instance and the PPP
connection in another routing instance, but that's likely not at all
simple to achieve.
Post by Alan Stern
Unfortunately, I can't work around this problem by reconfiguring the
VPN server -- there's no way to tell it to use a different IP address
for its end of the VPN tunnel. Furthermore, the server works just fine
with clients running Windows or OS-X.
Really? That seems ... improbable.
Post by Alan Stern
So it looks like the problem has to be fixed either in the kernel or in
the way pppd sets up its routing entry. Can you guys help?
I think the easiest solution is to configure pppd to lie to the kernel
about the remote address. Who cares what the remote address is on a
point-to-point link anyway?

There's currently no option to do this, but the code change in ipcp_up()
in pppd/ipcp.c would be rather simple. Just make the "noremoteip" code
run all the time:

/* Deliberately falsify the remote address. We don't care. */
ho->hisaddr = htonl(0x0aa00002);

As long as you don't need to contact that specific remote server using
the badly-assigned "internal" VPN address and can live with the fact
that you'll either go through the regular Internet to that address or be
forced to use some other address configured on that server, you should
be good.

(The address I used above is 10.160.0.2. That was one of the internal
DNS server addresses provided in the log you posted. It's not necessary
that the address used here is exactly that, but it may well be helpful.)

If you can't do that for some reason, then I suppose it would be
possible to use IP Chains (or whatever the packet-modification tool du
jure is used in your Linux distribution) to nail up an exception so that
the outside packets go to the outside interface and the inside ones go
to the PPP interface. Doing that likely requires selecting on (at
least!) source address, so it's messy and ugly and possibly error-prone,
but it might be doable.

Otherwise, contact the maintainer of that VPN server. It's just plain
old broken, and life's too short for broken software.
--
James Carlson 42.703N 71.076W <***@workingcode.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-ppp" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alan Stern
2014-10-20 19:45:23 UTC
Permalink
Post by James Carlson
Post by Alan Stern
As far as I can tell, the problem is caused by bad routing. The kernel
gets confused because the IP address assigned by the VPN server to the
server's end of the ppp tunnel is the _same_ as the server's actual IP
address.
Indeed! That's pretty darned lame behavior by that peer. It would
probably be workable if you had a virtual router instance and were able
to put the L2TP connection in one routing instance and the PPP
connection in another routing instance, but that's likely not at all
simple to achieve.
I'd like to find the simplest solution. Ideally it should "just work",
like the Windows and OS-X clients do.
Post by James Carlson
Post by Alan Stern
Unfortunately, I can't work around this problem by reconfiguring the
VPN server -- there's no way to tell it to use a different IP address
for its end of the VPN tunnel. Furthermore, the server works just fine
with clients running Windows or OS-X.
Really? That seems ... improbable.
I guess that depends on how you judge probabilities. :-)

As evidence to convince you, here's a log of a session on a rather old
Mac Powerbook G4 running OS 10.4.11. The situation isn't exactly the
same as with my Linux system, because for this test the client and the
VPN server are on the same subnet -- I don't think that should make any
difference. The client's IP address is 140.247.233.41, the server's is
.37, and the router to the outside world is .33. The client's ppp IP
address (assigned by the server) is 10.170.30.1.

The following commands were carried out while the VPN was connected:


------------------------------------------------------------------------
michael-burns-powerbook-g4:~ stern$ netstat -rn -f inet
Routing tables

Internet:
Destination Gateway Flags Refs Use Netif Expire
default 140.247.233.37 UGSc 2 4 ppp0
10 ppp0 USc 1 0 ppp0
127 127.0.0.1 UCS 0 0 lo0
127.0.0.1 127.0.0.1 UH 12 2278 lo0
140.247.233.32/27 link#4 UCS 2 0 en0
140.247.233.33 0:8:e3:ff:fc:b8 UHLW 0 0 en0 1198
140.247.233.37 0:1e:f7:15:53:a8 UHLW 3 10 en0 1153
140.247.233.37/32 link#4 UCS 1 0 en0
140.247.233.41 127.0.0.1 UHS 0 0 lo0
169.254 link#4 UCS 0 0 en0

michael-burns-powerbook-g4:~ stern$ ping -c1 -n 10.160.0.2
PING 10.160.0.2 (10.160.0.2): 56 data bytes
64 bytes from 10.160.0.2: icmp_seq=0 ttl=64 time=1.368 ms

--- 10.160.0.2 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.368/1.368/1.368/nan ms

michael-burns-powerbook-g4:~ stern$ ifconfig en0
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
inet 140.247.233.41 netmask 0xffffffe0 broadcast 140.247.233.63
ether 00:03:93:12:da:48
media: autoselect (100baseTX <full-duplex>) status: active
supported media: none autoselect 10baseT/UTP <half-duplex> 10baseT/UTP <full-duplex> 10baseT/UTP <full-duplex,hw-loopback> 100baseTX <half-duplex> 100baseTX <full-duplex> 100baseTX <full-duplex,hw-loopback>

michael-burns-powerbook-g4:~ stern$ ifconfig ppp0
ppp0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1280
inet 10.170.30.1 --> 140.247.233.37 netmask 0xff000000
------------------------------------------------------------------------


I don't understand (and can't be bothered to look up) those arcane
symbols in the netstat output. The IP address used for the ping test
(10.160.0.2) is a system on the VPN's private network.

Here's comparable output for a connection from a computer running
Windows 7 (same IP addresses as before):


------------------------------------------------------------------------
C:\Users\stern>netstat -rn
===========================================================================
Interface List
20...........................Rowland VPN
10...00 1a 6b 57 30 02 ......Intel(R) 82566DM Gigabit Network Connection
1...........................Software Loopback Interface 1
11...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter
12...00 00 00 00 00 00 00 e0 Teredo Tunneling Pseudo-Interface
19...00 00 00 00 00 00 00 e0 Microsoft 6to4 Adapter
21...00 00 00 00 00 00 00 e0 Microsoft ISATAP Adapter #2
===========================================================================

IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 140.247.233.33 140.247.233.41 4491
0.0.0.0 0.0.0.0 On-link 10.170.30.1 11
10.170.30.1 255.255.255.255 On-link 10.170.30.1 266
127.0.0.0 255.0.0.0 On-link 127.0.0.1 4531
127.0.0.1 255.255.255.255 On-link 127.0.0.1 4531
127.255.255.255 255.255.255.255 On-link 127.0.0.1 4531
140.247.233.32 255.255.255.224 On-link 140.247.233.41 4491
140.247.233.37 255.255.255.255 On-link 140.247.233.41 4236
140.247.233.41 255.255.255.255 On-link 140.247.233.41 4491
140.247.233.63 255.255.255.255 On-link 140.247.233.41 4491
224.0.0.0 240.0.0.0 On-link 127.0.0.1 4531
224.0.0.0 240.0.0.0 On-link 140.247.233.41 4492
224.0.0.0 240.0.0.0 On-link 10.170.30.1 11
255.255.255.255 255.255.255.255 On-link 127.0.0.1 4531
255.255.255.255 255.255.255.255 On-link 140.247.233.41 4491
255.255.255.255 255.255.255.255 On-link 10.170.30.1 266
===========================================================================
Persistent Routes:
Network Address Netmask Gateway Address Metric
0.0.0.0 0.0.0.0 140.247.233.33 Default
===========================================================================

C:\Users\stern>ping -n 1 10.160.0.2

Pinging 10.160.0.2 with 32 bytes of data:
Reply from 10.160.0.2: bytes=32 time<1ms TTL=64

Ping statistics for 10.160.0.2:
Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 0ms, Average = 0ms

C:\Users\stern>ipconfig /all

Windows IP Configuration

Host Name . . . . . . . . . . . . : Windows-test
Primary Dns Suffix . . . . . . . : rowland.org
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No
DNS Suffix Search List. . . . . . : rowland.org

PPP adapter Rowland VPN:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Rowland VPN
Physical Address. . . . . . . . . :
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
IPv4 Address. . . . . . . . . . . : 10.170.30.1(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.255
Default Gateway . . . . . . . . . : 0.0.0.0
DNS Servers . . . . . . . . . . . : 10.160.0.2
10.160.0.3
Primary WINS Server . . . . . . . : 10.160.0.2
NetBIOS over Tcpip. . . . . . . . : Disabled

Ethernet adapter Local Area Connection:

Connection-specific DNS Suffix . :
Description . . . . . . . . . . . : Intel(R) 82566DM Gigabit Network Connection
Physical Address. . . . . . . . . : 00-1A-6B-57-30-02
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
Link-local IPv6 Address . . . . . : fe80::1426:1891:bf83:3982%10(Preferred)
IPv4 Address. . . . . . . . . . . : 140.247.233.41(Preferred)
Subnet Mask . . . . . . . . . . . : 255.255.255.224
Default Gateway . . . . . . . . . : 140.247.233.33
DHCPv6 IAID . . . . . . . . . . . : 234887787
DHCPv6 Client DUID. . . . . . . . : 00-01-00-01-1B-A2-3E-B1-00-1A-6B-57-30-02

DNS Servers . . . . . . . . . . . : 8.8.8.8
NetBIOS over Tcpip. . . . . . . . : Enabled
------------------------------------------------------------------------


Although the Windows ipconfig output doesn't show the IP address of the
server side of the ppp tunnel, it does show up in a Details window
under the Network control panel, and it is indeed set to
140.247.233.37.
Post by James Carlson
Post by Alan Stern
So it looks like the problem has to be fixed either in the kernel or in
the way pppd sets up its routing entry. Can you guys help?
I think the easiest solution is to configure pppd to lie to the kernel
about the remote address. Who cares what the remote address is on a
point-to-point link anyway?
There's currently no option to do this, but the code change in ipcp_up()
in pppd/ipcp.c would be rather simple. Just make the "noremoteip" code
/* Deliberately falsify the remote address. We don't care. */
ho->hisaddr = htonl(0x0aa00002);
As long as you don't need to contact that specific remote server using
the badly-assigned "internal" VPN address and can live with the fact
that you'll either go through the regular Internet to that address or be
forced to use some other address configured on that server, you should
be good.
(The address I used above is 10.160.0.2. That was one of the internal
DNS server addresses provided in the log you posted. It's not necessary
that the address used here is exactly that, but it may well be helpful.)
That might work. But using a nonstandard version of pppd would be
awkward, and I would prefer to avoid it.
Post by James Carlson
If you can't do that for some reason, then I suppose it would be
possible to use IP Chains (or whatever the packet-modification tool du
jure is used in your Linux distribution) to nail up an exception so that
the outside packets go to the outside interface and the inside ones go
to the PPP interface. Doing that likely requires selecting on (at
least!) source address, so it's messy and ugly and possibly error-prone,
but it might be doable.
That sounds like a fairly easy thing to try. But it would still
require manual intervention instead of just working. Fixing the kernel
would be preferable, IMO.
Post by James Carlson
Otherwise, contact the maintainer of that VPN server. It's just plain
old broken, and life's too short for broken software.
It is an old Cisco security appliance, no doubt well past End-Of-Life.
I'm starting to think it might be preferable to throw the thing away
and start up a VPN server on the department's firewall (which is a
Linux box) instead.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ppp" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
James Carlson
2014-10-20 20:22:45 UTC
Permalink
Post by Alan Stern
Post by James Carlson
Indeed! That's pretty darned lame behavior by that peer. It would
probably be workable if you had a virtual router instance and were able
to put the L2TP connection in one routing instance and the PPP
connection in another routing instance, but that's likely not at all
simple to achieve.
I'd like to find the simplest solution. Ideally it should "just work",
like the Windows and OS-X clients do.
I'm not an expert on Windows networking internals. I assume OS X is BSD
+ whatever the folks in Cupertino have done to it. :-/

At a guess, it's living on the edge. It works because the L2TP
connection establishment caches a pointer to the output forwarding table
entry ("route") and just keeps living with it no matter what actually
happens down the line.

On Linux (and likely many other systems), the output computation is a
bit more dynamic, and the establishment of a direct point-to-point link
to a given IP address (as the PPP link represents) causes existing
cached pointers to get flushed away. Future packets to that destination
(IP _always_ forwards based on destination, not source) go down the most
direct path. Point-to-point is as direct as you can get.

It may be possible to modify the L2TP code to use flags to avoid the PPP
link (MSG_DONTROUTE?), but I suspect that's probably a bad rather than a
good thing to do.
Post by Alan Stern
Post by James Carlson
Post by Alan Stern
Unfortunately, I can't work around this problem by reconfiguring the
VPN server -- there's no way to tell it to use a different IP address
for its end of the VPN tunnel. Furthermore, the server works just fine
with clients running Windows or OS-X.
Really? That seems ... improbable.
I guess that depends on how you judge probabilities. :-)
:-/
Post by Alan Stern
Destination Gateway Flags Refs Use Netif Expire
default 140.247.233.37 UGSc 2 4 ppp0
10 ppp0 USc 1 0 ppp0
That's *quite* interesting! The PPP link doesn't have an interface
route as you'd find on most other systems. Instead, it has what appears
to be an effectively unnumbered link. Note the "ppp0" there instead of
an actual output address and the happy use of "10" for the local address
+ mask.

For what it's worth, the forced IP address option I've suggested is
morally equivalent to what's being done here on OS X, so that's a fair
reason to recommend it.

I checked out the pppd on Mac OS X (Darwin 13.4.0; Mavericks), and it
looks to be a variant of the SAMBA/ANU/CMU pppd, but I'm not sure what's
different with it, and I know of no contributions from them. And the
BSD support is long gone from the main source base ...
Post by Alan Stern
I don't understand (and can't be bothered to look up) those arcane
symbols in the netstat output. The IP address used for the ping test
(10.160.0.2) is a system on the VPN's private network.
The flags aren't all that interesting. "Up" "Gateway" "Static"
"cloning" are all expected in this context.
Post by Alan Stern
Post by James Carlson
As long as you don't need to contact that specific remote server using
the badly-assigned "internal" VPN address and can live with the fact
that you'll either go through the regular Internet to that address or be
forced to use some other address configured on that server, you should
be good.
(The address I used above is 10.160.0.2. That was one of the internal
DNS server addresses provided in the log you posted. It's not necessary
that the address used here is exactly that, but it may well be helpful.)
That might work. But using a nonstandard version of pppd would be
awkward, and I would prefer to avoid it.
What's "non-standard?"

Having the ability to force a given remote IP address looks to me like a
perfectly reasonable thing to do. We allow the remote IP address to be
set arbitrarily when the peer (for whatever reason) refuses to divulge
its address, and this is just an extension of that idea.
Post by Alan Stern
Post by James Carlson
If you can't do that for some reason, then I suppose it would be
possible to use IP Chains (or whatever the packet-modification tool du
jure is used in your Linux distribution) to nail up an exception so that
the outside packets go to the outside interface and the inside ones go
to the PPP interface. Doing that likely requires selecting on (at
least!) source address, so it's messy and ugly and possibly error-prone,
but it might be doable.
That sounds like a fairly easy thing to try. But it would still
require manual intervention instead of just working. Fixing the kernel
would be preferable, IMO.
I don't quite agree that it's necessarily "broken."

I do agree that it's bad to crash due to this misconfiguration. That's
certainly a bug of some sort. But making the kernel "naturally" accept
that the same unicast remote IP address refers to different outputs
depending on phase-of-moon in order to make this weird server happy
sounds like adding a bug rather than fixing one.

Routing based on destination is a good thing.
Post by Alan Stern
Post by James Carlson
Otherwise, contact the maintainer of that VPN server. It's just plain
old broken, and life's too short for broken software.
It is an old Cisco security appliance, no doubt well past End-Of-Life.
I'm starting to think it might be preferable to throw the thing away
and start up a VPN server on the department's firewall (which is a
Linux box) instead.
That sounds like a good (and easier to support) solution.
--
James Carlson 42.703N 71.076W <***@workingcode.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-ppp" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alan Stern
2014-10-21 14:15:57 UTC
Permalink
Post by James Carlson
Post by Alan Stern
Post by James Carlson
Otherwise, contact the maintainer of that VPN server. It's just plain
old broken, and life's too short for broken software.
It is an old Cisco security appliance, no doubt well past End-Of-Life.
I'm starting to think it might be preferable to throw the thing away
and start up a VPN server on the department's firewall (which is a
Linux box) instead.
That sounds like a good (and easier to support) solution.
Okay. I looked into iptables, but it doesn't seem to provide any way
to prevent a packet from being routed through a particular interface.
:-(

On the other hand, I tried writing a short /etc/ppp/ip-up.local script
that changes the destination address of the ppp interface and adds a
default route to the new, correct address. It worked! It's not a
perfect solution, because there's still a short window in which the
interface is up with the wrong address. A few packets get lost and a
deadlock could occur. But at least it's simple and non-invasive, and
it definitely proves the address conflict was indeed the cause of the
problem.

Changing pppd would be more foolproof. But then I'd also have to
change the programs that call it (xl2tpd and then NetworkManager), and
doing all that doesn't seem worthwhile.

In the end, I think the best solution will be to replace the VPN
server.

Thanks for your help,

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ppp" in
the body of a message to ***@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Loading...