aboutsummaryrefslogtreecommitdiffstats
path: root/net/l2tp
AgeCommit message (Collapse)AuthorFilesLines
11 daysl2tp: correct debugfs label for tunnel tx statsAlok Tiwari1-1/+1
l2tp_dfs_seq_tunnel_show prints two groups of tunnel statistics. The first group reports transmit counters, but the code labels it as rx. Set the label to "tx" so the debugfs output reflects the actual meaning. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251128085300.3377210-1-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-3/+3
Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.") 45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-20l2tp: reset skb control buffer on xmitDavid Bauer1-3/+3
The L2TP stack did not reset the skb control buffer before sending the encapsulated package. In a setup with an ath10k radio and batman-adv over an L2TP tunnel massive fragmentations happen sporadically if the L2TP tunnel is established over IPv4. L2TP might reset some of the fields in the IP control buffer, but L2TP assumes the type of the control buffer to be of an IPv4 packet. In case the L2TP interface is used as a batadv hardif or the packet is an IPv6 packet, this assumption breaks. Clear the entire control buffer to avoid such mishaps altogether. Fixes: f77ae9390438 ("[PPPOL2TP]: Reset meta-data in xmit function") Signed-off-by: David Bauer <mail@david-bauer.net> Link: https://patch.msgid.link/20251118001619.242107-1-mail@david-bauer.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-04net: Convert proto callbacks from sockaddr to sockaddr_unsizedKees Cook2-4/+7
Convert struct proto pre_connect(), connect(), bind(), and bind_add() callback function prototypes from struct sockaddr to struct sockaddr_unsized. This does not change per-implementation use of sockaddr for passing around an arbitrarily sized sockaddr struct. Those will be addressed in future patches. Additionally removes the no longer referenced struct sockaddr from include/net/inet_common.h. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-5-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops connect() callbacks to use sockaddr_unsizedKees Cook2-3/+3
Update all struct proto_ops connect() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops bind() callbacks to use sockaddr_unsizedKees Cook1-2/+2
Update all struct proto_ops bind() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-08-27l2tp: do not use sock_hold() in pppol2tp_session_get_sock()Eric Dumazet1-17/+8
pppol2tp_session_get_sock() is using RCU, it must be ready for sk_refcnt being zero. Commit ee40fb2e1eb5 ("l2tp: protect sock pointer of struct pppol2tp_session with RCU") was correct because it had a call_rcu(..., pppol2tp_put_sk) which was later removed in blamed commit. pppol2tp_recv() can use pppol2tp_session_get_sock() as well. Fixes: c5cbaef992d6 ("l2tp: refactor ppp socket/session relationship") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20250826134435.1683435-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-23net: annotate races around sk->sk_uidEric Dumazet1-1/+1
sk->sk_uid can be read while another thread changes its value in sockfs_setattr(). Add sk_uid(const struct sock *sk) helper to factorize the needed READ_ONCE() annotations, and add corresponding WRITE_ONCE() where needed. Fixes: 86741ec25462 ("net: core: Add a UID field to struct sock.") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Reviewed-by: Maciej Żenczykowski <maze@google.com> Link: https://patch.msgid.link/20250620133001.4090592-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-08net: move misc netdev_lock flavors to a separate headerJakub Kicinski1-0/+1
Move the more esoteric helpers for netdev instance lock to a dedicated header. This avoids growing netdevice.h to infinity and makes rebuilding the kernel much faster (after touching the header with the helpers). The main netdev_lock() / netdev_unlock() functions are used in static inlines in netdevice.h and will probably be used most commonly, so keep them in netdevice.h. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250307183006.2312761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-04ppp: use IFF_NO_QUEUE in virtual interfacesQingfang Deng1-0/+1
For PPPoE, PPTP, and PPPoL2TP, the start_xmit() function directly forwards packets to the underlying network stack and never returns anything other than 1. So these interfaces do not require a qdisc, and the IFF_NO_QUEUE flag should be set. Introduces a direct_xmit flag in struct ppp_channel to indicate when IFF_NO_QUEUE should be applied. The flag is set in ppp_connect_channel() for relevant protocols. While at it, remove the usused latency member from struct ppp_channel. Signed-off-by: Qingfang Deng <dqfext@gmail.com> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://patch.msgid.link/20250301135517.695809-1-dqfext@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-18ipv6: replace ipcm6_init calls with ipcm6_init_skWillem de Bruijn1-7/+1
This initializes tclass and dontfrag before cmsg parsing, removing the need for explicit checks against -1 in each caller. Leave hlimit set to -1, because its full initialization (in ip6_sk_dst_hoplimit) requires more state (dst, flowi6, ..). This also prepares for calling sockcm_init in a follow-on patch. Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20250214222720.3205500-7-willemdebruijn.kernel@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-20l2tp: Use inet_sk_init_flowi4() in l2tp_ip_sendmsg().Guillaume Nault1-13/+6
Use inet_sk_init_flowi4() to automatically initialise the flowi4 structure in l2tp_ip_sendmsg() instead of passing parameters manually to ip_route_output_ports(). Override ->daddr with the value passed in the msghdr structure if provided. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: James Chapman <jchapman@katalix.com> Link: https://patch.msgid.link/2ff22a3560c5050228928456662b80b9c84a8fe4.1734357769.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-11l2tp: Handle eth stats using NETDEV_PCPU_STAT_DSTATS.James Chapman1-5/+4
l2tp_eth uses the TSTATS infrastructure (dev_sw_netstats_*()) for RX and TX packet counters and DEV_STATS_INC for dropped counters. Consolidate that using the DSTATS infrastructure, which can handle both packet counters and packet drops. Statistics that don't fit DSTATS are still updated atomically with DEV_STATS_INC(). This change is inspired by the introduction of DSTATS helpers and their use in other udp tunnel drivers: Link: https://lore.kernel.org/all/cover.1733313925.git.gnault@redhat.com/ Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-26net/l2tp: fix warning in l2tp_exit_net found by syzbotJames Chapman1-3/+19
In l2tp's net exit handler, we check that an IDR is empty before destroying it: WARN_ON_ONCE(!idr_is_empty(&pn->l2tp_tunnel_idr)); idr_destroy(&pn->l2tp_tunnel_idr); By forcing memory allocation failures in idr_alloc_32, syzbot is able to provoke a condition where idr_is_empty returns false despite there being no items in the IDR. This turns out to be because the radix tree of the IDR contains only internal radix-tree nodes and it is this that causes idr_is_empty to return false. The internal nodes are cleaned by idr_destroy. Use idr_for_each to check that the IDR is empty instead of idr_is_empty to avoid the problem. Reported-by: syzbot+332fe1e67018625f63c9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=332fe1e67018625f63c9 Fixes: 73d33bd063c4 ("l2tp: avoid using drain_workqueue in l2tp_pre_exit_net") Signed-off-by: James Chapman <jchapman@katalix.com> Link: https://patch.msgid.link/20241118140411.1582555-1-jchapman@katalix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-10-15genetlink: hold RCU in genlmsg_mcast()Eric Dumazet1-2/+2
While running net selftests with CONFIG_PROVE_RCU_LIST=y I saw one lockdep splat [1]. genlmsg_mcast() uses for_each_net_rcu(), and must therefore hold RCU. Instead of letting all callers guard genlmsg_multicast_allns() with a rcu_read_lock()/rcu_read_unlock() pair, do it in genlmsg_mcast(). This also means the @flags parameter is useless, we need to always use GFP_ATOMIC. [1] [10882.424136] ============================= [10882.424166] WARNING: suspicious RCU usage [10882.424309] 6.12.0-rc2-virtme #1156 Not tainted [10882.424400] ----------------------------- [10882.424423] net/netlink/genetlink.c:1940 RCU-list traversed in non-reader section!! [10882.424469] other info that might help us debug this: [10882.424500] rcu_scheduler_active = 2, debug_locks = 1 [10882.424744] 2 locks held by ip/15677: [10882.424791] #0: ffffffffb6b491b0 (cb_lock){++++}-{3:3}, at: genl_rcv (net/netlink/genetlink.c:1219) [10882.426334] #1: ffffffffb6b49248 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg (net/netlink/genetlink.c:61 net/netlink/genetlink.c:57 net/netlink/genetlink.c:1209) [10882.426465] stack backtrace: [10882.426805] CPU: 14 UID: 0 PID: 15677 Comm: ip Not tainted 6.12.0-rc2-virtme #1156 [10882.426919] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [10882.427046] Call Trace: [10882.427131] <TASK> [10882.427244] dump_stack_lvl (lib/dump_stack.c:123) [10882.427335] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6822) [10882.427387] genlmsg_multicast_allns (net/netlink/genetlink.c:1940 (discriminator 7) net/netlink/genetlink.c:1977 (discriminator 7)) [10882.427436] l2tp_tunnel_notify.constprop.0 (net/l2tp/l2tp_netlink.c:119) l2tp_netlink [10882.427683] l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:253) l2tp_netlink [10882.427748] genl_family_rcv_msg_doit (net/netlink/genetlink.c:1115) [10882.427834] genl_rcv_msg (net/netlink/genetlink.c:1195 net/netlink/genetlink.c:1210) [10882.427877] ? __pfx_l2tp_nl_cmd_tunnel_create (net/l2tp/l2tp_netlink.c:186) l2tp_netlink [10882.427927] ? __pfx_genl_rcv_msg (net/netlink/genetlink.c:1201) [10882.427959] netlink_rcv_skb (net/netlink/af_netlink.c:2551) [10882.428069] genl_rcv (net/netlink/genetlink.c:1220) [10882.428095] netlink_unicast (net/netlink/af_netlink.c:1332 net/netlink/af_netlink.c:1357) [10882.428140] netlink_sendmsg (net/netlink/af_netlink.c:1901) [10882.428210] ____sys_sendmsg (net/socket.c:729 (discriminator 1) net/socket.c:744 (discriminator 1) net/socket.c:2607 (discriminator 1)) Fixes: 33f72e6f0c67 ("l2tp : multicast notification to the registered listeners") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Tom Parkin <tparkin@katalix.com> Cc: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20241011171217.3166614-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-04l2tp: remove unneeded null check in l2tp_v2_session_get_nextJames Chapman1-1/+1
Commit aa92c1cec92b ("l2tp: add tunnel/session get_next helpers") uses idr_get_next APIs to iterate over l2tp session IDR lists. Sessions in l2tp_v2_session_idr always have a non-null session->tunnel pointer since l2tp_session_register sets it before inserting the session into the IDR. Therefore the null check on session->tunnel in l2tp_v2_session_get_next is redundant and can be removed. Removing the check avoids a warning from lkp. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/r/202408111407.HtON8jqa-lkp@intel.com/ CC: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: James Chapman <jchapman@katalix.com> Acked-by: Tom Parkin <tparkin@katalix.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240903113547.1261048-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-09-03netdev_features: convert NETIF_F_LLTX to dev->lltxAlexander Lobakin1-1/+1
NETIF_F_LLTX can't be changed via Ethtool and is not a feature, rather an attribute, very similar to IFF_NO_QUEUE (and hot). Free one netdev_features_t bit and make it a "hot" private flag. Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-08-27l2tp: avoid using drain_workqueue in l2tp_pre_exit_netJames Chapman1-6/+9
Recent commit fc7ec7f554d7 ("l2tp: delete sessions using work queue") incorrectly uses drain_workqueue. The use of drain_workqueue in l2tp_pre_exit_net is flawed because the workqueue is shared by all nets and it is therefore possible for new work items to be queued for other nets while drain_workqueue runs. Instead of using drain_workqueue, use __flush_workqueue twice. The first one will run all tunnel delete work items and any work already queued. When tunnel delete work items are run, they may queue new session delete work items, which the second __flush_workqueue will run. In l2tp_exit_net, warn if any of the net's idr lists are not empty. Fixes: fc7ec7f554d7 ("l2tp: delete sessions using work queue") Signed-off-by: James Chapman <jchapman@katalix.com> Link: https://patch.msgid.link/20240823142257.692667-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-26l2tp: avoid overriding sk->sk_user_dataCong Wang1-0/+3
Although commit 4a4cd70369f1 ("l2tp: don't set sk_user_data in tunnel socket") removed sk->sk_user_data usage, setup_udp_tunnel_sock() still touches sk->sk_user_data, this conflicts with sockmap which also leverages sk->sk_user_data to save psock. Restore this sk->sk_user_data check to avoid such conflicts. Fixes: 4a4cd70369f1 ("l2tp: don't set sk_user_data in tunnel socket") Reported-by: syzbot+8dbe3133b840c470da0e@syzkaller.appspotmail.com Cc: Tom Parkin <tparkin@katalix.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Tested-by: James Chapman <jchapman@katalix.com> Reviewed-by: James Chapman <jchapman@katalix.com> Link: https://patch.msgid.link/20240822182544.378169-1-xiyou.wangcong@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-20l2tp: use skb_queue_purge in l2tp_ip_destroy_sockJames Chapman1-3/+1
Recent commit ed8ebee6def7 ("l2tp: have l2tp_ip_destroy_sock use ip_flush_pending_frames") was incorrect in that l2tp_ip does not use socket cork and ip_flush_pending_frames is for sockets that do. Use __skb_queue_purge instead and remove the unnecessary lock. Also unexport ip_flush_pending_frames since it was originally exported in commit 4ff8863419cd ("ipv4: export ip_flush_pending_frames") for l2tp and is not used by other modules. Suggested-by: xiyou.wangcong@gmail.com Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20240819143333.3204957-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-11l2tp: flush workqueue before draining itJames Chapman1-1/+9
syzbot exposes a race where a net used by l2tp is removed while an existing pppol2tp socket is closed. In l2tp_pre_exit_net, l2tp queues TUNNEL_DELETE work items to close each tunnel in the net. When these are run, new SESSION_DELETE work items are queued to delete each session in the tunnel. This all happens in drain_workqueue. However, drain_workqueue allows only new work items if they are queued by other work items which are already in the queue. If pppol2tp_release runs after drain_workqueue has started, it may queue a SESSION_DELETE work item, which results in the warning below in drain_workqueue. Address this by flushing the workqueue before drain_workqueue such that all queued TUNNEL_DELETE work items run before drain_workqueue is started. This will queue SESSION_DELETE work items for each session in the tunnel, hence pppol2tp_release or other API requests won't queue SESSION_DELETE requests once drain_workqueue is started. WARNING: CPU: 1 PID: 5467 at kernel/workqueue.c:2259 __queue_work+0xcd3/0xf50 kernel/workqueue.c:2258 Modules linked in: CPU: 1 UID: 0 PID: 5467 Comm: syz.3.43 Not tainted 6.11.0-rc1-syzkaller-00247-g3608d6aca5e7 #0 Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024 RIP: 0010:__queue_work+0xcd3/0xf50 kernel/workqueue.c:2258 Code: ff e8 11 84 36 00 90 0f 0b 90 e9 1e fd ff ff e8 03 84 36 00 eb 13 e8 fc 83 36 00 eb 0c e8 f5 83 36 00 eb 05 e8 ee 83 36 00 90 <0f> 0b 90 48 83 c4 60 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc RSP: 0018:ffffc90004607b48 EFLAGS: 00010093 RAX: ffffffff815ce274 RBX: ffff8880661fda00 RCX: ffff8880661fda00 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: 0000000000000000 R08: ffffffff815cd6d4 R09: 0000000000000000 R10: ffffc90004607c20 R11: fffff520008c0f85 R12: ffff88802ac33800 R13: ffff88802ac339c0 R14: dffffc0000000000 R15: 0000000000000008 FS: 00005555713eb500(0000) GS:ffff8880b9300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000001eda6000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> queue_work_on+0x1c2/0x380 kernel/workqueue.c:2392 pppol2tp_release+0x163/0x230 net/l2tp/l2tp_ppp.c:445 __sock_release net/socket.c:659 [inline] sock_close+0xbc/0x240 net/socket.c:1421 __fput+0x24a/0x8a0 fs/file_table.c:422 task_work_run+0x24f/0x310 kernel/task_work.c:228 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop kernel/entry/common.c:114 [inline] exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline] __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline] syscall_exit_to_user_mode+0x168/0x370 kernel/entry/common.c:218 do_syscall_64+0x100/0x230 arch/x86/entry/common.c:89 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f061e9779f9 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffff1c1fce8 EFLAGS: 00000246 ORIG_RAX: 00000000000001b4 RAX: 0000000000000000 RBX: 000000000001017d RCX: 00007f061e9779f9 RDX: 0000000000000000 RSI: 000000000000001e RDI: 0000000000000003 RBP: 00007ffff1c1fdc0 R08: 0000000000000001 R09: 00007ffff1c1ffcf R10: 00007f061e800000 R11: 0000000000000246 R12: 0000000000000032 R13: 00007ffff1c1fde0 R14: 00007ffff1c1fe00 R15: ffffffffffffffff </TASK> Fixes: fc7ec7f554d7 ("l2tp: delete sessions using work queue") Reported-by: syzbot+0e85b10481d2f5478053@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0e85b10481d2f5478053 Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-11l2tp: l2tp_eth: use per-cpu counters from dev->tstatsJames Chapman1-22/+10
l2tp_eth uses old-style dev->stats for fastpath packet/byte counters. Convert it to use dev->tstats per-cpu counters. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-11l2tp: improve tunnel/session refcount helpersJames Chapman8-93/+79
l2tp_tunnel_inc_refcount and l2tp_session_inc_refcount wrap refcount_inc. They add no value so just use the refcount APIs directly and drop l2tp's helpers. l2tp already uses refcount_inc_not_zero anyway. Rename l2tp_tunnel_dec_refcount and l2tp_session_dec_refcount to l2tp_tunnel_put and l2tp_session_put to better match their use pairing various _get getters. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-11l2tp: use get_next APIs for management requests and procfs/debugfsJames Chapman5-69/+39
l2tp netlink and procfs/debugfs iterate over tunnel and session lists to obtain data. They currently use very inefficient get_nth functions to do so. Replace these with get_next. For netlink, use nl cb->ctx[] for passing state instead of the obsolete cb->args[]. l2tp_tunnel_get_nth and l2tp_session_get_nth are no longer used so they can be removed. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-11l2tp: add tunnel/session get_next helpersJames Chapman2-0/+129
l2tp management APIs and procfs/debugfs iterate over l2tp tunnel and session lists. Since these lists are now implemented using IDR, we can use IDR get_next APIs to iterate them. Add tunnel/session get_next functions to do so. The session get_next functions get the next session in a given tunnel and need to account for l2tpv2 and l2tpv3 differences: * l2tpv2 sessions are keyed by tunnel ID / session ID. Iteration for a given tunnel ID, TID, can therefore start with a key given by TID/0 and finish when the next entry's tunnel ID is not TID. This is possible only because the tunnel ID part of the key is the upper 16 bits and the session ID part the lower 16 bits; when idr_next increments the key value, it therefore finds the next sessions of the current tunnel before those of the next tunnel. Entries with session ID 0 are always skipped because they are used internally by pppol2tp. * l2tpv3 sessions are keyed by session ID. Iteration starts at the first IDR entry and skips entries where the tunnel does not match. Iteration must also consider session ID collisions and walk the list of colliding sessions (if any) for one which matches the supplied tunnel. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-11l2tp: handle hash key collisions in l2tp_v3_session_getJames Chapman1-1/+2
To handle colliding l2tpv3 session IDs, l2tp_v3_session_get searches a hashed list keyed by ID and sk. Although unlikely, if hash keys collide, it is possible that hash_for_each_possible loops over a session which doesn't have the ID that we are searching for. So check for session ID match when looping over possible hash key matches. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-11l2tp: move l2tp_ip and l2tp_ip6 data to pernetJames Chapman2-50/+168
l2tp_ip[6] have always used global socket tables. It is therefore not possible to create l2tpip sockets in different namespaces with the same socket address. To support this, move l2tpip socket tables to pernet data. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-11l2tp: remove inline from functions in c sourcesJames Chapman4-6/+6
Update l2tp to remove the inline keyword from several functions in C sources, since this is now discouraged. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-08-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+13
Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Link: https://patch.msgid.link/20240808170148.3629934-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-08l2tp: fix lockdep splatJames Chapman1-2/+13
When l2tp tunnels use a socket provided by userspace, we can hit lockdep splats like the below when data is transmitted through another (unrelated) userspace socket which then gets routed over l2tp. This issue was previously discussed here: https://lore.kernel.org/netdev/87sfialu2n.fsf@cloudflare.com/ The solution is to have lockdep treat socket locks of l2tp tunnel sockets separately than those of standard INET sockets. To do so, use a different lockdep subclass where lock nesting is possible. ============================================ WARNING: possible recursive locking detected 6.10.0+ #34 Not tainted -------------------------------------------- iperf3/771 is trying to acquire lock: ffff8881027601d8 (slock-AF_INET/1){+.-.}-{2:2}, at: l2tp_xmit_skb+0x243/0x9d0 but task is already holding lock: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_INET/1); lock(slock-AF_INET/1); *** DEADLOCK *** May be due to missing lock nesting notation 10 locks held by iperf3/771: #0: ffff888102650258 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_sendmsg+0x1a/0x40 #1: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 #2: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 #3: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: process_backlog+0x28b/0x9f0 #4: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0xf9/0x260 #5: ffff888102650d98 (slock-AF_INET/1){+.-.}-{2:2}, at: tcp_v4_rcv+0x1848/0x1e10 #6: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: __ip_queue_xmit+0x4b/0xbc0 #7: ffffffff822ac220 (rcu_read_lock){....}-{1:2}, at: ip_finish_output2+0x17a/0x1130 #8: ffffffff822ac1e0 (rcu_read_lock_bh){....}-{1:2}, at: __dev_queue_xmit+0xcc/0x1450 #9: ffff888101f33258 (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock#2){+...}-{2:2}, at: __dev_queue_xmit+0x513/0x1450 stack backtrace: CPU: 2 UID: 0 PID: 771 Comm: iperf3 Not tainted 6.10.0+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <IRQ> dump_stack_lvl+0x69/0xa0 dump_stack+0xc/0x20 __lock_acquire+0x135d/0x2600 ? srso_alias_return_thunk+0x5/0xfbef5 lock_acquire+0xc4/0x2a0 ? l2tp_xmit_skb+0x243/0x9d0 ? __skb_checksum+0xa3/0x540 _raw_spin_lock_nested+0x35/0x50 ? l2tp_xmit_skb+0x243/0x9d0 l2tp_xmit_skb+0x243/0x9d0 l2tp_eth_dev_xmit+0x3c/0xc0 dev_hard_start_xmit+0x11e/0x420 sch_direct_xmit+0xc3/0x640 __dev_queue_xmit+0x61c/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 __tcp_send_ack+0x1b8/0x340 tcp_send_ack+0x23/0x30 __tcp_ack_snd_check+0xa8/0x530 ? srso_alias_return_thunk+0x5/0xfbef5 tcp_rcv_established+0x412/0xd70 tcp_v4_do_rcv+0x299/0x420 tcp_v4_rcv+0x1991/0x1e10 ip_protocol_deliver_rcu+0x50/0x220 ip_local_deliver_finish+0x158/0x260 ip_local_deliver+0xc8/0xe0 ip_rcv+0xe5/0x1d0 ? __pfx_ip_rcv+0x10/0x10 __netif_receive_skb_one_core+0xce/0xe0 ? process_backlog+0x28b/0x9f0 __netif_receive_skb+0x34/0xd0 ? process_backlog+0x28b/0x9f0 process_backlog+0x2cb/0x9f0 __napi_poll.constprop.0+0x61/0x280 net_rx_action+0x332/0x670 ? srso_alias_return_thunk+0x5/0xfbef5 ? find_held_lock+0x2b/0x80 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 handle_softirqs+0xda/0x480 ? __dev_queue_xmit+0xa2c/0x1450 do_softirq+0xa1/0xd0 </IRQ> <TASK> __local_bh_enable_ip+0xc8/0xe0 ? __dev_queue_xmit+0xa2c/0x1450 __dev_queue_xmit+0xa48/0x1450 ? ip_finish_output2+0xf4c/0x1130 ip_finish_output2+0x6b6/0x1130 ? srso_alias_return_thunk+0x5/0xfbef5 ? __ip_finish_output+0x217/0x380 ? srso_alias_return_thunk+0x5/0xfbef5 __ip_finish_output+0x217/0x380 ip_output+0x99/0x120 __ip_queue_xmit+0xae4/0xbc0 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? tcp_options_write.constprop.0+0xcb/0x3e0 ip_queue_xmit+0x34/0x40 __tcp_transmit_skb+0x1625/0x1890 tcp_write_xmit+0x766/0x2fb0 ? __entry_text_end+0x102ba9/0x102bad ? srso_alias_return_thunk+0x5/0xfbef5 ? __might_fault+0x74/0xc0 ? srso_alias_return_thunk+0x5/0xfbef5 __tcp_push_pending_frames+0x56/0x190 tcp_push+0x117/0x310 tcp_sendmsg_locked+0x14c1/0x1740 tcp_sendmsg+0x28/0x40 inet_sendmsg+0x5d/0x90 sock_write_iter+0x242/0x2b0 vfs_write+0x68d/0x800 ? __pfx_sock_write_iter+0x10/0x10 ksys_write+0xc8/0xf0 __x64_sys_write+0x3d/0x50 x64_sys_call+0xfaf/0x1f50 do_syscall_64+0x6d/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f4d143af992 Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 01 cc ff ff 41 54 b8 02 00 00 0 RSP: 002b:00007ffd65032058 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f4d143af992 RDX: 0000000000000025 RSI: 00007f4d143f3bcc RDI: 0000000000000005 RBP: 00007f4d143f2b28 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4d143f3bcc R13: 0000000000000005 R14: 0000000000000000 R15: 00007ffd650323f0 </TASK> Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()") Suggested-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+6acef9e0a4d1f46c83d4@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6acef9e0a4d1f46c83d4 CC: gnault@redhat.com CC: cong.wang@bytedance.com Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Link: https://patch.msgid.link/20240806160626.1248317-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-08-03l2tp: Don't assign net->gen->ptr[] for pppol2tp_net_ops.Kuniyuki Iwashima1-3/+0
Commit fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") converted net->gen->ptr[pppol2tp_net_id] in l2tp_ppp.c to net->gen->ptr[l2tp_net_id] in l2tp_core.c. Now the leftover wastes one entry of net->gen->ptr[] in each netns. Let's avoid the unwanted allocation. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: use pre_exit pernet hook to avoid rcu_barrierJames Chapman1-2/+7
Move the work of closing all tunnels from the pernet exit hook to pre_exit since the core does rcu synchronisation between these steps and we can therefore remove rcu_barrier from l2tp code. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: cleanup eth/ppp pseudowire setup codeJames Chapman2-4/+6
l2tp eth/ppp pseudowire setup/cleanup uses kfree() in some error paths. Drop the refcount instead such that the session object is always freed when the refcount reaches 0. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: add idr consistency check in session_registerJames Chapman1-2/+9
l2tp_session_register uses an idr_alloc then idr_replace pattern to insert sessions into the session IDR. To catch invalid locking, add a WARN_ON_ONCE if the IDR entry is modified by another thread between alloc and replace steps. Also add comments to make expectations clear. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: use rcu list add/del when updating listsJames Chapman1-6/+6
l2tp_v3_session_htable and tunnel->session_list are read by lockless getters using RCU. Use rcu list variants when adding or removing list items. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: prevent possible tunnel refcount underflowJames Chapman4-10/+24
When a session is created, it sets a backpointer to its tunnel. When the session refcount drops to 0, l2tp_session_free drops the tunnel refcount if session->tunnel is non-NULL. However, session->tunnel is set in l2tp_session_create, before the tunnel refcount is incremented by l2tp_session_register, which leaves a small window where session->tunnel is non-NULL when the tunnel refcount hasn't been bumped. Moving the assignment to l2tp_session_register is trivial but l2tp_session_create calls l2tp_session_set_header_len which uses session->tunnel to get the tunnel's encap. Add an encap arg to l2tp_session_set_header_len to avoid using session->tunnel. If l2tpv3 sessions have colliding IDs, it is possible for l2tp_v3_session_get to race with l2tp_session_register and fetch a session which doesn't yet have session->tunnel set. Add a check for this case. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: refactor ppp socket/session relationshipJames Chapman1-55/+39
Each l2tp ppp session has an associated pppox socket. l2tp_ppp uses the session's pppox socket refcount to manage session lifetimes; the pppox socket holds a ref on the session which is dropped by the socket destructor. This complicates session cleanup. Given l2tp sessions are refcounted, it makes more sense to reverse this relationship such that the session keeps the socket alive, not the other way around. So refactor l2tp_ppp to have the session hold a ref on its socket while it references it. When the session is closed, it drops its socket ref when it detaches from its socket. If the socket is closed first, it initiates the closing of its session, if one is attached. The socket/session can then be freed asynchronously when their refcounts drop to 0. Use the session's session_close callback to detach the pppox socket since this will be done on the work queue together with the rest of the session cleanup via l2tp_session_delete. Also, since l2tp_ppp uses the pppox socket's sk_user_data, use the rcu sk_user_data access helpers when accessing it and set the socket's SOCK_RCU_FREE flag to have pppox sockets freed by rcu. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: free sessions using rcuJames Chapman2-3/+2
l2tp sessions may be accessed under an rcu read lock. Have them freed via rcu and remove the now unneeded synchronize_rcu when a session is removed. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: delete sessions using work queueJames Chapman2-16/+21
When a tunnel is closed, l2tp_tunnel_closeall closes all sessions in the tunnel. Move the work of deleting each session to the work queue so that sessions are deleted using the same codepath whether they are closed by user API request or their parent tunnel is closing. This also avoids the locking dance in l2tp_tunnel_closeall where the tunnel's session list lock was unlocked and relocked in the loop. In l2tp_exit_net, use drain_workqueue instead of flush_workqueue because the processing of tunnel_delete work may queue session_delete work items which must also be processed. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: simplify tunnel and socket cleanupJames Chapman2-62/+21
When the l2tp tunnel socket used sk_user_data to point to its associated l2tp tunnel, socket and tunnel cleanup had to make use of the socket's destructor to free the tunnel only when the socket could no longer be accessed. Now that sk_user_data is no longer used, we can simplify socket and tunnel cleanup: * If the tunnel closes first, it cleans up and drops its socket ref when the tunnel refcount drops to zero. If its socket was provided by userspace, the socket is closed and freed asynchronously, when userspace closes it. If its socket is a kernel socket, the tunnel closes the socket itself during cleanup and drops its socket ref when the tunnel's refcount drops to zero. * If the socket closes first, we initiate the closing of its associated tunnel. For UDP sockets, this is via the socket's encap_destroy hook. For L2TPIP sockets, this is via the socket's destroy callback. The tunnel holds a socket ref while it references the sock. When the tunnel is freed, it drops its socket ref and the socket will be cleaned up when its own refcount drops to zero, asynchronous to the tunnel free. * The tunnel socket destructor is no longer needed since the tunnel is no longer freed through the socket destructor. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: remove unused tunnel magic fieldJames Chapman2-4/+0
Since l2tp no longer derives tunnel pointers directly via sk_user_data, it is no longer useful for l2tp to check tunnel pointers using a magic feather. Drop the tunnel's magic field. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: don't set sk_user_data in tunnel socketJames Chapman1-4/+6
l2tp no longer uses the tunnel socket's sk_user_data so drop the code which sets it. In l2tp_validate_socket use l2tp_sk_to_tunnel to check whether a given socket is already attached to an l2tp tunnel since we can no longer use non-null sk_user_data to indicate this. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: don't use tunnel socket sk_user_data in ppp procfs outputJames Chapman1-1/+1
l2tp's ppp procfs output can be used to show internal state of pppol2tp. It includes a 'user-data-ok' field, which is derived from the tunnel socket's sk_user_data being non-NULL. Use tunnel->sock being non-NULL to indicate this instead. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: have l2tp_ip_destroy_sock use ip_flush_pending_framesJames Chapman1-3/+3
Use the recently exported ip_flush_pending_frames instead of a free-coded version and lock the socket while we call it. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-31l2tp: lookup tunnel from socket without using sk_user_dataJames Chapman4-17/+54
l2tp_sk_to_tunnel derives the tunnel from sk_user_data. Instead, lookup the tunnel by walking the tunnel IDR for a tunnel using the indicated sock. This is slow but l2tp_sk_to_tunnel is not used in the datapath so performance isn't critical. l2tp_tunnel_destruct needs a variant of l2tp_sk_to_tunnel which does not bump the tunnel refcount since the tunnel refcount is already 0. Change l2tp_sk_to_tunnel sk arg to const since it does not modify sk. Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-23l2tp: make session IDR and tunnel session list coherentJames Chapman1-18/+14
Modify l2tp_session_register and l2tp_session_unhash so that the session IDR and tunnel session lists remain coherent. To do so, hold the session IDR lock and the tunnel's session list lock when making any changes to either list. Without this change, a rare race condition could hit the WARN_ON_ONCE in l2tp_session_unhash if a thread replaced the IDR entry while another thread was registering the same ID. [ 7126.151795][T17511] WARNING: CPU: 3 PID: 17511 at net/l2tp/l2tp_core.c:1282 l2tp_session_delete.part.0+0x87e/0xbc0 [ 7126.163754][T17511] ? show_regs+0x93/0xa0 [ 7126.164157][T17511] ? __warn+0xe5/0x3c0 [ 7126.164536][T17511] ? l2tp_session_delete.part.0+0x87e/0xbc0 [ 7126.165070][T17511] ? report_bug+0x2e1/0x500 [ 7126.165486][T17511] ? l2tp_session_delete.part.0+0x87e/0xbc0 [ 7126.166013][T17511] ? handle_bug+0x99/0x130 [ 7126.166428][T17511] ? exc_invalid_op+0x35/0x80 [ 7126.166890][T17511] ? asm_exc_invalid_op+0x1a/0x20 [ 7126.167372][T17511] ? l2tp_session_delete.part.0+0x87d/0xbc0 [ 7126.167900][T17511] ? l2tp_session_delete.part.0+0x87e/0xbc0 [ 7126.168429][T17511] ? __local_bh_enable_ip+0xa4/0x120 [ 7126.168917][T17511] l2tp_session_delete+0x40/0x50 [ 7126.169369][T17511] pppol2tp_release+0x1a1/0x3f0 [ 7126.169817][T17511] __sock_release+0xb3/0x270 [ 7126.170247][T17511] ? __pfx_sock_close+0x10/0x10 [ 7126.170697][T17511] sock_close+0x1c/0x30 [ 7126.171087][T17511] __fput+0x40b/0xb90 [ 7126.171470][T17511] task_work_run+0x16c/0x260 [ 7126.171897][T17511] ? __pfx_task_work_run+0x10/0x10 [ 7126.172362][T17511] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7126.172863][T17511] ? do_raw_spin_unlock+0x174/0x230 [ 7126.173348][T17511] do_exit+0xaae/0x2b40 [ 7126.173730][T17511] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7126.174235][T17511] ? __pfx_lock_release+0x10/0x10 [ 7126.174690][T17511] ? srso_alias_return_thunk+0x5/0xfbef5 [ 7126.175190][T17511] ? do_raw_spin_lock+0x12c/0x2b0 [ 7126.175650][T17511] ? __pfx_do_exit+0x10/0x10 [ 7126.176072][T17511] ? _raw_spin_unlock_irq+0x23/0x50 [ 7126.176543][T17511] do_group_exit+0xd3/0x2a0 [ 7126.176990][T17511] __x64_sys_exit_group+0x3e/0x50 [ 7126.177456][T17511] x64_sys_call+0x1821/0x1830 [ 7126.177895][T17511] do_syscall_64+0xcb/0x250 [ 7126.178317][T17511] entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: aa5e17e1f5ec ("l2tp: store l2tpv3 sessions in per-net IDR") Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240718134348.289865-1-jchapman@katalix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-12l2tp: fix l2tp_session_register with colliding l2tpv3 IDsJames Chapman1-8/+10
When handling colliding L2TPv3 session IDs, we use the existing session IDR entry and link the new session on that using session->coll_list. However, when using an existing IDR entry, we must not do the idr_replace step. Fixes: aa5e17e1f5ec ("l2tp: store l2tpv3 sessions in per-net IDR") Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-07-09l2tp: fix possible UAF when cleaning up tunnelsJames Chapman1-4/+7
syzbot reported a UAF caused by a race when the L2TP work queue closes a tunnel at the same time as a userspace thread closes a session in that tunnel. Tunnel cleanup is handled by a work queue which iterates through the sessions contained within a tunnel, and closes them in turn. Meanwhile, a userspace thread may arbitrarily close a session via either netlink command or by closing the pppox socket in the case of l2tp_ppp. The race condition may occur when l2tp_tunnel_closeall walks the list of sessions in the tunnel and deletes each one. Currently this is implemented using list_for_each_safe, but because the list spinlock is dropped in the loop body it's possible for other threads to manipulate the list during list_for_each_safe's list walk. This can lead to the list iterator being corrupted, leading to list_for_each_safe spinning. One sequence of events which may lead to this is as follows: * A tunnel is created, containing two sessions A and B. * A thread closes the tunnel, triggering tunnel cleanup via the work queue. * l2tp_tunnel_closeall runs in the context of the work queue. It removes session A from the tunnel session list, then drops the list lock. At this point the list_for_each_safe temporary variable is pointing to the other session on the list, which is session B, and the list can be manipulated by other threads since the list lock has been released. * Userspace closes session B, which removes the session from its parent tunnel via l2tp_session_delete. Since l2tp_tunnel_closeall has released the tunnel list lock, l2tp_session_delete is able to call list_del_init on the session B list node. * Back on the work queue, l2tp_tunnel_closeall resumes execution and will now spin forever on the same list entry until the underlying session structure is freed, at which point UAF occurs. The solution is to iterate over the tunnel's session list using list_first_entry_not_null to avoid the possibility of the list iterator pointing at a list item which may be removed during the walk. Also, have l2tp_tunnel_closeall ref each session while it processes it to prevent another thread from freeing it. cpu1 cpu2 --- --- pppol2tp_release() spin_lock_bh(&tunnel->list_lock); for (;;) { session = list_first_entry_or_null(&tunnel->session_list, struct l2tp_session, list); if (!session) break; list_del_init(&session->list); spin_unlock_bh(&tunnel->list_lock); l2tp_session_delete(session); l2tp_session_delete(session); spin_lock_bh(&tunnel->list_lock); } spin_unlock_bh(&tunnel->list_lock); Calling l2tp_session_delete on the same session twice isn't a problem per-se, but if cpu2 manages to destruct the socket and unref the session to zero before cpu1 progresses then it would lead to UAF. Reported-by: syzbot+b471b7c936301a59745b@syzkaller.appspotmail.com Reported-by: syzbot+c041b4ce3a6dfd1e63e2@syzkaller.appspotmail.com Fixes: d18d3f0a24fc ("l2tp: replace hlist with simple list for per-tunnel session list") Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: Tom Parkin <tparkin@katalix.com> Link: https://patch.msgid.link/20240704152508.1923908-1-jchapman@katalix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-07-04l2tp: Remove duplicate included header file trace.hThorsten Blum1-1/+0
Remove duplicate included header file trace.h and the following warning reported by make includecheck: trace.h is included more than once Compile-tested only. Signed-off-by: Thorsten Blum <thorsten.blum@toblux.com> Reviewed-by: Michal Kubiak <michal.kubiak@intel.com> Link: https://patch.msgid.link/20240703061147.691973-2-thorsten.blum@toblux.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-06-25l2tp: remove incorrect __rcu attributeJames Chapman1-1/+1
This fixes a sparse warning. Fixes: d18d3f0a24fc ("l2tp: replace hlist with simple list for per-tunnel session list") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202406220754.evK8Hrjw-lkp@intel.com/ Signed-off-by: James Chapman <jchapman@katalix.com> Link: https://patch.msgid.link/20240624082945.1925009-1-jchapman@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-21l2tp: replace hlist with simple list for per-tunnel session listJames Chapman3-91/+50
The per-tunnel session list is no longer used by the datapath. However, we still need a list of sessions in the tunnel for l2tp_session_get_nth, which is used by management code. (An alternative might be to walk each session IDR list, matching only sessions of a given tunnel.) Replace the per-tunnel hlist with a per-tunnel list. In functions which walk a list of sessions of a tunnel, walk this list instead. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: drop the now unused l2tp_tunnel_get_sessionJames Chapman2-24/+0
All users of l2tp_tunnel_get_session are now gone so it can be removed. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: use IDR for all session lookupsJames Chapman4-4/+20
Add generic session getter which uses IDR. Replace all users of l2tp_tunnel_get_session which uses the per-tunnel session list to use the generic getter. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: don't use sk_user_data in l2tp_udp_encap_err_recvJames Chapman1-6/+0
If UDP sockets are aliased, sk might be the wrong socket. There's no benefit to using sk_user_data to do some checks on the associated tunnel context. Just report the error anyway, like udp core does. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: refactor udp recv to lookup to not use sk_user_dataJames Chapman1-75/+21
Modify UDP decap to not use the tunnel pointer which comes from the sock's sk_user_data when parsing the L2TP header. By looking up the destination session using only the packet contents we avoid potential UDP 5-tuple aliasing issues which arise from depending on the socket that received the packet. Drop the useless error messages on short packet or on failing to find a session since the tunnel pointer might point to a different tunnel if multiple sockets use the same 5-tuple. Short packets (those not big enough to contain an L2TP header) are no longer counted in the tunnel's invalid counter because we can't derive the tunnel until we parse the l2tp header to lookup the session. l2tp_udp_encap_recv was a small wrapper around l2tp_udp_recv_core which used sk_user_data to derive a tunnel pointer in an RCU-safe way. But we no longer need the tunnel pointer, so remove that code and combine the two functions. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: store l2tpv2 sessions in per-net IDRJames Chapman2-15/+56
L2TPv2 sessions are currently kept in a per-tunnel hashlist, keyed by 16-bit session_id. When handling received L2TPv2 packets, we need to first derive the tunnel using the 16-bit tunnel_id or sock, then lookup the session in a per-tunnel hlist using the 16-bit session_id. We want to avoid using sk_user_data in the datapath and double lookups on every packet. So instead, use a per-net IDR to hold L2TPv2 sessions, keyed by a 32-bit value derived from the 16-bit tunnel_id and session_id. This will allow the L2TPv2 UDP receive datapath to lookup a session with a single lookup without deriving the tunnel first. L2TPv2 sessions are held in their own IDR to avoid potential key collisions with L2TPv3 sessions. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: store l2tpv3 sessions in per-net IDRJames Chapman4-74/+188
L2TPv3 sessions are currently held in one of two fixed-size hash lists: either a per-net hashlist (IP-encap), or a per-tunnel hashlist (UDP-encap), keyed by the L2TPv3 32-bit session_id. In order to lookup L2TPv3 sessions in UDP-encap tunnels efficiently without finding the tunnel first via sk_user_data, UDP sessions are now kept in a per-net session list, keyed by session ID. Convert the existing per-net hashlist to use an IDR for better performance when there are many sessions and have L2TPv3 UDP sessions use the same IDR. Although the L2TPv3 RFC states that the session ID alone identifies the session, our implementation has allowed the same session ID to be used in different L2TP UDP tunnels. To retain support for this, a new per-net session hashtable is used, keyed by the sock and session ID. If on creating a new session, a session already exists with that ID in the IDR, the colliding sessions are added to the new hashtable and the existing IDR entry is flagged. When looking up sessions, the approach is to first check the IDR and if no unflagged match is found, check the new hashtable. The sock is made available to session getters where session ID collisions are to be considered. In this way, the new hashtable is used only for session ID collisions so can be kept small. For managing session removal, we need a list of colliding sessions matching a given ID in order to update or remove the IDR entry of the ID. This is necessary to detect session ID collisions when future sessions are created. The list head is allocated on first collision of a given ID and refcounted. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: remove unused list_head member in l2tp_tunnelJames Chapman2-3/+0
Remove an unused variable in struct l2tp_tunnel which was left behind by commit c4d48a58f32c5 ("l2tp: convert l2tp_tunnel_list to idr"). Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-17l2tp: fix ICMP error handling for UDP-encap socketsTom Parkin1-11/+33
Since commit a36e185e8c85 ("udp: Handle ICMP errors for tunnels with same destination port on both endpoints") UDP's handling of ICMP errors has allowed for UDP-encap tunnels to determine socket associations in scenarios where the UDP hash lookup could not. Subsequently, commit d26796ae58940 ("udp: check udp sock encap_type in __udp_lib_err") subtly tweaked the approach such that UDP ICMP error handling would be skipped for any UDP socket which has encapsulation enabled. In the case of L2TP tunnel sockets using UDP-encap, this latter modification effectively broke ICMP error reporting for the L2TP control plane. To a degree this isn't catastrophic inasmuch as the L2TP control protocol defines a reliable transport on top of the underlying packet switching network which will eventually detect errors and time out. However, paying attention to the ICMP error reporting allows for more timely detection of errors in L2TP userspace, and aids in debugging connectivity issues. Reinstate ICMP error handling for UDP encap L2TP tunnels: * implement struct udp_tunnel_sock_cfg .encap_err_rcv in order to allow the L2TP code to handle ICMP errors; * only implement error-handling for tunnels which have a managed socket: unmanaged tunnels using a kernel socket have no userspace to report errors back to; * flag the error on the socket, which allows for userspace to get an error such as -ECONNREFUSED back from sendmsg/recvmsg; * pass the error into ip[v6]_icmp_error() which allows for userspace to get extended error information via. MSG_ERRQUEUE. Fixes: d26796ae5894 ("udp: check udp sock encap_type in __udp_lib_err") Signed-off-by: Tom Parkin <tparkin@katalix.com> Link: https://lore.kernel.org/r/20240513172248.623261-1-tparkin@katalix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-13l2tp: Support different protocol versions with same IP/port quadrupleSamuel Thibault1-8/+10
628bc3e5a1be ("l2tp: Support several sockets with same IP/port quadruple") added support for several L2TPv2 tunnels using the same IP/port quadruple, but if an L2TPv3 socket exists it could eat all the trafic. We thus have to first use the version from the packet to get the proper tunnel, and only then check that the version matches. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Reviewed-by: James Chapman <jchapman@katalix.com> Link: https://lore.kernel.org/r/20240509205812.4063198-1-samuel.thibault@ens-lyon.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-09l2tp: Support several sockets with same IP/port quadrupleSamuel Thibault1-0/+21
Some l2tp providers will use 1701 as origin port and open several tunnels for the same origin and target. On the Linux side, this may mean opening several sockets, but then trafic will go to only one of them, losing the trafic for the tunnel of the other socket (or leaving it up to userland, consuming a lot of cpu%). This can also happen when the l2tp provider uses a cluster, and load-balancing happens to migrate from one origin IP to another one, for which a socket was already established. Managing reassigning tunnels from one socket to another would be very hairy for userland. Lastly, as documented in l2tpconfig(1), as client it may be necessary to use 1701 as origin port for odd firewalls reasons, which could prevent from establishing several tunnels to a l2tp server, for the same reason: trafic would get only on one of the two sockets. With the V2 protocol it is however easy to route trafic to the proper tunnel, by looking up the tunnel number in the network namespace. This fixes the three cases altogether. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Link: https://lore.kernel.org/r/20240506215336.1470009-1-samuel.thibault@ens-lyon.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+3
Cross-merge networking fixes after downstream PR. Conflicts: include/linux/filter.h kernel/bpf/core.c 66e13b615a0c ("bpf: verifier: prevent userspace memory access") d503a04f8bc0 ("bpf: Add support for certain atomics in bpf_arena to x86 JIT") https://lore.kernel.org/all/20240429114939.210328b0@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-30inet: introduce dst_rtable() helperEric Dumazet1-1/+1
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper") This patch does a similar change for IPv4. Instead of (struct rtable *)dst casts, we can use : #define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst) Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240429133009.1227754-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-29ipv6: introduce dst_rt6_info() helperEric Dumazet1-1/+1
Instead of (struct rt6_info *)dst casts, we can use : #define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst) Some places needed missing const qualifiers : ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway() v2: added missing parts (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-26net l2tp: drop flow hash on forwardDavid Bauer1-0/+3
Drop the flow-hash of the skb when forwarding to the L2TP netdev. This avoids the L2TP qdisc from using the flow-hash from the outer packet, which is identical for every flow within the tunnel. This does not affect every platform but is specific for the ethernet driver. It depends on the platform including L4 information in the flow-hash. One such example is the Mediatek Filogic MT798x family of networking processors. Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support") Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David Bauer <mail@david-bauer.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240424171110.13701-1-mail@david-bauer.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-11l2tp: fix incorrect parameter validation in the pppol2tp_getsockopt() functionGavrilov Ilia1-2/+2
The 'len' variable can't be negative when assigned the result of 'min_t' because all 'min_t' parameters are cast to unsigned int, and then the minimum one is chosen. To fix the logic, check 'len' as read from 'optlen', where the types of relevant variables are (signed) int. Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: Gavrilov Ilia <Ilia.Gavrilov@infotecs.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/udp.c f796feabb9f5 ("udp: add local "peek offset enabled" flag") 56667da7399e ("net: implement lockless setsockopt(SO_PEEK_OFF)") Adjacent changes: net/unix/garbage.c aa82ac51d633 ("af_unix: Drop oob_skb ref before purging queue in GC.") 11498715f266 ("af_unix: Remove io_uring code for GC.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22l2tp: pass correct message length to ip6_append_dataTom Parkin1-1/+1
l2tp_ip6_sendmsg needs to avoid accounting for the transport header twice when splicing more data into an already partially-occupied skbuff. To manage this, we check whether the skbuff contains data using skb_queue_empty when deciding how much data to append using ip6_append_data. However, the code which performed the calculation was incorrect: ulen = len + skb_queue_empty(&sk->sk_write_queue) ? transhdrlen : 0; ...due to C operator precedence, this ends up setting ulen to transhdrlen for messages with a non-zero length, which results in corrupted packets on the wire. Add parentheses to correct the calculation in line with the original intent. Fixes: 9d4c75800f61 ("ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()") Cc: David Howells <dhowells@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Tom Parkin <tparkin@katalix.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240220122156.43131-1-tparkin@katalix.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-21net: l2tp: constify the struct device_type usageRicardo B. Marliere1-1/+1
Since commit aed65af1cc2f ("drivers: make device_type const"), the driver core can properly handle constant struct device_type. Move the l2tpeth_type variable to be a constant structure as well, placing it into read-only memory which can not be modified at runtime. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-12ipv4: Set the routing scope properly in ip_route_output_ports().Guillaume Nault1-1/+1
Set scope automatically in ip_route_output_ports() (using the socket SOCK_LOCALROUTE flag). This way, callers don't have to overload the tos with the RTO_ONLINK flag, like RT_CONN_FLAGS() does. For callers that don't pass a struct sock, this doesn't change anything as the scope is still set to RT_SCOPE_UNIVERSE when sk is NULL. Callers that passed a struct sock and used RT_CONN_FLAGS(sk) or RT_CONN_FLAGS_TOS(sk, tos) for the tos are modified to use ip_sock_tos(sk) and RT_TOS(tos) respectively, as overloading tos with the RTO_ONLINK flag now becomes unnecessary. In drivers/net/amt.c, all ip_route_output_ports() calls use a 0 tos parameter, ignoring the SOCK_LOCALROUTE flag of the socket. But the sk parameter is a kernel socket, which doesn't have any configuration path for setting SOCK_LOCALROUTE anyway. Therefore, ip_route_output_ports() will continue to initialise scope with RT_SCOPE_UNIVERSE and amt.c doesn't need to be modified. Also, remove RT_CONN_FLAGS() and RT_CONN_FLAGS_TOS() from route.h as these macros are now unused. The objective is to eventually remove RTO_ONLINK entirely to allow converting ->flowi4_tos to dscp_t. This will ensure proper isolation between the DSCP and ECN bits, thus minimising the risk of introducing bugs where TOS values interfere with ECN. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/dacfd2ab40685e20959ab7b53c427595ba229e7d.1707496938.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-11ipv6: annotate data-races around np->ucast_oifEric Dumazet1-1/+1
np->ucast_oif is read locklessly in some contexts. Make all accesses to this field lockless, adding appropriate annotations. This also makes setsockopt( IPV6_UNICAST_IF ) lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-11ipv6: annotate data-races around np->mcast_oifEric Dumazet1-1/+1
np->mcast_oif is read locklessly in some contexts. Make all accesses to this field lockless, adding appropriate annotations. This also makes setsockopt( IPV6_MULTICAST_IF ) lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Cross-merge networking fixes after downstream PR. No conflicts (or adjacent changes of note). Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-01ipv4, ipv6: Fix handling of transhdrlen in __ip{,6}_append_data()David Howells1-1/+1
Including the transhdrlen in length is a problem when the packet is partially filled (e.g. something like send(MSG_MORE) happened previously) when appending to an IPv4 or IPv6 packet as we don't want to repeat the transport header or account for it twice. This can happen under some circumstances, such as splicing into an L2TP socket. The symptom observed is a warning in __ip6_append_data(): WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800 that occurs when MSG_SPLICE_PAGES is used to append more data to an already partially occupied skbuff. The warning occurs when 'copy' is larger than the amount of data in the message iterator. This is because the requested length includes the transport header length when it shouldn't. This can be triggered by, for example: sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP); bind(sfd, ...); // ::1 connect(sfd, ...); // ::1 port 7 send(sfd, buffer, 4100, MSG_MORE); sendfile(sfd, dfd, NULL, 1024); Fix this by only adding transhdrlen into the length if the write queue is empty in l2tp_ip6_sendmsg(), analogously to how UDP does things. l2tp_ip_sendmsg() looks like it won't suffer from this problem as it builds the UDP packet itself. Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Reported-by: syzbot+62cbf263225ae13ff153@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/0000000000001c12b30605378ce8@google.com/ Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Dumazet <edumazet@google.com> cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> cc: "David S. Miller" <davem@davemloft.net> cc: David Ahern <dsahern@kernel.org> cc: Paolo Abeni <pabeni@redhat.com> cc: Jakub Kicinski <kuba@kernel.org> cc: netdev@vger.kernel.org cc: bpf@vger.kernel.org cc: syzkaller-bugs@googlegroups.com Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-10-01net: l2tp_eth: use generic dev->stats fieldsEric Dumazet1-22/+12
Core networking has opt-in atomic variant of dev->stats, simply use DEV_STATS_INC(), DEV_STATS_ADD() and DEV_STATS_READ(). v2: removed @priv local var in l2tp_eth_dev_recv() (Simon) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_FLOWINFO_SEND implementationEric Dumazet1-2/+2
np->sndflow reads are racy. Use one bit ftom atomic inet->inet_flags instead, IPV6_FLOWINFO_SEND setsockopt() can be lockless. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-15ipv6: lockless IPV6_DONTFRAG implementationEric Dumazet1-1/+1
Move np->dontfrag flag to inet->inet_flags to fix data-races. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-09-14udp: annotate data-races around udp->encap_typeEric Dumazet1-3/+3
syzbot/KCSAN complained about UDP_ENCAP_L2TPINUDP setsockopt() racing. Add READ_ONCE()/WRITE_ONCE() to document races on this lockless field. syzbot report was: BUG: KCSAN: data-race in udp_lib_setsockopt / udp_lib_setsockopt read-write to 0xffff8881083603fa of 1 bytes by task 16557 on cpu 0: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read-write to 0xffff8881083603fa of 1 bytes by task 16554 on cpu 1: udp_lib_setsockopt+0x682/0x6c0 udp_setsockopt+0x73/0xa0 net/ipv4/udp.c:2779 sock_common_setsockopt+0x61/0x70 net/core/sock.c:3697 __sys_setsockopt+0x1c9/0x230 net/socket.c:2263 __do_sys_setsockopt net/socket.c:2274 [inline] __se_sys_setsockopt net/socket.c:2271 [inline] __x64_sys_setsockopt+0x66/0x80 net/socket.c:2271 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x01 -> 0x05 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16554 Comm: syz-executor.5 Not tainted 6.5.0-rc7-syzkaller-00004-gf7757129e3de #0 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-08-16inet: introduce inet->inet_flagsEric Dumazet1-1/+1
Various inet fields are currently racy. do_ip_setsockopt() and do_ip_getsockopt() are mostly holding the socket lock, but some (fast) paths do not. Use a new inet->inet_flags to hold atomic bits in the series. Remove inet->cmsg_flags, and use instead 9 bits from inet_flags. Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Cross-merge networking fixes after downstream PR. Conflicts: net/dsa/port.c 9945c1fb03a3 ("net: dsa: fix older DSA drivers using phylink") a88dd7538461 ("net: dsa: remove legacy_pre_march2020 detection") https://lore.kernel.org/all/20230731102254.2c9868ca@canb.auug.org.au/ net/xdp/xsk.c 3c5b4d69c358 ("net: annotate data-races around sk->sk_mark") b7f72a30e9ac ("xsk: introduce wrappers and helpers for supporting multi-buffer in Tx path") https://lore.kernel.org/all/20230731102631.39988412@canb.auug.org.au/ drivers/net/ethernet/broadcom/bnxt/bnxt.c 37b61cda9c16 ("bnxt: don't handle XDP in netpoll") 2b56b3d99241 ("eth: bnxt: handle invalid Tx completions more gracefully") https://lore.kernel.org/all/20230801101708.1dc7faac@canb.auug.org.au/ Adjacent changes: drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec_fs.c 62da08331f1a ("net/mlx5e: Set proper IPsec source port in L4 selector") fbd517549c32 ("net/mlx5e: Add function to get IPsec offload namespace") drivers/net/ethernet/sfc/selftest.c 55c1528f9b97 ("sfc: fix field-spanning memcpy in selftest") ae9d445cd41f ("sfc: Miscellaneous comment removals") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-29net: annotate data-races around sk->sk_markEric Dumazet1-1/+1
sk->sk_mark is often read while another thread could change the value. Fixes: 4a19ec5800fc ("[NET]: Introducing socket mark socket option.") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-07-24ipv6: remove hard coded limitation on ipv6_pinfoEric Dumazet1-3/+1
IPv6 inet sockets are supposed to have a "struct ipv6_pinfo" field at the end of their definition, so that inet6_sk_generic() can derive from socket size the offset of the "struct ipv6_pinfo". This is very fragile, and prevents adding bigger alignment in sockets, because inet6_sk_generic() does not work if the compiler adds padding after the ipv6_pinfo component. We are currently working on a patch series to reorganize TCP structures for better data locality and found issues similar to the one fixed in commit f5d547676ca0 ("tcp: fix tcp_inet6_sk() for 32bit kernels") Alternative would be to force an alignment on "struct ipv6_pinfo", greater or equal to __alignof__(any ipv6 sock) to ensure there is no padding. This does not look great. v2: fix typo in mptcp_proto_v6_init() (Paolo) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Chao Wu <wwchao@google.com> Cc: Wei Wang <weiwan@google.com> Cc: Coco Li <lixiaoyan@google.com> Cc: YiFei Zhu <zhuyifei@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-06-24sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES)David Howells2-2/+0
Remove ->sendpage() and ->sendpage_locked(). sendmsg() with MSG_SPLICE_PAGES should be used instead. This allows multiple pages and multipage folios to be passed through. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can cc: Jens Axboe <axboe@kernel.dk> cc: Matthew Wilcox <willy@infradead.org> cc: linux-afs@lists.infradead.org cc: mptcp@lists.linux.dev cc: rds-devel@oss.oracle.com cc: tipc-discussion@lists.sourceforge.net cc: virtualization@lists.linux-foundation.org Link: https://lore.kernel.org/r/20230623225513.2732256-16-dhowells@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-06-15net: ioctl: Use kernel memory on protocol ioctl callbacksBreno Leitao2-6/+5
Most of the ioctls to net protocols operates directly on userspace argument (arg). Usually doing get_user()/put_user() directly in the ioctl callback. This is not flexible, because it is hard to reuse these functions without passing userspace buffers. Change the "struct proto" ioctls to avoid touching userspace memory and operate on kernel buffers, i.e., all protocol's ioctl callbacks is adapted to operate on a kernel memory other than on userspace (so, no more {put,get}_user() and friends being called in the ioctl callback). This changes the "struct proto" ioctl format in the following way: int (*ioctl)(struct sock *sk, int cmd, - unsigned long arg); + int *karg); (Important to say that this patch does not touch the "struct proto_ops" protocols) So, the "karg" argument, which is passed to the ioctl callback, is a pointer allocated to kernel space memory (inside a function wrapper). This buffer (karg) may contain input argument (copied from userspace in a prep function) and it might return a value/buffer, which is copied back to userspace if necessary. There is not one-size-fits-all format (that is I am using 'may' above), but basically, there are three type of ioctls: 1) Do not read from userspace, returns a result to userspace 2) Read an input parameter from userspace, and does not return anything to userspace 3) Read an input from userspace, and return a buffer to userspace. The default case (1) (where no input parameter is given, and an "int" is returned to userspace) encompasses more than 90% of the cases, but there are two other exceptions. Here is a list of exceptions: * Protocol RAW: * cmd = SIOCGETVIFCNT: * input and output = struct sioc_vif_req * cmd = SIOCGETSGCNT * input and output = struct sioc_sg_req * Explanation: for the SIOCGETVIFCNT case, userspace passes the input argument, which is struct sioc_vif_req. Then the callback populates the struct, which is copied back to userspace. * Protocol RAW6: * cmd = SIOCGETMIFCNT_IN6 * input and output = struct sioc_mif_req6 * cmd = SIOCGETSGCNT_IN6 * input and output = struct sioc_sg_req6 * Protocol PHONET: * cmd == SIOCPNADDRESOURCE | SIOCPNDELRESOURCE * input int (4 bytes) * Nothing is copied back to userspace. For the exception cases, functions sock_sk_ioctl_inout() will copy the userspace input, and copy it back to kernel space. The wrapper that prepare the buffer and put the buffer back to user is sk_ioctl(), so, instead of calling sk->sk_prot->ioctl(), the callee now calls sk_ioctl(), which will handle all cases. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20230609152800.830401-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-03-31l2tp: generate correct module alias stringsAndrea Righi2-8/+8
Commit 65b32f801bfb ("uapi: move IPPROTO_L2TP to in.h") moved the definition of IPPROTO_L2TP from a define to an enum, but since __stringify doesn't work properly with enums, we ended up breaking the modalias strings for the l2tp modules: $ modinfo l2tp_ip l2tp_ip6 | grep alias alias: net-pf-2-proto-IPPROTO_L2TP alias: net-pf-2-proto-2-type-IPPROTO_L2TP alias: net-pf-10-proto-IPPROTO_L2TP alias: net-pf-10-proto-2-type-IPPROTO_L2TP Use the resolved number directly in MODULE_ALIAS_*() macros (as we already do with SOCK_DGRAM) to fix the alias strings: $ modinfo l2tp_ip l2tp_ip6 | grep alias alias: net-pf-2-proto-115 alias: net-pf-2-proto-115-type-2 alias: net-pf-10-proto-115 alias: net-pf-10-proto-115-type-2 Moreover, fix the ordering of the parameters passed to MODULE_ALIAS_NET_PF_PROTO_TYPE() by switching proto and type. Fixes: 65b32f801bfb ("uapi: move IPPROTO_L2TP to in.h") Link: https://lore.kernel.org/lkml/ZCQt7hmodtUaBlCP@righiandr-XPS-13-7390 Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Tested-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-02-20l2tp: Avoid possible recursive deadlock in l2tp_tunnel_register()Shigeru Yoshida1-58/+67
When a file descriptor of pppol2tp socket is passed as file descriptor of UDP socket, a recursive deadlock occurs in l2tp_tunnel_register(). This situation is reproduced by the following program: int main(void) { int sock; struct sockaddr_pppol2tp addr; sock = socket(AF_PPPOX, SOCK_DGRAM, PX_PROTO_OL2TP); if (sock < 0) { perror("socket"); return 1; } addr.sa_family = AF_PPPOX; addr.sa_protocol = PX_PROTO_OL2TP; addr.pppol2tp.pid = 0; addr.pppol2tp.fd = sock; addr.pppol2tp.addr.sin_family = PF_INET; addr.pppol2tp.addr.sin_port = htons(0); addr.pppol2tp.addr.sin_addr.s_addr = inet_addr("192.168.0.1"); addr.pppol2tp.s_tunnel = 1; addr.pppol2tp.s_session = 0; addr.pppol2tp.d_tunnel = 0; addr.pppol2tp.d_session = 0; if (connect(sock, (const struct sockaddr *)&addr, sizeof(addr)) < 0) { perror("connect"); return 1; } return 0; } This program causes the following lockdep warning: ============================================ WARNING: possible recursive locking detected 6.2.0-rc5-00205-gc96618275234 #56 Not tainted -------------------------------------------- repro/8607 is trying to acquire lock: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: l2tp_tunnel_register+0x2b7/0x11c0 but task is already holding lock: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(sk_lock-AF_PPPOX); lock(sk_lock-AF_PPPOX); *** DEADLOCK *** May be due to missing lock nesting notation 1 lock held by repro/8607: #0: ffff8880213c8130 (sk_lock-AF_PPPOX){+.+.}-{0:0}, at: pppol2tp_connect+0xa82/0x1a30 stack backtrace: CPU: 0 PID: 8607 Comm: repro Not tainted 6.2.0-rc5-00205-gc96618275234 #56 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x100/0x178 __lock_acquire.cold+0x119/0x3b9 ? lockdep_hardirqs_on_prepare+0x410/0x410 lock_acquire+0x1e0/0x610 ? l2tp_tunnel_register+0x2b7/0x11c0 ? lock_downgrade+0x710/0x710 ? __fget_files+0x283/0x3e0 lock_sock_nested+0x3a/0xf0 ? l2tp_tunnel_register+0x2b7/0x11c0 l2tp_tunnel_register+0x2b7/0x11c0 ? sprintf+0xc4/0x100 ? l2tp_tunnel_del_work+0x6b0/0x6b0 ? debug_object_deactivate+0x320/0x320 ? lockdep_init_map_type+0x16d/0x7a0 ? lockdep_init_map_type+0x16d/0x7a0 ? l2tp_tunnel_create+0x2bf/0x4b0 ? l2tp_tunnel_create+0x3c6/0x4b0 pppol2tp_connect+0x14e1/0x1a30 ? pppol2tp_put_sk+0xd0/0xd0 ? aa_sk_perm+0x2b7/0xa80 ? aa_af_perm+0x260/0x260 ? bpf_lsm_socket_connect+0x9/0x10 ? pppol2tp_put_sk+0xd0/0xd0 __sys_connect_file+0x14f/0x190 __sys_connect+0x133/0x160 ? __sys_connect_file+0x190/0x190 ? lockdep_hardirqs_on+0x7d/0x100 ? ktime_get_coarse_real_ts64+0x1b7/0x200 ? ktime_get_coarse_real_ts64+0x147/0x200 ? __audit_syscall_entry+0x396/0x500 __x64_sys_connect+0x72/0xb0 do_syscall_64+0x38/0xb0 entry_SYSCALL_64_after_hwframe+0x63/0xcd This patch fixes the issue by getting/creating the tunnel before locking the pppol2tp socket. Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()") Cc: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-18l2tp: prevent lockdep issue in l2tp_tunnel_register()Eric Dumazet1-3/+2
lockdep complains with the following lock/unlock sequence: lock_sock(sk); write_lock_bh(&sk->sk_callback_lock); [1] release_sock(sk); [2] write_unlock_bh(&sk->sk_callback_lock); We need to swap [1] and [2] to fix this issue. Fixes: 0b2c59720e65 ("l2tp: close all race conditions in l2tp_tunnel_register()") Reported-by: syzbot+bbd35b345c7cab0d9a08@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/netdev/20230114030137.672706-1-xiyou.wangcong@gmail.com/T/#m1164ff20628671b0f326a24cb106ab3239c70ce3 Cc: Cong Wang <cong.wang@bytedance.com> Cc: Guillaume Nault <gnault@redhat.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16l2tp: close all race conditions in l2tp_tunnel_register()Cong Wang1-14/+14
The code in l2tp_tunnel_register() is racy in several ways: 1. It modifies the tunnel socket _after_ publishing it. 2. It calls setup_udp_tunnel_sock() on an existing socket without locking. 3. It changes sock lock class on fly, which triggers many syzbot reports. This patch amends all of them by moving socket initialization code before publishing and under sock lock. As suggested by Jakub, the l2tp lockdep class is not necessary as we can just switch to bh_lock_sock_nested(). Fixes: 37159ef2c1ae ("l2tp: fix a lockdep splat") Fixes: 6b9f34239b00 ("l2tp: fix races in tunnel creation") Reported-by: syzbot+52866e24647f9a23403f@syzkaller.appspotmail.com Reported-by: syzbot+94cc2a66fc228b23f360@syzkaller.appspotmail.com Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Guillaume Nault <gnault@redhat.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Parkin <tparkin@katalix.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-01-16l2tp: convert l2tp_tunnel_list to idrCong Wang1-43/+42
l2tp uses l2tp_tunnel_list to track all registered tunnels and to allocate tunnel ID's. IDR can do the same job. More importantly, with IDR we can hold the ID before a successful registration so that we don't need to worry about late error handling, it is not easy to rollback socket changes. This is a preparation for the following fix. Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Guillaume Nault <gnault@redhat.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Tom Parkin <tparkin@katalix.com> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Reviewed-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-8/+9
tools/lib/bpf/ringbuf.c 927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap") b486d19a0ab0 ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/20221121122707.44d1446a@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-23l2tp: Don't sleep and disable BH under writer-side sk_callback_lockJakub Sitnicki1-8/+9
When holding a reader-writer spin lock we cannot sleep. Calling setup_udp_tunnel_sock() with write lock held violates this rule, because we end up calling percpu_down_read(), which might sleep, as syzbot reports [1]: __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890 percpu_down_read include/linux/percpu-rwsem.h:49 [inline] cpus_read_lock+0x1b/0x140 kernel/cpu.c:310 static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158 udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline] setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81 l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509 pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723 Trim the writer-side critical section for sk_callback_lock down to the minimum, so that it covers only operations on sk_user_data. Also, when grabbing the sk_callback_lock, we always need to disable BH, as Eric points out. Failing to do so leads to deadlocks because we acquire sk_callback_lock in softirq context, which can get stuck waiting on us if: 1) it runs on the same CPU, or CPU0 ---- lock(clock-AF_INET6); <Interrupt> lock(clock-AF_INET6); 2) lock ordering leads to priority inversion CPU0 CPU1 ---- ---- lock(clock-AF_INET6); local_irq_disable(); lock(&tcp_hashinfo.bhash[i].lock); lock(clock-AF_INET6); <Interrupt> lock(&tcp_hashinfo.bhash[i].lock); ... as syzbot reports [2,3]. Use the _bh variants for write_(un)lock. [1] https://lore.kernel.org/netdev/0000000000004e78ec05eda79749@google.com/ [2] https://lore.kernel.org/netdev/000000000000e38b6605eda76f98@google.com/ [3] https://lore.kernel.org/netdev/000000000000dfa31e05eda76f75@google.com/ v2: - Check and set sk_user_data while holding sk_callback_lock for both L2TP encapsulation types (IP and UDP) (Tetsuo) Cc: Tom Parkin <tparkin@katalix.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Fixes: b68777d54fac ("l2tp: Serialize access to sk_user_data with sk_callback_lock") Reported-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com Reported-by: syzbot+50680ced9e98a61f7698@syzkaller.appspotmail.com Reported-by: syzbot+de987172bb74a381879b@syzkaller.appspotmail.com Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-11-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-6/+13
include/linux/bpf.h 1f6e04a1c7b8 ("bpf: Fix offset calculation error in __copy_map_value and zero_map_value") aa3496accc41 ("bpf: Refactor kptr_off_tab into btf_record") f71b2f64177a ("bpf: Refactor map->off_arr handling") https://lore.kernel.org/all/20221114095000.67a73239@canb.auug.org.au/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-11-16l2tp: Serialize access to sk_user_data with sk_callback_lockJakub Sitnicki1-6/+13
sk->sk_user_data has multiple users, which are not compatible with each other. Writers must synchronize by grabbing the sk->sk_callback_lock. l2tp currently fails to grab the lock when modifying the underlying tunnel socket fields. Fix it by adding appropriate locking. We err on the side of safety and grab the sk_callback_lock also inside the sk_destruct callback overridden by l2tp, even though there should be no refs allowing access to the sock at the time when sk_destruct gets called. v4: - serialize write to sk_user_data in l2tp sk_destruct v3: - switch from sock lock to sk_callback_lock - document write-protection for sk_user_data v2: - update Fixes to point to origin of the bug - use real names in Reported/Tested-by tags Cc: Tom Parkin <tparkin@katalix.com> Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Reported-by: Haowei Yan <g1042620637@gmail.com> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-10-24inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().Kuniyuki Iwashima1-2/+0
After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources. Now we can remove unnecessary inet6_destroy_sock() calls in sk->sk_prot->destroy(). DCCP and SCTP have their own sk->sk_destruct() function, so we change them separately in the following patches. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-29genetlink: start to validate reserved header bytesJakub Kicinski1-0/+1
We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel) Signed-off-by: David S. Miller <davem@davemloft.net>
2022-08-22l2tp: move from strlcpy with unused retval to strscpyWolfram Sang1-2/+2
Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20220818210222.8515-1-wsa+renesas@sang-engineering.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-08l2tp: l2tp_debugfs: fix Clang -Wformat warningsJustin Stitt1-3/+3
When building with Clang we encounter the following warnings: | net/l2tp/l2tp_debugfs.c:187:40: error: format specifies type 'unsigned | short' but the argument has type 'u32' (aka 'unsigned int') | [-Werror,-Wformat] seq_printf(m, " nr %hu, ns %hu\n", session->nr, | session->ns); - | net/l2tp/l2tp_debugfs.c:196:32: error: format specifies type 'unsigned | short' but the argument has type 'int' [-Werror,-Wformat] | session->l2specific_type, l2tp_get_l2specific_len(session)); - | net/l2tp/l2tp_debugfs.c:219:6: error: format specifies type 'unsigned | short' but the argument has type 'u32' (aka 'unsigned int') | [-Werror,-Wformat] session->nr, session->ns, Both session->nr and ->nc are of type `u32`. The currently used format specifier is `%hu` which describes a `u16`. My proposed fix is to listen to Clang and use the correct format specifier `%u`. For the warning at line 196, l2tp_get_l2specific_len() returns an int and should therefore be using the `%d` format specifier. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Justin Stitt <justinstitt@google.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-07-07net: l2tp: fix clang -Wformat warningJustin Stitt1-1/+1
When building with clang we encounter this warning: | net/l2tp/l2tp_ppp.c:1557:6: error: format specifies type 'unsigned | short' but the argument has type 'u32' (aka 'unsigned int') | [-Werror,-Wformat] session->nr, session->ns, Both session->nr and session->ns are of type u32. The format specifier previously used is `%hu` which would truncate our unsigned integer from 32 to 16 bits. This doesn't seem like intended behavior, if it is then perhaps we need to consider suppressing the warning with pragma clauses. This patch should get us closer to the goal of enabling the -Wformat flag for Clang builds. Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Justin Stitt <justinstitt@google.com> Acked-by: Guillaume Nault <gnault@redhat.com> Link: https://lore.kernel.org/r/20220706230833.535238-1-justinstitt@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-08ipv6: Fix signed integer overflow in l2tp_ip6_sendmsgWang Yufen1-2/+3
When len >= INT_MAX - transhdrlen, ulen = len + transhdrlen will be overflow. To fix, we can follow what udpv6 does and subtract the transhdrlen from the max. Signed-off-by: Wang Yufen <wangyufen@huawei.com> Link: https://lore.kernel.org/r/20220607120028.845916-2-wangyufen@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-16l2tp: use add READ_ONCE() to fetch sk->sk_bound_dev_ifEric Dumazet2-4/+8
Use READ_ONCE() in paths not holding the socket lock. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-12net: remove noblock parameter from recvmsg() entitiesOliver Hartkopp2-4/+2
The internal recvmsg() functions have two parameters 'flags' and 'noblock' that were merged inside skb_recv_datagram(). As a follow up patch to commit f4b41f062c42 ("net: remove noblock parameter from skb_recv_datagram()") this patch removes the separate 'noblock' parameter for recvmsg(). Analogue to the referenced patch for skb_recv_datagram() the 'flags' and 'noblock' parameters are unnecessarily split up with e.g. err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); or in err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); instead of simply using only flags all the time and check for MSG_DONTWAIT where needed (to preserve for the formerly separated no(n)block condition). Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-04-06net: remove noblock parameter from skb_recv_datagram()Oliver Hartkopp3-4/+5
skb_recv_datagram() has two parameters 'flags' and 'noblock' that are merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)' As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags' into 'flags' and 'noblock' with finally obsolete bit operations like this: skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc); And this is not even done consistently with the 'flags' parameter. This patch removes the obsolete and costly splitting into two parameters and only performs bit operations when really needed on the caller side. One missing conversion thankfully reported by kernel test robot. I missed to enable kunit tests to build the mctp code. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-10l2tp: add netns refcount tracker to l2tp_dfs_seq_dataEric Dumazet1-4/+5
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-29net/l2tp: convert tunnel rwlock_t to rcuTom Parkin3-36/+31
Previously commit e02d494d2c60 ("l2tp: Convert rwlock to RCU") converted most, but not all, rwlock instances in the l2tp subsystem to RCU. The remaining rwlock protects the per-tunnel hashlist of sessions which is used for session lookups in the UDP-encap data path. Convert the remaining rwlock to rcu to improve performance of UDP-encap tunnels. Note that the tunnel and session, which both live on RCU-protected lists, use slightly different approaches to incrementing their refcounts in the various getter functions. The tunnel has to use refcount_inc_not_zero because the tunnel shutdown process involves dropping the refcount to zero prior to synchronizing RCU readers (via. kfree_rcu). By contrast, the session shutdown removes the session from the list(s) it is on, synchronizes with readers, and then decrements the session refcount. Since the getter functions increment the session refcount with the RCU read lock held we prevent getters seeing a zero session refcount, and therefore don't need to use refcount_inc_not_zero. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-09net/l2tp: Fix reference count leak in l2tp_udp_recv_coreXiyu Yang1-1/+3
The reference count leak issue may take place in an error handling path. If both conditions of tunnel->version == L2TP_HDR_VER_3 and the return value of l2tp_v3_ensure_opt_in_linear is nonzero, the function would directly jump to label invalid, without decrementing the reference count of the l2tp_session object session increased earlier by l2tp_tunnel_get_session(). This may result in refcount leaks. Fix this issue by decrease the reference count before jumping to the label invalid. Fixes: 4522a70db7aa ("l2tp: fix reading optional fields of L2TPv3") Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-07l2tp: Fix spelling mistakesZheng Yongjun2-2/+2
Fix some spelling mistakes in comments: negociated ==> negotiated dont ==> don't Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-05-17net: Remove the member netns_okYejune Deng1-1/+0
Every protocol has the 'netns_ok' member and it is euqal to 1. The 'if (!prot->netns_ok)' always false in inet_add_protocol(). Signed-off-by: Yejune Deng <yejunedeng@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-27net: fix a concurrency bug in l2tp_tunnel_register()Gong, Sishuai1-5/+5
l2tp_tunnel_register() registers a tunnel without fully initializing its attribute. This can allow another kernel thread running l2tp_xmit_core() to access the uninitialized data and then cause a kernel NULL pointer dereference error, as shown below. Thread 1 Thread 2 //l2tp_tunnel_register() list_add_rcu(&tunnel->list, &pn->l2tp_tunnel_list); //pppol2tp_connect() tunnel = l2tp_tunnel_get(sock_net(sk), info.tunnel_id); // Fetch the new tunnel ... //l2tp_xmit_core() struct sock *sk = tunnel->sock; ... bh_lock_sock(sk); //Null pointer error happens tunnel->sock = sk; Fix this bug by initializing tunnel->sock before adding the tunnel into l2tp_tunnel_list. Reviewed-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: Sishuai Gong <sishuai@purdue.edu> Reported-by: Sishuai Gong <sishuai@purdue.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-22net: l2tp: Fix a typoBhaskar Chowdhury1-1/+1
s/verifed/verified/ Signed-off-by: Bhaskar Chowdhury <unixbhaskar@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-03net: l2tp: reduce log level of messages in receive path, add counter insteadMatthias Schiffer3-19/+29
Commit 5ee759cda51b ("l2tp: use standard API for warning log messages") changed a number of warnings about invalid packets in the receive path so that they are always shown, instead of only when a special L2TP debug flag is set. Even with rate limiting these warnings can easily cause significant log spam - potentially triggered by a malicious party sending invalid packets on purpose. In addition these warnings were noticed by projects like Tunneldigger [1], which uses L2TP for its data path, but implements its own control protocol (which is sufficiently different from L2TP data packets that it would always be passed up to userspace even with future extensions of L2TP). Some of the warnings were already redundant, as l2tp_stats has a counter for these packets. This commit adds one additional counter for invalid packets that are passed up to userspace. Packets with unknown session are not counted as invalid, as there is nothing wrong with the format of these packets. With the additional counter, all of these messages are either redundant or benign, so we reduce them to pr_debug_ratelimited(). [1] https://github.com/wlanslovenija/tunneldigger/issues/160 Fixes: 5ee759cda51b ("l2tp: use standard API for warning log messages") Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-11-23lsm,selinux: pass flowi_common instead of flowi to the LSM hooksPaul Moore1-1/+1
As pointed out by Herbert in a recent related patch, the LSM hooks do not have the necessary address family information to use the flowi struct safely. As none of the LSMs currently use any of the protocol specific flowi information, replace the flowi pointers with pointers to the address family independent flowi_common struct. Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: James Morris <jamorris@linux.microsoft.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2020-10-02genetlink: move to smaller ops wherever possibleJakub Kicinski1-3/+3
Bulk of the genetlink users can use smaller ops, move them. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-29l2tp: report rx cookie discards in netlink getTom Parkin1-0/+6
When an L2TPv3 session receives a data frame with an incorrect cookie l2tp_core logs a warning message and bumps a stats counter to reflect the fact that the packet has been dropped. However, the stats counter in question is missing from the l2tp_netlink get message for tunnel and session instances. Include the statistic in the netlink get response. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18l2tp: fix up inconsistent rx/tx statisticsTom Parkin1-3/+8
Historically L2TP core statistics count the L2TP header in the per-session and per-tunnel byte counts tracked for transmission and receipt. Now that l2tp_xmit_skb updates tx stats, it is necessary for l2tp_xmit_core to pass out the length of the transmitted packet so that the statistics can be updated correctly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: avoid duplicated code in l2tp_tunnel_closeallTom Parkin1-13/+1
l2tp_tunnel_closeall is called as a part of tunnel shutdown in order to close all the sessions held by the tunnel. The code it uses to close a session duplicates what l2tp_session_delete does. Rather than duplicating the code, have l2tp_tunnel_closeall call l2tp_session_delete instead. This involves a very minor change to locking in l2tp_tunnel_closeall. Previously, l2tp_tunnel_closeall checked the session "dead" flag while holding tunnel->hlist_lock. This allowed for the code to step to the next session in the list without releasing the lock if the current session happened to be in the process of closing already. By calling l2tp_session_delete instead, l2tp_tunnel_closeall must now drop and regain the hlist lock for each session in the tunnel list. Given that the likelihood of a session being in the process of closing when the tunnel is closed, it seems worth this very minor potential loss of efficiency to avoid duplication of the session delete code. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: make magic feather checks more usefulTom Parkin5-20/+38
The l2tp tunnel and session structures contain a "magic feather" field which was originally intended to help trace lifetime bugs in the code. Since the introduction of the shared kernel refcount code in refcount.h, and l2tp's porting to those APIs, we are covered by the refcount code's checks and warnings. Duplicating those checks in the l2tp code isn't useful. However, magic feather checks are still useful to help to detect bugs stemming from misuse/trampling of the sk_user_data pointer in struct sock. The l2tp code makes extensive use of sk_user_data to stash pointers to the tunnel and session structures, and if another subsystem overwrites sk_user_data it's important to detect this. As such, rework l2tp's magic feather checks to focus on validating the tunnel and session data structures when they're extracted from sk_user_data. * Add a new accessor function l2tp_sk_to_tunnel which contains a magic feather check, and is used by l2tp_core and l2tp_ip[6] * Comment l2tp_udp_encap_recv which doesn't use this new accessor function because of the specific nature of the codepath it is called in * Drop l2tp_session_queue_purge's check on the session magic feather: it is called from code which is walking the tunnel session list, and hence doesn't need validation * Drop l2tp_session_free's check on the tunnel magic feather: the intention of this check is covered by refcount.h's reference count sanity checking * Add session magic validation in pppol2tp_ioctl. On failure return -EBADF, which mirrors the approach in pppol2tp_[sg]etsockopt. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: capture more tx errors in data plane statsTom Parkin1-34/+37
l2tp_xmit_skb has a number of failure paths which are not reflected in the tunnel and session statistics because the stats are updated by l2tp_xmit_core. Hence any errors occurring before l2tp_xmit_core is called are missed from the statistics. Refactor the transmit path slightly to capture all error paths. l2tp_xmit_skb now leaves all the actual work of transmission to l2tp_xmit_core, and updates the statistics based on l2tp_xmit_core's return code. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: drop net argument from l2tp_tunnel_createTom Parkin4-4/+4
The argument is unused, so remove it. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: drop data_len argument from l2tp_xmit_coreTom Parkin1-3/+2
The data_len argument passed to l2tp_xmit_core is no longer used, so remove it. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-03l2tp: remove header length param from l2tp_xmit_skbTom Parkin4-10/+9
All callers pass the session structure's hdr_len field as the header length parameter to l2tp_xmit_skb. Since we're passing a pointer to the session structure to l2tp_xmit_skb anyway, there's not much point breaking the header length out as a separate argument. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-22l2tp: remove tunnel and session debug flags fieldTom Parkin5-35/+12
The l2tp subsystem now uses standard kernel logging APIs for informational and warning messages, and tracepoints for debug information. Now that the tunnel and session debug flags are unused, remove the field from the core structures. Various system calls (in the case of l2tp_ppp) and netlink messages handle the getting and setting of debug flags. To avoid userspace breakage don't modify the API of these calls; simply ignore set requests, and send dummy data for get requests. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-22l2tp: remove custom logging macrosTom Parkin1-13/+0
All l2tp's informational and warning logging is now carried out using standard kernel APIs. Debugging information is now handled using tracepoints. Now that no code is using the custom logging macros, remove them from l2tp_core.h. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-22l2tp: add tracepoints to l2tp_core.cTom Parkin1-52/+31
Add lifetime event tracing for tunnel and session instances, tracking tunnel and session registration, deletion, and eventual freeing. Port the data path sequence number debug logging to use trace points rather than custom debug macros. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-22l2tp: add tracepoint definitions in trace.hTom Parkin2-2/+200
l2tp can provide a better debug experience using tracepoints rather than printk-style logging. Add tracepoint definitions in trace.h for use in the l2tp subsystem code. Add preprocessor definitions for the length of session and tunnel names in l2tp_core.h so we can reuse these in trace.h. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-22l2tp: add tracepoint infrastructure to coreTom Parkin3-0/+20
The l2tp subsystem doesn't currently make use of tracepoints. As a starting point for adding tracepoints, add skeleton infrastructure for defining tracepoints for the subsystem, and for having them build appropriately whether compiled into the kernel or built as a module. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-22l2tp: use standard API for warning log messagesTom Parkin2-20/+14
The l2tp_* log wrappers only emit messages of a given category if the tunnel or session structure has the appropriate flag set in its debug field. Flags default to being unset. For warning messages, this doesn't make a lot of sense since an administrator is likely to want to know about datapath warnings without needing to tweak the debug flags setting for a given tunnel or session instance. Modify l2tp_warn callsites to use pr_warn_ratelimited instead for unconditional output of warning messages. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-22l2tp: remove noisy logging, use appropriate log levelsTom Parkin2-43/+5
l2tp_ppp in particular had a lot of log messages for tracing [get|set]sockopt calls. These aren't especially useful, so remove these messages. Several log messages flagging error conditions were logged using l2tp_info: they're better off as l2tp_warn. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-22l2tp: don't log data framesTom Parkin5-101/+6
l2tp had logging to trace data frame receipt and transmission, including code to dump packet contents. This was originally intended to aid debugging of core l2tp packet handling, but is of limited use now that code is stable. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30l2tp: improve API documentation in l2tp_core.hTom Parkin1-14/+72
* Improve the description of the key l2tp subsystem data structures. * Add high-level description of the main APIs for interacting with l2tp core. * Add documentation for the l2tp netlink session command callbacks. * Document the session pseudowire callbacks. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30l2tp: tweak exports for l2tp_recv_common and l2tp_ioctlTom Parkin2-2/+2
All of the l2tp subsystem's exported symbols are exported using EXPORT_SYMBOL_GPL, except for l2tp_recv_common and l2tp_ioctl. These functions alone are not useful without the rest of the l2tp infrastructure, so there's no practical benefit to these symbols using a different export policy. Change these exports to use EXPORT_SYMBOL_GPL for consistency with the rest of l2tp. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30l2tp: remove build_header callback in struct l2tp_sessionTom Parkin2-7/+4
The structure of an L2TP data packet header varies depending on the version of the L2TP protocol being used. struct l2tp_session used to have a build_header callback to abstract this difference away. It's clearer to simply choose the correct function to use when building the data packet (and we save on the function pointer in the session structure). This approach does mean dereferencing the parent tunnel structure in order to determine the tunnel version, but we're doing that in the transmit path in any case. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30l2tp: return void from l2tp_session_deleteTom Parkin3-10/+5
l2tp_session_delete is used to schedule a session instance for deletion. The function itself always returns zero, and none of its direct callers check its return value, so have the function return void. This change de-facto changes the l2tp netlink session_delete callback prototype since all pseudowires currently use l2tp_session_delete for their implementation of that operation. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30l2tp: don't export tunnel and session free functionsTom Parkin2-47/+46
Tunnel and session instances are reference counted, and shouldn't be directly freed by pseudowire code. Rather than exporting l2tp_tunnel_free and l2tp_session_free, make them private to l2tp_core.c, and export the refcount functions instead. In order to do this, the refcount functions cannot be declared as inline. Since the codepaths which take and drop tunnel and session references are not directly in the datapath this shouldn't cause performance issues. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-30l2tp: don't export __l2tp_session_unhashTom Parkin2-32/+26
When __l2tp_session_unhash was first added it was used outside of l2tp_core.c, but that's no longer the case. As such, there's no longer a need to export the function. Make it private inside l2tp_core.c, and relocate it to avoid having to declare the function prototype in l2tp_core.h. Since the function is no longer used outside l2tp_core.c, remove the "__" prefix since we don't need to indicate anything special about its expected use to callers. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24l2tp: WARN_ON rather than BUG_ON in l2tp_session_freeTom Parkin1-1/+3
l2tp_session_free called BUG_ON if the tunnel magic feather value wasn't correct. The intent of this was to catch lifetime bugs; for example early tunnel free due to incorrect use of reference counts. Since the tunnel magic feather being wrong indicates either early free or structure corruption, we can avoid doing more damage by simply leaving the tunnel structure alone. If the tunnel refcount isn't dropped when it should be, the tunnel instance will remain in the kernel, resulting in the tunnel structure and socket leaking. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24l2tp: remove BUG_ON refcount value in l2tp_session_freeTom Parkin1-2/+0
l2tp_session_free is only called by l2tp_session_dec_refcount when the reference count reaches zero, so it's of limited value to validate the reference count value in l2tp_session_free itself. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24l2tp: WARN_ON rather than BUG_ON in l2tp_session_queue_purgeTom Parkin1-3/+4
l2tp_session_queue_purge is used during session shutdown to drop any skbs queued for reordering purposes according to L2TP dataplane rules. The BUG_ON in this function checks the session magic feather in an attempt to catch lifetime bugs. Rather than crashing the kernel with a BUG_ON, we can simply WARN_ON and refuse to do anything more -- in the worst case this could result in a leak. However this is highly unlikely given that the session purge only occurs from codepaths which have obtained the session by means of a lookup via. the parent tunnel and which check the session "dead" flag to protect against shutdown races. While we're here, have l2tp_session_queue_purge return void rather than an integer, since neither of the callsites checked the return value. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24l2tp: don't BUG_ON seqfile checks in l2tp_pppTom Parkin1-1/+5
checkpatch advises that WARN_ON and recovery code are preferred over BUG_ON which crashes the kernel. l2tp_ppp has a BUG_ON check of struct seq_file's private pointer in pppol2tp_seq_start prior to accessing data through that pointer. Rather than crashing, we can simply bail out early and return NULL in order to terminate the seq file processing in much the same way as we do when reaching the end of tunnel/session instances to render. Retain a WARN_ON to help trace possible bugs in this area. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24l2tp: don't BUG_ON session magic checks in l2tp_pppTom Parkin1-3/+7
checkpatch advises that WARN_ON and recovery code are preferred over BUG_ON which crashes the kernel. l2tp_ppp.c's BUG_ON checks of the l2tp session structure's "magic" field occur in code paths where it's reasonably easy to recover: * In the case of pppol2tp_sock_to_session, we can return NULL and the caller will bail out appropriately. There is no change required to any of the callsites of this function since they already handle pppol2tp_sock_to_session returning NULL. * In the case of pppol2tp_session_destruct we can just avoid decrementing the reference count on the suspect session structure. In the worst case scenario this results in a memory leak, which is preferable to a crash. Convert these uses of BUG_ON to WARN_ON accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24l2tp: remove BUG_ON in l2tp_tunnel_closeallTom Parkin1-2/+0
l2tp_tunnel_closeall is only called from l2tp_core.c, and it's easy to statically analyse the code path calling it to validate that it should never be passed a NULL tunnel pointer. Having a BUG_ON checking the tunnel pointer triggers a checkpatch warning. Since the BUG_ON is of no value, remove it to avoid the warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24l2tp: remove BUG_ON in l2tp_session_queue_purgeTom Parkin1-1/+0
l2tp_session_queue_purge is only called from l2tp_core.c, and it's easy to statically analyse the code paths calling it to validate that it should never be passed a NULL session pointer. Having a BUG_ON checking the session pointer triggers a checkpatch warning. Since the BUG_ON is of no value, remove it to avoid the warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24l2tp: WARN_ON rather than BUG_ON in l2tp_dfs_seq_startTom Parkin1-1/+4
l2tp_dfs_seq_start had a BUG_ON to catch a possible programming error in l2tp_dfs_seq_open. Since we can easily bail out of l2tp_dfs_seq_start, prefer to do that and flag the error with a WARN_ON rather than crashing the kernel. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24l2tp: avoid multiple assignmentsTom Parkin3-9/+15
checkpatch warns about multiple assignments. Update l2tp accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-24net: pass a sockptr_t into ->setsockoptChristoph Hellwig1-2/+2
Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: cleanup kzalloc callsTom Parkin1-2/+2
Passing "sizeof(struct blah)" in kzalloc calls is less readable, potentially prone to future bugs if the type of the pointer is changed, and triggers checkpatch warnings. Tweak the kzalloc calls in l2tp which use this form to avoid the warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: cleanup netlink tunnel create address handlingTom Parkin1-24/+33
When creating an L2TP tunnel using the netlink API, userspace must either pass a socket FD for the tunnel to use (for managed tunnels), or specify the tunnel source/destination address (for unmanaged tunnels). Since source/destination addresses may be AF_INET or AF_INET6, the l2tp netlink code has conditionally compiled blocks to support IPv6. Rather than embedding these directly into l2tp_nl_cmd_tunnel_create (where it makes the code difficult to read and confuses checkpatch to boot) split the handling of address-related attributes into a separate function. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: cleanup netlink send of tunnel address informationTom Parkin1-56/+70
l2tp_nl_tunnel_send has conditionally compiled code to support AF_INET6, which makes the code difficult to follow and triggers checkpatch warnings. Split the code out into functions to handle the AF_INET v.s. AF_INET6 cases, which both improves readability and resolves the checkpatch warnings. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: check socket address type in l2tp_dfs_seq_tunnel_showTom Parkin1-3/+5
checkpatch warns about indentation and brace balancing around the conditionally compiled code for AF_INET6 support in l2tp_dfs_seq_tunnel_show. By adding another check on the socket address type we can make the code more readable while removing the checkpatch warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: cleanup unnecessary braces in if statementsTom Parkin2-17/+12
These checks are all simple and don't benefit from extra braces to clarify intent. Remove them for easier-reading code. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-23l2tp: cleanup comparisons to NULLTom Parkin6-48/+47
checkpatch warns about comparisons to NULL, e.g. CHECK: Comparison to NULL could be written "!rt" #474: FILE: net/l2tp/l2tp_ip.c:474: + if (rt == NULL) { These sort of comparisons are generally clearer and more readable the way checkpatch suggests, so update l2tp accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22l2tp: avoid precidence issues in L2TP_SKB_CB macroTom Parkin1-1/+1
checkpatch warned about the L2TP_SKB_CB macro's use of its argument: add braces to avoid the problem. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22l2tp: line-break long function prototypesTom Parkin1-2/+4
In l2tp_core.c both l2tp_tunnel_create and l2tp_session_create take quite a number of arguments and have a correspondingly long prototype. This is both quite difficult to scan visually, and triggers checkpatch warnings. Add a line break to make these function prototypes more readable. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22l2tp: prefer seq_puts for unformatted outputTom Parkin1-2/+2
checkpatch warns about use of seq_printf where seq_puts would do. Modify l2tp_debugfs accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22l2tp: prefer using BIT macroTom Parkin1-2/+2
Use BIT(x) rather than (1<<x), reported by checkpatch.pl. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22l2tp: add identifier name in function pointer prototypeTom Parkin1-1/+1
Reported by checkpatch: "WARNING: function definition argument 'struct sock *' should also have an identifier name" Add an identifier name to help document the prototype. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22l2tp: cleanup suspect code indentTom Parkin1-2/+2
l2tp_core has conditionally compiled code in l2tp_xmit_skb for IPv6 support. The structure of this code triggered a checkpatch warning due to incorrect indentation. Fix up the indentation to address the checkpatch warning. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22l2tp: cleanup wonky alignment of line-broken function callsTom Parkin3-8/+8
Arguments should be aligned with the function call open parenthesis as per checkpatch. Tweak some function calls which were not aligned correctly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22l2tp: cleanup difficult-to-read line breaksTom Parkin2-44/+31
Some l2tp code had line breaks which made the code more difficult to read. These were originally motivated by the 80-character line width coding guidelines, but were actually a negative from the perspective of trying to follow the code. Remove these linebreaks for clearer code, even if we do exceed 80 characters in width in some places. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22l2tp: cleanup commentsTom Parkin8-68/+47
Modify some l2tp comments to better adhere to kernel coding style, as reported by checkpatch.pl. Add descriptive comments for the l2tp per-net spinlocks to document their use. Fix an incorrect comment in l2tp_recv_common: RFC2661 section 5.4 states that: "The LNS controls enabling and disabling of sequence numbers by sending a data message with or without sequence numbers present at any time during the life of a session." l2tp handles this correctly in l2tp_recv_common, but the comment around the code was incorrect and confusing. Fix up the comment accordingly. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-22l2tp: cleanup whitespace useTom Parkin7-48/+48
Fix up various whitespace issues as reported by checkpatch.pl: * remove spaces around operators where appropriate, * add missing blank lines following declarations, * remove multiple blank lines, or trailing blank lines at the end of functions. Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19net/ipv6: remove compat_ipv6_{get,set}sockoptChristoph Hellwig1-4/+0
Handle the few cases that need special treatment in-line using in_compat_syscall(). This also removes all the now unused compat_{get,set}sockopt methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19net/ipv4: remove compat_ip_{get,set}sockoptChristoph Hellwig1-4/+0
Handle the few cases that need special treatment in-line using in_compat_syscall(). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-19net: remove compat_sock_common_{get,set}sockoptChristoph Hellwig2-6/+0
Add the compat handling to sock_common_{get,set}sockopt instead, keyed of in_compat_syscall(). This allow to remove the now unused ->compat_{get,set}sockopt methods from struct proto_ops. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-4/+1
All conflicts seemed rather trivial, with some guidance from Saeed Mameed on the tc_ct.c one. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-08l2tp: remove skb_dst_set() from l2tp_xmit_skb()Xin Long1-4/+1
In the tx path of l2tp, l2tp_xmit_skb() calls skb_dst_set() to set skb's dst. However, it will eventually call inet6_csk_xmit() or ip_queue_xmit() where skb's dst will be overwritten by: skb_dst_set_noref(skb, dst); without releasing the old dst in skb. Then it causes dst/dev refcnt leak: unregister_netdevice: waiting for eth0 to become free. Usage count = 1 This can be reproduced by simply running: # modprobe l2tp_eth && modprobe l2tp_ip # sh ./tools/testing/selftests/net/l2tp.sh So before going to inet6_csk_xmit() or ip_queue_xmit(), skb's dst should be dropped. This patch is to fix it by removing skb_dst_set() from l2tp_xmit_skb() and moving skb_dst_drop() into l2tp_xmit_core(). Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core") Reported-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: James Chapman <jchapman@katalix.com> Tested-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-28l2tp: fix l2tp_eth_dev_xmit()'s return typeLuc Van Oostenryck1-1/+1
The method ndo_start_xmit() is defined as returning an 'netdev_tx_t', which is a typedef for an enum type, but the implementation in this driver returns an 'int'. Fix this by returning 'netdev_tx_t' in this driver too. Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-14treewide: replace '---help---' in Kconfig files with 'help'Masahiro Yamada1-1/+1
Since commit 84af7a6194e4 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2020-05-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller3-15/+47
xdp_umem.c had overlapping changes between the 64-bit math fix for the calculation of npgs and the removal of the zerocopy memory type which got rid of the chunk_size_nohdr member. The mlx5 Kconfig conflict is a case where we just take the net-next copy of the Kconfig entry dependency as it takes on the ESWITCH dependency by one level of indirection which is what the 'net' conflicting change is trying to ensure. Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-30l2tp: add sk_family checks to l2tp_validate_socketEric Dumazet1-0/+3
syzbot was able to trigger a crash after using an ISDN socket and fool l2tp. Fix this by making sure the UDP socket is of the proper family. BUG: KASAN: slab-out-of-bounds in setup_udp_tunnel_sock+0x465/0x540 net/ipv4/udp_tunnel.c:78 Write of size 1 at addr ffff88808ed0c590 by task syz-executor.5/3018 CPU: 0 PID: 3018 Comm: syz-executor.5 Not tainted 5.7.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x188/0x20d lib/dump_stack.c:118 print_address_description.constprop.0.cold+0xd3/0x413 mm/kasan/report.c:382 __kasan_report.cold+0x20/0x38 mm/kasan/report.c:511 kasan_report+0x33/0x50 mm/kasan/common.c:625 setup_udp_tunnel_sock+0x465/0x540 net/ipv4/udp_tunnel.c:78 l2tp_tunnel_register+0xb15/0xdd0 net/l2tp/l2tp_core.c:1523 l2tp_nl_cmd_tunnel_create+0x4b2/0xa60 net/l2tp/l2tp_netlink.c:249 genl_family_rcv_msg_doit net/netlink/genetlink.c:673 [inline] genl_family_rcv_msg net/netlink/genetlink.c:718 [inline] genl_rcv_msg+0x627/0xdf0 net/netlink/genetlink.c:735 netlink_rcv_skb+0x15a/0x410 net/netlink/af_netlink.c:2469 genl_rcv+0x24/0x40 net/netlink/genetlink.c:746 netlink_unicast_kernel net/netlink/af_netlink.c:1303 [inline] netlink_unicast+0x537/0x740 net/netlink/af_netlink.c:1329 netlink_sendmsg+0x882/0xe10 net/netlink/af_netlink.c:1918 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6e6/0x810 net/socket.c:2352 ___sys_sendmsg+0x100/0x170 net/socket.c:2406 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x45ca29 Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007effe76edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004fe1c0 RCX: 000000000045ca29 RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000005 RBP: 000000000078bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff R13: 000000000000094e R14: 00000000004d5d00 R15: 00007effe76ee6d4 Allocated by task 3018: save_stack+0x1b/0x40 mm/kasan/common.c:49 set_track mm/kasan/common.c:57 [inline] __kasan_kmalloc mm/kasan/common.c:495 [inline] __kasan_kmalloc.constprop.0+0xbf/0xd0 mm/kasan/common.c:468 __do_kmalloc mm/slab.c:3656 [inline] __kmalloc+0x161/0x7a0 mm/slab.c:3665 kmalloc include/linux/slab.h:560 [inline] sk_prot_alloc+0x223/0x2f0 net/core/sock.c:1612 sk_alloc+0x36/0x1100 net/core/sock.c:1666 data_sock_create drivers/isdn/mISDN/socket.c:600 [inline] mISDN_sock_create+0x272/0x400 drivers/isdn/mISDN/socket.c:796 __sock_create+0x3cb/0x730 net/socket.c:1428 sock_create net/socket.c:1479 [inline] __sys_socket+0xef/0x200 net/socket.c:1521 __do_sys_socket net/socket.c:1530 [inline] __se_sys_socket net/socket.c:1528 [inline] __x64_sys_socket+0x6f/0xb0 net/socket.c:1528 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 Freed by task 2484: save_stack+0x1b/0x40 mm/kasan/common.c:49 set_track mm/kasan/common.c:57 [inline] kasan_set_free_info mm/kasan/common.c:317 [inline] __kasan_slab_free+0xf7/0x140 mm/kasan/common.c:456 __cache_free mm/slab.c:3426 [inline] kfree+0x109/0x2b0 mm/slab.c:3757 kvfree+0x42/0x50 mm/util.c:603 __free_fdtable+0x2d/0x70 fs/file.c:31 put_files_struct fs/file.c:420 [inline] put_files_struct+0x248/0x2e0 fs/file.c:413 exit_files+0x7e/0xa0 fs/file.c:445 do_exit+0xb04/0x2dd0 kernel/exit.c:791 do_group_exit+0x125/0x340 kernel/exit.c:894 get_signal+0x47b/0x24e0 kernel/signal.c:2739 do_signal+0x81/0x2240 arch/x86/kernel/signal.c:784 exit_to_usermode_loop+0x26c/0x360 arch/x86/entry/common.c:161 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline] syscall_return_slowpath arch/x86/entry/common.c:279 [inline] do_syscall_64+0x6b1/0x7d0 arch/x86/entry/common.c:305 entry_SYSCALL_64_after_hwframe+0x49/0xb3 The buggy address belongs to the object at ffff88808ed0c000 which belongs to the cache kmalloc-2k of size 2048 The buggy address is located 1424 bytes inside of 2048-byte region [ffff88808ed0c000, ffff88808ed0c800) The buggy address belongs to the page: page:ffffea00023b4300 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 flags: 0xfffe0000000200(slab) raw: 00fffe0000000200 ffffea0002838208 ffffea00015ba288 ffff8880aa000e00 raw: 0000000000000000 ffff88808ed0c000 0000000100000001 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88808ed0c480: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88808ed0c500: 00 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88808ed0c580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ^ ffff88808ed0c600: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88808ed0c680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc Fixes: 6b9f34239b00 ("l2tp: fix races in tunnel creation") Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Guillaume Nault <gnault@redhat.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-30l2tp: do not use inet_hash()/inet_unhash()Eric Dumazet2-15/+44
syzbot recently found a way to crash the kernel [1] Issue here is that inet_hash() & inet_unhash() are currently only meant to be used by TCP & DCCP, since only these protocols provide the needed hashinfo pointer. L2TP uses a single list (instead of a hash table) This old bug became an issue after commit 610236587600 ("bpf: Add new cgroup attach type to enable sock modifications") since after this commit, sk_common_release() can be called while the L2TP socket is still considered 'hashed'. general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 7063 Comm: syz-executor654 Not tainted 5.7.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:inet_unhash+0x11f/0x770 net/ipv4/inet_hashtables.c:600 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e dd 04 00 00 48 8d 7d 08 44 8b 73 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 55 05 00 00 48 8d 7d 14 4c 8b 6d 08 48 b8 00 00 RSP: 0018:ffffc90001777d30 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff88809a6df940 RCX: ffffffff8697c242 RDX: 0000000000000001 RSI: ffffffff8697c251 RDI: 0000000000000008 RBP: 0000000000000000 R08: ffff88809f3ae1c0 R09: fffffbfff1514cc1 R10: ffffffff8a8a6607 R11: fffffbfff1514cc0 R12: ffff88809a6df9b0 R13: 0000000000000007 R14: 0000000000000000 R15: ffffffff873a4d00 FS: 0000000001d2b880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000006cd090 CR3: 000000009403a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: sk_common_release+0xba/0x370 net/core/sock.c:3210 inet_create net/ipv4/af_inet.c:390 [inline] inet_create+0x966/0xe00 net/ipv4/af_inet.c:248 __sock_create+0x3cb/0x730 net/socket.c:1428 sock_create net/socket.c:1479 [inline] __sys_socket+0xef/0x200 net/socket.c:1521 __do_sys_socket net/socket.c:1530 [inline] __se_sys_socket net/socket.c:1528 [inline] __x64_sys_socket+0x6f/0xb0 net/socket.c:1528 do_syscall_64+0xf6/0x7d0 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x49/0xb3 RIP: 0033:0x441e29 Code: e8 fc b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb 08 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffdce184148 EFLAGS: 00000246 ORIG_RAX: 0000000000000029 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441e29 RDX: 0000000000000073 RSI: 0000000000000002 RDI: 0000000000000002 RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000402c30 R14: 0000000000000000 R15: 0000000000000000 Modules linked in: ---[ end trace 23b6578228ce553e ]--- RIP: 0010:inet_unhash+0x11f/0x770 net/ipv4/inet_hashtables.c:600 Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e dd 04 00 00 48 8d 7d 08 44 8b 73 08 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 55 05 00 00 48 8d 7d 14 4c 8b 6d 08 48 b8 00 00 RSP: 0018:ffffc90001777d30 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: ffff88809a6df940 RCX: ffffffff8697c242 RDX: 0000000000000001 RSI: ffffffff8697c251 RDI: 0000000000000008 RBP: 0000000000000000 R08: ffff88809f3ae1c0 R09: fffffbfff1514cc1 R10: ffffffff8a8a6607 R11: fffffbfff1514cc0 R12: ffff88809a6df9b0 R13: 0000000000000007 R14: 0000000000000000 R15: ffffffff873a4d00 FS: 0000000001d2b880(0000) GS:ffff8880ae600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000006cd090 CR3: 000000009403a000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Cc: Andrii Nakryiko <andriin@fb.com> Reported-by: syzbot+3610d489778b57cc8031@syzkaller.appspotmail.com
2020-05-18ipv6: move SIOCADDRT and SIOCDELRT handling into ->compat_ioctlChristoph Hellwig1-0/+1
To prepare removing the global routing_ioctl hack start lifting the code into a newly added ipv6 ->compat_ioctl handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-04net: partially revert dynamic lockdep key changesCong Wang1-0/+1
This patch reverts the folowing commits: commit 064ff66e2bef84f1153087612032b5b9eab005bd "bonding: add missing netdev_update_lockdep_key()" commit 53d374979ef147ab51f5d632dfe20b14aebeccd0 "net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()" commit 1f26c0d3d24125992ab0026b0dab16c08df947c7 "net: fix kernel-doc warning in <linux/netdevice.h>" commit ab92d68fc22f9afab480153bd82a20f6e2533769 "net: core: add generic lockdep keys" but keeps the addr_list_lock_key because we still lock addr_list_lock nestedly on stack devices, unlikely xmit_lock this is safe because we don't take addr_list_lock on any fast path. Reported-and-tested-by: syzbot+aaa6fa4949cc5d9b7b25@syzkaller.appspotmail.com Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-04-08l2tp: Allow management of tunnels and session in user namespaceMichael Weiß1-8/+8
Creation and management of L2TPv3 tunnels and session through netlink requires CAP_NET_ADMIN. However, a process with CAP_NET_ADMIN in a non-initial user namespace gets an EPERM due to the use of the genetlink GENL_ADMIN_PERM flag. Thus, management of L2TP VPNs inside an unprivileged container won't work. We replaced the GENL_ADMIN_PERM by the GENL_UNS_ADMIN_PERM flag similar to other network modules which also had this problem, e.g., openvswitch (commit 4a92602aa1cd "openvswitch: allow management from inside user namespaces") and nl80211 (commit 5617c6cd6f844 "nl80211: Allow privileged operations from user namespaces"). I tested this in the container runtime trustm3 (trustm3.github.io) and was able to create l2tp tunnels and sessions in unpriviliged (user namespaced) containers using a private network namespace. For other runtimes such as docker or lxc this should work, too. Signed-off-by: Michael Weiß <michael.weiss@aisec.fraunhofer.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-28l2tp: Replace zero-length array with flexible-array memberGustavo A. R. Silva1-1/+1
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] Lastly, fix the following checkpatch warning: CHECK: Prefer kernel type 'u8' over 'uint8_t' #50: FILE: net/l2tp/l2tp_core.h:119: + uint8_t priv[]; /* private data */ This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-04l2tp: Allow duplicate session creation with UDPRidge Kennedy1-1/+6
In the past it was possible to create multiple L2TPv3 sessions with the same session id as long as the sessions belonged to different tunnels. The resulting sessions had issues when used with IP encapsulated tunnels, but worked fine with UDP encapsulated ones. Some applications began to rely on this behaviour to avoid having to negotiate unique session ids. Some time ago a change was made to require session ids to be unique across all tunnels, breaking the applications making use of this "feature". This change relaxes the duplicate session id check to allow duplicates if both of the colliding sessions belong to UDP encapsulated tunnels. Fixes: dbdbc73b4478 ("l2tp: fix duplicate session creation") Signed-off-by: Ridge Kennedy <ridge.kennedy@alliedtelesis.co.nz> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-03l2tp: Remove redundant BUG_ON() check in l2tp_pernetXu Wang1-2/+0
Passing NULL to l2tp_pernet causes a crash via BUG_ON. Dereferencing net in net_generic() also has the same effect. This patch removes the redundant BUG_ON check on the same parameter. Signed-off-by: Xu Wang <vulab@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-04net: ipv6: add net argument to ip6_dst_lookup_flowSabrina Dubroca1-1/+1
This will be used in the conversion of ipv6_stub to ip6_dst_lookup_flow, as some modules currently pass a net argument without a socket to ip6_dst_lookup. This is equivalent to commit 343d60aada5a ("ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument"). Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-24net: core: add generic lockdep keysTaehee Yoo1-1/+0
Some interface types could be nested. (VLAN, BONDING, TEAM, MACSEC, MACVLAN, IPVLAN, VIRT_WIFI, VXLAN, etc..) These interface types should set lockdep class because, without lockdep class key, lockdep always warn about unexisting circular locking. In the current code, these interfaces have their own lockdep class keys and these manage itself. So that there are so many duplicate code around the /driver/net and /net/. This patch adds new generic lockdep keys and some helper functions for it. This patch does below changes. a) Add lockdep class keys in struct net_device - qdisc_running, xmit, addr_list, qdisc_busylock - these keys are used as dynamic lockdep key. b) When net_device is being allocated, lockdep keys are registered. - alloc_netdev_mqs() c) When net_device is being free'd llockdep keys are unregistered. - free_netdev() d) Add generic lockdep key helper function - netdev_register_lockdep_key() - netdev_unregister_lockdep_key() - netdev_update_lockdep_key() e) Remove unnecessary generic lockdep macro and functions f) Remove unnecessary lockdep code of each interfaces. After this patch, each interface modules don't need to maintain their lockdep keys. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-01netfilter: drop bridge nf reset from nf_resetFlorian Westphal4-4/+4
commit 174e23810cd31 ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi recycle always drop skb extensions. The additional skb_ext_del() that is performed via nf_reset on napi skb recycle is not needed anymore. Most nf_reset() calls in the stack are there so queued skb won't block 'rmmod nf_conntrack' indefinitely. This removes the skb_ext_del from nf_reset, and renames it to a more fitting nf_reset_ct(). In a few selected places, add a call to skb_ext_reset to make sure that no active extensions remain. I am submitting this for "net", because we're still early in the release cycle. The patch applies to net-next too, but I think the rename causes needless divergence between those trees. Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-07-30compat_ioctl: pppoe: fix PPPOEIOCSFWD handlingArnd Bergmann1-0/+3
Support for handling the PPPOEIOCSFWD ioctl in compat mode was added in linux-2.5.69 along with hundreds of other commands, but was always broken sincen only the structure is compatible, but the command number is not, due to the size being sizeof(size_t), or at first sizeof(sizeof((struct sockaddr_pppox)), which is different on 64-bit architectures. Guillaume Nault adds: And the implementation was broken until 2016 (see 29e73269aa4d ("pppoe: fix reference counting in PPPoE proxy")), and nobody ever noticed. I should probably have removed this ioctl entirely instead of fixing it. Clearly, it has never been used. Fix it by adding a compat_ioctl handler for all pppoe variants that translates the command number and then calls the regular ioctl function. All other ioctl commands handled by pppoe are compatible between 32-bit and 64-bit, and require compat_ptr() conversion. This should apply to all stable kernels. Acked-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-08ipv6: elide flowlabel check if no exclusive leases existWillem de Bruijn1-2/+2
Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO. If not set, by default an autogenerated flowlabel is selected. Explicit flowlabels require a control operation per label plus a datapath check on every connection (every datagram if unconnected). This is particularly expensive on unconnected sockets multiplexing many flows, such as QUIC. In the common case, where no lease is exclusive, the check can be safely elided, as both lease request and check trivially succeed. Indeed, autoflowlabel does the same even with exclusive leases. Elide the check if no process has requested an exclusive lease. fl6_sock_lookup previously returns either a reference to a lease or NULL to denote failure. Modify to return a real error and update all callers. On return NULL, they can use the label and will elide the atomic_dec in fl6_sock_release. This is an optimization. Robust applications still have to revert to requesting leases if the fast path fails due to an exclusive lease. Changes RFC->v1: - use static_key_false_deferred to rate limit jump label operations - call static_key_deferred_flush to stop timers on exit - move decrement out of RCU context - defer optimization also if opt data is associated with a lease - updated all fp6_sock_lookup callers, not just udp Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-12/+3
Minor SPDX change conflict. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-06-19treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500Thomas Gleixner3-12/+3
Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-14l2tp: no need to check return value of debugfs_create functionsGreg Kroah-Hartman1-18/+3
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Also, there is no need to store the individual debugfs file name, just remove the whole directory all at once, saving a local variable. Cc: "David S. Miller" <davem@davemloft.net> Cc: Guillaume Nault <g.nault@alphalink.fr> Cc: netdev@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-30treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152Thomas Gleixner5-25/+5
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-21treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner1-0/+1
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-1/+2
Minor conflict with the DSA legacy code removal. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07l2tp: Fix possible NULL pointer dereferenceYueHaibing1-1/+2
BUG: unable to handle kernel NULL pointer dereference at 0000000000000128 PGD 0 P4D 0 Oops: 0000 [#1 CPU: 0 PID: 5697 Comm: modprobe Tainted: G W 5.1.0-rc7+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:__lock_acquire+0x53/0x10b0 Code: 8b 1c 25 40 5e 01 00 4c 8b 6d 10 45 85 e4 0f 84 bd 06 00 00 44 8b 1d 7c d2 09 02 49 89 fe 41 89 d2 45 85 db 0f 84 47 02 00 00 <48> 81 3f a0 05 70 83 b8 00 00 00 00 44 0f 44 c0 83 fe 01 0f 86 3a RSP: 0018:ffffc90001c07a28 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff88822f038440 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000128 RBP: ffffc90001c07a88 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000128 R15: 0000000000000000 FS: 00007fead0811540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000128 CR3: 00000002310da000 CR4: 00000000000006f0 Call Trace: ? __lock_acquire+0x24e/0x10b0 lock_acquire+0xdf/0x230 ? flush_workqueue+0x71/0x530 flush_workqueue+0x97/0x530 ? flush_workqueue+0x71/0x530 l2tp_exit_net+0x170/0x2b0 [l2tp_core ? l2tp_exit_net+0x93/0x2b0 [l2tp_core ops_exit_list.isra.6+0x36/0x60 unregister_pernet_operations+0xb8/0x110 unregister_pernet_device+0x25/0x40 l2tp_init+0x55/0x1000 [l2tp_core ? 0xffffffffa018d000 do_one_initcall+0x6c/0x3cc ? do_init_module+0x22/0x1f1 ? rcu_read_lock_sched_held+0x97/0xb0 ? kmem_cache_alloc_trace+0x325/0x3b0 do_init_module+0x5b/0x1f1 load_module+0x1db1/0x2690 ? m_show+0x1d0/0x1d0 __do_sys_finit_module+0xc5/0xd0 __x64_sys_finit_module+0x15/0x20 do_syscall_64+0x6b/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fead031a839 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe8d9acca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000560078398b80 RCX: 00007fead031a839 RDX: 0000000000000000 RSI: 000056007659dc2e RDI: 0000000000000003 RBP: 000056007659dc2e R08: 0000000000000000 R09: 0000560078398b80 R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000 R13: 00005600783a04a0 R14: 0000000000040000 R15: 0000560078398b80 Modules linked in: l2tp_core(+) e1000 ip_tables ipv6 [last unloaded: l2tp_core CR2: 0000000000000128 ---[ end trace 8322b2b8bf83f8e1 If alloc_workqueue fails in l2tp_init, l2tp_net_ops is unregistered on failure path. Then l2tp_exit_net is called which will flush NULL workqueue, this patch add a NULL check to fix it. Fixes: 67e04c29ec0d ("l2tp: unregister l2tp_net_ops on failure path") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-5/+5
Three trivial overlapping conflicts. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-30l2ip: fix possible use-after-freeEric Dumazet1-4/+4
Before taking a refcount on a rcu protected structure, we need to make sure the refcount is not zero. syzbot reported : refcount_t: increment on 0; use-after-free. WARNING: CPU: 1 PID: 23533 at lib/refcount.c:156 refcount_inc_checked lib/refcount.c:156 [inline] WARNING: CPU: 1 PID: 23533 at lib/refcount.c:156 refcount_inc_checked+0x61/0x70 lib/refcount.c:154 Kernel panic - not syncing: panic_on_warn set ... CPU: 1 PID: 23533 Comm: syz-executor.2 Not tainted 5.1.0-rc7+ #93 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 panic+0x2cb/0x65c kernel/panic.c:214 __warn.cold+0x20/0x45 kernel/panic.c:571 report_bug+0x263/0x2b0 lib/bug.c:186 fixup_bug arch/x86/kernel/traps.c:179 [inline] fixup_bug arch/x86/kernel/traps.c:174 [inline] do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:272 do_invalid_op+0x37/0x50 arch/x86/kernel/traps.c:291 invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:973 RIP: 0010:refcount_inc_checked lib/refcount.c:156 [inline] RIP: 0010:refcount_inc_checked+0x61/0x70 lib/refcount.c:154 Code: 1d 98 2b 2a 06 31 ff 89 de e8 db 2c 40 fe 84 db 75 dd e8 92 2b 40 fe 48 c7 c7 20 7a a1 87 c6 05 78 2b 2a 06 01 e8 7d d9 12 fe <0f> 0b eb c1 90 90 90 90 90 90 90 90 90 90 90 55 48 89 e5 41 57 41 RSP: 0018:ffff888069f0fba8 EFLAGS: 00010286 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 000000000000f353 RSI: ffffffff815afcb6 RDI: ffffed100d3e1f67 RBP: ffff888069f0fbb8 R08: ffff88809b1845c0 R09: ffffed1015d23ef1 R10: ffffed1015d23ef0 R11: ffff8880ae91f787 R12: ffff8880a8f26968 R13: 0000000000000004 R14: dffffc0000000000 R15: ffff8880a49a6440 l2tp_tunnel_inc_refcount net/l2tp/l2tp_core.h:240 [inline] l2tp_tunnel_get+0x250/0x580 net/l2tp/l2tp_core.c:173 pppol2tp_connect+0xc00/0x1c70 net/l2tp/l2tp_ppp.c:702 __sys_connect+0x266/0x330 net/socket.c:1808 __do_sys_connect net/socket.c:1819 [inline] __se_sys_connect net/socket.c:1816 [inline] __x64_sys_connect+0x73/0xb0 net/socket.c:1816 Fixes: 54652eb12c1b ("l2tp: hold tunnel while looking up sessions in l2tp_netlink") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27genetlink: optionally validate strictly/dumpsJohannes Berg1-0/+9
Add options to strictly validate messages and dump messages, sometimes perhaps validating dump messages non-strictly may be required, so add an option for that as well. Since none of this can really be applied to existing commands, set the options everwhere using the following spatch: @@ identifier ops; expression X; @@ struct genl_ops ops[] = { ..., { .cmd = X, + .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP, ... }, ... }; For new commands one should just not copy the .validate 'opt-out' flags and thus get strict validation. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-27netlink: make nla_nest_start() add NLA_F_NESTED flagMichal Kubecek1-2/+2
Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most netlink based interfaces (including recently added ones) are still not setting it in kernel generated messages. Without the flag, message parsers not aware of attribute semantics (e.g. wireshark dissector or libmnl's mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display the structure of their contents. Unfortunately we cannot just add the flag everywhere as there may be userspace applications which check nlattr::nla_type directly rather than through a helper masking out the flags. Therefore the patch renames nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start() as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually are rewritten to use nla_nest_start(). Except for changes in include/net/netlink.h, the patch was generated using this semantic patch: @@ expression E1, E2; @@ -nla_nest_start(E1, E2) +nla_nest_start_noflag(E1, E2) @@ expression E1, E2; @@ -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED) +nla_nest_start(E1, E2) Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-26l2tp: use rcu_dereference_sk_user_data() in l2tp_udp_encap_recv()Eric Dumazet1-1/+1
Canonical way to fetch sk_user_data from an encap_rcv() handler called from UDP stack in rcu protected section is to use rcu_dereference_sk_user_data(), otherwise compiler might read it multiple times. Fixes: d00fa9adc528 ("il2tp: fix races with tunnel socket close") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-19net: rework SIOCGSTAMP ioctl handlingArnd Bergmann2-0/+2
The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many socket protocol handlers, and all of those end up calling the same sock_get_timestamp()/sock_get_timestampns() helper functions, which results in a lot of duplicate code. With the introduction of 64-bit time_t on 32-bit architectures, this gets worse, as we then need four different ioctl commands in each socket protocol implementation. To simplify that, let's add a new .gettstamp() operation in struct proto_ops, and move ioctl implementation into the common sock_ioctl()/compat_sock_ioctl_trans() functions that these all go through. We can reuse the sock_get_timestamp() implementation, but generalize it so it can deal with both native and compat mode, as well as timeval and timespec structures. Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-18l2tp: fix set but not used variableJakub Kicinski1-2/+1
GCC complains: net/l2tp/l2tp_ppp.c: In function ‘pppol2tp_ioctl’: net/l2tp/l2tp_ppp.c:1073:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable] int val; ^~~ Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-22genetlink: make policy common to familyJohannes Berg1-9/+1
Since maxattr is common, the policy can't really differ sanely, so make it common as well. The only user that did in fact manage to make a non-common policy is taskstats, which has to be really careful about it (since it's still using a common maxattr!). This is no longer supported, but we can fake it using pre_doit. This reduces the size of e.g. nl80211.o (which has lots of commands): text data bss dec hex filename 398745 14323 2240 415308 6564c net/wireless/nl80211.o (before) 397913 14331 2240 414484 65314 net/wireless/nl80211.o (after) -------------------------------- -832 +8 0 -824 Which is obviously just 8 bytes for each command, and an added 8 bytes for the new policy pointer. I'm not sure why the ops list is counted as .text though. Most of the code transformations were done using the following spatch: @ops@ identifier OPS; expression POLICY; @@ struct genl_ops OPS[] = { ..., { - .policy = POLICY, }, ... }; @@ identifier ops.OPS; expression ops.POLICY; identifier fam; expression M; @@ struct genl_family fam = { .ops = OPS, .maxattr = M, + .policy = POLICY, ... }; This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing the cb->data as ops, which we want to change in a later genl patch. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-13l2tp: fix infoleak in l2tp_ip6_recvmsg()Eric Dumazet1-3/+1
Back in 2013 Hannes took care of most of such leaks in commit bceaa90240b6 ("inet: prevent leakage of uninitialized memory to user in recv syscalls") But the bug in l2tp_ip6_recvmsg() has not been fixed. syzbot report : BUG: KMSAN: kernel-infoleak in _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 CPU: 1 PID: 10996 Comm: syz-executor362 Not tainted 5.0.0+ #11 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x173/0x1d0 lib/dump_stack.c:113 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:600 kmsan_internal_check_memory+0x9f4/0xb10 mm/kmsan/kmsan.c:694 kmsan_copy_to_user+0xab/0xc0 mm/kmsan/kmsan_hooks.c:601 _copy_to_user+0x16b/0x1f0 lib/usercopy.c:32 copy_to_user include/linux/uaccess.h:174 [inline] move_addr_to_user+0x311/0x570 net/socket.c:227 ___sys_recvmsg+0xb65/0x1310 net/socket.c:2283 do_recvmmsg+0x646/0x10c0 net/socket.c:2390 __sys_recvmmsg net/socket.c:2469 [inline] __do_sys_recvmmsg net/socket.c:2492 [inline] __se_sys_recvmmsg+0x1d1/0x350 net/socket.c:2485 __x64_sys_recvmmsg+0x62/0x80 net/socket.c:2485 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291 entry_SYSCALL_64_after_hwframe+0x63/0xe7 RIP: 0033:0x445819 Code: e8 6c b6 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 2b 12 fc ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f64453eddb8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b RAX: ffffffffffffffda RBX: 00000000006dac28 RCX: 0000000000445819 RDX: 0000000000000005 RSI: 0000000020002f80 RDI: 0000000000000003 RBP: 00000000006dac20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dac2c R13: 00007ffeba8f87af R14: 00007f64453ee9c0 R15: 20c49ba5e353f7cf Local variable description: ----addr@___sys_recvmsg Variable was created at: ___sys_recvmsg+0xf6/0x1310 net/socket.c:2244 do_recvmmsg+0x646/0x10c0 net/socket.c:2390 Bytes 0-31 of 32 are uninitialized Memory access of size 32 starts at ffff8880ae62fbb0 Data copied to user address 0000000020000000 Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-31l2tp: copy 4 more bytes to linear part if necessaryJacob Wen1-3/+2
The size of L2TPv2 header with all optional fields is 14 bytes. l2tp_udp_recv_core only moves 10 bytes to the linear part of a skb. This may lead to l2tp_recv_common read data outside of a skb. This patch make sure that there is at least 14 bytes in the linear part of a skb to meet the maximum need of l2tp_udp_recv_core and l2tp_recv_common. The minimum size of both PPP HDLC-like frame and Ethernet frame is larger than 14 bytes, so we are safe to do so. Also remove L2TP_HDR_SIZE_NOSEQ, it is unused now. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Suggested-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jacob Wen <jian.w.wen@oracle.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-01-30l2tp: fix reading optional fields of L2TPv3Jacob Wen4-0/+30
Use pskb_may_pull() to make sure the optional fields are in skb linear parts, so we can safely read them later. It's easy to reproduce the issue with a net driver that supports paged skb data. Just create a L2TPv3 over IP tunnel and then generates some network traffic. Once reproduced, rx err in /sys/kernel/debug/l2tp/tunnels will increase. Changes in v4: 1. s/l2tp_v3_pull_opt/l2tp_v3_ensure_opt_in_linear/ 2. s/tunnel->version != L2TP_HDR_VER_2/tunnel->version == L2TP_HDR_VER_3/ 3. Add 'Fixes' in commit messages. Changes in v3: 1. To keep consistency, move the code out of l2tp_recv_common. 2. Use "net" instead of "net-next", since this is a bug fix. Changes in v2: 1. Only fix L2TPv3 to make code simple. To fix both L2TPv3 and L2TPv2, we'd better refactor l2tp_recv_common. It's complicated to do so. 2. Reloading pointers after pskb_may_pull Fixes: f7faffa3ff8e ("l2tp: Add L2TPv3 protocol support") Fixes: 0d76751fad77 ("l2tp: Add L2TPv3 IP encapsulation (no UDP) support") Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6") Signed-off-by: Jacob Wen <jian.w.wen@oracle.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-11-14l2tp: fix a sock refcnt leak in l2tp_tunnel_registerXin Long1-5/+4
This issue happens when trying to add an existent tunnel. It doesn't call sock_put() before returning -EEXIST to release the sock refcnt that was held by calling sock_hold() before the existence check. This patch is to fix it by holding the sock after doing the existence check. Fixes: f6cd651b056f ("l2tp: fix race in duplicate tunnel detection") Reported-by: Jianlin Shi <jishi@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>