aboutsummaryrefslogtreecommitdiffstats
AgeCommit message (Collapse)AuthorFilesLines
2025-11-26ACPI: processor_core: fix map_x2apic_id for amd-pstate on am4René Rebe1-1/+1
On all AMD AM4 systems I have seen, e.g ASUS X470-i, Pro WS X570 Ace and equivalent Gigabyte, amd-pstate does not initialize when the x2apic is enabled in the BIOS. Kernel debug messages include: [ 0.315438] acpi LNXCPU:00: Failed to get CPU physical ID. [ 0.354756] ACPI CPPC: No CPC descriptor for CPU:0 [ 0.714951] amd_pstate: the _CPC object is not present in SBIOS or ACPI disabled I tracked this down to map_x2apic_id() checking device_declaration passed in via the type argument of acpi_get_phys_id() via map_madt_entry() while map_lapic_id() does not. It appears these BIOSes use Processor statements for declaring the CPUs in the ACPI namespace instead of processor device objects (which should have been used). CPU declarations via Processor statements were deprecated in ACPI 6.0 that was released 10 years ago. They should not be used any more in any contemporary platform firmware. I tried to contact Asus support multiple times, but never received a reply nor did any BIOS update ever change this. Fix amd-pstate w/ x2apic on am4 by allowing map_x2apic_id() to work with CPUs declared via Processor statements for IDs less than 255, which is consistent with ACPI 5.0 that still allowed Processor statements to be used for declaring CPUs. Fixes: 7237d3de78ff ("x86, ACPI: add support for x2apic ACPI extensions") Signed-off-by: René Rebe <rene@exactco.de> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20251126.165513.1373131139292726554.rene@exactco.de Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-26watchdog: diag288_wdt: Remove KMSG_COMPONENT macroHeiko Carstens1-2/+1
The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message catalog" from 2008 [1] which never made it upstream. The macro was added to s390 code to allow for an out-of-tree patch which used this to generate unique message ids. Also this out-of-tree doesn't exist anymore. Remove the macro in order to get rid of a pointless indirection. [1] https://lwn.net/Articles/292650/ Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-26net/sched: em_canid: fix uninit-value in em_canid_matchShaurya Rane1-0/+3
Use pskb_may_pull() to ensure a complete CAN frame is present in the linear data buffer before reading the CAN ID. A simple skb->len check is insufficient because it only verifies the total data length but does not guarantee the data is present in skb->data (it could be in fragments). pskb_may_pull() both validates the length and pulls fragmented data into the linear buffer if necessary, making it safe to directly access skb->data. Reported-by: syzbot+5d8269a1e099279152bc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=5d8269a1e099279152bc Fixes: f057bbb6f9ed ("net: em_canid: Ematch rule to match CAN frames according to their identifiers") Signed-off-by: Shaurya Rane <ssrane_b23@ee.vjti.ac.in> Link: https://patch.msgid.link/20251126085718.50808-1-ssranevjti@gmail.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: rcar_canfd: Fix CAN-FD mode as defaultBiju Das1-22/+31
The commit 5cff263606a1 ("can: rcar_canfd: Fix controller mode setting") has aligned with the flow mentioned in the hardware manual for all SoCs except R-Car Gen3 and RZ/G2L SoCs. On R-Car Gen4 and RZ/G3E SoCs, due to the wrong logic in the commit[1] sets the default mode to FD-Only mode instead of CAN-FD mode. This patch sets the CAN-FD mode as the default for all SoCs by dropping the rcar_canfd_set_mode() as some SoC requires mode setting in global reset mode, and the rest of the SoCs in channel reset mode and update the rcar_canfd_reset_controller() to take care of these constraints. Moreover, the RZ/G3E and R-Car Gen4 SoCs support 3 modes compared to 2 modes on the R-Car Gen3. Use inverted logic in rcar_canfd_reset_controller() to simplify the code later to support FD-only mode. [1] commit 45721c406dcf ("can: rcar_canfd: Add support for r8a779a0 SoC") Fixes: 5cff263606a1 ("can: rcar_canfd: Fix controller mode setting") Cc: stable@vger.kernel.org Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://patch.msgid.link/20251118123926.193445-1-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26thermal/drivers/imx91: Add support for i.MX91 thermal monitoring unitPengfei Li3-0/+395
Introduce support for the i.MX91 thermal monitoring unit, which features a single sensor for the CPU. The register layout differs from other chips, necessitating the creation of a dedicated file for this. This sensor provides a resolution of 1/64°C (6-bit fraction). For actual accuracy, refer to the datasheet, as it varies depending on the chip grade. Provide an interrupt for end of measurement and threshold violation and Contain temperature threshold comparators, in normal and secure address space, with direction and threshold programmability. Datasheet Link: https://www.nxp.com/docs/en/data-sheet/IMX91CEC.pdf Signed-off-by: Pengfei Li <pengfei.li_1@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251020-imx91tmu-v7-2-48d7d9f25055@nxp.com
2025-11-26dt-bindings: thermal: fsl,imx91-tmu: add bindings for NXP i.MX91 thermal modulePengfei Li1-0/+87
Add bindings documentation for i.MX91 thermal modules. Signed-off-by: Pengfei Li <pengfei.li_1@nxp.com> Signed-off-by: Frank Li <Frank.Li@nxp.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://patch.msgid.link/20251020-imx91tmu-v7-1-48d7d9f25055@nxp.com
2025-11-26dt-bindings: thermal: tsens: Add QCS8300 compatibleGaurav Kohli1-0/+1
Add compatibility string for the thermal sensors on QCS8300 platform. Signed-off-by: Gaurav Kohli <quic_gkohli@quicinc.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Rob Herring (Arm) <robh@kernel.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Link: https://patch.msgid.link/20250822042316.1762153-2-quic_gkohli@quicinc.com
2025-11-26Merge tag 'timers-v6.19-rc1' of ↵Thomas Gleixner14-56/+291
git://git.kernel.org/pub/scm/linux/kernel/git/daniel.lezcano/linux into timers/clocksource Pull clocksource/event changes from Daniel Lezcano: - Use 64-bits for timer compensation for IoT usage where the suspend time is much longer than what 32-bits can provide (Enlin Mu) - Add delay support on sp804 for ARM32 platforms (Stephen Eta Zhou) - Fix missing resource release on error in the probe path of in the ralink driver (Haotian Zhang) - Fix double deregistration on probe failure in the NXP STM driver (Johan Hovold) - Disable runtime PM for the Renesas SH CMT timer because it is incompatible with PREEMPT_RT=y (Niklas Söderlund) - Fix section mismatches in the NXP STM driver (Johan Hovold) - Preventing unbinding the NXP PIT, STM and MMIO ARM Arch timers as the code does not suppport bind/unbind (Johan Hovold) - Use the clocksource instead of ticks on the RDA8810PL platform (Enlin Mu) - Drop the unused module alias for the STM32-LP (Johan Hovold) - Add Realtek system timer driver (Hao-Wen Ting) Link: https://lore.kernel.org/all/9303b790-28d4-4bd9-b01d-28fb05493596@linaro.org
2025-11-26Merge patch series "fs: tidy up step_into() & friends before inlining"Christian Brauner1-14/+44
Cleanup step_into() and walk_component() and inline them both. * patches from https://patch.msgid.link/20251120003803.2979978-1-mjguzik@gmail.com: fs: inline step_into() and walk_component() fs: tidy up step_into() & friends before inlining Link: https://patch.msgid.link/20251120003803.2979978-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-26fs: inline step_into() and walk_component()Mateusz Guzik1-3/+28
The primary consumer is link_path_walk(), calling walk_component() every time which in turn calls step_into(). Inlining these saves overhead of 2 function calls per path component, along with allowing the compiler to do better job optimizing them in place. step_into() had absolutely atrocious assembly to facilitate the slowpath. In order to lessen the burden at the callsite all the hard work is moved into step_into_slowpath() and instead an inline-able fastpath is implemented for rcu-walk. The new fastpath is a stripped down step_into() RCU handling with a d_managed() check from handle_mounts(). Benchmarked as follows on Sapphire Rapids: 1. the "before" was a kernel with not-yet-merged optimizations (notably elision of calls to security_inode_permission() and marking ext4 inodes as not having acls as applicable) 2. "after" is the same + the prep patch + this patch 3. benchmark consists of issuing 205 calls to access(2) in a loop with pathnames lifted out of gcc and the linker building real code, most of which have several path components and 118 of which fail with -ENOENT. Result in terms of ops/s: before: 21619 after: 22536 (+4%) profile before: 20.25% [kernel] [k] __d_lookup_rcu 10.54% [kernel] [k] link_path_walk 10.22% [kernel] [k] entry_SYSCALL_64 6.50% libc.so.6 [.] __GI___access 6.35% [kernel] [k] strncpy_from_user 4.87% [kernel] [k] step_into 3.68% [kernel] [k] kmem_cache_alloc_noprof 2.88% [kernel] [k] walk_component 2.86% [kernel] [k] kmem_cache_free 2.14% [kernel] [k] set_root 2.08% [kernel] [k] lookup_fast after: 23.38% [kernel] [k] __d_lookup_rcu 11.27% [kernel] [k] entry_SYSCALL_64 10.89% [kernel] [k] link_path_walk 7.00% libc.so.6 [.] __GI___access 6.88% [kernel] [k] strncpy_from_user 3.50% [kernel] [k] kmem_cache_alloc_noprof 2.01% [kernel] [k] kmem_cache_free 2.00% [kernel] [k] set_root 1.99% [kernel] [k] lookup_fast 1.81% [kernel] [k] do_syscall_64 1.69% [kernel] [k] entry_SYSCALL_64_safe_stack While walk_component() and step_into() of course disappear from the profile, the link_path_walk() barely gets more overhead despite the inlining thanks to the fast path added and while completing more walks per second. I did not investigate why overhead grew a lot on __d_lookup_rcu(). Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251120003803.2979978-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-26fs: tidy up step_into() & friends before inliningMateusz Guzik1-13/+18
Symlink handling is already marked as unlikely and pushing out some of it into pick_link() reduces register spillage on entry to step_into() with gcc 14.2. The compiler needed additional convincing that handle_mounts() is unlikely to fail. At the same time neither clang nor gcc could be convinced to tail-call into pick_link(). While pick_link() takes an address of stack-based object as an argument (which definitely prevents the optimization), splitting it into separate <dentry, mount> tuple did not help. The issue persists even when compiled without stack protector. As such nothing was done about this for the time being to not grow the diff. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251120003803.2979978-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-26Merge patch series "re-enable IOCB_NOWAIT writes to files v2"Christian Brauner5-49/+29
Christoph Hellwig <hch@lst.de> says: [Fix] the layering bypass in btrfs when updating timestamps on device files for devices removed from btrfs usage, and FMODE_NOCMTIME handling in the VFS now that nfsd started using it. Note that I'm still not sure that nfsd usage is fully correct for all file systems, as only XFS explicitly supports FMODE_NOCMTIME, but at least the generic code does the right thing now. * patches from https://patch.msgid.link/20251120064859.2911749-1-hch@lst.de: orangefs: use inode_update_timestamps directly btrfs: fix the comment on btrfs_update_time btrfs: use vfs_utimes to update file timestamps fs: export vfs_utimes fs: lift the FMODE_NOCMTIME check into file_update_time_flags fs: refactor file timestamp update logic Link: https://patch.msgid.link/20251120064859.2911749-1-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-26orangefs: use inode_update_timestamps directlyChristoph Hellwig1-1/+3
Orangefs has no i_version handling and __orangefs_setattr already explicitly marks the inode dirty. So instead of the using the flags return value from generic_update_time, just call the lower level inode_update_timestamps helper directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251120064859.2911749-7-hch@lst.de Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-26btrfs: fix the comment on btrfs_update_timeChristoph Hellwig1-2/+2
Since commit e41f941a2311 ("Btrfs: move over to use ->update_time") this is not a copy of the high-level file_update_time helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251120064859.2911749-6-hch@lst.de Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-26btrfs: use vfs_utimes to update file timestampsChristoph Hellwig1-7/+4
Btrfs updates the device node timestamps for block device special files when it stop using the device. Commit 8f96a5bfa150 ("btrfs: update the bdev time directly when closing") switch that update from the correct layering to directly call the low-level helper on the bdev inode. This is wrong and got fixed in commit 54fde91f52f5 ("btrfs: update device path inode time instead of bd_inode") by updating the file system inode instead of the bdev inode, but this kept the incorrect bypassing of the VFS interfaces and file system ->update_times method. Fix this by using the propet vfs_utimes interface. Fixes: 8f96a5bfa150 ("btrfs: update the bdev time directly when closing") Fixes: 54fde91f52f5 ("btrfs: update device path inode time instead of bd_inode") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251120064859.2911749-5-hch@lst.de Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-26fs: export vfs_utimesChristoph Hellwig1-0/+1
This will be used to replace an incorrect direct call into generic_update_time in btrfs. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251120064859.2911749-4-hch@lst.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-26fs: lift the FMODE_NOCMTIME check into file_update_time_flagsChristoph Hellwig1-2/+2
FMODE_NOCMTIME used to be just a hack for the legacy XFS handle-based "invisible I/O", but commit e5e9b24ab8fa ("nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation") started using it from generic callers. I'm not sure other file systems are actually read for this in general, so the above commit should get a closer look, but for it to make any sense, file_update_time needs to respect the flag. Lift the check from file_modified_flags to file_update_time so that users of file_update_time inherit the behavior and so that all the checks are done in one place. Fixes: e5e9b24ab8fa ("nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251120064859.2911749-3-hch@lst.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-26fs: refactor file timestamp update logicChristoph Hellwig1-37/+17
Currently the two high-level APIs use two helper functions to implement almost all of the logic. Refactor the two helpers and the common logic into a new file_update_time_flags routine that gets the iocb flags or 0 in case of file_update_time passed so that the entire logic is contained in a single function and can be easily understood and modified. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251120064859.2911749-2-hch@lst.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-26Merge back ACPI processor driver changes for 6.19Rafael J. Wysocki1-26/+16
2025-11-26spi: tegra114: remove Kconfig dependency on TEGRA20_APB_DMAFrancesco Lavra1-2/+2
This driver runs also on Tegra SoCs without a Tegra20 APB DMA controller (e.g. Tegra234). Remove the Kconfig dependency on TEGRA20_APB_DMA; in addition, amend the help text to reflect the fact that this driver works on SoCs different from Tegra114. Fixes: bb9667d8187b ("arm64: tegra: Add SPI device tree nodes for Tegra234") Signed-off-by: Francesco Lavra <flavra@baylibre.com> Link: https://patch.msgid.link/20251126095027.4102004-1-flavra@baylibre.com Signed-off-by: Mark Brown <broonie@kernel.org>
2025-11-26serial: 8250: Fix 8250_rsa symbol loopIlpo Järvinen4-13/+21
Depmod fails for a kernel made with: make allnoconfig echo -e "CONFIG_MODULES=y\nCONFIG_SERIAL_8250=m\nCONFIG_SERIAL_8250_EXTENDED=y\nCONFIG_SERIAL_8250_RSA=y" >> .config make olddefconfig ...due to a dependency loop: depmod: ERROR: Cycle detected: 8250 -> 8250_base -> 8250 depmod: ERROR: Found 2 modules in dependency cycles! This is caused by the move of 8250 RSA code from 8250_port.c (in 8250_base.ko) into 8250_rsa.c (in 8250.ko) by the commit 5a128fb475fb ("serial: 8250: move RSA functions to 8250_rsa.c"). The commit b20d6576cdb3 ("serial: 8250: export RSA functions") tried to fix a missing symbol issue with EXPORTs but those then cause this dependency cycle. Break dependency loop by moving 8250_rsa.o from 8250.ko to 8250_base.ko and by passing univ8250_port_base_ops to univ8250_rsa_support() that can make a local copy of it. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reported-by: Alex Davis <alex47794@gmail.com> Fixes: 5a128fb475fb ("serial: 8250: move RSA functions to 8250_rsa.c") Fixes: b20d6576cdb3 ("serial: 8250: export RSA functions") Cc: stable <stable@kernel.org> Link: https://lore.kernel.org/all/87frc3sd8d.fsf@posteo.net/ Link: https://lore.kernel.org/all/CADiockCvM6v+d+UoFZpJSMoLAdpy99_h-hJdzUsdfaWGn3W7-g@mail.gmail.com/ Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://patch.msgid.link/20251110105043.4062-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-11-26s390/entry: Use lay instead of aghikHeiko Carstens1-1/+1
Use the lay instruction instead of aghik. aghik is only available since z196, therefore compiling the kernel for z10 results in this error: arch/s390/kernel/entry.S: Assembler messages: arch/s390/kernel/entry.S:165: Error: Unrecognized opcode: `aghik' Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202511261518.nBbQN5h7-lkp@intel.com/ Fixes: f5730d44e05e ("s390: Add stackprotector support") Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-26dt-bindings: can: mpfs: document resetsConor Dooley1-0/+5
The CAN cores on Polarfire SoC both have a reset. The platform firmware brings both cores out of reset, but the linux driver must use them during normal operation. The resets should have been made required, but this is one of the things that can happen when the binding is written without driver support. Fixes: c878d518d7b6 ("dt-bindings: can: mpfs: document the mpfs CAN controller") Signed-off-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Link: https://patch.msgid.link/20251121-sample-footsore-743d81772efc@spud Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26clocksource/drivers: Add Realtek system timer driverHao-Wen Ting4-0/+167
Add a system timer driver for Realtek SoCs. This driver registers the 1 MHz global hardware counter on Realtek platforms as a clock event device. Since this hardware counter starts counting automatically after SoC power-on, no clock initialization is required. Because the counter does not stop or get affected by CPU power down, and it supports oneshot mode, it is typically used as a tick broadcast timer. Signed-off-by: Hao-Wen Ting <haowen.ting@realtek.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251126060110.198330-3-haowen.ting@realtek.com
2025-11-26dt-bindings: timer: Add Realtek SYSTIMERHao-Wen Ting1-0/+47
The Realtek SYSTIMER (System Timer) is a 64-bit global hardware counter operating at a fixed 1MHz frequency. Thanks to its compare match interrupt capability, the timer natively supports oneshot mode for tick broadcast functionality. Signed-off-by: Hao-Wen Ting <haowen.ting@realtek.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Link: https://patch.msgid.link/20251126060110.198330-2-haowen.ting@realtek.com
2025-11-26clocksource/drivers/stm32-lp: Drop unused module aliasJohan Hovold1-1/+0
The driver cannot be built as a module so drop the unused platform module alias. Note that platform aliases are not needed for OF probing should it ever become possible to build the driver as a module. Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251111154516.1698-1-johan@kernel.org
2025-11-26clocksource/drivers/rda: Add sched_clock_register for RDA8810PL SoCEnlin Mu1-1/+8
The current system log timestamp accuracy is tick based, which can not meet the usage requirements and needs to reach nanoseconds. Therefore, the sched_clock_register function needs to be added. [ dlezcano: Fixed typos ] Signed-off-by: Enlin Mu <enlin.mu@unisoc.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251107063347.3692-1-enlin.mu@linux.dev
2025-11-26clocksource/drivers/nxp-stm: Prevent driver unbindJohan Hovold1-1/+2
Clockevents cannot be deregistered so suppress the bind attributes to prevent the driver from being unbound and releasing the underlying resources after registration. Even if the driver can currently only be built-in, also switch to builtin_platform_driver() to prevent it from being unloaded should modular builds ever be enabled. Fixes: cec32ac75827 ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251111153226.579-4-johan@kernel.org
2025-11-26Merge patch series "MAINTAINERS: Add myself as m_can maintainer"Marc Kleine-Budde1-5/+3
Markus Schneider-Pargmann <msp@baylibre.com> says: these two patches are updating the m_can maintainer entry, replacing the current maintainer and simplifying the files mentioned. Link: https://patch.msgid.link/20251119-topic-mcan-reviewer-v6-18-v2-0-f842c3094b18@baylibre.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26MAINTAINERS: Simplify m_can sectionMarkus Schneider-Pargmann1-4/+2
Simplify the section by using the whole m_can directory. This includes a few new files, e.g. Kconfig, Makefile, m_can_pci.c and tcan4x5x* which are all closely coupled to the m_can driver core. Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com> Link: https://patch.msgid.link/20251119-topic-mcan-reviewer-v6-18-v2-2-f842c3094b18@baylibre.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26MAINTAINERS: Add myself as m_can maintainerMarkus Schneider-Pargmann1-1/+1
As I have contributed to the m_can driver over the past years, I would like to continue maintaining the driver. As Chandrasekar is currently not responsive, I will replace him as the maintainer of the driver. Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com> Link: https://patch.msgid.link/20251119-topic-mcan-reviewer-v6-18-v2-1-f842c3094b18@baylibre.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26clocksource/drivers/nxp-pit: Prevent driver unbindJohan Hovold1-1/+2
The driver does not support unbinding (e.g. as clockevents cannot be deregistered) so suppress the bind attributes to prevent the driver from being unbound and rebound after registration (and disabling the timer when reprobing fails). Even if the driver can currently only be built-in, also switch to builtin_platform_driver() to prevent it from being unloaded should modular builds ever be enabled. Fixes: bee33f22d7c3 ("clocksource/drivers/nxp-pit: Add NXP Automotive s32g2 / s32g3 support") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251111153226.579-3-johan@kernel.org
2025-11-26clocksource/drivers/arm_arch_timer_mmio: Prevent driver unbindJohan Hovold1-0/+2
Clockevents cannot be deregistered so suppress the bind attributes to prevent the driver from being unbound and releasing the underlying resources after registration. Fixes: 4891f01527bb ("clocksource/drivers/arm_arch_timer: Add standalone MMIO driver") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20251111153226.579-2-johan@kernel.org
2025-11-26clocksource/drivers/nxp-stm: Fix section mismatchesJohan Hovold1-9/+9
Platform drivers can be probed after their init sections have been discarded (e.g. on probe deferral or manual rebind through sysfs) so the probe function must not live in init. Device managed resource actions similarly cannot be discarded. The "_probe" suffix of the driver structure name prevents modpost from warning about this so replace it to catch any similar future issues. Fixes: cec32ac75827 ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: stable@vger.kernel.org # 6.16 Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251017054943.7195-1-johan@kernel.org
2025-11-26clocksource/drivers/sh_cmt: Always leave device running after probeNiklas Söderlund1-33/+3
The CMT device can be used as both a clocksource and a clockevent provider. The driver tries to be smart and power itself on and off, as well as enabling and disabling its clock when it's not in operation. This behavior is slightly altered if the CMT is used as an early platform device in which case the device is left powered on after probe, but the clock is still enabled and disabled at runtime. This has worked for a long time, but recent improvements in PREEMPT_RT and PROVE_LOCKING have highlighted an issue. As the CMT registers itself as a clockevent provider, clockevents_register_device(), it needs to use raw spinlocks internally as this is the context of which the clockevent framework interacts with the CMT driver. However in the context of holding a raw spinlock the CMT driver can't really manage its power state or clock with calls to pm_runtime_*() and clk_*() as these calls end up in other platform drivers using regular spinlocks to control power and clocks. This mix of spinlock contexts trips a lockdep warning. ============================= [ BUG: Invalid wait context ] 6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty #21 Not tainted ----------------------------- swapper/1/0 is trying to lock: ffff00000898d180 (&dev->power.lock){-...}-{3:3}, at: __pm_runtime_resume+0x38/0x88 ccree e6601000.crypto: ARM CryptoCell 630P Driver: HW version 0xAF400001/0xDCC63000, Driver version 5.0 other info that might help us debug this: ccree e6601000.crypto: ARM ccree device initialized context-{5:5} 2 locks held by swapper/1/0: #0: ffff80008173c298 (tick_broadcast_lock){-...}-{2:2}, at: __tick_broadcast_oneshot_control+0xa4/0x3a8 #1: ffff0000089a5858 (&ch->lock){....}-{2:2} usbcore: registered new interface driver usbhid , at: sh_cmt_start+0x30/0x364 stack backtrace: CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.17.0-rc3-arm64-renesas-03071-gb3c4f4122b28-dirty #21 PREEMPT Hardware name: Renesas Salvator-X 2nd version board based on r8a77965 (DT) Call trace: show_stack+0x14/0x1c (C) dump_stack_lvl+0x6c/0x90 dump_stack+0x14/0x1c __lock_acquire+0x904/0x1584 lock_acquire+0x220/0x34c _raw_spin_lock_irqsave+0x58/0x80 __pm_runtime_resume+0x38/0x88 sh_cmt_start+0x54/0x364 sh_cmt_clock_event_set_oneshot+0x64/0xb8 clockevents_switch_state+0xfc/0x13c tick_broadcast_set_event+0x30/0xa4 __tick_broadcast_oneshot_control+0x1e0/0x3a8 tick_broadcast_oneshot_control+0x30/0x40 cpuidle_enter_state+0x40c/0x680 cpuidle_enter+0x30/0x40 do_idle+0x1f4/0x26c cpu_startup_entry+0x34/0x40 secondary_start_kernel+0x11c/0x13c __secondary_switched+0x74/0x78 For non-PREEMPT_RT builds this is not really an issue, but for PREEMPT_RT builds where normal spinlocks can sleep this might be an issue. Be cautious and always leave the power and clock running after probe. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20251016182022.1837417-1-niklas.soderlund+renesas@ragnatech.se
2025-11-26clocksource/drivers/stm: Fix double deregistration on probe failureJohan Hovold1-3/+1
The purpose of the devm_add_action_or_reset() helper is to call the action function in case adding an action ever fails so drop the clock source deregistration from the error path to avoid deregistering twice. Fixes: cec32ac75827 ("clocksource/drivers/nxp-timer: Add the System Timer Module for the s32gx platforms") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251017055039.7307-1-johan@kernel.org
2025-11-26clocksource/drivers/ralink: Fix resource leaks in init error pathHaotian Zhang1-2/+9
The ralink_systick_init() function does not release all acquired resources on its error paths. If irq_of_parse_and_map() or a subsequent call fails, the previously created I/O memory mapping and IRQ mapping are leaked. Add goto-based error handling labels to ensure that all allocated resources are correctly freed. Fixes: 1f2acc5a8a0a ("MIPS: ralink: Add support for systick timer found on newer ralink SoC") Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251030090710.1603-1-vulab@iscas.ac.cn
2025-11-26clocksource/drivers/timer-sp804: Fix read_current_timer() issue when clock ↵Stephen Eta Zhou1-0/+24
source is not registered Register a valid read_current_timer() function for the SP804 timer on ARM32. On ARM32 platforms, when the SP804 timer is selected as the clocksource, the driver does not register a valid read_current_timer() function. As a result, features that rely on this API—such as rdseed—consistently return incorrect values. To fix this, a delay_timer structure is registered during the SP804 driver's initialization. The read_current_timer() function is implemented using the existing sp804_read() logic, and the timer frequency is reused from the already-initialized clocksource. Signed-off-by: Stephen Eta Zhou <stephen.eta.zhou@gmail.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20250525-sp804-fix-read_current_timer-v4-1-87a9201fa4ec@gmail.com
2025-11-26clocksource/drivers/sprd: Enable register for timer counter from 32 bit to ↵Enlin Mu1-6/+18
64 bit Using 32 bit for suspend compensation, the max compensation time is 36 hours(working clock is 32k).In some IOT devices, the suspend time may be long, even exceeding 36 hours. Therefore, a 64 bit timer counter is needed for counting. Signed-off-by: Enlin Mu <enlin.mu@unisoc.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Link: https://patch.msgid.link/20251106021830.34846-1-enlin.mu@linux.dev
2025-11-26Merge patch series "Add R-Car CAN-FD suspend/resume support"Marc Kleine-Budde1-78/+168
Biju <biju.das.au@gmail.com> says: From: Biju Das <biju.das.jz@bp.renesas.com> This patch series adds proper suspend/resume support to the Renesas R-Car CAN-FD controller driver, after the customary cleanups and fixes. It aims to fix CAN-FD operation after resume from s2ram on systems where PSCI powers down the SoC. This patch series has been tested on RZ/G3E SMARC EVK and RZ/G2L SMARC EVK. This patch series depend upon [1] [1] https://lore.kernel.org/all/20251123112326.128448-1-biju.das.jz@bp.renesas.com/ v2->v3: * Updated commit header and description for patch#3 * Collected tags. v1->v2: * Added logs from RZ/G3E * Collected tags. * Moved enabling of RAM clk from probe(). * Added RAM clk handling in rcar_canfd_global_{,de}init(). * Fixed the typo in error path of rcar_canfd_resume(). Logs from RZ/G3E: root@smarc-rzg3e:~# /canfd_t_003_all.sh [INFO] Testing can0<->can1 with bitrate 1000000 and dbitrate 4000000 [INFO] Bringing down can0 can1 [INFO] Bringing up can0 can1 [INFO] Testing can1 as producer and can0 as consumer [ 541.705921] can: controller area network core [ 541.710369] NET: Registered PF_CAN protocol family [ 541.753974] can: raw protocol [INFO] Testing can0 as producer and can1 as consumer [INFO] Testing can0<->can1 with bitrate 500000 and dbitrate 2000000 [INFO] Bringing down can0 can1 [INFO] Bringing up can0 can1 [INFO] Testing can1 as producer and can0 as consumer [INFO] Testing can0 as producer and can1 as consumer [INFO] Testing can0<->can1 with bitrate 250000 and dbitrate 1000000 [INFO] Bringing down can0 can1 [INFO] Bringing up can0 can1 [INFO] Testing can1 as producer and can0 as consumer [INFO] Testing can0 as producer and can1 as consumer EXIT|PASS|canfd_t_003.sh|[00:00:25] || bind/unbind ---------- [ 566.821475] rcar_canfd 12440000.can: can_clk rate is 80000000 [ 566.828076] rcar_canfd 12440000.can: device registered (channel 1) [ 566.834361] rcar_canfd 12440000.can: can_clk rate is 80000000 [ 566.841842] rcar_canfd 12440000.can: device registered (channel 4) [ 566.848093] rcar_canfd 12440000.can: global operational state (canfd clk, fd mode) [INFO] Testing can0<->can1 with bitrate 1000000 and dbitrate 4000000 [INFO] Bringing down can0 can1 [INFO] Bringing up can0 can1 [INFO] Testing can1 as producer and can0 as consumer [INFO] Testing can0 as producer and can1 as consumer [INFO] Testing can0<->can1 with bitrate 500000 and dbitrate 2000000 [INFO] Bringing down can0 can1 [INFO] Bringing up can0 can1 [INFO] Testing can1 as producer and can0 as consumer [INFO] Testing can0 as producer and can1 as consumer [INFO] Testing can0<->can1 with bitrate 250000 and dbitrate 1000000 [INFO] Bringing down can0 can1 [INFO] Bringing up can0 can1 [INFO] Testing can1 as producer and can0 as consumer [INFO] Testing can0 as producer and can1 as consumer EXIT|PASS|canfd_t_003.sh|[00:00:25] || s2idle ----- [ 592.182479] PM: suspend entry (s2idle) [ 592.187031] Filesystems sync: 0.000 seconds [ 592.193221] Freezing user space processes [ 592.199425] Freezing user space processes completed (elapsed 0.002 seconds) [ 592.206450] OOM killer disabled. [ 592.209843] Freezing remaining freezable tasks [ 592.215775] Freezing remaining freezable tasks completed (elapsed 0.001 seconds) [ 592.223247] printk: Suspending console(s) (use no_console_suspend to debug) [ 592.260524] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 592.322759] renesas-gbeth 15c30000.ethernet end0: Link is Down [ 596.070955] dwmac4: Master AXI performs any burst length [ 596.072307] renesas-gbeth 15c30000.ethernet end0: No Safety Features support found [ 596.072376] renesas-gbeth 15c30000.ethernet end0: IEEE 1588-2008 Advanced Timestamp supported [ 596.077470] renesas-gbeth 15c30000.ethernet end0: configuring for phy/rgmii-id link mode [ 596.087503] dwmac4: Master AXI performs any burst length [ 596.088817] renesas-gbeth 15c40000.ethernet end1: No Safety Features support found [ 596.088881] renesas-gbeth 15c40000.ethernet end1: IEEE 1588-2008 Advanced Timestamp supported [ 596.093997] renesas-gbeth 15c40000.ethernet end1: configuring for phy/rgmii-id link mode [ 596.141986] usb usb1: root hub lost power or was reset [ 596.142031] usb usb2: root hub lost power or was reset [ 598.304525] usb 2-1: reset SuperSpeed Plus Gen 2x1 USB device number 2 using xhci-renesas-hcd [ 598.414846] OOM killer enabled. [ 598.418002] Restarting tasks: Starting [ 598.422518] Restarting tasks: Done [ 598.425999] random: crng reseeded on system resumption [ 598.431248] PM: suspend exit [ 598.661875] renesas-gbeth 15c30000.ethernet end0: Link is Up - 1Gbps/Full - flow control rx/tx [INFO] Testing can0<->can1 with bitrate 1000000 and dbitrate 4000000 [INFO] Bringing down can0 can1 [INFO] Bringing up can0 can1 [INFO] Testing can1 as producer and can0 as consumer [INFO] Testing can0 as producer and can1 as consumer [INFO] Testing can0<->can1 with bitrate 500000 and dbitrate 2000000 [INFO] Bringing down can0 can1 [INFO] Bringing up can0 can1 [INFO] Testing can1 as producer and can0 as consumer [INFO] Testing can0 as producer and can1 as consumer [INFO] Testing can0<->can1 with bitrate 250000 and dbitrate 1000000 [INFO] Bringing down can0 can1 [INFO] Bringing up can0 can1 [INFO] Testing can1 as producer and can0 as consumer [INFO] Testing can0 as producer and can1 as consumer EXIT|PASS|canfd_t_003.sh|[00:00:25] || Link: https://patch.msgid.link/20251124102837.106973-1-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: rcar_canfd: Add suspend/resume supportGeert Uytterhoeven1-0/+53
On R-Car Gen3 using PSCI, s2ram powers down the SoC. After resume, the CAN-FD interface no longer works. Trying to bring it up again fails: # ip link set can0 up RTNETLINK answers: Connection timed out # dmesg ... channel 0 communication state failed Fix this by populating the (currently empty) suspend and resume callbacks, to stop/start the individual CAN-FD channels, and (de)initialize the CAN-FD controller. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Biju Das <biju.das.jz@bp.renesas.com> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://patch.msgid.link/20251124102837.106973-8-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: rcar_canfd: Convert to DEFINE_SIMPLE_DEV_PM_OPS()Geert Uytterhoeven1-5/+5
Convert the Renesas R-Car CAN-FD driver from SIMPLE_DEV_PM_OPS() to DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr(). This lets us drop the __maybe_unused annotations from its suspend and resume callbacks, and reduces kernel size in case CONFIG_PM or CONFIG_PM_SLEEP is disabled. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://patch.msgid.link/20251124102837.106973-7-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: rcar_canfd: Invert CAN clock and close_candev() orderGeert Uytterhoeven1-1/+1
The CAN clock is enabled before calling open_candev(), and disabled before calling close_candev(). Invert the order of the latter, to restore symmetry. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://patch.msgid.link/20251124102837.106973-6-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: rcar_canfd: Extract rcar_canfd_global_{,de}init()Geert Uytterhoeven1-78/+104
Extract the code to (de)initialize global state into separate functions, for future reuse. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://patch.msgid.link/20251124102837.106973-5-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: rcar_canfd: Use devm_clk_get_optional() for RAM clkBiju Das1-6/+17
Replace devm_clk_get_optional_enabled()->devm_clk_get_optional() as the RAM clk needs to be enabled in resume for proper operation in STR mode for RZ/G3E SoC. Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://patch.msgid.link/20251124102837.106973-4-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: rcar_canfd: Invert global vs. channel teardownGeert Uytterhoeven1-3/+3
Global state is initialized and torn down before per-channel state. Invert the order to restore symmetry. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Vincent Mailhol <mailhol@kernel.org> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://patch.msgid.link/20251124102837.106973-3-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: rcar_canfd: Invert reset assert orderGeert Uytterhoeven1-2/+2
The two resets are asserted during cleanup in the same order as they were deasserted during probe. Invert the order to restore symmetry. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Vincent Mailhol <mailhol@kernel.org> Reviewed-by: Biju Das <biju.das.jz@bp.renesas.com> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Link: https://patch.msgid.link/20251124102837.106973-2-biju.das.jz@bp.renesas.com Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26Merge patch series "can: netlink: add CAN XL support"Marc Kleine-Budde11-88/+990
Marc Kleine-Budde <mkl@pengutronix.de> says: Similarly to how CAN FD reuses the bittiming logic of Classical CAN, CAN XL also reuses the entirety of CAN FD features, and, on top of that, adds new features which are specific to CAN XL. A so-called 'mixed-mode' is intended to have (XL-tolerant) CAN FD nodes and CAN XL nodes on one CAN segment, where the FD-controllers can talk CC/FD and the XL-controllers can talk CC/FD/XL. This mixed-mode utilizes the known error-signalling (ES) for sending CC/FD/XL frames. For CAN FD and CAN XL the tranceiver delay compensation (TDC) is supported to use common CAN and CAN-SIG transceivers. The CANXL-only mode disables the error-signalling in the CAN XL controller. This mode does not allow CC/FD frames to be sent but additionally offers a CAN XL transceiver mode switching (TMS) to send CAN XL frames with up to 20Mbit/s data rate. The TMS utilizes a PWM configuration which is added to the netlink interface. Configured with CAN_CTRLMODE_FD and CAN_CTRLMODE_XL this leads to: FD=0 XL=0 CC-only mode (ES=1) FD=1 XL=0 FD/CC mixed-mode (ES=1) FD=1 XL=1 XL/FD/CC mixed-mode (ES=1) FD=0 XL=1 XL-only mode (ES=0, TMS optional) Patch #1 print defined ctrlmode strings capitalized to increase the readability and to be in line with the 'ip' tool (iproute2). Patch #2 is a small clean-up which makes can_calc_bittiming() use NL_SET_ERR_MSG() instead of netdev_err(). Patch #3 adds a check in can_dev_dropped_skb() to drop CAN FD frames when CAN FD is turned off. Patch #4 adds CAN_CTRLMODE_RESTRICTED. Note that contrary to the other CAN_CTRL_MODE_XL_* that are introduced in the later patches, this control mode is not specific to CAN XL. The nuance is that because this restricted mode was only added in ISO 11898-1:2024, it is made mandatory for CAN XL devices but optional for other protocols. This is why this patch is added as a preparation before introducing the core CAN XL logic. Patch #5 adds all the CAN XL features which are inherited from CAN FD: the nominal bittiming, the data bittiming and the TDC. Patch #6 add a new CAN_CTRLMODE_XL_TMS control mode which is specific to CAN XL to enable the transceiver mode switching (TMS) in XL-only mode. Patch #7 adds a check in can_dev_dropped_skb() to drop CAN CC/FD frames when the CAN XL controller is in CAN XL-only mode. The introduced can_dev_in_xl_only_mode() function also determines the error-signalling configuration for the CAN XL controllers. Patch #8 to #11 add the PWM logic for the CAN XL TMS mode. Patch #12 to #14 add different default sample-points for standard CAN and CAN SIG transceivers (with TDC) and CAN XL transceivers using PWM in the CAN XL TMS mode. Patch #15 add a dummy_can driver for netlink testing and debugging. Patch #16 check CAN frame type (CC/FD/XL) when writing those frames to the CAN_RAW socket and reject them if it's not supported by the CAN interface. Patch #17 increase the resolution when printing the bitrate error and round-up the value to 0.01% in the case the resolution would still provide values which would lead to 0.00%. Link: https://patch.msgid.link/20251126-canxl-v8-0-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: dev: print bitrate error with two decimal digitsOliver Hartkopp1-6/+9
Increase the resolution when printing the bitrate error and round-up the value to 0.01% in the case the resolution would still provide values which would lead to 0.00%. Suggested-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-17-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: raw: instantly reject unsupported CAN framesOliver Hartkopp1-8/+46
For real CAN interfaces the CAN_CTRLMODE_FD and CAN_CTRLMODE_XL control modes indicate whether an interface can handle those CAN FD/XL frames. In the case a CAN XL interface is configured in CANXL-only mode with disabled error-signalling neither CAN CC nor CAN FD frames can be sent. The checks are performed on CAN_RAW sockets to give an instant feedback to the user when writing unsupported CAN frames to the interface. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-16-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: add dummy_can driverVincent Mailhol3-0/+303
During the development of CAN XL, we found the need of creating a dummy CAN XL driver in order to test the new netlink interface. While this code was initially intended to be some throwaway, it received some positive feedback. Add the dummy_can driver. This driver acts similarly to the vcan interface in the sense that it will echo back any packet it receives. The difference is that it exposes a set on bittiming parameters as a real device would and thus must be configured as if it was a real physical interface. The driver comes with a debug mode. If debug message are enabled (for example by enabling CONFIG_CAN_DEBUG_DEVICES), it will print in the kernel log all the bittiming values, similar to what a: ip --details link show can0 would do. This driver is mostly intended for debugging and testing, but some developers also may want to look at it as a simple reference implementation. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-15-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: calc_bittiming: add can_calc_sample_point_pwm()Vincent Mailhol1-0/+18
The optimum sample point value depends on the bit symmetry. The more asymmetric the bit is, the more the sample point would be located towards the end of the bit. On the contrary, if the transceiver only has a small asymmetry, the optimal sample point would be slightly after the centre of the bit. For NRZ encoding (used by Classical CAN, CAN FD and CAN XL with TMS off), the optimum sample points values are above 70% as implemented in can_calc_sample_point_nrz(). When TMS is on, CAN XL optimum sample points are near to 50% or 60% [1]. Add can_calc_sample_point_pwm() which returns a sample point which is suitable for PWM encoding. We crafted the formula to make it return the same values as below table (source: table 3 of [1]). Bit rate (Mbits/s) Sample point ------------------------------------- 2.0 51.3% 5.0 53.1% 8.0 55.0% 10.0 56.3% 12.3 53.8% 13.3 58.3% 14.5 54.5% 16.0 60.0% 17.7 55.6% 20.0 62.5% The calculation simply consists of setting a slightly too high sample point and then letting can_update_sample_point() correct the values. For now, it is just a formula up our sleeves which matches the empirical observations of [1]. Once CiA recommendations become available, can_calc_sample_point_pwm() should be updated accordingly. [1] CAN XL system design: Clock tolerances and edge deviations edge deviations Link: https://www.can-cia.org/fileadmin/cia/documents/publications/cnlm/december_2024/cnlm_24-4_p18_can_xl_system_design_clock_tolerances_and_edge_deviations_dr_arthur_mutter_bosch.pdf Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-14-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: calc_bittiming: add can_calc_sample_point_nrz()Vincent Mailhol1-10/+15
CAN XL optimal sample point for PWM encoding (when TMS is on) differs from the NRZ optimal one. There is thus a need to calculate a different sample point depending whether TMS is on or off. This is a preparation change: move the sample point calculation from can_calc_bittiming() into the new can_calc_sample_point_nrz() function. In an upcoming change, a function will be added to calculate the sample point for PWM encoding. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-13-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: calc_bittiming: replace misleading "nominal" by "reference"Vincent Mailhol1-13/+13
The functions can_update_sample_point() and can_calc_bittiming() are generic and meant to be used for both the nominal and the data bittiming calculation. However, those functions use misleading terminologies such as "bitrate nominal" or "sample point nominal". Replace all places where the word "nominal" appears with "reference" in order to better distinguish it from the calculated values. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-12-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: netlink: add PWM netlink interfaceVincent Mailhol2-2/+215
When the TMS is switched on, the node uses PWM (Pulse Width Modulation) during the data phase instead of the classic NRZ (Non Return to Zero) encoding. PWM is configured by three parameters: - PWMS: Pulse Width Modulation Short phase - PWML: Pulse Width Modulation Long phase - PWMO: Pulse Width Modulation Offset time For each of these parameters, define three IFLA symbols: - IFLA_CAN_PWM_PWM*_MIN: the minimum allowed value. - IFLA_CAN_PWM_PWM*_MAX: the maximum allowed value. - IFLA_CAN_PWM_PWM*: the runtime value. This results in a total of nine IFLA symbols which are all nested in a parent IFLA_CAN_XL_PWM symbol. IFLA_CAN_PWM_PWM*_MIN and IFLA_CAN_PWM_PWM*_MAX define the range of allowed values and will match the value statically configured by the device in struct can_pwm_const. IFLA_CAN_PWM_PWM* match the runtime values stored in struct can_pwm. Those parameters may only be configured when the tms mode is on. If the PWMS, PWML and PWMO parameters are provided, check that all the needed parameters are present using can_validate_pwm(), then check their value using can_validate_pwm_bittiming(). PWMO defaults to zero if omitted. Otherwise, if CAN_CTRLMODE_XL_TMS is true but none of the PWM parameters are provided, calculate them using can_calc_pwm(). Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-11-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: calc_bittiming: add PWM calculationVincent Mailhol2-0/+46
Perform the PWM calculation according to CiA recommendations. Note that for databitrates greater than 5 MBPS, tqmin is less than CAN_PWM_NS_MAX (which is defined to 200 nano seconds), consequently, the result of the division: DIV_ROUND_UP(xl_ns, CAN_PWM_NS_MAX) is one and thus the for loop automatically stops on the first iteration giving a single PWM symbol per bit as expected. Because of that, there is no actual need for a separate conditional branch for when the databitrate is greater than 5 MBPS. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-10-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: bittiming: add PWM validationVincent Mailhol2-0/+85
Add can_validate_pwm() to validate the values pwms, pwml and pwml. Error messages are added to each of the checks to inform the user on what went wrong. Refer to those error messages to understand the validation logic. The boundary values CAN_PWM_DECODE_NS (the transceiver minimum decoding margin) and CAN_PWM_NS_MAX (the maximum PWM symbol duration) are hardcoded for the moment. Note that a transceiver capable of bitrates higher than 20 Mbps may be able to handle a CAN_PWM_DECODE_NS below 5 ns. If such transceivers become commercially available, this code could be revisited to make this parameter configurable. For now, leave it static. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-9-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: bittiming: add PWM parametersVincent Mailhol1-2/+39
In CAN XL, higher data bit rates require the CAN transceiver to switch its operation mode to use Pulse-Width Modulation (PWM) transmission mode instead of the classic dominant/recessive transmission mode. The PWM parameters are: - PWMS: pulse width modulation short phase - PWML: pulse width modulation long phase - PWMO: pulse width modulation offset CiA 612-2 specifies PWMS and PWML to be at least 1 (arguably, PWML shall be at least 2 to respect the PWMS < PWML rule). PWMO's minimum is expected to always be zero. It is added more for consistency than anything else. Add struct can_pwm_const so that the different devices can provide their minimum and maximum values. When TMS is on, the runtime PWMS, PWML and PWMO are needed (either calculated or provided by the user): add struct can_pwm to store these. TDC and PWM can not be used at the same time (TDC can only be used when TMS is off and PWM only when TMS is on). struct can_pwm is thus put together with struct can_tdc inside a union to save some space. The netlink logic will be added in an upcoming change. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-8-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: dev: can_dev_dropped_skb: drop CC/FD frames in CANXL-only modeOliver Hartkopp1-0/+19
The error-signalling (ES) is a mandatory functionality for CAN CC and CAN FD to report CAN frame format violations by sending an error-frame signal on the bus. A so-called 'mixed-mode' is intended to have (XL-tolerant) CAN FD nodes and CAN XL nodes on one CAN segment, where the FD-controllers can talk CC/FD and the XL-controllers can talk CC/FD/XL. This mixed-mode utilizes the error-signalling for sending CC/FD/XL frames. The CANXL-only mode disables the error-signalling in the CAN XL controller. This mode does not allow CC/FD frames to be sent but additionally offers a CAN XL transceiver mode switching (TMS). Configured with CAN_CTRLMODE_FD and CAN_CTRLMODE_XL this leads to: FD=0 XL=0 CC-only mode (ES=1) FD=1 XL=0 FD/CC mixed-mode (ES=1) FD=1 XL=1 XL/FD/CC mixed-mode (ES=1) FD=0 XL=1 XL-only mode (ES=0, TMS optional) The helper function can_dev_in_xl_only_mode() determines the required value to disable error signalling in the CAN XL controller. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-7-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: netlink: add CAN_CTRLMODE_XL_TMS flagVincent Mailhol3-3/+48
The Transceiver Mode Switching (TMS) indicates whether the CAN XL controller shall use the PWM or NRZ encoding during the data phase. The term "transceiver mode switching" is used in both ISO 11898-1 and CiA 612-2 (although only the latter one uses the abbreviation TMS). We adopt the same naming convention here for consistency. Add the CAN_CTRLMODE_XL_TMS flag to the list of the CAN control modes. Add can_validate_xl_flags() to check the coherency of the TMS flag. That function will be reused in upcoming changes to validate the other CAN XL flags. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-6-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: netlink: add initial CAN XL supportVincent Mailhol5-20/+90
CAN XL uses bittiming parameters different from Classical CAN and CAN FD. Thus, all the data bittiming parameters, including TDC, need to be duplicated for CAN XL. Add the CAN XL netlink interface for all the features which are common with CAN FD. Any new CAN XL specific features are added later on. The first time CAN XL is activated, the MTU is set by default to CANXL_MAX_MTU. The user may then configure a custom MTU within the CANXL_MIN_MTU to CANXL_MAX_MTU range, in which case, the custom MTU value will be kept as long as CAN XL remains active. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-5-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: netlink: add CAN_CTRLMODE_RESTRICTEDVincent Mailhol4-24/+36
ISO 11898-1:2024 adds a new restricted operation mode. This mode is added as a mandatory feature for nodes which support CAN XL and is retrofitted as optional for legacy nodes (i.e. the ones which only support Classical CAN and CAN FD). The restricted operation mode is nearly the same as the listen only mode: the node can not send data frames or remote frames and can not send dominant bits if an error occurs. The only exception is that the node shall still send the acknowledgment bit. A second niche exception is that the node may still send a data frame containing a time reference message if the node is a primary time provider, but because the time provider feature is not yet implemented in the kernel, this second exception is not relevant to us at the moment. Add the CAN_CTRLMODE_RESTRICTED control mode flag and update the can_dev_dropped_skb() helper function accordingly. Finally, bail out if both CAN_CTRLMODE_LISTENONLY and CAN_CTRLMODE_RESTRICTED are provided. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-4-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: dev: can_dev_dropped_skb: drop CAN FD skbs if FD is offVincent Mailhol1-3/+11
Currently, the CAN FD skb validation logic is based on the MTU: the interface is deemed FD capable if and only if its MTU is greater or equal to CANFD_MTU. This logic is showing its limit with the introduction of CAN XL. For example, consider the two scenarios below: 1. An interface configured with CAN FD on and CAN XL on 2. An interface configured with CAN FD off and CAN XL on In those two scenarios, the interfaces would have the same MTU: CANXL_MTU making it impossible to differentiate which one has CAN FD turned on and which one has it off. Because of the limitation, the only non-UAPI-breaking workaround is to do the check at the device level using the can_priv->ctrlmode flags. Unfortunately, the virtual interfaces (vcan, vxcan), which do not have a can_priv, are left behind. Add a check on the CAN_CTRLMODE_FD flag in can_dev_dropped_skb() and drop FD frames whenever the feature is turned off. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-3-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: bittiming: apply NL_SET_ERR_MSG() to can_calc_bittiming()Vincent Mailhol1-1/+1
When CONFIG_CAN_CALC_BITTIMING is disabled, the can_calc_bittiming() functions can not be used and the user needs to provide all the bittiming parameters. Currently, can_calc_bittiming() prints an error message to the kernel log. Instead use NL_SET_ERR_MSG() to make it return the error message through the netlink interface so that the user can directly see it. Signed-off-by: Vincent Mailhol <mailhol@kernel.org> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-2-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26can: dev: can_get_ctrlmode_str: use capitalized ctrlmode stringsOliver Hartkopp1-12/+12
Unify the ctrlmode related strings to the command line options of the 'ip' tool from the iproute2 package. The capitalized strings are also shown when the detailed interface configuration is printed by 'ip'. Suggested-by: Stephane Grosjean <stephane.grosjean@hms-networks.com> Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Link: https://patch.msgid.link/20251126-canxl-v8-1-e7e3eb74f889@pengutronix.de Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-11-26wifi: mac80211: allow sharing identical chanctx for S1G interfacesLachlan Hodges2-3/+15
Introduce support for sharing identical channel contexts for S1G interfaces. Additionally, do not downgrade channel requests for S1G interfaces. Signed-off-by: Lachlan Hodges <lachlan.hodges@morsemicro.com> Link: https://patch.msgid.link/20251126015758.149034-1-lachlan.hodges@morsemicro.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2025-11-26docs/zh_CN: Add wd719x.rst translationYujie Zhang2-1/+36
Translate .../scsi/wd719x.rst into Chinese. Add wd719x into .../scsi/index.rst. Update the translation through commit 40ee63091a40 ("scsi: docs: convert wd719x.txt to ReST") Signed-off-by: Yujie Zhang <yjzhang@leap-io-kernel.com> Signed-off-by: Alex Shi <alexs@kernel.org>
2025-11-26docs/zh_CN: Add libsas.rst translationYujie Zhang2-1/+426
Translate .../scsi/libsas.rst into Chinese. Add libsas into .../scsi/index.rst. Update the translation through commit 25882c82f850 ("scsi: libsas: Delete lldd_clear_aca callback") Signed-off-by: Yujie Zhang <yjzhang@leap-io-kernel.com> Signed-off-by: Alex Shi <alexs@kernel.org>
2025-11-26ALSA: hda/realtek: Add quirk for HP ProBook 450 G8Ilyas Gasanov1-0/+1
My laptop, HP ProBook 450 G8 (32M40EA), has Realtek ALC236 codec on its integrated sound card, and uses GPIO pins 0x2 and 0x1 for speaker mute and mic mute LEDs correspondingly, as found out by me through hda-verb invocations. This matches the GPIO masks used by the alc236_fixup_hp_gpio_led() function. PCI subsystem vendor and device IDs happen to be 0x103c and 0x8a75, which has not been covered in the ALC2xx driver code yet. Signed-off-by: Ilyas Gasanov <public@gsnoff.com> Link: https://patch.msgid.link/20251125235441.53629-1-public@gsnoff.com Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-11-26PM / devfreq: Fix typo in DFSO_DOWNDIFFERENTIAL macro nameRiwen Lu1-2/+2
Correct the spelling error in the DFSO_DOWNDIFFERENTIAL macro definition and update the corresponding variable assignment. The macro was previously misspelled as DFSO_DOWNDIFFERENCTIAL. This change ensures consistent and correct spelling throughout the simpleondemand governor implementation. Signed-off-by: Riwen Lu <luriwen@kylinos.cn> Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Link: https://patchwork.kernel.org/project/linux-pm/patch/20251118032339.2799230-1-luriwen@kylinos.cn/
2025-11-25drivers: net: fbnic: Return the true error in fbnic_alloc_napi_vectors.Dimitri Daskalakis1-1/+1
The error case in fbnic_alloc_napi_vectors defaulted to returning ENOMEM. This can mask the true error case, causing confusion. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Link: https://patch.msgid.link/20251124200518.1848029-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25Merge branch 'selftest-af_unix-misc-updates'Jakub Kicinski3-10/+10
Kuniyuki Iwashima says: ==================== selftest: af_unix: Misc updates. Patch 1 add .gitignore under tools/testing/selftests/net/af_unix/. Patch 2 make so_peek_off.c less flaky. ==================== Link: https://patch.msgid.link/20251124212805.486235-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25selftest: af_unix: Extend recv() timeout in so_peek_off.c.Kuniyuki Iwashima1-2/+2
so_peek_off.c is reported to be flaky on NIPA: # # so_peek_off.c:149:two_chunks_overlap_blocking:Expected -1 (-1) != bytes (-1) # # two_chunks_overlap_blocking: Test terminated by assertion # # FAIL so_peek_off.stream.two_chunks_overlap_blocking The test fork()s a child process to send() data after 1ms to wake up the parent process being blocked (up to 3ms) on recv(). But, from the log, the parent woke up after 3ms timeout, so it could be too short when the host is overloaded. Let's extend it to 5s. Reported-by: Jakub Kicinski <kuba@kernel.org> Closes: https://lore.kernel.org/netdev/20251124070722.1e828c53@kernel.org/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124212805.486235-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25selftest: af_unix: Create its own .gitignore.Kuniyuki Iwashima2-8/+8
Somehow AF_UNIX tests have reused ../.gitignore, but now NIPA warns about it. Let's create .gitignore under af_unix/. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124212805.486235-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25xsk: avoid data corruption on cq descriptor numberFernando Fernandez Mancera1-55/+88
Since commit 30f241fcf52a ("xsk: Fix immature cq descriptor production"), the descriptor number is stored in skb control block and xsk_cq_submit_addr_locked() relies on it to put the umem addrs onto pool's completion queue. skb control block shouldn't be used for this purpose as after transmit xsk doesn't have control over it and other subsystems could use it. This leads to the following kernel panic due to a NULL pointer dereference. BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] SMP NOPTI CPU: 2 UID: 1 PID: 927 Comm: p4xsk.bin Not tainted 6.16.12+deb14-cloud-amd64 #1 PREEMPT(lazy) Debian 6.16.12-1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014 RIP: 0010:xsk_destruct_skb+0xd0/0x180 [...] Call Trace: <IRQ> ? napi_complete_done+0x7a/0x1a0 ip_rcv_core+0x1bb/0x340 ip_rcv+0x30/0x1f0 __netif_receive_skb_one_core+0x85/0xa0 process_backlog+0x87/0x130 __napi_poll+0x28/0x180 net_rx_action+0x339/0x420 handle_softirqs+0xdc/0x320 ? handle_edge_irq+0x90/0x1e0 do_softirq.part.0+0x3b/0x60 </IRQ> <TASK> __local_bh_enable_ip+0x60/0x70 __dev_direct_xmit+0x14e/0x1f0 __xsk_generic_xmit+0x482/0xb70 ? __remove_hrtimer+0x41/0xa0 ? __xsk_generic_xmit+0x51/0xb70 ? _raw_spin_unlock_irqrestore+0xe/0x40 xsk_sendmsg+0xda/0x1c0 __sys_sendto+0x1ee/0x200 __x64_sys_sendto+0x24/0x30 do_syscall_64+0x84/0x2f0 ? __pfx_pollwake+0x10/0x10 ? __rseq_handle_notify_resume+0xad/0x4c0 ? restore_fpregs_from_fpstate+0x3c/0x90 ? switch_fpu_return+0x5b/0xe0 ? do_syscall_64+0x204/0x2f0 ? do_syscall_64+0x204/0x2f0 ? do_syscall_64+0x204/0x2f0 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> [...] Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x1c000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Instead use the skb destructor_arg pointer along with pointer tagging. As pointers are always aligned to 8B, use the bottom bit to indicate whether this a single address or an allocated struct containing several addresses. Fixes: 30f241fcf52a ("xsk: Fix immature cq descriptor production") Closes: https://lore.kernel.org/netdev/0435b904-f44f-48f8-afb0-68868474bf1c@nop.hu/ Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20251124171409.3845-1-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25virtio_net: enhance wake/stop tx queue statistics accountingLiming Wu1-18/+26
This patch refines and strengthens the statistics collection of TX queue wake/stop events introduced by commit c39add9b2423 ("virtio_net: Add TX stopped and wake counters"). Previously, the driver only recorded partial wake/stop statistics for TX queues. Some wake events triggered by 'skb_xmit_done()' or resume operations were not counted, which made the per-queue metrics incomplete. Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251120015320.1418-1-liming.wu@jaguarmicro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25Merge branch 'tcp-provide-better-locality-for-retransmit-timer'Jakub Kicinski13-69/+74
Eric Dumazet says: ==================== tcp: provide better locality for retransmit timer TCP stack uses three timers per flow, currently spread this way: - sk->sk_timer : keepalive timer - icsk->icsk_retransmit_timer : retransmit timer - icsk->icsk_delack_timer : delayed ack timer This series moves the retransmit timer to sk->sk_timer location, to increase data locality in TX paths. keepalive timers are not often used, this change should be neutral for them. After the series we have following fields: - sk->tcp_retransmit_timer : retransmit timer, in sock_write_tx group - icsk->icsk_delack_timer : delayed ack timer - icsk->icsk_keepalive_timer : keepalive timer Moving icsk_delack_timer in a beter location would also be welcomed. ==================== Link: https://patch.msgid.link/20251124175013.1473655-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25tcp: remove icsk->icsk_retransmit_timerEric Dumazet8-30/+25
Now sk->sk_timer is no longer used by TCP keepalive, we can use its storage for TCP and MPTCP retransmit timers for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25tcp: introduce icsk->icsk_keepalive_timerEric Dumazet11-24/+35
sk->sk_timer has been used for TCP keepalives. Keepalive timers are not in fast path, we want to use sk->sk_timer storage for retransmit timers, for better cache locality. Create icsk->icsk_keepalive_timer and change keepalive code to no longer use sk->sk_timer. Added space is reclaimed in the following patch. This includes changes to MPTCP, which was also using sk_timer. Alias icsk->mptcp_tout_timer and icsk->icsk_keepalive_timer for inet_sk_diag_fill() sake. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: move sk_dst_pending_confirm and sk_pacing_status to sock_read_tx groupEric Dumazet2-4/+4
These two fields are mostly read in TCP tx path, move them in an more appropriate group for better cache locality. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25tcp: rename icsk_timeout() to tcp_timeout_expires()Eric Dumazet6-13/+12
In preparation of sk->tcp_timeout_timer introduction, rename icsk_timeout() helper and change its argument to plain 'const struct sock *sk'. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251124175013.1473655-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25ice: fix broken Rx on VFsAlexander Lobakin1-0/+3
Since the tagged commit, ice stopped respecting Rx buffer length passed from VFs. At that point, the buffer length was hardcoded in ice, so VFs still worked up to some point (until, for example, a VF wanted an MTU larger than its PF). The next commit 93f53db9f9dc ("ice: switch to Page Pool"), broke Rx on VFs completely since ice started accounting per-queue buffer lengths again, but now VF queues always had their length zeroed, as ice was already ignoring what iavf was passing to it. Restore the line that initializes the buffer length on VF queues basing on the virtchnl messages. Fixes: 3a4f419f7509 ("ice: drop page splitting and recycling") Reported-by: Jakub Slepecki <jakub.slepecki@intel.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Jakub Slepecki <jakub.slepecki@intel.com> Link: https://patch.msgid.link/20251124170735.3077425-1-aleksander.lobakin@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25chtls: Avoid -Wflex-array-member-not-at-end warningGustavo A. R. Silva1-7/+1
-Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. Use the `DEFINE_RAW_FLEX()` helper for on-stack definitions of a flexible structure where the size of the flexible-array member is known at compile-time, and refactor the rest of the code, accordingly. So, with these changes, fix the following warning: drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c:163:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Link: https://patch.msgid.link/aSQocKoJGkN0wzEj@kspp Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25Merge branch 'tools-ynl-gen-regeneration-comment-function-prefix'Jakub Kicinski41-9/+57
Asbjørn Sloth Tønnesen says: ==================== tools: ynl-gen: regeneration comment + function prefix It looks like these two patches are the last ones needed for YNL, before the WireGuard patches can go in. These patches was both requested by Jason, during review of the WireGuard YNL conversion patchset[1]. ==================== Link: https://patch.msgid.link/20251120174429.390574-1-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25tools: ynl-gen: add regeneration commentAsbjørn Sloth Tønnesen41-0/+41
Add a comment on regeneration to the generated files. The comment is placed after the YNL-GEN line[1], as to not interfere with ynl-regen.sh's detection logic. [1] and after the optional YNL-ARG line. Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25tools: ynl-gen: add function prefix argumentAsbjørn Sloth Tønnesen1-9/+16
This patch adds a new CLI argument for overriding the default function prefix, as used for naming the doit/dumpit functions in the generated kernel code. When not specified the default "$(FAMILY)-nl" is used. This can also be specified persistently in generated files: /* YNL-ARG --function-prefix wg */ In the above example it causes the following changes: wireguard_nl_get_device_dumpit() -> wg_get_device_dumpit() wireguard_nl_get_device_doit() -> wg_get_device_doit() The variable name fn_prefix, was chosen as it relates to op_prefix which is used to prefix the UAPI commands enum entries. Link: https://lore.kernel.org/r/aRvWzC8qz3iXDAb3@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Link: https://patch.msgid.link/20251120174429.390574-2-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25Merge branch 'ptp-ocp-a-fix-and-refactoring'Jakub Kicinski1-28/+18
Andy Shevchenko says: ==================== ptp: ocp: A fix and refactoring Here is the fix for incorrect use of %ptT with the associated refactoring and additional cleanups. Note, %ptS, which is introduced in another series that is already applied to PRINTK tree, doesn't fit here, that's why this fix is separated from that series. ==================== Link: https://patch.msgid.link/20251124084816.205035-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25ptp: ocp: Reuse META's PCI vendor IDAndy Shevchenko1-3/+2
The META's PCI vendor ID is listed already in the pci_ids.h. Reuse it here. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-5-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25ptp: ocp: Apply standard pattern for cleaning up loopAndy Shevchenko1-2/+1
The while (i--) is a standard pattern for the cleaning up loops. Apply this pattern where it makes sense in the driver. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-4-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25ptp: ocp: Make ptp_ocp_unregister_ext() NULL-awareAndy Shevchenko1-14/+10
It's a common practice to make resource release functions be NULL-aware. Make ptp_ocp_unregister_ext() NULL-aware. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-3-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25ptp: ocp: Refactor signal_show() and fix %ptT misuseAndy Shevchenko1-9/+5
Refactor signal_show() to avoid sequential calls to sysfs_emit*() and use the same pattern to get the index of a signal as it's done in signal_store(). While at it, fix wrong use of %ptT against struct timespec64. It's kinda lucky that it worked just because the first member there 64-bit and it's of time64_t type. Now with %ptS it may be used correctly. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124084816.205035-2-andriy.shevchenko@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25vsock/test: Extend transport change null-ptr-deref testMichal Luczaj1-1/+6
syzkaller reported a lockdep lock order inversion warning[1] due to commit 687aa0c5581b ("vsock: Fix transport_* TOCTOU"). This was fixed in commit f7c877e75352 ("vsock: fix lock inversion in vsock_assign_transport()"). Redo syzkaller's repro by piggybacking on a somewhat related test implemented in commit 3a764d93385c ("vsock/test: Add test for null ptr deref when transport changes"). [1]: https://lore.kernel.org/netdev/68f6cdb0.a70a0220.205af.0039.GAE@google.com/ Signed-off-by: Michal Luczaj <mhal@rbox.co> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://patch.msgid.link/20251123-vsock_test-linger-lockdep-warn-v1-1-4b1edf9d8cdc@rbox.co Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25r8169: improve MAC EEE handlingHeiner Kallweit1-37/+36
Let phydev->enable_tx_lpi control whether MAC enables TX LPI, instead of enabling it unconditionally. This way TX LPI is disabled if e.g. link partner doesn't support EEE. This helps to avoid potential issues like link flaps. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/91bcb837-3fab-4b4e-b495-038df0932e44@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25r8169: fix RTL8127 hang on suspend/shutdownHeiner Kallweit1-5/+14
There have been reports that RTL8127 hangs on suspend and shutdown, partially disappearing from lspci until power-cycling. According to Realtek disabling PLL's when switching to D3 should be avoided on that chip version. Fix this by aligning disabling PLL's with the vendor drivers, what in addition results in PLL's not being disabled when switching to D3hot on other chip versions. Fixes: f24f7b2f3af9 ("r8169: add support for RTL8127A") Tested-by: Fabio Baltieri <fabio.baltieri@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/d7faae7e-66bc-404a-a432-3a496600575f@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: phy: mxl-gpy: add support for MxL86252 and MxL86282Daniel Golle1-2/+89
Add PHY driver support for Maxlinear MxL86252 and MxL86282 switches. The PHYs built-into those switches are just like any other GPY 2.5G PHYs with the exception of the temperature sensor data being encoded in a different way. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/a6cd7fe461b011cec2b59dffaf34e9c8b0819059.1763818120.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: phy: mxl-gpy: add support for MxL86211CChad Monroe1-0/+24
MxL86211C is a smaller and more efficient version of the GPY211C. Add the PHY ID and phy_driver instance to the mxl-gpy driver. Signed-off-by: Chad Monroe <chad@monroe.io> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/cabf3559d6511bed6b8a925f540e3162efc20f6b.1763818120.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: sxgbe: fix potential NULL dereference in sxgbe_rx()Alexey Kodanev1-1/+3
Currently, when skb is null, the driver prints an error and then dereferences skb on the next line. To fix this, let's add a 'break' after the error message to switch to sxgbe_rx_refill(), which is similar to the approach taken by the other drivers in this particular case, e.g. calxeda with xgmac_rx(). Found during a code review. Fixes: 1edb9ca69e8a ("net: sxgbe: add basic framework for Samsung 10Gb ethernet driver") Signed-off-by: Alexey Kodanev <aleksei.kodanev@bell-sw.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251121123834.97748-1-aleksei.kodanev@bell-sw.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: mdio: remove redundant fwnode cleanupBuday Csaba2-7/+1
Remove redundant fwnode cleanup in of_mdiobus_register_device() and xpcs_plat_init_dev(). mdio_device_free() eventually calls mdio_device_release(), which already performs fwnode_handle_put(), making the manual cleanup unnecessary. Combine fwnode_handle_get() with device_set_node() in of_mdiobus_register_device() for clarity. Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/00847693daa8f7c8ff5dfa19dd35fc712fa4e2b5.1763995734.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: mdio: eliminate kdoc warnings in mdio_device.c and mdio_bus.cBuday Csaba2-7/+55
Fix all warnings reported by scripts/kernel-doc in mdio_device.c and mdio_bus.c Signed-off-by: Buday Csaba <buday.csaba@prolan.hu> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/7ef7b80669da2b899d38afdb6c45e122229c3d8c.1763968667.git.buday.csaba@prolan.hu Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25Merge branch 'net-enetc-add-port-mdio-support-for-both-i-mx94-and-i-mx95'Jakub Kicinski3-3/+215
Wei Fang says: ==================== net: enetc: add port MDIO support for both i.MX94 and i.MX95 The NETC IP has one external master MDIO interface (eMDIO) for managing external PHYs, all ENETC ports share this eMDIO. The EMDIO function and the ENETC port MDIO are the virtual ports of this eMDIO, ENETC can use these virtual ports to access their PHYs. The difference is that EMDIO function is a 'global port', it can access all the PHYs on the eMDIO, so it provides a means for different software modules to share a single set of MDIO signals to access their PHYs. The ENETC port MDIO can only access its own external PHY. Furthermore, its PHY address must be set to its corresponding LaBCR register in IERB module, which is is a 64 KB size page containing registers that are used for pre-boot initialization for all NETC PCIe functions. And this IERB is owned by the host OS and it will be locked after the initialization, so it cannot be configured at running time any more. The port MDIO can only work properly when the PHY address accessed by it matches the value of its corresponding LaBCR[MDIO_PHYAD_PRTAD]. Otherwise, the MDIO access by the port MDIO will not take effect. Note that the same PHY is either controlled by port MDIO or by the EMDIO function. The netc-blk-ctrl driver will only set the PHY address in the LaBCR register corresponding to the ENETC when the ENETC node contains an mdio child node, and the ENETC driver will only create the port MDIO bus then. An example in DTS is as follows, the EMDIO function will not\ access this PHY. enetc_port0 { phy-handle = <&ethphy0>; phy-mode = "rgmii-id"; mdio { #address-cells = <1>; #size-cells = <0>; ethphy0: ethernet-phy@1 { reg = <1>; }; }; }; If users want to use EMDIO funtion to manage the PHY, they only need to place the PHY node in the emdio node. The same PHY must not be placed simultaneously within the ENETC node. An example in DTS to use EMDIO is as below. netc_emdio { ethphy0: ethernet-phy@1 { reg = <1>; }; ethphy2: ethernet-phy@8 { reg = <8>; }; }; In the host OS, when there are multiple ENETCs, they can all access their PHYs using their own port MDIO, or they can all access their PHYs using the EMDIO function, or they can partially use port MDIO and partially use the EMDIO function. Another typical use case of port MDIO is the Jailhouse usage. An ENETC is assigned to a guest OS. The EMDIO function will be unavailable in the guest OS because EMDIO is controlled by the host OS. Therefore, the ENETC can use its port MDIO to manage its external PHY in this situation. In this use case, the host OS's root dtb will disable the ENETC node, so the host OS's ENETC driver will not probe the ENETC and its PHY. In addition, this series also adds the internal MDIO bus support, each ENETC has an internal MDIO interface for managing on-die PHY (PCS) if it has PCS layer. ==================== Link: https://patch.msgid.link/20251119102557.1041881-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: enetc: update the base address of port MDIO registers for ENETC v4Wei Fang2-2/+18
Each ENETC has a set of external MDIO registers to access its external PHY based on its port EMDIO bus, these registers are used for MDIO bus access, such as setting the PHY address, PHY register address and value, read or write operations, C22 or C45 format, etc. The base address of this set of registers has been modified in ENETC v4 and is different from that in ENETC v1. So the base address needs to be updated so that ENETC v4 can use port MDIO to manage its own external PHY. Additionally, if ENETC has the PCS layer, it also has a set of internal MDIO registers for managing its on-die PHY (PCS/Serdes). The base address of this set of registers is also different from that of ENETC v1, so the base address also needs to be updated so that ENETC v4 can support the management of on-die PHY through the internal MDIO bus. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-4-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: enetc: set external PHY address in IERB for i.MX94 ENETCWei Fang1-0/+57
NETC IP has only one external master MDIO interface (eMDIO) for managing the external PHYs. ENETC can use the interfaces provided by the EMDIO function or its port MDIO to access and manage its external PHY. Both the EMDIO function and the port MDIO are all virtual ports of the eMDIO. The difference is that the EMDIO function is a 'global port', it can access all the PHYs on the eMDIO, but port MDIO can only access its own PHY. To ensure that ENETC can only access its own PHY through port MDIO, LaBCR[MDIO_PHYAD_PRTAD] needs to be set, which represents the address of the external PHY connected to ENETC. If the accessed PHY address is not consistent with LaBCR[MDIO_PHYAD_PRTAD], then the MDIO access initiated by port MDIO will be invalid. Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-3-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: enetc: set the external PHY address in IERB for port MDIO usageWei Fang1-1/+140
The ENETC supports managing its own external PHY through its port MDIO functionality. To use this function, the PHY address needs be set in the corresponding LaBCR register in the Integrated Endpoint Register Block (IERB), which is used for pre-boot initialization of NETC PCIe functions. The port MDIO can only work properly when the PHY address accessed by the port MDIO matches the corresponding LaBCR[MDIO_PHYAD_PRTAD] value. Because the ENETC driver only registers the MDIO bus (port MDIO bus) when it detects an MDIO child node in its node, similarly, the netc-blk-ctrl driver only resolves the PHY address and sets it in the corresponding LaBCR when it detects an MDIO child node in the ENETC node. Co-developed-by: Aziz Sellami <aziz.sellami@nxp.com> Signed-off-by: Aziz Sellami <aziz.sellami@nxp.com> Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com> Link: https://patch.msgid.link/20251119102557.1041881-2-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25Merge branch 'improvements-over-dsa-conduit-ethtool-ops'Jakub Kicinski1-43/+102
Vladimir Oltean says: ==================== Improvements over DSA conduit ethtool ops DSA interceps 'ethtool -S eth0', where eth0 is the host port of the switch (called 'conduit'). It does this because otherwise there is no way to report port counters for the CPU port, which is a MAC like any other of that switch, except Linux exposes no net_device for it, thus no ethtool hook. Having understood all downsides of this debugging interface, when we need it we needed, so the proposed changes here are to make it more useful by dumping more counters in it: not just the switch CPU port, but all other switch ports in the tree which lack a net_device. Not reinventing any wheel, just putting more output in an existing command. That is patch 3/3. The other 2 are cleanup. ==================== Link: https://patch.msgid.link/20251122112311.138784-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: dsa: append ethtool counters of all hidden ports to conduitVladimir Oltean1-33/+93
Currently there is no way to see packet counters on cascade ports, and no clarity on how the API for that would look like. Because it's something that is currently needed, just extend the hack where ethtool -S on the conduit interface dumps CPU port counters, and also use it to dump counters of cascade ports. Note that the "pXX_" naming convention changes to "sXX_pYY", to distinguish between ports having the same index but belonging to different switches. This has a slight chance of causing regressions to existing tooling: - grepping for "p04_counter_name" still works, but might return more than one string now - grepping for " p04_counter_name" no longer works Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: dsa: use kernel data types for ethtool ops on conduitVladimir Oltean1-6/+5
Suppress some checkpatch 'CHECK' messages about u8 being preferable over uint8_t, etc. No functional change. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net: dsa: cpu_dp->orig_ethtool_ops might be NULLVladimir Oltean1-7/+7
In theory this would have been seen by now, but it seems that all drivers used as DSA conduit interfaces thus far have had ethtool_ops set, and it's hard to even find modern Ethernet drivers (and not VF ones) which don't use ethtool. Here is the unfiltered list of drivers which register any sort of net_device but don't set its ethtool_ops pointer. I don't think any of them 'risks' being used as a DSA conduit, maybe except for moxart, rnpbge and icssm, I'm not sure. - drivers/net/can/dev/dev.c - drivers/net/wwan/qcom_bam_dmux.c - drivers/net/wwan/t7xx/t7xx_netdev.c - drivers/net/arcnet/arcnet.c - drivers/net/hamradio/ - drivers/net/slip/slip.c - drivers/net/ethernet/ezchip/nps_enet.c - drivers/net/ethernet/moxa/moxart_ether.c - drivers/net/ethernet/wangxun/txgbevf/txgbevf_main.c - drivers/net/ethernet/wangxun/ngbevf/ngbevf_main.c - drivers/net/ethernet/huawei/hinic3/hinic3_main.c - drivers/net/ethernet/i825xx/ - drivers/net/ethernet/ti/icssm/icssm_prueth.c - drivers/net/ethernet/seeq/ - drivers/net/ethernet/litex/litex_liteeth.c - drivers/net/ethernet/sunplus/spl2sw_driver.c - drivers/net/ethernet/mucse/rnpgbe/rnpgbe_main.c - drivers/net/ipa/ - drivers/net/wireless/microchip/wilc1000/ - drivers/net/wireless/mediatek/mt76/dma.c - drivers/net/wireless/ath/ath12k/ - drivers/net/wireless/ath/ath11k/ - drivers/net/wireless/ath/ath6kl/ - drivers/net/wireless/ath/ath10k/ - drivers/net/wireless/intel/iwlwifi/pcie/gen1_2/trans.c - drivers/net/wireless/virtual/mac80211_hwsim.c - drivers/net/wireless/quantenna/qtnfmac/pcie/pcie.c - drivers/net/wireless/realtek/rtw89/core.c - drivers/net/wireless/realtek/rtw88/pci.c - drivers/net/caif/ - drivers/net/plip/ - drivers/net/wan/ - drivers/net/mctp/ - drivers/net/ppp/ - drivers/net/thunderbolt/ Nonetheless, it's good for the framework not to make such assumptions, and not panic when coming across such kind of host device in the future. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20251122112311.138784-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25team: Move team device type change at the end of team_port_addNikola Z. Ivanov1-8/+15
Attempting to add a port device that is already up will expectedly fail, but not before modifying the team device header_ops. In the case of the syzbot reproducer the gre0 device is already in state UP when it attempts to add it as a port device of team0, this fails but before that header_ops->create of team0 is changed from eth_header to ipgre_header in the call to team_dev_type_check_change. Later when we end up in ipgre_header() struct ip_tunnel* points to nonsense as the private data of the device still holds a struct team. Example sequence of iproute2 commands to reproduce the hang/BUG(): ip link add dev team0 type team ip link add dev gre0 type gre ip link set dev gre0 up ip link set dev gre0 master team0 ip link set dev team0 up ping -I team0 1.1.1.1 Move team_dev_type_check_change down where all other checks have passed as it changes the dev type with no way to restore it in case one of the checks that follow it fail. Also make sure to preserve the origial mtu assignment: - If port_dev is not the same type as dev, dev takes mtu from port_dev - If port_dev is the same type as dev, port_dev takes mtu from dev This is done by adding a conditional before the call to dev_set_mtu to prevent it from assigning port_dev->mtu = dev->mtu and instead letting team_dev_type_check_change assign dev->mtu = port_dev->mtu. The conditional is needed because the patch moves the call to team_dev_type_check_change past dev_set_mtu. Testing: - team device driver in-tree selftests - Add/remove various devices as slaves of team device - syzbot Reported-by: syzbot+a2a3b519de727b0f7903@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a2a3b519de727b0f7903 Fixes: 1d76efe1577b ("team: add support for non-ethernet devices") Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20251122002027.695151-1-zlatistiv@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25cxgb4: Rename sched_class to avoid type clashAlan Maguire5-32/+32
drivers/net/ethernet/chelsio/cxgb4/sched.h declares a sched_class struct which has a type name clash with struct sched_class in kernel/sched/sched.h (a type used in a field in task_struct). When cxgb4 is a builtin we end up with both sched_class types, and as a result of this we wind up with DWARF (and derived from that BTF) with a duplicate incorrect task_struct representation. When cxgb4 is built-in this type clash can cause kernel builds to fail as resolve_btfids will fail when confused which task_struct to use. See [1] for more details. As such, renaming sched_class to ch_sched_class (in line with other structs like ch_sched_flowc) makes sense. [1] https://lore.kernel.org/bpf/2412725b-916c-47bd-91c3-c2d57e3e6c7b@acm.org/ Reported-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Potnuri Bharat Teja <bharat@chelsio.com> Link: https://patch.msgid.link/20251121181231.64337-1-alan.maguire@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25r8169: add support for RTL9151AJaven Xu1-0/+3
This adds support for chip RTL9151A. Its XID is 0x68b. It is bascially basd on the one with XID 0x688, but with different firmware file. Signed-off-by: Javen Xu <javen_xu@realsil.com.cn> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Link: https://patch.msgid.link/20251121090104.3753-1-javen_xu@realsil.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25net/mlx5e: Fix validation logic in rate limitingDanielle Costantino1-1/+1
The rate limiting validation condition currently checks the output variable max_bw_value[i] instead of the input value maxrate->tc_maxrate[i]. This causes the validation to compare an uninitialized or stale value rather than the actual requested rate. The condition should check the input rate to properly validate against the upper limit: } else if (maxrate->tc_maxrate[i] <= upper_limit_gbps) { This aligns with the pattern used in the first branch, which correctly checks maxrate->tc_maxrate[i] against upper_limit_mbps. The current implementation can lead to unreliable validation behavior: - For rates between 25.5 Gbps and 255 Gbps, if max_bw_value[i] is 0 from initialization, the GBPS path may be taken regardless of whether the actual rate is within bounds - When processing multiple TCs (i > 0), max_bw_value[i] contains the value computed for the previous TC, affecting the validation logic - The overflow check for rates exceeding 255 Gbps may not trigger consistently depending on previous array values This patch ensures the validation correctly examines the requested rate value for proper bounds checking. Fixes: 43b27d1bd88a ("net/mlx5e: Fix wraparound in rate limiting for values above 255 Gbps") Signed-off-by: Danielle Costantino <dcostantino@meta.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Link: https://patch.msgid.link/20251124180043.2314428-1-dcostantino@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25smb: client: fix memory leak in cifs_construct_tcon()Paulo Alcantara1-0/+1
When having a multiuser mount with domain= specified and using cifscreds, cifs_set_cifscreds() will end up setting @ctx->domainname, so it needs to be freed before leaving cifs_construct_tcon(). This fixes the following memory leak reported by kmemleak: mount.cifs //srv/share /mnt -o domain=ZELDA,multiuser,... su - testuser cifscreds add -d ZELDA -u testuser ... ls /mnt/1 ... umount /mnt echo scan > /sys/kernel/debug/kmemleak cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8881203c3f08 (size 8): comm "ls", pid 5060, jiffies 4307222943 hex dump (first 8 bytes): 5a 45 4c 44 41 00 cc cc ZELDA... backtrace (crc d109a8cf): __kmalloc_node_track_caller_noprof+0x572/0x710 kstrdup+0x3a/0x70 cifs_sb_tlink+0x1209/0x1770 [cifs] cifs_get_fattr+0xe1/0xf50 [cifs] cifs_get_inode_info+0xb5/0x240 [cifs] cifs_revalidate_dentry_attr+0x2d1/0x470 [cifs] cifs_getattr+0x28e/0x450 [cifs] vfs_getattr_nosec+0x126/0x180 vfs_statx+0xf6/0x220 do_statx+0xab/0x110 __x64_sys_statx+0xd5/0x130 do_syscall_64+0xbb/0x380 entry_SYSCALL_64_after_hwframe+0x77/0x7f Fixes: f2aee329a68f ("cifs: set domainName when a domain-key is used in multiuser") Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Reviewed-by: David Howells <dhowells@redhat.com> Cc: Jay Shin <jaeshin@redhat.com> Cc: stable@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
2025-11-25Merge branch 'general-enhancements-to-rqspinlock-stress-test'Alexei Starovoitov1-3/+117
Kumar Kartikeya Dwivedi says: ==================== General enhancements to rqspinlock stress test Three enchancements, details in commit messages. First, the CPU requirements are 2 for AA, 3 for ABBA, and 4 for ABBCCA, hence relax the check during module initialization. Second, add a per-CPU histogram to capture lock acquisition times to record which buckets these acquisitions fall into for the normal task context and NMI context. Anything below 10ms is not printed in detail, but above that displays the full breakdown for each context. Finally, make the delay of the NMI and task contexts configurable, set to 10 and 20 ms respectively by default. ==================== Link: https://patch.msgid.link/20251125020749.2421610-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25selftests/bpf: Make CS length configurable for rqspinlock stress testKumar Kartikeya Dwivedi1-2/+12
Allow users to configure the critical section delay for both task/normal and NMI contexts, and set to 20ms and 10ms as before by default. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251125020749.2421610-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25selftests/bpf: Add lock wait time stats to rqspinlock stress testKumar Kartikeya Dwivedi1-0/+104
Add statistics per-CPU broken down by context and various timing windows for the time taken to acquire an rqspinlock. Cases where all acquisitions fit into the 10ms window are skipped from printing, otherwise the full breakdown is displayed when printing the summary. This allows capturing precisely the number of times outlier attempts happened for a given lock in a given context. A critical detail is that time is captured regardless of success or failure, which is important to capture events for failed but long waiting timeout attempts. Output: [ 64.279459] rqspinlock acquisition latency histogram (ms): [ 64.279472] cpu1: total 528426 (normal 526559, nmi 1867) [ 64.279477] 0-1ms: total 524697 (normal 524697, nmi 0) [ 64.279480] 2-2ms: total 3652 (normal 1811, nmi 1841) [ 64.279482] 3-3ms: total 66 (normal 47, nmi 19) [ 64.279485] 4-4ms: total 2 (normal 1, nmi 1) [ 64.279487] 5-5ms: total 1 (normal 1, nmi 0) [ 64.279489] 6-6ms: total 1 (normal 0, nmi 1) [ 64.279490] 101-150ms: total 1 (normal 0, nmi 1) [ 64.279492] >= 251ms: total 6 (normal 2, nmi 4) ... Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251125020749.2421610-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25selftests/bpf: Relax CPU requirements for rqspinlock stress testKumar Kartikeya Dwivedi1-1/+1
Only require 2 CPUs for AA, 3 for ABBA, 4 for ABBCCA, which is calculated nicely by adding to the mode enum. Enables running single CPU AA tests. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20251125020749.2421610-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25bpf: Introduce internal bpf_map_check_op_flags helper functionLeon Hwang2-23/+22
It is to unify map flags checking for lookup_elem, update_elem, lookup_batch and update_batch APIs. Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/r/20251125145857.98134-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-11-25libbpf: Fix some incorrect @param descriptions in the comment of libbpf.hJianyun Gao1-11/+16
Fix up some of missing or incorrect @param descriptions for libbpf public APIs in libbpf.h. Signed-off-by: Jianyun Gao <jianyungao89@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251118033025.11804-1-jianyungao89@gmail.com
2025-11-25selftests/bpf: Call bpf_get_numa_node_id() in trigger_count()Menglong Dong2-4/+6
The bench test "trig-kernel-count" can be used as a baseline comparison for fentry and other benchmarks, and the calling to bpf_get_numa_node_id() should be considered as composition of the baseline. So, let's call it in trigger_count(). Meanwhile, rename trigger_count() to trigger_kernel_count() to make it easier understand. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251116014242.151110-1-dongml2@chinatelecom.cn
2025-11-25docs: bpf: map_array: Specify BPF_MAP_TYPE_PERCPU_ARRAY value size limitAlex Tran1-2/+3
Specify value size limit for BPF_MAP_TYPE_PERCPU_ARRAY which is PCPU_MIN_UNIT_SIZE (32 kb). In percpu allocator (mm: percpu), any request with a size greater than PCPU_MIN_UNIT_SIZE is rejected. Signed-off-by: Alex Tran <alex.t.tran@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20251115063531.2302903-1-alex.t.tran@gmail.com
2025-11-25ACPICA: Avoid walking the Namespace if start_node is NULLCryolitia PukNgae1-3/+6
Although commit 0c9992315e73 ("ACPICA: Avoid walking the ACPI Namespace if it is not there") fixed the situation when both start_node and acpi_gbl_root_node are NULL, the Linux kernel mainline now still crashed on Honor Magicbook 14 Pro [1]. That happens due to the access to the member of parent_node in acpi_ns_get_next_node(). The NULL pointer dereference will always happen, no matter whether or not the start_node is equal to ACPI_ROOT_OBJECT, so move the check of start_node being NULL out of the if block. Unfortunately, all the attempts to contact Honor have failed, they refused to provide any technical support for Linux. The bad DSDT table's dump could be found on GitHub [2]. DMI: HONOR FMB-P/FMB-P-PCB, BIOS 1.13 05/08/2025 Link: https://github.com/acpica/acpica/commit/1c1b57b9eba4554cb132ee658dd942c0210ed20d Link: https://gist.github.com/Cryolitia/a860ffc97437dcd2cd988371d5b73ed7 [1] Link: https://github.com/denis-bb/honor-fmb-p-dsdt [2] Signed-off-by: Cryolitia PukNgae <cryolitia.pukngae@linux.dev> Reviewed-by: WangYuli <wangyl5933@chinaunicom.cn> [ rjw: Subject adjustment, changelog edits ] Link: https://patch.msgid.link/20251125-acpica-v1-1-99e63b1b25f8@linux.dev Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25tracing: Fix WARN_ON in tracing_buffers_mmap_close for split VMAsDeepanshu Kartikey1-0/+10
When a VMA is split (e.g., by partial munmap or MAP_FIXED), the kernel calls vm_ops->close on each portion. For trace buffer mappings, this results in ring_buffer_unmap() being called multiple times while ring_buffer_map() was only called once. This causes ring_buffer_unmap() to return -ENODEV on subsequent calls because user_mapped is already 0, triggering a WARN_ON. Trace buffer mappings cannot support partial mappings because the ring buffer structure requires the complete buffer including the meta page. Fix this by adding a may_split callback that returns -EINVAL to prevent VMA splits entirely. Cc: stable@vger.kernel.org Fixes: cf9f0f7c4c5bb ("tracing: Allow user-space mapping of the ring-buffer") Link: https://patch.msgid.link/20251119064019.25904-1-kartikey406@gmail.com Closes: https://syzkaller.appspot.com/bug?extid=a72c325b042aae6403c7 Tested-by: syzbot+a72c325b042aae6403c7@syzkaller.appspotmail.com Reported-by: syzbot+a72c325b042aae6403c7@syzkaller.appspotmail.com Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-25drm/xe: Fix conversion from clock ticks to millisecondsHarish Chegondi1-6/+1
When tick counts are large and multiplication by MSEC_PER_SEC is larger than 64 bits, the conversion from clock ticks to milliseconds can go bad. Use mul_u64_u32_div() instead. Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: 49cc215aad7f ("drm/xe: Add xe_gt_clock_interval_to_ms helper") Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/1562f1b62d5be3fbaee100f09107f3cc49e40dd1.1763408584.git.harish.chegondi@intel.com (cherry picked from commit 96b93ac214f9dd66294d975d86c5dee256faef91) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-25drm/xe/guc: Fix stack_depot usageLucas De Marchi1-0/+3
Add missing stack_depot_init() call when CONFIG_DRM_XE_DEBUG_GUC is enabled to fix the following call stack: [] BUG: kernel NULL pointer dereference, address: 0000000000000000 [] Workqueue: drm_sched_run_job_work [gpu_sched] [] RIP: 0010:stack_depot_save_flags+0x172/0x870 [] Call Trace: [] <TASK> [] fast_req_track+0x58/0xb0 [xe] Fixes: 16b7e65d299d ("drm/xe/guc: Track FAST_REQ H2Gs to report where errors came from") Tested-by: Sagar Ghuge <sagar.ghuge@intel.com> Cc: stable@vger.kernel.org # v6.17+ Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20251118-fix-debug-guc-v1-1-9f780c6bedf8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 64fdf496a6929a0a194387d2bb5efaf5da2b542f) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-25drm/xe/guc: Fix resource leak in xe_guc_ct_init_noalloc()Shuicheng Lin1-6/+6
xe_guc_ct_init_noalloc() allocates the CT workqueue and other helpers before it tries to initialize ct->lock. If drmm_mutex_init() fails we currently bail out without releasing those resources because the guc_ct_fini() hasn’t been registered yet. Since destroy_workqueue() in guc_ct_fini() may flush the workqueue, which in turn can take the ct lock, the initialization sequence is restructured to first initialize the ct->lock, then set up all CT state, and finally register guc_ct_fini(). v2: guc_ct_fini() does take ct lock. (Matt) v3: move primelockdep() together with drmm_mutex_init(). (Lucas) Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patch.msgid.link/20251110184522.1581001-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 2e4ad5b0667244f496783c58de0995b9562d3344) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-25sched/mmcid: Switch over to the new mechanismThomas Gleixner5-116/+103
Now that all pieces are in place, change the implementations of sched_mm_cid_fork() and sched_mm_cid_exit() to adhere to the new strict ownership scheme and switch context_switch() over to use the new mm_cid_schedin() functionality. The common case is that there is no mode change required, which makes fork() and exit() just update the user count and the constraints. In case that a new user would exceed the CID space limit the fork() context handles the transition to per CPU mode with mm::mm_cid::mutex held. exit() handles the transition back to per task mode when the user count drops below the switch back threshold. fork() might also be forced to handle a deferred switch back to per task mode, when a affinity change increased the number of allowed CPUs enough. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.280380631@linutronix.de
2025-11-25sched/mmcid: Implement deferred mode changeThomas Gleixner2-7/+59
When affinity changes cause an increase of the number of CPUs allowed for tasks which are related to a MM, that might results in a situation where the ownership mode can go back from per CPU mode to per task mode. As affinity changes happen with runqueue lock held there is no way to do the actual mode change and required fixup right there. Add the infrastructure to defer it to a workqueue. The scheduled work can race with a fork() or exit(). Whatever happens first takes care of it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.216484739@linutronix.de
2025-11-25irqwork: Move data struct to a types headerThomas Gleixner2-7/+16
... to avoid header recursion hell. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.152813625@linutronix.de
2025-11-25sched/mmcid: Provide CID ownership mode fixup functionsThomas Gleixner2-26/+259
CIDs are either owned by tasks or by CPUs. The ownership mode depends on the number of tasks related to a MM and the number of CPUs on which these tasks are theoretically allowed to run on. Theoretically because that number is the superset of CPU affinities of all tasks which only grows and never shrinks. Switching to per CPU mode happens when the user count becomes greater than the maximum number of CIDs, which is calculated by: opt_cids = min(mm_cid::nr_cpus_allowed, mm_cid::users); max_cids = min(1.25 * opt_cids, nr_cpu_ids); The +25% allowance is useful for tight CPU masks in scenarios where only a few threads are created and destroyed to avoid frequent mode switches. Though this allowance shrinks, the closer opt_cids becomes to nr_cpu_ids, which is the (unfortunate) hard ABI limit. At the point of switching to per CPU mode the new user is not yet visible in the system, so the task which initiated the fork() runs the fixup function: mm_cid_fixup_tasks_to_cpu() walks the thread list and either transfers each tasks owned CID to the CPU the task runs on or drops it into the CID pool if a task is not on a CPU at that point in time. Tasks which schedule in before the task walk reaches them do the handover in mm_cid_schedin(). When mm_cid_fixup_tasks_to_cpus() completes it's guaranteed that no task related to that MM owns a CID anymore. Switching back to task mode happens when the user count goes below the threshold which was recorded on the per CPU mode switch: pcpu_thrs = min(opt_cids - (opt_cids / 4), nr_cpu_ids / 2); This threshold is updated when a affinity change increases the number of allowed CPUs for the MM, which might cause a switch back to per task mode. If the switch back was initiated by a exiting task, then that task runs the fixup function. If it was initiated by a affinity change, then it's run either in the deferred update function in context of a workqueue or by a task which forks a new one or by a task which exits. Whatever happens first. mm_cid_fixup_cpus_to_task() walks through the possible CPUs and either transfers the CPU owned CIDs to a related task which runs on the CPU or drops it into the pool. Tasks which schedule in on a CPU which the walk did not cover yet do the handover themselves. This transition from CPU to per task ownership happens in two phases: 1) mm:mm_cid.transit contains MM_CID_TRANSIT. This is OR'ed on the task CID and denotes that the CID is only temporarily owned by the task. When it schedules out the task drops the CID back into the pool if this bit is set. 2) The initiating context walks the per CPU space and after completion clears mm:mm_cid.transit. After that point the CIDs are strictly task owned again. This two phase transition is required to prevent CID space exhaustion during the transition as a direct transfer of ownership would fail if two tasks are scheduled in on the same CPU before the fixup freed per CPU CIDs. When mm_cid_fixup_cpus_to_tasks() completes it's guaranteed that no CID related to that MM is owned by a CPU anymore. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.088189028@linutronix.de
2025-11-25sched/mmcid: Provide new scheduler CID mechanismThomas Gleixner4-10/+168
The MM CID management has two fundamental requirements: 1) It has to guarantee that at no given point in time the same CID is used by concurrent tasks in userspace. 2) The CID space must not exceed the number of possible CPUs in a system. While most allocators (glibc, tcmalloc, jemalloc) do not care about that, there seems to be at least some LTTng library depending on it. The CID space compaction itself is not a functional correctness requirement, it is only a useful optimization mechanism to reduce the memory foot print in unused user space pools. The optimal CID space is: min(nr_tasks, nr_cpus_allowed); Where @nr_tasks is the number of actual user space threads associated to the mm and @nr_cpus_allowed is the superset of all task affinities. It is growth only as it would be insane to take a racy snapshot of all task affinities when the affinity of one task changes just do redo it 2 milliseconds later when the next task changes it's affinity. That means that as long as the number of tasks is lower or equal than the number of CPUs allowed, each task owns a CID. If the number of tasks exceeds the number of CPUs allowed it switches to per CPU mode, where the CPUs own the CIDs and the tasks borrow them as long as they are scheduled in. For transition periods CIDs can go beyond the optimal space as long as they don't go beyond the number of possible CPUs. The current upstream implementation adds overhead into task migration to keep the CID with the task. It also has to do the CID space consolidation work from a task work in the exit to user space path. As that work is assigned to a random task related to a MM this can inflict unwanted exit latencies. Implement the context switch parts of a strict ownership mechanism to address this. This removes most of the work from the task which schedules out. Only during transitioning from per CPU to per task ownership it is required to drop the CID when leaving the CPU to prevent CID space exhaustion. Other than that scheduling out is just a single check and branch. The task which schedules in has to check whether: 1) The ownership mode changed 2) The CID is within the optimal CID space In stable situations this results in zero work. The only short disruption is when ownership mode changes or when the associated CID is not in the optimal CID space. The latter only happens when tasks exit and therefore the optimal CID space shrinks. That mechanism is strictly optimized for the common case where no change happens. The only case where it actually causes a temporary one time spike is on mode changes when and only when a lot of tasks related to a MM schedule exactly at the same time and have eventually to compete on allocating a CID from the bitmap. In the sysbench test case which triggered the spinlock contention in the initial CID code, __schedule() drops significantly in perf top on a 128 Core (256 threads) machine when running sysbench with 255 threads, which fits into the task mode limit of 256 together with the parent thread: Upstream rseq/perf branch +CID rework 0.42% 0.37% 0.32% [k] __schedule Increasing the number of threads to 256, which puts the test process into per CPU mode looks about the same. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172550.023984859@linutronix.de
2025-11-25sched/mmcid: Introduce per task/CPU ownership infrastructureThomas Gleixner4-4/+75
The MM CID management has two fundamental requirements: 1) It has to guarantee that at no given point in time the same CID is used by concurrent tasks in userspace. 2) The CID space must not exceed the number of possible CPUs in a system. While most allocators (glibc, tcmalloc, jemalloc) do not care about that, there seems to be at least librseq depending on it. The CID space compaction itself is not a functional correctness requirement, it is only a useful optimization mechanism to reduce the memory foot print in unused user space pools. The optimal CID space is: min(nr_tasks, nr_cpus_allowed); Where @nr_tasks is the number of actual user space threads associated to the mm and @nr_cpus_allowed is the superset of all task affinities. It is growth only as it would be insane to take a racy snapshot of all task affinities when the affinity of one task changes just do redo it 2 milliseconds later when the next task changes its affinity. That means that as long as the number of tasks is lower or equal than the number of CPUs allowed, each task owns a CID. If the number of tasks exceeds the number of CPUs allowed it switches to per CPU mode, where the CPUs own the CIDs and the tasks borrow them as long as they are scheduled in. For transition periods CIDs can go beyond the optimal space as long as they don't go beyond the number of possible CPUs. The current upstream implementation adds overhead into task migration to keep the CID with the task. It also has to do the CID space consolidation work from a task work in the exit to user space path. As that work is assigned to a random task related to a MM this can inflict unwanted exit latencies. This can be done differently by implementing a strict CID ownership mechanism. Either the CIDs are owned by the tasks or by the CPUs. The latter provides less locality when tasks are heavily migrating, but there is no justification to optimize for overcommit scenarios and thereby penalizing everyone else. Provide the basic infrastructure to implement this: - Change the UNSET marker to BIT(31) from ~0U - Add the ONCPU marker as BIT(30) - Add the TRANSIT marker as BIT(29) That allows to check for ownership trivially and provides a simple check for UNSET as well. The TRANSIT marker is required to prevent CID space exhaustion when switching from per CPU to per task mode. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://patch.msgid.link/20251119172549.960252358@linutronix.de
2025-11-25sched/mmcid: Serialize sched_mm_cid_fork()/exit() with a mutexThomas Gleixner2-0/+24
Prepare for the new CID management scheme which puts the CID ownership transition into the fork() and exit() slow path by serializing sched_mm_cid_fork()/exit() with it, so task list and cpu mask walks can be done in interruptible and preemptible code. The contention on it is not worse than on other concurrency controls in the fork()/exit() machinery. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.895826703@linutronix.de
2025-11-25sched/mmcid: Provide precomputed maximal valueThomas Gleixner4-19/+50
Reading mm::mm_users and mm:::mm_cid::nr_cpus_allowed every time to compute the maximal CID value is just wasteful as that value is only changing on fork(), exit() and eventually when the affinity changes. So it can be easily precomputed at those points and provided in mm::mm_cid for consumption in the hot path. But there is an issue with using mm::mm_users for accounting because that does not necessarily reflect the number of user space tasks as other kernel code can take temporary references on the MM which skew the picture. Solve that by adding a users counter to struct mm_mm_cid, which is modified by fork() and exit() and used for precomputing under mm_mm_cid::lock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.832764634@linutronix.de
2025-11-25sched/mmcid: Move initialization out of lineThomas Gleixner2-14/+15
It's getting bigger soon, so just move it out of line to the rest of the code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.769636491@linutronix.de
2025-11-25signal: Move MMCID exit out of sighand lockThomas Gleixner4-6/+5
There is no need anymore to keep this under sighand lock as the current code and the upcoming replacement are not depending on the exit state of a task anymore. That allows to use a mutex in the exit path. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20251119172549.706439391@linutronix.de
2025-11-25sched/mmcid: Convert mm CID mask to a bitmapThomas Gleixner3-8/+9
This is truly a bitmap and just conveniently uses a cpumask because the maximum size of the bitmap is nr_cpu_ids. But that prevents to do searches for a zero bit in a limited range, which is helpful to provide an efficient mechanism to consolidate the CID space when the number of users decreases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Link: https://patch.msgid.link/20251119172549.642866767@linutronix.de
2025-11-25cpumask: Cache num_possible_cpus()Thomas Gleixner2-2/+27
Reevaluating num_possible_cpus() over and over does not make sense. That becomes a constant after init as cpu_possible_mask is marked ro_after_init. Cache the value during initialization and provide that for consumption. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20251119172549.578653738@linutronix.de
2025-11-25cpuidle: Warn instead of bailing out if target residency check failsRafael J. Wysocki1-10/+8
It turns out that the change in commit 76934e495cdc ("cpuidle: Add sanity check for exit latency and target residency") goes too far because there are systems in the field on which the check introduced by that commit does not pass. For this reason, change __cpuidle_driver_init() return type back to void and make it print a warning when the check mentioned above does not pass. Fixes: 76934e495cdc ("cpuidle: Add sanity check for exit latency and target residency") Reported-by: Val Packett <val@packett.cool> Closes: https://lore.kernel.org/linux-pm/20251121010756.6687-1-val@packett.cool/ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/2808566.mvXUDI8C0e@rafael.j.wysocki
2025-11-25cpuidle: Update header inclusionAndy Shevchenko1-0/+4
While cleaning up some headers, I got a build error on this file: drivers/cpuidle/poll_state.c:52:2: error: call to undeclared library function 'snprintf' with type 'int (char *restrict, unsigned long, const char *restrict, ...)'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] Update header inclusions to follow IWYU (Include What You Use) principle. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251124205752.1328701-1-andriy.shevchenko@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25Documentation: power/cpuidle: Document the CPU system wakeup latency QoSUlf Hansson2-4/+14
Let's document how the new CPU system wakeup latency QoS limit can be used from user space, along with how the constraint is taken into account for s2idle and cpuidle. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-7-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25cpuidle: Respect the CPU system wakeup QoS limit for cpuidleUlf Hansson1-0/+4
The CPU system wakeup QoS limit must be respected for the regular cpuidle state selection. Therefore, let's extend the common governor helper cpuidle_governor_latency_req(), to take the constraint into account. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-6-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25sched: idle: Respect the CPU system wakeup QoS limit for s2idleUlf Hansson3-12/+18
A CPU system wakeup QoS limit may have been requested by user space. To avoid breaking this constraint when entering a low power state during s2idle, let's start to take into account the QoS limit. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-5-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25pmdomain: Respect the CPU system wakeup QoS limit for cpuidleUlf Hansson1-1/+5
The CPU system wakeup QoS limit must be respected for the regular cpuidle state selection. Therefore, let's extend the genpd governor for CPUs to take the constraint into account when it selects a domain idle state for the corresponding PM domain. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-4-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25pmdomain: Respect the CPU system wakeup QoS limit for s2idleUlf Hansson3-2/+36
A CPU system wakeup QoS limit may have been requested by user space. To avoid breaking this constraint when entering a low power state during s2idle through genpd, let's extend the corresponding genpd governor for CPUs. More precisely, during s2idle let the genpd governor select a suitable domain idle state, by taking into account the QoS limit. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-3-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25PM: QoS: Introduce a CPU system wakeup QoS limitUlf Hansson3-0/+126
Some platforms supports multiple low power states for CPUs that can be used when entering system-wide suspend. Currently we are always selecting the deepest possible state for the CPUs, which can break the system wakeup latency constraint that may be required for a use case. Let's take the first step towards addressing this problem, by introducing an interface for user space, that allows us to specify the CPU system wakeup QoS limit. Subsequent changes will start taking into account the new QoS limit. Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Kevin Hilman (TI) <khilman@baylibre.com> Tested-by: Kevin Hilman (TI) <khilman@baylibre.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Link: https://patch.msgid.link/20251125112650.329269-2-ulf.hansson@linaro.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25timekeeping: Fix error code in tk_aux_sysfs_init()Dan Carpenter1-1/+3
If kobject_create_and_add() fails on the first iteration, then the error code is set to -ENOMEM which is correct. But if it fails in subsequent iterations then "ret" is zero, which means success, but it should be -ENOMEM. Set the error code to -ENOMEM correctly. Fixes: 7b5ab04f035f ("timekeeping: Fix resource leak in tk_aux_sysfs_init() error paths") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Malaya Kumar Rout <mrout@redhat.com> Link: https://patch.msgid.link/aSW1R8q5zoY_DgQE@stanley.mountain
2025-11-25Merge tag 'arm64-fixes' of ↵Linus Torvalds3-12/+4
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "We've got a revert due to one of the recent CCA commits breaking ACPI firmware-based error reporting, a fix for a hard-lockup introduced by a prior fix affecting non-default (CONFIG_EXPERT) configurations and another ACPI fix for systems using MMIO-based timers. Other than that, we're looking pretty good. - Avoid hardlockup when CONFIG_MITIGATE_SPECTRE_BRANCH_HISTORY=n - Fix regression in APEI/GHES error handling - Fix MMIO timers when probed via ACPI" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: proton-pack: Fix hard lockup when !MITIGATE_SPECTRE_BRANCH_HISTORY ACPI: GTDT: Correctly number platform devices for MMIO timers Revert "arm64: acpi: Enable ACPI CCEL support"
2025-11-25Merge tag 'for-linus-iommufd' of ↵Linus Torvalds2-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: "Two build fixes, no functional change: - Fix a possible compiler error around counted_by() due to wrong initialization order - Fix a -Wflex-array-member-not-at-end" * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd/iommufd_private.h: Avoid -Wflex-array-member-not-at-end warning iommufd/driver: Fix counter initialization for counted_by annotation
2025-11-25Merge tag 'linux-cpupower-6.19-rc1' of ↵Rafael J. Wysocki1-11/+21
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux Pull a cpupower utility update for 6.19-rc1 from Shuah Khan: "Adds support for building libcpupower statically when STATIC=true is specified during build." * tag 'linux-cpupower-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux: tools/power/cpupower: Support building libcpupower statically
2025-11-25Merge tag 'opp-updates-6.19' of ↵Rafael J. Wysocki4-148/+176
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull OPP updates for 6.19 from Viresh Kumar: "- Minor improvements to the Rust interface (Tamir Duberstein). - Fixes to scope-based pointers (Viresh Kumar)." * tag 'opp-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: rust: opp: simplify callers of `to_c_str_array` OPP: Initialize scope-based pointers inline rust: opp: fix broken rustdoc link
2025-11-25Merge tag 'cpufreq-arm-updates-6.19' of ↵Rafael J. Wysocki7-21/+194
git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm Pull CPUFreq updates for 6.19 from Viresh Kumar: "- tegra186: Add OPP / bandwidth support for Tegra186 (Aaron Kling). - Minor improvements to various cpufreq drivers (Christian Marangi, Hal Feng, Jie Zhan, Marco Crivellari, Miaoqian Lin, and Shuhao Fu)." * tag 'cpufreq-arm-updates-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm: cpufreq: qcom-nvmem: fix compilation warning for qcom_cpufreq_ipq806x_match_list cpufreq: tegra194: add WQ_PERCPU to alloc_workqueue users cpufreq: qcom-nvmem: add compatible fallback for ipq806x for no SMEM cpufreq: CPPC: Don't warn if FIE init fails to read counters cpufreq: nforce2: fix reference count leak in nforce2 cpufreq: tegra186: add OPP support and set bandwidth cpufreq: dt-platdev: Add JH7110S SOC to the allowlist cpufreq: s5pv210: fix refcount leak
2025-11-25Merge branch 'net_sched-speedup-qdisc-dequeue'Paolo Abeni15-80/+147
Eric Dumazet says: ==================== net_sched: speedup qdisc dequeue Avoid up to two cache line misses in qdisc dequeue() to fetch skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held. Idea is to cache gso_segs at enqueue time before spinlock is acquired, in the first skb cache line, where we already have qdisc_skb_cb(skb)->pkt_len. This series gives a 8 % improvement in a TX intensive workload. (120 Mpps -> 130 Mpps on a Turin host, IDPF with 32 TX queues) v2: https://lore.kernel.org/netdev/20251111093204.1432437-1-edumazet@google.com/ v1: https://lore.kernel.org/netdev/20251110094505.3335073-1-edumazet@google.com/T/#m8f562ed148f807c02fd02c6cd243604d449615b9 ==================== Link: https://patch.msgid.link/20251121083256.674562-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: use qdisc_dequeue_drop() in cake, codel, fq_codelEric Dumazet3-3/+10
cake, codel and fq_codel can drop many packets from dequeue(). Use qdisc_dequeue_drop() so that the freeing can happen outside of the qdisc spinlock scope. Add TCQ_F_DEQUEUE_DROPS to sch->flags. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-15-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: add qdisc_dequeue_drop() helperEric Dumazet3-14/+43
Some qdisc like cake, codel, fq_codel might drop packets in their dequeue() method. This is currently problematic because dequeue() runs with the qdisc spinlock held. Freeing skbs can be extremely expensive. Add qdisc_dequeue_drop() method and a new TCQ_F_DEQUEUE_DROPS so that these qdiscs can opt-in to defer the skb frees after the socket spinlock is released. TCQ_F_DEQUEUE_DROPS is an attempt to not penalize other qdiscs with an extra cache line miss. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-14-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: add tcf_kfree_skb_list() helperEric Dumazet2-10/+16
Using kfree_skb_list_reason() to free list of skbs from qdisc operations seems wrong as each skb might have a different drop reason. Cleanup __dev_xmit_skb() to call tcf_kfree_skb_list() once in preparation of the following patch. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-13-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: annotate a data-race in __dev_xmit_skb()Eric Dumazet1-1/+1
q->limit is read locklessly, add a READ_ONCE(). Fixes: 100dfa74cad9 ("net: dev_queue_xmit() llist adoption") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-12-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: prefech skb->priority in __dev_xmit_skb()Eric Dumazet1-0/+1
Most qdiscs need to read skb->priority at enqueue time(). In commit 100dfa74cad9 ("net: dev_queue_xmit() llist adoption") I added a prefetch(next), lets add another one for the second half of skb. Note that skb->priority and skb->hash share a common cache line, so this patch helps qdiscs needing both fields. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-11-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: sch_fq: prefetch one skb ahead in dequeue()Eric Dumazet1-2/+5
prefetch the skb that we are likely to dequeue at the next dequeue(). Also call fq_dequeue_skb() a bit sooner in fq_dequeue(). This reduces the window between read of q.qlen and changes of fields in the cache line that could be dirtied by another cpu trying to queue a packet. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-10-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: sch_fq: move qdisc_bstats_update() to fq_dequeue_skb()Eric Dumazet1-1/+1
Group together changes to qdisc fields to reduce chances of false sharing if another cpu attempts to acquire the qdisc spinlock. qdisc_qstats_backlog_dec(sch, skb); sch->q.qlen--; qdisc_bstats_update(sch, skb); Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-9-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: add Qdisc_read_mostly and Qdisc_write groupsEric Dumazet1-11/+18
It is possible to reorg Qdisc to avoid always dirtying 2 cache lines in fast path by reducing this to a single dirtied cache line. In current layout, we change only four/six fields in the first cache line: - q.spinlock - q.qlen - bstats.bytes - bstats.packets - some Qdisc also change q.next/q.prev In the second cache line we change in the fast path: - running - state - qstats.backlog /* --- cacheline 2 boundary (128 bytes) --- */ struct sk_buff_head gso_skb __attribute__((__aligned__(64))); /* 0x80 0x18 */ struct qdisc_skb_head q; /* 0x98 0x18 */ struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); /* 0xb0 0x10 */ /* --- cacheline 3 boundary (192 bytes) --- */ struct gnet_stats_queue qstats; /* 0xc0 0x14 */ bool running; /* 0xd4 0x1 */ /* XXX 3 bytes hole, try to pack */ unsigned long state; /* 0xd8 0x8 */ struct Qdisc * next_sched; /* 0xe0 0x8 */ struct sk_buff_head skb_bad_txq; /* 0xe8 0x18 */ /* --- cacheline 4 boundary (256 bytes) --- */ Reorganize things to have a first cache line mostly read, then a mostly written one. This gives a ~3% increase of performance under tx stress. Note that there is an additional hole because @qstats now spans over a third cache line. /* --- cacheline 2 boundary (128 bytes) --- */ __u8 __cacheline_group_begin__Qdisc_read_mostly[0] __attribute__((__aligned__(64))); /* 0x80 0 */ struct sk_buff_head gso_skb; /* 0x80 0x18 */ struct Qdisc * next_sched; /* 0x98 0x8 */ struct sk_buff_head skb_bad_txq; /* 0xa0 0x18 */ __u8 __cacheline_group_end__Qdisc_read_mostly[0]; /* 0xb8 0 */ /* XXX 8 bytes hole, try to pack */ /* --- cacheline 3 boundary (192 bytes) --- */ __u8 __cacheline_group_begin__Qdisc_write[0] __attribute__((__aligned__(64))); /* 0xc0 0 */ struct qdisc_skb_head q; /* 0xc0 0x18 */ unsigned long state; /* 0xd8 0x8 */ struct gnet_stats_basic_sync bstats __attribute__((__aligned__(16))); /* 0xe0 0x10 */ bool running; /* 0xf0 0x1 */ /* XXX 3 bytes hole, try to pack */ struct gnet_stats_queue qstats; /* 0xf4 0x14 */ /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */ __u8 __cacheline_group_end__Qdisc_write[0]; /* 0x108 0 */ /* XXX 56 bytes hole, try to pack */ Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-8-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: cake: use qdisc_pkt_segs()Eric Dumazet1-9/+3
Use new qdisc_pkt_segs() to avoid a cache line miss in cake_enqueue() for non GSO packets. cake_overhead() does not have to recompute it. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-7-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: use qdisc_skb_cb(skb)->pkt_segs in bstats_update()Eric Dumazet7-4/+16
Avoid up to two cache line misses in qdisc dequeue() to fetch skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held. This gives a 5 % improvement in a TX intensive workload. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-6-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: use qdisc_pkt_len_segs_init() in sch_handle_ingress()Eric Dumazet1-1/+1
sch_handle_ingress() sets qdisc_skb_cb(skb)->pkt_len. We also need to initialize qdisc_skb_cb(skb)->pkt_segs. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-5-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: initialize qdisc_skb_cb(skb)->pkt_segs in qdisc_pkt_len_init()Eric Dumazet2-5/+12
qdisc_pkt_len_init() is currently initalizing qdisc_skb_cb(skb)->pkt_len. Add qdisc_skb_cb(skb)->pkt_segs initialization and rename this function to qdisc_pkt_len_segs_init(). Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-4-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: init shinfo->gso_segs from qdisc_pkt_len_init()Eric Dumazet1-1/+2
Qdisc use shinfo->gso_segs for their pkts stats in bstats_update(), but this field needs to be initialized for SKB_GSO_DODGY users. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-3-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net_sched: make room for (struct qdisc_skb_cb)->pkt_segsEric Dumazet5-18/+18
Add a new u16 field, next to pkt_len : pkt_segs This will cache shinfo->gso_segs to speed up qdisc deqeue(). Move slave_dev_queue_mapping at the end of qdisc_skb_cb, and move three bits from tc_skb_cb : - post_ct - post_ct_snat - post_ct_dnat Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251121083256.674562-2-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25Revert "ACPI: processor: idle: Optimize ACPI idle driver registration"Rafael J. Wysocki3-47/+23
Revert commit 7a8c994cbb2d ("ACPI: processor: idle: Optimize ACPI idle driver registration") because it is reported to introduce a cpuidle regression leading to a kernel crash on a platform using the ACPI idle driver. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Closes: https://lore.kernel.org/lkml/20251124200019.GIaSS5U9HhsWBotrQZ@fat_crate.local/
2025-11-25dt-bindings: thermal: qcom-tsens: make ipq5018 tsens standalone compatibleGeorge Moussalem1-1/+6
The tsens IP found in the IPQ5018 SoC should not use qcom,tsens-v1 as fallback since it has no RPM and, as such, must deviate from the standard v1 init routine as this version of tsens needs to be explicitly reset and enabled in the driver. So let's make qcom,ipq5018-tsens a standalone compatible in the bindings. Fixes: 77c6d28192ef ("dt-bindings: thermal: qcom-tsens: Add ipq5018 compatible") Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: George Moussalem <george.moussalem@outlook.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20250818-ipq5018-tsens-fix-v1-1-0f08cf09182d@outlook.com
2025-11-25ALSA: usb-audio: fix uac2 clock source at terminal parserRené Rebe1-1/+1
Since 8b3a087f7f65 ("ALSA: usb-audio: Unify virtual type units type to UAC3 values") usb-audio is using UAC3_CLOCK_SOURCE instead of bDescriptorSubtype, later refactored with e0ccdef9265 ("ALSA: usb-audio: Clean up check_input_term()") into parse_term_uac2_clock_source(). This breaks the clock source selection for at least my 1397:0003 BEHRINGER International GmbH FCA610 Pro. Fix by using UAC2_CLOCK_SOURCE in parse_term_uac2_clock_source(). Fixes: 8b3a087f7f65 ("ALSA: usb-audio: Unify virtual type units type to UAC3 values") Signed-off-by: René Rebe <rene@exactco.de> Link: https://patch.msgid.link/20251125.154149.1121389544970412061.rene@exactco.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2025-11-25net: lan966x: Fix the initialization of taprioHoratiu Vultur1-1/+4
To initialize the taprio block in lan966x, it is required to configure the register REVISIT_DLY. The purpose of this register is to set the delay before revisit the next gate and the value of this register depends on the system clock. The problem is that the we calculated wrong the value of the system clock period in picoseconds. The actual system clock is ~165.617754MHZ and this correspond to a period of 6038 pico seconds and not 15125 as currently set. Fixes: e462b2717380b4 ("net: lan966x: Add offload support for taprio") Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251121061411.810571-1-horatiu.vultur@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25s390/vdso: Get rid of -m64 flag handlingHeiko Carstens1-7/+3
The compiler/assembler flag -m64 is added and removed at two locations. This pointless exercise is a leftover to keep the 31 and 64 bit vdso Makefiles as symmetrical as possible. Given that the 31 bit vdso code does not exist anymore, remove the -m64 flag handling. Suggested-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-25s390/vdso: Rename vdso64 to vdsoHeiko Carstens18-106/+104
Since compat is gone there is only a 64 bit vdso left. Remove the superfluous "64" suffix everywhere. Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-25s390: Rename head64.S to head.SHeiko Carstens3-2/+2
All the code is 64 bit, therefore remove the superfluous suffix. Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-25s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macrosJens Remus1-40/+3
This simplifies the vDSO linker script. The ELF_DETAILS macro was not used in addition, as done on arm64 and powerpc, as that would introduce an empty .modinfo section. Note that this rearranges the .comment section to follow after all of the debug sections. Signed-off-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-25Revert "ACPI: processor: Remove unused empty stubs of some functions"Rafael J. Wysocki1-0/+20
Revert commit 5020d05b3476 ("ACPI: processor: Remove unused empty stubs of some functions") because it depends on a problematic one. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25io_uring: fix mixed cqe overflow handlingPavel Begunkov1-0/+2
I started to see zcrx data corruptions. That turned out to be due to CQ tail pointing to a stale entry which happened to be from a zcrx request. I.e. the tail is incremented without the CQE memory being changed. The culprit is __io_cqring_overflow_flush() passing "cqe32=true" to io_get_cqe_overflow() for non-mixed CQE32 setups, which only expects it to be set for mixed 32B CQEs and not for SETUP_CQE32. The fix is slightly hacky, long term it's better to unify mixed and CQE32 handling. Fixes: e26dca67fde19 ("io_uring: add support for IORING_SETUP_CQE_MIXED") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-11-25Revert "ACPI: processor: idle: Rearrange declarations in header file"Rafael J. Wysocki1-2/+5
Revert commit bdf780fbcef5 ("ACPI: processor: idle: Rearrange declarations in header file") because it depends on a problematic one. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25Revert "ACPI: processor: idle: Redefine two functions as void"Rafael J. Wysocki2-21/+24
Revert commit fbd401e95e56 ("ACPI: processor: idle: Redefine two functions as void") because it depends on a problematic one. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25Revert "ACPI: processor: Do not expose global variable acpi_idle_driver"Rafael J. Wysocki3-9/+4
Revert commit 559f2eacc8a2 ACPI: processor: Do not expose global variable acpi_idle_driver" because it depends on a problematic one. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-25Merge branch 'slab/for-6.19/mempool_alloc_bulk' into slab/for-nextVlastimil Babka4-195/+295
Merges series "mempool_alloc_bulk and various mempool improvements v3" from Christoph Hellwig. From the cover letter [1]: This series adds a bulk version of mempool_alloc that makes allocating multiple objects deadlock safe. The initial users is the blk-crypto-fallback code: https://lore.kernel.org/linux-block/20251031093517.1603379-1-hch@lst.de/ with which v1 was posted, but I also have a few other users in mind. Link: https://lore.kernel.org/all/20251113084022.1255121-1-hch@lst.de/ [1]
2025-11-25Merge branch 'slab/for-6.19/freelist_aba_t_cleanups' into slab/for-nextVlastimil Babka14-129/+122
Merge series "slab: cmpxchg cleanups enabled by -fms-extensions" From the cover letter [1]: After learning about -fms-extensions being enabled for 6.19, I realized there is some cleanup potential in slub code by extending the definition and usage of freelist_aba_t, as it can now become an unnamed member of struct slab. This series performs the cleanup, with no functional changes intended. Additionally we turn freelist_aba_t to struct freelist_counters as it doesn't meet any criteria for being a typedef, per Documentation/process/coding-style.rst Based on the tag kbuild-ms-extensions-6.19 from git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linuxV Link: https://lore.kernel.org/all/20251107-slab-fms-cleanup-v1-0-650b1491ac9e@suse.cz/#t [1]
2025-11-25Merge branch 'slab/for-6.19/memdesc_prep' into slab/for-nextVlastimil Babka8-175/+157
Merge series "Prepare slab for memdescs" by Matthew Wilcox. From the cover letter [1]: When we separate struct folio, struct page and struct slab from each other, converting to folios then to slabs will be nonsense. It made sense under the 'folio is just a head page' interpretation, but with full separation, page_folio() will return NULL for a page which belongs to a slab. This patch series removes almost all mentions of folio from slab. There are a few folio_test_slab() invocations left around the tree that I haven't decided how to handle yet. We're not yet quite at the point of separately allocating struct slab, but that's what I'll be working on next. Link: https://lore.kernel.org/all/20251113000932.1589073-1-willy@infradead.org/ [1]
2025-11-25Merge branch 'slab/for-6.19/sheaves_cleanups' into slab/for-nextVlastimil Babka3-166/+173
Merge series "slab: preparatory cleanups before adding sheaves to all caches" [1] Cleanups that were written as part of the full sheaves conversion, which is not fully ready yet, but they are useful on their own. Link: https://lore.kernel.org/all/20251105-sheaves-cleanups-v1-0-b8218e1ac7ef@suse.cz/ [1]
2025-11-25drm/i915/psr: Reject async flips when selective fetch is enabledVille Syrjälä2-6/+8
The selective fetch code doesn't handle asycn flips correctly. There is a nonsense check for async flips in intel_psr2_sel_fetch_config_valid() but that only gets called for modesets/fastsets and thus does nothing for async flips. Currently intel_async_flip_check_hw() is very unhappy as the selective fetch code pulls in planes that are not even async flips capable. Reject async flips when selective fetch is enabled, until someone fixes this properly (ie. disable selective fetch while async flips are being issued). Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patch.msgid.link/20251105171015.22234-1-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com> (cherry picked from commit a5f0cc8e0cd4007370af6985cb152001310cf20c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-11-25slab: Remove unnecessary call to compound_head() in alloc_from_pcs()Matthew Wilcox (Oracle)1-1/+1
Each page knows which node it belongs to, so there's no need to convert to a folio. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20251124142329.1691780-1-willy@infradead.org Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2025-11-25mmc: sdhci-of-dwcmshc: Promote the th1520 reset handling to ip levelJisheng Zhang1-12/+17
Commit 27e8fe0da3b7 ("mmc: sdhci-of-dwcmshc: Prevent stale command interrupt handling") clears pending interrupts when resetting host->pending_reset to ensure no pending stale interrupts after sdhci_threaded_irq restores interrupts. But this fix is only added for th1520 platforms, in fact per my test, this issue exists on all dwcmshc users, such as cv1800b, sg2002, and synaptics platforms. So promote the above reset handling from th1520 to ip level. And keep reset handling on rk, sg2042 and bf3 as is, until it's confirmed that the same issue exists on these platforms too. Fixes: 017199c2849c ("mmc: sdhci-of-dwcmshc: Add support for Sophgo CV1800B and SG2002") Signed-off-by: Jisheng Zhang <jszhang@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
2025-11-25Documentation/arm64: Fix the typo of register namesZenon Xiu1-4/+4
The register name 'HWFGWTR_EL2' and 'HWFGRTR_EL2' is wrong, should be 'HFGWTR_EL2' and 'HFGRTR_EL2'. Find the register description on arm website here, https://developer.arm.com/documentation/ddi0601/2025-09/AArch64-Registers/HFGWTR-EL2--Hypervisor-Fine-Grained-Write-Trap-Register https://developer.arm.com/documentation/ddi0601/2025-09/AArch64-Registers/HFGRTR-EL2--Hypervisor-Fine-Grained-Read-Trap-Register?lang=en Signed-off-by: Zenon Xiu <zenonxiu@outlook.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-25ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()Marc Zyngier2-35/+0
Since 0f67b56d84b4c ("clocksource/drivers/arm_arch_timer_mmio: Switch over to standalone driver"), acpi_arch_timer_mem_init() is unused. Remove it. Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Sudeep Holla <sudeep.holla@arm.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Acked-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-11-25net: phy: mxl-gpy: fix link properties on USXGMII and internal PHYsDaniel Golle1-7/+11
gpy_update_interface() returns early in case the PHY is internal or connected via USXGMII. In this case the gigabit master/slave property as well as MDI/MDI-X status also won't be read which seems wrong. Always read those properties by moving the logic to retrieve them to gpy_read_status(). Fixes: fd8825cd8c6fc ("net: phy: mxl-gpy: Add PHY Auto/MDI/MDI-X set driver for GPY211 chips") Fixes: 311abcdddc00a ("net: phy: add support to get Master-Slave configuration") Suggested-by: "Russell King (Oracle)" <linux@armlinux.org.uk> Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/71fccf3f56742116eb18cc070d2a9810479ea7f9.1763650701.git.daniel@makrotopia.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25atm/fore200e: Fix possible data race in fore200e_open()Gui-Dong Han1-0/+2
Protect access to fore200e->available_cell_rate with rate_mtx lock in the error handling path of fore200e_open() to prevent a data race. The field fore200e->available_cell_rate is a shared resource used to track available bandwidth. It is concurrently accessed by fore200e_open(), fore200e_close(), and fore200e_change_qos(). In fore200e_open(), the lock rate_mtx is correctly held when subtracting vcc->qos.txtp.max_pcr from available_cell_rate to reserve bandwidth. However, if the subsequent call to fore200e_activate_vcin() fails, the function restores the reserved bandwidth by adding back to available_cell_rate without holding the lock. This introduces a race condition because available_cell_rate is a global device resource shared across all VCCs. If the error path in fore200e_open() executes concurrently with operations like fore200e_close() or fore200e_change_qos() on other VCCs, a read-modify-write race occurs. Specifically, the error path reads the rate without the lock. If another CPU acquires the lock and modifies the rate (e.g., releasing bandwidth in fore200e_close()) between this read and the subsequent write, the error path will overwrite the concurrent update with a stale value. This results in incorrect bandwidth accounting. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251120120657.2462194-1-hanguidong02@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25Merge branch 'net-dsa-microchip-fix-resource-releases-in-error-path'Paolo Abeni2-29/+24
Bastien Curutchet says: ==================== net: dsa: microchip: Fix resource releases in error path I worked on adding PTP support for the KSZ8463. While doing so, I ran into a few bugs in the resource release process that occur when things go wrong arount IRQ initialization. This small series fixes those bugs. The next series, which will add the PTP support, depend on this one. Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com> --- Bastien Curutchet (Schneider Electric) (5): net: dsa: microchip: common: Fix checks on irq_find_mapping() net: dsa: microchip: ptp: Fix checks on irq_find_mapping() net: dsa: microchip: Don't free uninitialized ksz_irq net: dsa: microchip: Free previously initialized ports on init failures net: dsa: microchip: Fix symetry in ksz_ptp_msg_irq_{setup/free}() drivers/net/dsa/microchip/ksz_common.c | 31 +++++++++++++++---------------- drivers/net/dsa/microchip/ksz_ptp.c | 22 +++++++++------------- 2 files changed, 24 insertions(+), 29 deletions(-) --- base-commit: 09652e543e809c2369dca142fee5d9b05be9bdc7 change-id: 20251031-ksz-fix-db345df7635f Best regards, ==================== Link: https://patch.msgid.link/20251120-ksz-fix-v6-0-891f80ae7f8f@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: dsa: microchip: Fix symetry in ksz_ptp_msg_irq_{setup/free}()Bastien Curutchet (Schneider Electric)1-11/+7
The IRQ numbers created through irq_create_mapping() are only assigned to ptpmsg_irq[n].num at the end of the IRQ setup. So if an error occurs between their creation and their assignment (for instance during the request_threaded_irq() step), we enter the error path and fail to release the newly created virtual IRQs because they aren't yet assigned to ptpmsg_irq[n].num. Move the mapping creation to ksz_ptp_msg_irq_setup() to ensure symetry with what's released by ksz_ptp_msg_irq_free(). In the error path, move the irq_dispose_mapping to the out_ptp_msg label so it will be called only on created IRQs. Cc: stable@vger.kernel.org Fixes: cc13ab18b201 ("net: dsa: microchip: ptp: enable interrupt for timestamping") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com> Link: https://patch.msgid.link/20251120-ksz-fix-v6-5-891f80ae7f8f@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: dsa: microchip: Free previously initialized ports on init failuresBastien Curutchet (Schneider Electric)1-12/+11
If a port interrupt setup fails after at least one port has already been successfully initialized, the gotos miss some resource releasing: - the already initialized PTP IRQs aren't released - the already initialized port IRQs aren't released if the failure occurs in ksz_pirq_setup(). Merge 'out_girq' and 'out_ptpirq' into a single 'port_release' label. Behind this label, use the reverse loop to release all IRQ resources for all initialized ports. Jump in the middle of the reverse loop if an error occurs in ksz_ptp_irq_setup() to only release the port IRQ of the current iteration. Cc: stable@vger.kernel.org Fixes: c9cd961c0d43 ("net: dsa: microchip: lan937x: add interrupt support for port phy link") Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com> Link: https://patch.msgid.link/20251120-ksz-fix-v6-4-891f80ae7f8f@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: dsa: microchip: Don't free uninitialized ksz_irqBastien Curutchet (Schneider Electric)1-1/+1
If something goes wrong at setup, ksz_irq_free() can be called on uninitialized ksz_irq (for example when ksz_ptp_irq_setup() fails). It leads to freeing uninitialized IRQ numbers and/or domains. Use dsa_switch_for_each_user_port_continue_reverse() in the error path to iterate only over the fully initialized ports. Cc: stable@vger.kernel.org Fixes: cc13ab18b201 ("net: dsa: microchip: ptp: enable interrupt for timestamping") Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com> Link: https://patch.msgid.link/20251120-ksz-fix-v6-3-891f80ae7f8f@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: dsa: microchip: ptp: Fix checks on irq_find_mapping()Bastien Curutchet (Schneider Electric)1-2/+2
irq_find_mapping() returns a positive IRQ number or 0 if no IRQ is found but it never returns a negative value. However, during the PTP IRQ setup, we verify that its returned value isn't negative. Fix the irq_find_mapping() check to enter the error path when 0 is returned. Return -EINVAL in such case. Cc: stable@vger.kernel.org Fixes: cc13ab18b201 ("net: dsa: microchip: ptp: enable interrupt for timestamping") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com> Link: https://patch.msgid.link/20251120-ksz-fix-v6-2-891f80ae7f8f@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: dsa: microchip: common: Fix checks on irq_find_mapping()Bastien Curutchet (Schneider Electric)1-4/+4
irq_find_mapping() returns a positive IRQ number or 0 if no IRQ is found but it never returns a negative value. However, on each irq_find_mapping() call, we verify that the returned value isn't negative. Fix the irq_find_mapping() checks to enter error paths when 0 is returned. Return -EINVAL in such cases. CC: stable@vger.kernel.org Fixes: c9cd961c0d43 ("net: dsa: microchip: lan937x: add interrupt support for port phy link") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Bastien Curutchet (Schneider Electric) <bastien.curutchet@bootlin.com> Link: https://patch.msgid.link/20251120-ksz-fix-v6-1-891f80ae7f8f@bootlin.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25net: aquantia: Add missing descriptor cache invalidation on ATL2Kai-Heng Feng4-19/+25
ATL2 hardware was missing descriptor cache invalidation in hw_stop(), causing SMMU translation faults during device shutdown and module removal: [ 70.355743] arm-smmu-v3 arm-smmu-v3.5.auto: event 0x10 received: [ 70.361893] arm-smmu-v3 arm-smmu-v3.5.auto: 0x0002060000000010 [ 70.367948] arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000020000000000 [ 70.374002] arm-smmu-v3 arm-smmu-v3.5.auto: 0x00000000ff9bc000 [ 70.380055] arm-smmu-v3 arm-smmu-v3.5.auto: 0x0000000000000000 [ 70.386109] arm-smmu-v3 arm-smmu-v3.5.auto: event: F_TRANSLATION client: 0001:06:00.0 sid: 0x20600 ssid: 0x0 iova: 0xff9bc000 ipa: 0x0 [ 70.398531] arm-smmu-v3 arm-smmu-v3.5.auto: unpriv data write s1 "Input address caused fault" stag: 0x0 Commit 7a1bb49461b1 ("net: aquantia: fix potential IOMMU fault after driver unbind") and commit ed4d81c4b3f2 ("net: aquantia: when cleaning hw cache it should be toggled") fixed cache invalidation for ATL B0, but ATL2 was left with only interrupt disabling. This allowed hardware to write to cached descriptors after DMA memory was unmapped, triggering SMMU faults. Once cache invalidation is applied to ATL2, the translation fault can't be observed anymore. Add shared aq_hw_invalidate_descriptor_cache() helper and use it in both ATL B0 and ATL2 hw_stop() implementations for consistent behavior. Fixes: e54dcf4bba3e ("net: atlantic: basic A2 init/deinit hw_ops") Tested-by: Carol Soto <csoto@nvidia.com> Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20251120041537.62184-1-kaihengf@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25dt-bindings: net: aspeed: add AST2700 MDIO compatibleJacky Chou1-1/+6
Add "aspeed,ast2700-mdio" compatible to the binding schema with a fallback to "aspeed,ast2600-mdio". Although the MDIO controller on AST2700 is functionally the same as the one on AST2600, it's good practice to add a SoC-specific compatible for new silicon. This allows future driver updates to handle any 2700-specific integration issues without requiring devicetree changes or complex runtime detection logic. For now, the driver continues to bind via the existing "aspeed,ast2600-mdio" compatible, so no driver changes are needed. Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Jacky Chou <jacky_chou@aspeedtech.com> Link: https://patch.msgid.link/20251120-aspeed_mdio_ast2700-v2-1-0d722bfb2c54@aspeedtech.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-25tools/thermal/thermal-engine: Fix format string bug in thermal-engineMalaya Kumar Rout1-1/+1
The error message in the daemon() failure path uses %p format specifier without providing a corresponding pointer argument, resulting in undefined behavior and printing garbage values. Replace %p with %m to properly print the errno error message, which is the intended behavior when daemon() fails. This fix ensures proper error reporting when daemonization fails. Signed-off-by: Malaya Kumar Rout <mrout@redhat.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://patch.msgid.link/20251124104401.374856-1-mrout@redhat.com
2025-11-25wifi: nl80211: vendor-cmd: intel: fix a blank kernel-doc line warningRandy Dunlap1-1/+0
Delete an empty line prevent a kernel-doc warning: Warning: ../include/uapi/linux/nl80211-vnd-intel.h:86 bad line: Fixes: 3d2a2544eae9 ("nl80211: vendor-cmd: add Intel vendor commands for iwlmei usage") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20251125022834.3171742-1-rdunlap@infradead.org Signed-off-by: Johannes Berg <johannes.berg@intel.com>