| Age | Commit message (Collapse) | Author | Files | Lines |
|
The pr_read_keys() interface has a u32 num_keys parameter. The SCSI
PERSISTENT RESERVE IN command has a maximum READ KEYS service action
size of 65536 bytes. Reject num_keys values that are too large to fit
into the SCSI command.
This will become important when pr_read_keys() is exposed to untrusted
userspace via an <linux/pr.h> ioctl.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Since after commit 12e4e8c7ab59 ("io_uring/rw: enable bio caches for
IRQ rw"), bio_put is safe for task and irq context, bio_alloc_bioset is
safe for task context and no one calls in irq context, so we can enable
per cpu bio cache by default.
Benchmarked with t/io_uring and ext4+nvme:
taskset -c 6 /root/fio/t/io_uring -p0 -d128 -b4096 -s1 -c1 -F1 -B1 -R1
-X1 -n1 -P1 /mnt/testfile
base IOPS is 562K, patch IOPS is 574K. The CPU usage of bio_alloc_bioset
decrease from 1.42% to 1.22%.
The worst case is allocate bio in CPU A but free in CPU B, still use
t/io_uring and ext4+nvme:
base IOPS is 648K, patch IOPS is 647K.
Also use fio test ext4/xfs with libaio/sync/io_uring on null_blk and
nvme, no obvious performance regression.
Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Use bio_alloc_bioset for passthru IO by default, so that we can enable
bio cache for irq and polled passthru IO in later.
Signed-off-by: Fengnan Chang <changfengnan@bytedance.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The io_uring_queue_async_work tracepoint event stores an int rw field
that represents whether the work item is hashed. Rename it to "hashed"
and change its type to bool to more accurately reflect its value.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
If a task has a pending signal when create_io_thread() is called,
copy_process() will return -ERESTARTNOINTR. io_should_retry_thread()
will request a retry of create_io_thread() up to WORKER_INIT_LIMIT = 3
times. If all retries fail, the io_uring request will fail with
ECANCELED.
Commit 3918315c5dc ("io-wq: backoff when retrying worker creation")
added a linear backoff to allow the thread to handle its signal before
the retry. However, a thread receiving frequent signals may get unlucky
and have a signal pending at every retry. Since the userspace task
doesn't control when it receives signals, there's no easy way for it to
prevent the create_io_thread() failure due to pending signals. The task
may also lack the information necessary to regenerate the canceled SQE.
So always retry the create_io_thread() on the ERESTART* errors,
analogous to what a fork() syscall would do. EAGAIN can occur due to
various persistent conditions such as exceeding RLIMIT_NPROC, so respect
the WORKER_INIT_LIMIT retry limit for EAGAIN errors.
Signed-off-by: Caleb Sander Mateos <csander@purestorage.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
When the core of io_uring was updated to handle completions
consistently and with fixed return codes, the POLL_REMOVE opcode
with updates got slightly broken. If a POLL_ADD is pending and
then POLL_REMOVE is used to update the events of that request, if that
update causes the POLL_ADD to now trigger, then that completion is lost
and a CQE is never posted.
Additionally, ensure that if an update does cause an existing POLL_ADD
to complete, that the completion value isn't always overwritten with
-ECANCELED. For that case, whatever io_poll_add() set the value to
should just be retained.
Cc: stable@vger.kernel.org
Fixes: 97b388d70b53 ("io_uring: handle completions in the core")
Reported-by: syzbot+641eec6b7af1f62f2b99@syzkaller.appspotmail.com
Tested-by: syzbot+641eec6b7af1f62f2b99@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The only crypto-related functionality this codec uses is the sha256()
function, which is provided by CRYPTO_LIB_SHA256. Originally
CRYPTO_LIB_SHA256 was visible only when CRYPTO; however, that was fixed
years ago and the libraries can now be selected on their own.
So, remove the unnecessary selection of CRYPTO.
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20251204052954.488568-1-ebiggers@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Since commit 94fe479fae96 ("drm/rcar-du: dsi: Clean up handling of DRM mode flags")
the driver does not set TXVMVPRMSET0R_VSPOL_LOW and TXVMVPRMSET0R_HSPOL_LOW
for modes which set neither DRM_MODE_FLAG_[PN].SYNC. The previous behavior
was to assume that neither flag means DRM_MODE_FLAG_N.SYNC . Restore the
previous behavior for maximum compatibility.
The change of behavior is visible below, consider Vertical mode->flags
for simplicity sake, although the same applies to Horizontal ones:
Before 94fe479fae96 ("drm/rcar-du: dsi: Clean up handling of DRM mode flags") :
- DRM_MODE_FLAG_PVSYNC => vprmset0r |= 0
- DRM_MODE_FLAG_NVSYNC => vprmset0r |= TXVMVPRMSET0R_VSPOL_LOW
- Neither DRM_MODE_FLAG_[PN]VSYNC => vprmset0r |= TXVMVPRMSET0R_VSPOL_LOW
After 94fe479fae96 ("drm/rcar-du: dsi: Clean up handling of DRM mode flags") :
- DRM_MODE_FLAG_PVSYNC => vprmset0r |= 0
- DRM_MODE_FLAG_NVSYNC => vprmset0r |= TXVMVPRMSET0R_VSPOL_LOW
- Neither DRM_MODE_FLAG_[PN]VSYNC => vprmset0r |= 0 <---------- This broke
The "Neither" case behavior is different, because DRM_MODE_FLAG_N[HV]SYNC is
really not equivalent !DRM_MODE_FLAG_P[HV]SYNC .
Fixes: 94fe479fae96 ("drm/rcar-du: dsi: Clean up handling of DRM mode flags")
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Link: https://patch.msgid.link/20251202181146.138365-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
|
|
Merge series from Johan Hovold <johan@kernel.org>:
The original wcd938x driver has a couple of OF node reference leaks
which have been reproduced in the two later added drivers.
|
|
Merge series from Shengjiu Wang <shengjiu.wang@nxp.com>:
Disable regulator when error happens to balance the reference count.
|
|
The check that determines if the message that warns about the per-dentry
timeout being greater than the super block timeout is not correct.
The initial value for this field is -1 and the type of the field is
unsigned long.
I could change the type to long but the message is in the wrong place
too, it should come after the timeout setting. So leave everything else
as it is and move the message and check the timeout is actually set
as an additional condition on issuing the message. Also fix the timeout
comparison.
Signed-off-by: Ian Kent <raven@themaw.net>
Link: https://patch.msgid.link/20251111060439.19593-2-raven@themaw.net
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Events may fail to open as no supported CPUs were specified on the
command line. In this case a confusing "error" message of "success"
can be reported. Let's skip the error in that case.
Before:
```
$ perf stat -C2048 -e cycles -- true
WARNING: A requested CPU in '2048' is not supported by PMU 'cpu' (CPUs 0-7) for event 'cycles'
Error:
No supported events found.
The sys_perf_event_open() syscall returned with 0 (Success) for event (cpu/unknown-hardware/).
"dmesg | grep -i perf" may provide additional information.
```
After:
```
$ perf stat -C2048 -e cycles -- true
WARNING: A requested CPU in '2048' is not supported by PMU 'cpu' (CPUs 0-7) for event 'cycles'
Error:
No supported events found.
```
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Ensure "--null" does a minimal run.
Reported-by: Ingo Molnar <mingo@kernel.org>
Closes: https://lore.kernel.org/linux-perf-users/aSwt7yzFjVJCEmVp@gmail.com/
Tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
If the perf_cpu_map is empty or is just the any CPU value, then early
return. Don't process the "any" CPU when creating the bitmap.
Tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Passing an empty map to perf_cpu_map__max triggered a SEGV. Explicitly
test for the empty map.
Reported-by: Ingo Molnar <mingo@kernel.org>
Closes: https://lore.kernel.org/linux-perf-users/aSwt7yzFjVJCEmVp@gmail.com/
Tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It is intended that a "--null" run doesn't open any events.
Fixes: 2cc7aa995ce9 ("perf stat: Refactor retry/skip/fatal error handling")
Tested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Previously, the behavior of the multichannel and max_channels mount
options was inconsistent and order-dependent. For example, specifying
"multichannel,max_channels=1" would result in 2 channels, while
"max_channels=1,multichannel" would result in 1 channel. Additionally,
conflicting combinations such as "nomultichannel,max_channels=3" or
"multichannel,max_channels=1" did not produce errors and could lead to
unexpected channel counts.
This commit introduces two new fields in smb3_fs_context to explicitly
track whether multichannel and max_channels were specified during
mount. The option parsing and validation logic is updated to ensure:
- The outcome is no longer dependent on the order of options.
- Conflicting combinations (e.g., "nomultichannel,max_channels=3" or
"multichannel,max_channels=1") are detected and result in an error.
- The number of channels created is consistent with the specified
options.
This improves the reliability and predictability of mount option
handling for SMB3 multichannel support.
Reviewed-by: Shyam Prasad N <sprasad@microsoft.com>
Signed-off-by: Rajasi Mandal <rajasimandal@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
https://github.com/Paragon-Software-Group/linux-ntfs3
Pull ntfs3 updates from Konstantin Komarov:
"New code:
- support timestamps prior to epoch
- do not overwrite uptodate pages
- disable readahead for compressed files
- setting of dummy blocksize to read boot_block when mounting
- the run_lock initialization when loading $Extend
- initialization of allocated memory before use
- support for the NTFS3_IOC_SHUTDOWN ioctl
- check for minimum alignment when performing direct I/O reads
- check for shutdown in fsync
Fixes:
- mount failure for sparse runs in run_unpack()
- use-after-free of sbi->options in cmp_fnames
- KMSAN uninit bug after failed mi_read in mi_format_new
- uninit error after buffer allocation by __getname()
- KMSAN uninit-value in ni_create_attr_list
- double free of sbi->options->nls and ownership of fc->fs_private
- incorrect vcn adjustments in attr_collapse_range()
- mode update when ACL can be reduced to mode
- memory leaks in add sub record
Changes:
- refactor code, updated terminology, spelling
- do not kmap pages in (de)compression code
- after ntfs_look_free_mft(), code that fails must put mft_inode
- default mount options for "acl" and "prealloc"
Replaced:
- use unsafe_memcpy() to avoid memcpy size warning
- ntfs_bio_pages with page cache for compressed files"
* tag 'ntfs3_for_6.19' of https://github.com/Paragon-Software-Group/linux-ntfs3: (26 commits)
fs/ntfs3: check for shutdown in fsync
fs/ntfs3: change the default mount options for "acl" and "prealloc"
fs/ntfs3: Prevent memory leaks in add sub record
fs/ntfs3: out1 also needs to put mi
fs/ntfs3: Fix spelling mistake "recommened" -> "recommended"
fs/ntfs3: update mode in xattr when ACL can be reduced to mode
fs/ntfs3: check minimum alignment for direct I/O
fs/ntfs3: implement NTFS3_IOC_SHUTDOWN ioctl
fs/ntfs3: correct attr_collapse_range when file is too fragmented
ntfs3: fix double free of sbi->options->nls and clarify ownership of fc->fs_private
fs/ntfs3: Initialize allocated memory before use
fs/ntfs3: remove ntfs_bio_pages and use page cache for compressed I/O
ntfs3: avoid memcpy size warning
fs/ntfs3: fix KMSAN uninit-value in ni_create_attr_list
ntfs3: init run lock for extend inode
ntfs: set dummy blocksize to read boot_block when mounting
fs/ntfs3: disable readahead for compressed files
ntfs3: Fix uninit buffer allocated by __getname()
ntfs3: fix uninit memory after failed mi_read in mi_format_new
ntfs3: fix use-after-free of sbi->options in cmp_fnames
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"New features and improvements for the ext4 file system:
- Optimize online defragmentation by using folios instead of
individual buffer heads
- Improve error codes stored in the superblock when the journal
aborts
- Minor cleanups and clarifications in ext4_map_blocks()
- Add documentation of the casefold and encrypt flags
- Add support for file systems with a blocksize greater than the
pagesize
- Improve performance by enabling the caching the fact that an inode
does not have a Posix ACL
Various Bug Fixes:
- Fix false positive complaints from smatch
- Fix error code which is returned by ext4fs_dirhash() when Siphash
is used without the encryption key
- Fix races when writing to inline data files which could trigger a
BUG
- Fix potential NULL dereference when there is an corrupt file system
with an extended attribute value stored in a inode
- Fix false positive lockdep report when syzbot uses ext4 and ocfs2
together
- Fix false positive reported by DEPT by adjusting lock annotation
- Avoid a potential BUG_ON in jbd2 when a file system is massively
corrupted
- Fix a WARN_ON when superblock is corrupted with a non-NULL
terminated mount options field
- Add check if the userspace passes in a non-NULL terminated mount
options field to EXT4_IOC_SET_TUNE_SB_PARAM
- Fix a potential journal checksum failure whena file system is
copied while it is mounted read-only
- Fix a potential potential orphan file tracking error which only
showed on 32-bit systems
- Fix assertion checks in mballoc (which have to be explicitly enbled
by manually enabling AGGRESSIVE_CHECKS and recompiling)
- Avoid complaining about overly large orphan files created by mke2fs
with with file systems with a 64k block size"
* tag 'ext4_for_linus-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (58 commits)
ext4: mark inodes without acls in __ext4_iget()
ext4: enable block size larger than page size
ext4: add checks for large folio incompatibilities when BS > PS
ext4: support verifying data from large folios with fs-verity
ext4: make data=journal support large block size
ext4: support large block size in __ext4_block_zero_page_range()
ext4: support large block size in mpage_prepare_extent_to_map()
ext4: support large block size in mpage_map_and_submit_buffers()
ext4: support large block size in ext4_block_write_begin()
ext4: support large block size in ext4_mpage_readpages()
ext4: rename 'page' references to 'folio' in multi-block allocator
ext4: prepare buddy cache inode for BS > PS with large folios
ext4: support large block size in ext4_mb_init_cache()
ext4: support large block size in ext4_mb_get_buddy_page_lock()
ext4: support large block size in ext4_mb_load_buddy_gfp()
ext4: add EXT4_LBLK_TO_PG and EXT4_PG_TO_LBLK for block/page conversion
ext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion
ext4: support large block size in ext4_readdir()
ext4: support large block size in ext4_calculate_overhead()
ext4: introduce s_min_folio_order for future BS > PS support
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Major withdraw / error handling overhaul based on dlm's new
DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like
node failures. Make withdraws asynchronous
- Fix a bug in commit e4a8b5481c59a that caused 'df' to remain out of
sync. ('df' is still allowed to go slightly out of sync for short
periods of time)
- Prevent recusive memory reclaim in gfs2_unstuff_dinode()
- Clean up SDF_JOURNAL_LIVE flag handling
- Fix remote evict for read-only filesystems
- Fix a misuse of bio_chain()
- Various other minor cleanups
* tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (35 commits)
gfs2: Fix use of bio_chain
gfs2: Clean up SDF_JOURNAL_LIVE flag handling
gfs2: No longer thaw filesystems during a withdraw
gfs2: Withdraw immediately in gfs2_trans_add_meta
gfs2: New gfs2_withdraw_helper
gfs2: Clean up properly during a withdraw
gfs2: Rename gfs2_{gl_dq_holders => withdraw_glocks}
Revert "gfs2: fix infinite loop when checking ail item count before go_inval"
Revert "gfs2: Allow some glocks to be used during withdraw"
Revert "gfs2: Check for log write errors before telling dlm to unlock"
Revert "gfs2: fix a deadlock on withdraw-during-mount"
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (6/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (5/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (4/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (3/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (2/6)
Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (1/6)
Revert "gfs2: don't stop reads while withdraw in progress"
gfs2: Rename LM_FLAG_{NOEXP -> RECOVER}
gfs2: Kill gfs2_io_error_bh_wd
...
|
|
Pull smb client and server updates from Steve French:
- server fixes:
- IPC use after free locking fix
- fix locking bug in delete paths
- fix use after free in disconnect
- fix underflow in locking check
- error mapping improvement
- socket listening improvement
- return code mapping fixes
- crypto improvements (use default libraries)
- cleanup patches:
- netfs
- client checkpatch cleanup
- server cleanup
- move server/client duplicate code to common code
- fix some defines to better match protocol specification
- smbdirect (RDMA) fixes
- client debugging improvements for leases
* tag 'v6.19-rc-smb-fixes' of git://git.samba.org/ksmbd: (44 commits)
cifs: Use netfs_alloc/free_folioq_buffer()
smb: client: show smb lease key in open_dirs output
smb: client: show smb lease key in open_files output
ksmbd: ipc: fix use-after-free in ipc_msg_send_request
smb: client: relax WARN_ON_ONCE(SMBDIRECT_SOCKET_*) checks in recv_done() and smbd_conn_upcall()
smb: server: relax WARN_ON_ONCE(SMBDIRECT_SOCKET_*) checks in recv_done() and smb_direct_cm_handler()
smb: smbdirect: introduce SMBDIRECT_CHECK_STATUS_{WARN,DISCONNECT}()
smb: smbdirect: introduce SMBDIRECT_DEBUG_ERR_PTR() helper
ksmbd: vfs: fix race on m_flags in vfs_cache
ksmbd: Replace strcpy + strcat to improve convert_to_nt_pathname
smb: move FILE_SYSTEM_ATTRIBUTE_INFO to common/fscc.h
ksmbd: implement error handling for STATUS_INFO_LENGTH_MISMATCH in smb server
ksmbd: fix use-after-free in ksmbd_tree_connect_put under concurrency
ksmbd: server: avoid busy polling in accept loop
smb: move create_durable_reconn to common/smb2pdu.h
smb: fix some warnings reported by scripts/checkpatch.pl
smb: do some cleanups
smb: move FILE_SYSTEM_SIZE_INFO to common/fscc.h
smb: move some duplicate struct definitions to common/fscc.h
smb: move list of FileSystemAttributes to common/fscc.h
...
|
|
Pull xfs updates from Carlos Maiolino:
"There are no major changes in xfs. This contains mostly some code
cleanups, a few bug fixes and documentation update. Highlights are:
- Quota locking cleanup
- Getting rid of old xlog_in_core_2_t type"
* tag 'xfs-merge-6.19' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (33 commits)
docs: remove obsolete links in the xfs online repair documentation
xfs: move some code out of xfs_iget_recycle
xfs: use zi more in xfs_zone_gc_mount
xfs: remove the unused bv field in struct xfs_gc_bio
xfs: remove xarray mark for reclaimable zones
xfs: remove the xlog_in_core_t typedef
xfs: remove l_iclog_heads
xfs: remove the xlog_rec_header_t typedef
xfs: remove xlog_in_core_2_t
xfs: remove a very outdated comment from xlog_alloc_log
xfs: cleanup xlog_alloc_log a bit
xfs: don't use xlog_in_core_2_t in struct xlog_in_core
xfs: add a on-disk log header cycle array accessor
xfs: add a XLOG_CYCLE_DATA_SIZE constant
xfs: reduce ilock roundtrips in xfs_qm_vop_dqalloc
xfs: move xfs_dquot_tree calls into xfs_qm_dqget_cache_{lookup,insert}
xfs: move quota locking into xrep_quota_item
xfs: move quota locking into xqcheck_commit_dquot
xfs: move q_qlock locking into xqcheck_compare_dquot
xfs: move q_qlock locking into xchk_quota_item
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
- Fix a WARNING caused by a recent FSDAX misdetection regression
- Fix the filesystem stacking limit for file-backed mounts
- Print more informative diagnostics on decompression errors
- Switch the on-disk definition `erofs_fs.h` to the MIT license
- Minor cleanups
* tag 'erofs-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
erofs: switch on-disk header `erofs_fs.h` to MIT license
erofs: get rid of raw bi_end_io() usage
erofs: enable error reporting for z_erofs_fixup_insize()
erofs: enable error reporting for z_erofs_stream_switch_bufs()
erofs: improve Zstd, LZMA and DEFLATE error strings
erofs: improve decompression error reporting
erofs: tidy up z_erofs_lz4_handle_overlap()
erofs: limit the level of fs stacking for file-backed mounts
erofs: correct FSDAX detection
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs
Pull hfs/hfsplus updates from Viacheslav Dubeyko:
"Several fixes for syzbot reported issues, HFS/HFS+ fixes of xfstests
failures, Kunit-based unit-tests introduction, and code cleanup:
- Dan Carpenter fixed a potential use-after-free issue in
hfs_correct_next_unused_CNID() method. Tetsuo Handa has made nice
fix of syzbot reported issue related to incorrect inode->i_mode
management if volume has been corrupted somehow. Yang Chenzhi has
made really good fix of potential race condition in
__hfs_bnode_create() method for HFS+ file system.
- Several fixes to xfstests failures. Particularly, generic/070,
generic/073, and generic/101 test-cases finish successfully for the
case of HFS+ file system right now.
- HFS and HFS+ drivers share multiple structures of on-disk layout
declarations. Some structures are used without any change. However,
we had two independent declarations of the same structures in HFS
and HFS+ drivers.
The on-disk layout declarations have been moved into
include/linux/hfs_common.h with the goal to exclude the
declarations duplication and to keep the HFS/HFS+ on-disk layout
declarations in one place.
Also, this patch prepares the basis for creating a hfslib that can
aggregate common functionality without necessity to duplicate the
same code in HFS and HFS+ drivers.
- HFS/HFS+ really need unit-tests because of multiple xfstests
failures. The first two patches introduce Kunit-based unit-tests
for the case string operations in HFS/HFS+ file system drivers"
* tag 'hfs-v6.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs:
hfs/hfsplus: move on-disk layout declarations into hfs_common.h
hfsplus: fix volume corruption issue for generic/101
hfsplus: introduce KUnit tests for HFS+ string operations
hfs: introduce KUnit tests for HFS string operations
hfsplus: fix volume corruption issue for generic/073
hfsplus: Verify inode mode when loading from disk
hfsplus: fix volume corruption issue for generic/070
hfs/hfsplus: prevent getting negative values of offset/length
hfsplus: fix missing hfs_bnode_get() in __hfs_bnode_create
hfs: fix potential use after free in hfs_correct_next_unused_CNID()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"Features:
- shutdown ioctl support (needs CONFIG_BTRFS_EXPERIMENTAL for now):
- set filesystem state as being shut down (also named going down
in other filesystems), where all active operations return EIO
and this cannot be changed until unmount
- pending operations are attempted to be finished but error
messages may still show up depending on where exactly the
shutdown happened
- scrub (and device replace) vs suspend/hibernate:
- a running scrub will prevent suspend, which can be annoying as
suspend is an immediate request and scrub is not critical
- filesystem freezing before suspend was not sufficient as the
problem was in process freezing
- behaviour change: on suspend scrub and device replace are
cancelled, where scrub can record the last state and continue
from there; the device replace has to be restarted from the
beginning
- zone stats exported in sysfs, from the perspective of the
filesystem this includes active, reclaimable, relocation etc zones
Performance:
- improvements when processing space reservation tickets by
optimizing locking and shrinking critical sections, cumulative
improvements in lockstat numbers show +15%
Notable fixes:
- use vmalloc fallback when allocating bios as high order allocations
can happen with wide checksums (like sha256)
- scrub will always track the last position of progress so it's not
starting from zero after an error
Core:
- under experimental config, checksum calculations are offloaded to
process context, simplifies locking and allows to remove
compression write worker kthread(s):
- speed improvement in direct IO throughput with buffered IO
fallback is +15% when not offloaded but this is more related to
internal crypto subsystem improvements
- this will be probably default in the future removing the sysfs
tunable
- (experimental) block size > page size updates:
- support more operations when not using large folios (encoded
read/write and send)
- raid56
- more preparations for fscrypt support
Other:
- more conversions to auto-cleaned variables
- parameter cleanups and removals
- extended warning fixes
- improved printing of structured values like keys
- lots of other cleanups and refactoring"
* tag 'for-6.19-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (147 commits)
btrfs: remove unnecessary inode key in btrfs_log_all_parents()
btrfs: remove redundant zero/NULL initializations in btrfs_alloc_root()
btrfs: remaining BTRFS_PATH_AUTO_FREE conversions
btrfs: send: do not allocate memory for xattr data when checking it exists
btrfs: send: add unlikely to all unexpected overflow checks
btrfs: reduce arguments to btrfs_del_inode_ref_in_log()
btrfs: remove root argument from btrfs_del_dir_entries_in_log()
btrfs: use test_and_set_bit() in btrfs_delayed_delete_inode_ref()
btrfs: don't search back for dir inode item in INO_LOOKUP_USER
btrfs: don't rewrite ret from inode_permission
btrfs: add orig_logical to btrfs_bio for encryption
btrfs: disable verity on encrypted inodes
btrfs: disable various operations on encrypted inodes
btrfs: remove redundant level reset in btrfs_del_items()
btrfs: simplify leaf traversal after path release in btrfs_next_old_leaf()
btrfs: optimize balance_level() path reference handling
btrfs: factor out root promotion logic into promote_child_to_root()
btrfs: raid56: remove the "_step" infix
btrfs: raid56: enable bs > ps support
btrfs: raid56: prepare finish_parity_scrub() to support bs > ps cases
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block updates from Jens Axboe:
- Fix head insertion for mq-deadline, a regression from when priority
support was added
- Series simplifying and improving the ublk user copy code
- Various ublk related cleanups
- Fixup REQ_NOWAIT handling in loop/zloop, clearing NOWAIT when the
request is punted to a thread for handling
- Merge and then later revert loop dio nowait support, as it ended up
causing excessive stack usage for when the inline issue code needs to
dip back into the full file system code
- Improve auto integrity code, making it less deadlock prone
- Speedup polled IO handling, but manually managing the hctx lookups
- Fixes for blk-throttle for SSD devices
- Small series with fixes for the S390 dasd driver
- Add support for caching zones, avoiding unnecessary report zone
queries
- MD pull requests via Yu:
- fix null-ptr-dereference regression for dm-raid0
- fix IO hang for raid5 when array is broken with IO inflight
- remove legacy 1s delay to speed up system shutdown
- change maintainer's email address
- data can be lost if array is created with different lbs devices,
fix this problem and record lbs of the array in metadata
- fix rcu protection for md_thread
- fix mddev kobject lifetime regression
- enable atomic writes for md-linear
- some cleanups
- bcache updates via Coly
- remove useless discard and cache device code
- improve usage of per-cpu workqueues
- Reorganize the IO scheduler switching code, fixing some lockdep
reports as well
- Improve the block layer P2P DMA support
- Add support to the block tracing code for zoned devices
- Segment calculation improves, and memory alignment flexibility
improvements
- Set of prep and cleanups patches for ublk batching support. The
actual batching hasn't been added yet, but helps shrink down the
workload of getting that patchset ready for 6.20
- Fix for how the ps3 block driver handles segments offsets
- Improve how block plugging handles batch tag allocations
- nbd fixes for use-after-free of the configuration on device clear/put
- Set of improvements and fixes for zloop
- Add Damien as maintainer of the block zoned device code handling
- Various other fixes and cleanups
* tag 'for-6.19/block-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits)
block/rnbd: correct all kernel-doc complaints
blk-mq: use queue_hctx in blk_mq_map_queue_type
md: remove legacy 1s delay in md_notify_reboot
md/raid5: fix IO hang when array is broken with IO inflight
md: warn about updating super block failure
md/raid0: fix NULL pointer dereference in create_strip_zones() for dm-raid
sbitmap: fix all kernel-doc warnings
ublk: add helper of __ublk_fetch()
ublk: pass const pointer to ublk_queue_is_zoned()
ublk: refactor auto buffer register in ublk_dispatch_req()
ublk: add `union ublk_io_buf` with improved naming
ublk: add parameter `struct io_uring_cmd *` to ublk_prep_auto_buf_reg()
kfifo: add kfifo_alloc_node() helper for NUMA awareness
blk-mq: fix potential uaf for 'queue_hw_ctx'
blk-mq: use array manage hctx map instead of xarray
ublk: prevent invalid access with DEBUG
s390/dasd: Use scnprintf() instead of sprintf()
s390/dasd: Move device name formatting into separate function
s390/dasd: Remove unnecessary debugfs_create() return checks
s390/dasd: Fix gendisk parent after copy pair swap
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring updates from Jens Axboe:
- Unify how task_work cancelations are detected, placing it in the
task_work running state rather than needing to check the task state
- Series cleaning up and moving the cancelation code to where it
belongs, in cancel.c
- Cleanup of waitid and futex argument handling
- Add support for mixed sized SQEs. 6.18 added support for mixed sized
CQEs, improving flexibility and efficiency of workloads that need big
CQEs. This adds similar support for SQEs, where the occasional need
for a 128b SQE doesn't necessitate having all SQEs be 128b in size
- Introduce zcrx and SQ/CQ layout queries. The former returns what zcrx
features are available. And both return the ring size information to
help with allocation size calculation for user provided rings like
IORING_SETUP_NO_MMAP and IORING_MEM_REGION_TYPE_USER
- Zcrx updates for 6.19. It includes a bunch of small patches,
IORING_REGISTER_ZCRX_CTRL and RQ flushing and David's work on sharing
zcrx b/w multiple io_uring instances
- Series cleaning up ring initializations, notable deduplicating ring
size and offset calculations. It also moves most of the checking
before doing any allocations, making the code simpler
- Add support for getsockname and getpeername, which is mostly a
trivial hookup after a bit of refactoring on the networking side
- Various fixes and cleanups
* tag 'for-6.19/io_uring-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (68 commits)
io_uring: Introduce getsockname io_uring cmd
socket: Split out a getsockname helper for io_uring
socket: Unify getsockname and getpeername implementation
io_uring/query: drop unused io_handle_query_entry() ctx arg
io_uring/kbuf: remove obsolete buf_nr_pages and update comments
io_uring/register: use correct location for io_rings_layout
io_uring/zcrx: share an ifq between rings
io_uring/zcrx: add io_fill_zcrx_offsets()
io_uring/zcrx: export zcrx via a file
io_uring/zcrx: move io_zcrx_scrub() and dependencies up
io_uring/zcrx: count zcrx users
io_uring/zcrx: add sync refill queue flushing
io_uring/zcrx: introduce IORING_REGISTER_ZCRX_CTRL
io_uring/zcrx: elide passing msg flags
io_uring/zcrx: use folio_nr_pages() instead of shift operation
io_uring/zcrx: convert to use netmem_desc
io_uring/query: introduce rings info query
io_uring/query: introduce zcrx query
io_uring: move cq/sq user offset init around
io_uring: pre-calculate scq layout
...
|
|
__blkdev_issue_discard() always returns 0, making the error assignment
in __submit_discard_cmd() dead code.
Initialize err to 0 and remove the error assignment from the
__blkdev_issue_discard() call to err. Move fault injection code into
already present if branch where err is set to -EIO.
This preserves the fault injection behavior while removing dead error
handling.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch optimizes the tracepoint by replacing these hardcoded strings
with a new enumeration f2fs_cp_phase.
1.Defines enum f2fs_cp_phase with values for each checkpoint phase.
2.Updates trace_f2fs_write_checkpoint to accept a u16 phase argument
instead of a string pointer.
3.Uses __print_symbolic in TP_printk to convert the enum values
back to their corresponding strings for human-readable trace output.
This change reduces the storage overhead for each trace event
by replacing a variable-length string with a 2-byte integer,
while maintaining the same readable output in ftrace.
Signed-off-by: YH Lin <yhli@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
w/ LFS mode, in get_left_section_blocks(), we should not account the
blocks which were used before and now are invalided, otherwise those
blocks will be counted as freed one in has_curseg_enough_space(), result
in missing to trigger GC in time.
Cc: stable@kernel.org
Fixes: 249ad438e1d9 ("f2fs: add a method for calculating the remaining blocks in the current segment in LFS mode.")
Fixes: bf34c93d2645 ("f2fs: check curseg space before foreground GC")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
cat /sys/kernel/debug/f2fs/status
Main area: 17 segs, 17 secs 17 zones
TYPE blkoff segno secno zoneno dirty_seg full_seg valid_blk
- COLD data: 0 4 4 4 0 0 0
- WARM data: 0 7 7 7 0 0 0
- HOT data: 1 5 5 5 2 0 512
- Dir dnode: 3 0 0 0 1 0 2
- File dnode: 0 1 1 1 0 0 0
- Indir nodes: 0 2 2 2 0 0 0
- Pinned file: 0 -1 -1 -1
- ATGC data: 0 -1 -1 -1
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Sphinx's LaTeX builder fails when converting the nested ASCII tables in
f2fs.rst, producing the following error:
"Markup is unsupported in LaTeX: longtable does not support nesting a table."
Wrap the affected ASCII tables in literal code blocks to force Sphinx to
render them verbatim. This prevents nested longtables and fixes the PDF
build failure on Sphinx 8.2.x.
Acked-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Akira Yokosawa <akiyks@gmail.com>
Signed-off-by: Masaharu Noguchi <nogunix@gmail.com>
Acked-by: Jonathan Corbet <corbet@lwn.net>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
opt field in structure f2fs_mount_info and opt_mask field in structure
f2fs_fs_context is 32-bits variable, now we're running out of available
bits in them, let's expand them to 64-bits for better scalability.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch changes default schedule timeout value from 20ms to 1ms,
in order to give caller more chances to check whether IO or non-IO
congestion condition has already been mitigable.
In addition, default interval of periodical discard submission is
kept to 20ms.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs retry logic, we will call f2fs_io_schedule_timeout() to sleep as
uninterruptible state (waiting for IO) for a while, however, in several
paths below, we are not blocked by IO:
- f2fs_write_single_data_page() return -EAGAIN due to racing on cp_rwsem.
- f2fs_flush_device_cache() failed to submit preflush command.
- __issue_discard_cmd_range() sleeps periodically in between two in batch
discard submissions.
So, in order to reveal state of task more accurate, let's introduce
f2fs_schedule_timeout() and call it in above paths in where we are waiting
for non-IO reasons.
Then we can get real reason of uninterruptible sleep for a thread in
tracepoint, perfetto, etc.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
memalloc_retry_wait() is recommended in memory allocation retry logic,
use it as much as possible.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This patch adds a sysfs entry showing the max zones that F2FS can write
concurrently.
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The usage of unusable_blocks_per_sec is already wrapped by
CONFIG_BLK_DEV_ZONED, except for its declaration and the definitions of
CAP_BLKS_PER_SEC and CAP_SEGS_PER_SEC. This patch ensures that all code
related to unusable_blocks_per_sec is properly wrapped under the
CONFIG_BLK_DEV_ZONED option.
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In f2fs_recover_fsync_data(),use LIST_HEAD() to declare and
initialize the list_head in one step instead of using
INIT_LIST_HEAD() separately.
No functional change.
Signed-off-by: Baolin Liu <liubaolin@kylinos.cn>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The recent increase in the number of Segment Summary Area (SSA) entries
from 512 to 2048 was an unintentional change in logic of 16kb block
support. This commit corrects the issue.
To better utilize the space available from the erroneous 2048-entry
calculation, we are implementing a solution to share the currently
unused SSA space with neighboring segments. This enhances overall
SSA utilization without impacting the established 8MB segment size.
Fixes: d7e9a9037de2 ("f2fs: Support Block Size == Page Size")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
mkfs.f2fs -f /dev/vdd
mount /dev/vdd /mnt/f2fs
touch /mnt/f2fs/foo
sync # avoid CP_UMOUNT_FLAG in last f2fs_checkpoint.ckpt_flags
touch /mnt/f2fs/bar
f2fs_io fsync /mnt/f2fs/bar
f2fs_io shutdown 2 /mnt/f2fs
umount /mnt/f2fs
blockdev --setro /dev/vdd
mount /dev/vdd /mnt/f2fs
mount: /mnt/f2fs: WARNING: source write-protected, mounted read-only.
For the case if we create and fsync a new inode before sudden power-cut,
without norecovery or disable_roll_forward mount option, the following
mount will succeed w/o recovering last fsynced inode.
The problem here is that we only check inode_list list after
find_fsync_dnodes() in f2fs_recover_fsync_data() to find out whether
there is recoverable data in the iamge, but there is a missed case, if
last fsynced inode is not existing in last checkpoint, then, we will
fail to get its inode due to nat of inode node is not existing in last
checkpoint, so the inode won't be linked in inode_list.
Let's detect such case in dyrun mode to fix this issue.
After this change, mount will fail as expected below:
mount: /mnt/f2fs: cannot mount /dev/vdd read-only.
dmesg(1) may have more information after failed mount system call.
demsg:
F2FS-fs (vdd): Need to recover fsync data, but write access unavailable, please try mount w/ disable_roll_forward or norecovery
Cc: stable@kernel.org
Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
With below scripts, it will trigger panic in f2fs:
mkfs.f2fs -f /dev/vdd
mount /dev/vdd /mnt/f2fs
touch /mnt/f2fs/foo
sync
echo 111 >> /mnt/f2fs/foo
f2fs_io fsync /mnt/f2fs/foo
f2fs_io shutdown 2 /mnt/f2fs
umount /mnt/f2fs
mount -o ro,norecovery /dev/vdd /mnt/f2fs
or
mount -o ro,disable_roll_forward /dev/vdd /mnt/f2fs
F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 0
F2FS-fs (vdd): Mounted with checkpoint version = 7f5c361f
F2FS-fs (vdd): Stopped filesystem due to reason: 0
F2FS-fs (vdd): f2fs_recover_fsync_data: recovery fsync data, check_only: 1
Filesystem f2fs get_tree() didn't set fc->root, returned 1
------------[ cut here ]------------
kernel BUG at fs/super.c:1761!
Oops: invalid opcode: 0000 [#1] SMP PTI
CPU: 3 UID: 0 PID: 722 Comm: mount Not tainted 6.18.0-rc2+ #721 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:vfs_get_tree.cold+0x18/0x1a
Call Trace:
<TASK>
fc_mount+0x13/0xa0
path_mount+0x34e/0xc50
__x64_sys_mount+0x121/0x150
do_syscall_64+0x84/0x800
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7fa6cc126cfe
The root cause is we missed to handle error number returned from
f2fs_recover_fsync_data() when mounting image w/ ro,norecovery or
ro,disable_roll_forward mount option, result in returning a positive
error number to vfs_get_tree(), fix it.
Cc: stable@kernel.org
Fixes: 6781eabba1bd ("f2fs: give -EINVAL for norecovery and rw mount")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
This adds a tracepoint in the fadvise call path.
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The age extent cache uses last_blocks (derived from
allocated_data_blocks) to determine data age. However, there's a
conflict between the deletion
marker (last_blocks=0) and legitimate last_blocks=0 cases when
allocated_data_blocks overflows to 0 after reaching ULLONG_MAX.
In this case, valid extents are incorrectly skipped due to the
"if (!tei->last_blocks)" check in __update_extent_tree_range().
This patch fixes the issue by:
1. Reserving ULLONG_MAX as an invalid/deletion marker
2. Limiting allocated_data_blocks to range [0, ULLONG_MAX-1]
3. Using F2FS_EXTENT_AGE_INVALID for deletion scenarios
4. Adjusting overflow age calculation from ULLONG_MAX to (ULLONG_MAX-1)
Reproducer (using a patched kernel with allocated_data_blocks
initialized to ULLONG_MAX - 3 for quick testing):
Step 1: Mount and check initial state
# dd if=/dev/zero of=/tmp/test.img bs=1M count=100
# mkfs.f2fs -f /tmp/test.img
# mkdir -p /mnt/f2fs_test
# mount -t f2fs -o loop,age_extent_cache /tmp/test.img /mnt/f2fs_test
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551612 # ULLONG_MAX - 3
Inner Struct Count: tree: 1(0), node: 0
Step 2: Create files and write data to trigger overflow
# touch /mnt/f2fs_test/{1,2,3,4}.txt; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551613 # ULLONG_MAX - 2
Inner Struct Count: tree: 5(0), node: 1
# dd if=/dev/urandom of=/mnt/f2fs_test/1.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551614 # ULLONG_MAX - 1
Inner Struct Count: tree: 5(0), node: 2
# dd if=/dev/urandom of=/mnt/f2fs_test/2.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 18446744073709551615 # ULLONG_MAX
Inner Struct Count: tree: 5(0), node: 3
# dd if=/dev/urandom of=/mnt/f2fs_test/3.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 0 # Counter overflowed!
Inner Struct Count: tree: 5(0), node: 4
Step 3: Trigger the bug - next write should create node but gets skipped
# dd if=/dev/urandom of=/mnt/f2fs_test/4.txt bs=4K count=1; sync
# cat /sys/kernel/debug/f2fs/status | grep -A 4 "Block Age"
Allocated Data Blocks: 1
Inner Struct Count: tree: 5(0), node: 4
Expected: node: 5 (new extent node for 4.txt)
Actual: node: 4 (extent insertion was incorrectly skipped due to
last_blocks = allocated_data_blocks = 0 in __get_new_block_age)
After this fix, the extent node is correctly inserted and node count
becomes 5 as expected.
Fixes: 71644dff4811 ("f2fs: add block_age-based extent cache")
Cc: stable@kernel.org
Signed-off-by: Xiaole He <hexiaole1994@126.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Add check for inode->i_nlink == 1 for directories during unlink,
as their value is decremented twice, which can trigger a warning in
drop_nlink. In such case mark the filesystem as corrupted and return
from the function call with the relevant failure return value.
Additionally add the check for i_nlink == 1 in
sanity_check_inode in order to detect on-disk corruption early.
Reported-by: syzbot+c07d47c7bc68f47b9083@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c07d47c7bc68f47b9083
Tested-by: syzbot+c07d47c7bc68f47b9083@syzkaller.appspotmail.com
Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Rename "fail" label to "out" as it's used as a default
exit path out of f2fs_unlink as well as error path.
Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When F2FS uses multiple block devices, each device may have a
different discard granularity. The minimum trim granularity must be
at least the maximum discard granularity of all devices, excluding
zoned devices. Use max_t instead of the max() macro to compute the
maximum value.
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
The one_time_gc field in struct victim_sel_policy is conditionally
initialized but unconditionally read, leading to undefined behavior
that triggers UBSAN warnings.
In f2fs_get_victim() at fs/f2fs/gc.c:774, the victim_sel_policy
structure is declared without initialization:
struct victim_sel_policy p;
The field p.one_time_gc is only assigned when the 'one_time' parameter
is true (line 789):
if (one_time) {
p.one_time_gc = one_time;
...
}
However, this field is unconditionally read in subsequent get_gc_cost()
at line 395:
if (p->one_time_gc && (valid_thresh_ratio < 100) && ...)
When one_time is false, p.one_time_gc contains uninitialized stack
memory. Hence p.one_time_gc is an invalid bool value.
UBSAN detects this invalid bool value:
UBSAN: invalid-load in fs/f2fs/gc.c:395:7
load of value 77 is not a valid value for type '_Bool'
CPU: 3 UID: 0 PID: 1297 Comm: f2fs_gc-252:16 Not tainted 6.18.0-rc3
#5 PREEMPT(voluntary)
Hardware name: OpenStack Foundation OpenStack Nova,
BIOS 1.13.0-1ubuntu1.1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x70/0x90
dump_stack+0x14/0x20
__ubsan_handle_load_invalid_value+0xb3/0xf0
? dl_server_update+0x2e/0x40
? update_curr+0x147/0x170
f2fs_get_victim.cold+0x66/0x134 [f2fs]
? sched_balance_newidle+0x2ca/0x470
? finish_task_switch.isra.0+0x8d/0x2a0
f2fs_gc+0x2ba/0x8e0 [f2fs]
? _raw_spin_unlock_irqrestore+0x12/0x40
? __timer_delete_sync+0x80/0xe0
? timer_delete_sync+0x14/0x20
? schedule_timeout+0x82/0x100
gc_thread_func+0x38b/0x860 [f2fs]
? gc_thread_func+0x38b/0x860 [f2fs]
? __pfx_autoremove_wake_function+0x10/0x10
kthread+0x10b/0x220
? __pfx_gc_thread_func+0x10/0x10 [f2fs]
? _raw_spin_unlock_irq+0x12/0x40
? __pfx_kthread+0x10/0x10
ret_from_fork+0x11a/0x160
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
This issue is reliably reproducible with the following steps on a
100GB SSD /dev/vdb:
mkfs.f2fs -f /dev/vdb
mount /dev/vdb /mnt/f2fs_test
fio --name=gc --directory=/mnt/f2fs_test --rw=randwrite \
--bs=4k --size=8G --numjobs=12 --fsync=4 --runtime=10 \
--time_based
echo 1 > /sys/fs/f2fs/vdb/gc_urgent
The uninitialized value causes incorrect GC victim selection, leading
to unpredictable garbage collection behavior.
Fix by zero-initializing the entire victim_sel_policy structure to
ensure all fields have defined values.
Fixes: e791d00bd06c ("f2fs: add valid block ratio not to do excessive GC for one time GC")
Cc: stable@kernel.org
Signed-off-by: Xiaole He <hexiaole1994@126.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
It recommends to use i_size_{read,write}() to access and update i_size,
otherwise, we may get wrong tearing value due to high 32-bits value
and low 32-bits value of i_size field are not updated atomically in
32-bits archicture machine.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Xfstests generic/335, generic/336 sometimes crash with the following message:
F2FS-fs (dm-0): detect filesystem reference count leak during umount, type: 9, count: 1
------------[ cut here ]------------
kernel BUG at fs/f2fs/super.c:1939!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 1 UID: 0 PID: 609351 Comm: umount Tainted: G W 6.17.0-rc5-xfstests-g9dd1835ecda5 #1 PREEMPT(none)
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:f2fs_put_super+0x3b3/0x3c0
Call Trace:
<TASK>
generic_shutdown_super+0x7e/0x190
kill_block_super+0x1a/0x40
kill_f2fs_super+0x9d/0x190
deactivate_locked_super+0x30/0xb0
cleanup_mnt+0xba/0x150
task_work_run+0x5c/0xa0
exit_to_user_mode_loop+0xb7/0xc0
do_syscall_64+0x1ae/0x1c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
---[ end trace 0000000000000000 ]---
It appears that sometimes it is possible that f2fs_put_super() is called before
all node page reads are completed.
Adding a call to f2fs_wait_on_all_pages() for F2FS_RD_NODE fixes the problem.
Cc: stable@kernel.org
Fixes: 20872584b8c0b ("f2fs: fix to drop all dirty meta/node pages during umount()")
Signed-off-by: Jan Prusakowski <jprusakowski@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
If there are too many background IOs during f2fs_enable_checkpoint(),
sync_inodes_sb() may be blocked for long time due to it will loop to
write dirty datas which are generated by in parallel write()
continuously.
Let's change as below to resolve this issue:
- hold cp_enable_rwsem write lock to block any cache/dio write
- decrease DEF_ENABLE_INTERVAL from 16 to 5
In addition, dump more logs during f2fs_enable_checkpoint().
Testcase:
1. fill data into filesystem until 90% usage.
2. mount -o remount,checkpoint=disable:10% /data
3. fio --rw=randwrite --bs=4kb --size=1GB --numjobs=10 \
--iodepth=64 --ioengine=psync --time_based --runtime=600 \
--directory=/data/fio_dir/ &
4. mount -o remount,checkpoint=enable /data
Before:
F2FS-fs (dm-51): f2fs_enable_checkpoint() finishes, writeback:7232, sync:39793, cp:457
After:
F2FS-fs (dm-51): f2fs_enable_checkpoint end, writeback:5032, lock:0, sync_inode:5552, sync_fs:84
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In order to let userspace detect such error rather than suffering
silent failure.
Fixes: 4354994f097d ("f2fs: checkpoint disabling")
Cc: stable@kernel.org
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Change the type of the unlock parameter of f2fs_put_page to bool.
All callers should consistently pass true or false. No logical change.
Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
F2FS can mount filesystems with corrupted directory depth values that
get runtime-clamped to MAX_DIR_HASH_DEPTH. When RENAME_WHITEOUT
operations are performed on such directories, f2fs_rename performs
directory modifications (updating target entry and deleting source
entry) before attempting to add the whiteout entry via f2fs_add_link.
If f2fs_add_link fails due to the corrupted directory structure, the
function returns an error to VFS, but the partial directory
modifications have already been committed to disk. VFS assumes the
entire rename operation failed and does not update the dentry cache,
leaving stale mappings.
In the error path, VFS does not call d_move() to update the dentry
cache. This results in new_dentry still pointing to the old inode
(new_inode) which has already had its i_nlink decremented to zero.
The stale cache causes subsequent operations to incorrectly reference
the freed inode.
This causes subsequent operations to use cached dentry information that
no longer matches the on-disk state. When a second rename targets the
same entry, VFS attempts to decrement i_nlink on the stale inode, which
may already have i_nlink=0, triggering a WARNING in drop_nlink().
Example sequence:
1. First rename (RENAME_WHITEOUT): file2 → file1
- f2fs updates file1 entry on disk (points to inode 8)
- f2fs deletes file2 entry on disk
- f2fs_add_link(whiteout) fails (corrupted directory)
- Returns error to VFS
- VFS does not call d_move() due to error
- VFS cache still has: file1 → inode 7 (stale!)
- inode 7 has i_nlink=0 (already decremented)
2. Second rename: file3 → file1
- VFS uses stale cache: file1 → inode 7
- Tries to drop_nlink on inode 7 (i_nlink already 0)
- WARNING in drop_nlink()
Fix this by explicitly invalidating old_dentry and new_dentry when
f2fs_add_link fails during whiteout creation. This forces VFS to
refresh from disk on subsequent operations, ensuring cache consistency
even when the rename partially succeeds.
Reproducer:
1. Mount F2FS image with corrupted i_current_depth
2. renameat2(file2, file1, RENAME_WHITEOUT)
3. renameat2(file3, file1, 0)
4. System triggers WARNING in drop_nlink()
Fixes: 7e01e7ad746b ("f2fs: support RENAME_WHITEOUT")
Reported-by: syzbot+632cf32276a9a564188d@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=632cf32276a9a564188d
Suggested-by: Chao Yu <chao@kernel.org>
Link: https://lore.kernel.org/all/20251022233349.102728-1-kartikey406@gmail.com/ [v1]
Cc: stable@vger.kernel.org
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As Hong Yun reported in mailing list:
loop7: detected capacity change from 0 to 131072
------------[ cut here ]------------
kmem_cache of name 'f2fs_xattr_entry-7:7' already exists
WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 kmem_cache_sanity_check mm/slab_common.c:109 [inline]
WARNING: CPU: 0 PID: 24426 at mm/slab_common.c:110 __kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307
CPU: 0 UID: 0 PID: 24426 Comm: syz.7.1370 Not tainted 6.17.0-rc4 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:kmem_cache_sanity_check mm/slab_common.c:109 [inline]
RIP: 0010:__kmem_cache_create_args+0xa6/0x320 mm/slab_common.c:307
Call Trace:
__kmem_cache_create include/linux/slab.h:353 [inline]
f2fs_kmem_cache_create fs/f2fs/f2fs.h:2943 [inline]
f2fs_init_xattr_caches+0xa5/0xe0 fs/f2fs/xattr.c:843
f2fs_fill_super+0x1645/0x2620 fs/f2fs/super.c:4918
get_tree_bdev_flags+0x1fb/0x260 fs/super.c:1692
vfs_get_tree+0x43/0x140 fs/super.c:1815
do_new_mount+0x201/0x550 fs/namespace.c:3808
do_mount fs/namespace.c:4136 [inline]
__do_sys_mount fs/namespace.c:4347 [inline]
__se_sys_mount+0x298/0x2f0 fs/namespace.c:4324
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x8e/0x3a0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug can be reproduced w/ below scripts:
- mount /dev/vdb /mnt1
- mount /dev/vdc /mnt2
- umount /mnt1
- mounnt /dev/vdb /mnt1
The reason is if we created two slab caches, named f2fs_xattr_entry-7:3
and f2fs_xattr_entry-7:7, and they have the same slab size. Actually,
slab system will only create one slab cache core structure which has
slab name of "f2fs_xattr_entry-7:3", and two slab caches share the same
structure and cache address.
So, if we destroy f2fs_xattr_entry-7:3 cache w/ cache address, it will
decrease reference count of slab cache, rather than release slab cache
entirely, since there is one more user has referenced the cache.
Then, if we try to create slab cache w/ name "f2fs_xattr_entry-7:3" again,
slab system will find that there is existed cache which has the same name
and trigger the warning.
Let's changes to use global inline_xattr_slab instead of per-sb slab cache
for fixing.
Fixes: a999150f4fe3 ("f2fs: use kmem_cache pool during inline xattr lookups")
Cc: stable@kernel.org
Reported-by: Hong Yun <yhong@link.cuhk.edu.hk>
Tested-by: Hong Yun <yhong@link.cuhk.edu.hk>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Bai, Shuangpeng <sjb7183@psu.edu> reported a bug as below:
Oops: divide error: 0000 [#1] SMP KASAN PTI
CPU: 0 UID: 0 PID: 11441 Comm: syz.0.46 Not tainted 6.17.0 #1 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
RIP: 0010:f2fs_all_cluster_page_ready+0x106/0x550 fs/f2fs/compress.c:857
Call Trace:
<TASK>
f2fs_write_cache_pages fs/f2fs/data.c:3078 [inline]
__f2fs_write_data_pages fs/f2fs/data.c:3290 [inline]
f2fs_write_data_pages+0x1c19/0x3600 fs/f2fs/data.c:3317
do_writepages+0x38e/0x640 mm/page-writeback.c:2634
filemap_fdatawrite_wbc mm/filemap.c:386 [inline]
__filemap_fdatawrite_range mm/filemap.c:419 [inline]
file_write_and_wait_range+0x2ba/0x3e0 mm/filemap.c:794
f2fs_do_sync_file+0x6e6/0x1b00 fs/f2fs/file.c:294
generic_write_sync include/linux/fs.h:3043 [inline]
f2fs_file_write_iter+0x76e/0x2700 fs/f2fs/file.c:5259
new_sync_write fs/read_write.c:593 [inline]
vfs_write+0x7e9/0xe00 fs/read_write.c:686
ksys_write+0x19d/0x2d0 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xf7/0x470 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
The bug was triggered w/ below race condition:
fsync setattr ioctl
- f2fs_do_sync_file
- file_write_and_wait_range
- f2fs_write_cache_pages
: inode is non-compressed
: cc.cluster_size =
F2FS_I(inode)->i_cluster_size = 0
- tag_pages_for_writeback
- f2fs_setattr
- truncate_setsize
- f2fs_truncate
- f2fs_fileattr_set
- f2fs_setflags_common
- set_compress_context
: F2FS_I(inode)->i_cluster_size = 4
: set_inode_flag(inode, FI_COMPRESSED_FILE)
- f2fs_compressed_file
: return true
- f2fs_all_cluster_page_ready
: "pgidx % cc->cluster_size" trigger dividing 0 issue
Let's change as below to fix this issue:
- introduce a new atomic type variable .writeback in structure f2fs_inode_info
to track the number of threads which calling f2fs_write_cache_pages().
- use .i_sem lock to protect .writeback update.
- check .writeback before update compression context in f2fs_setflags_common()
to avoid race w/ ->writepages.
Fixes: 4c8ff7095bef ("f2fs: support data compression")
Cc: stable@kernel.org
Reported-by: Bai, Shuangpeng <sjb7183@psu.edu>
Tested-by: Bai, Shuangpeng <sjb7183@psu.edu>
Closes: https://lore.kernel.org/lkml/44D8F7B3-68AD-425F-9915-65D27591F93F@psu.edu
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As syzbot reported:
F2FS-fs (loop0): __update_extent_tree_range: extent len is zero, type: 0, extent [0, 0, 0], age [0, 0]
------------[ cut here ]------------
kernel BUG at fs/f2fs/extent_cache.c:678!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 5336 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
RIP: 0010:__update_extent_tree_range+0x13bc/0x1500 fs/f2fs/extent_cache.c:678
Call Trace:
<TASK>
f2fs_update_read_extent_cache_range+0x192/0x3e0 fs/f2fs/extent_cache.c:1085
f2fs_do_zero_range fs/f2fs/file.c:1657 [inline]
f2fs_zero_range+0x10c1/0x1580 fs/f2fs/file.c:1737
f2fs_fallocate+0x583/0x990 fs/f2fs/file.c:2030
vfs_fallocate+0x669/0x7e0 fs/open.c:342
ioctl_preallocate fs/ioctl.c:289 [inline]
file_ioctl+0x611/0x780 fs/ioctl.c:-1
do_vfs_ioctl+0xb33/0x1430 fs/ioctl.c:576
__do_sys_ioctl fs/ioctl.c:595 [inline]
__se_sys_ioctl+0x82/0x170 fs/ioctl.c:583
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xfa/0x3b0 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f07bc58eec9
In error path of f2fs_zero_range(), it may add a zero-sized extent
into extent cache, it should be avoided.
Fixes: 6e9619499f53 ("f2fs: support in batch fzero in dnode page")
Cc: stable@kernel.org
Reported-by: syzbot+24124df3170c3638b35f@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68e5d698.050a0220.256323.0032.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As Jiaming Zhang and syzbot reported, there is potential deadlock in
f2fs as below:
Chain exists of:
&sbi->cp_rwsem --> fs_reclaim --> sb_internal#2
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
rlock(sb_internal#2);
lock(fs_reclaim);
lock(sb_internal#2);
rlock(&sbi->cp_rwsem);
*** DEADLOCK ***
3 locks held by kswapd0/73:
#0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat mm/vmscan.c:7015 [inline]
#0: ffffffff8e247a40 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x951/0x2800 mm/vmscan.c:7389
#1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_trylock_shared fs/super.c:562 [inline]
#1: ffff8880118400e0 (&type->s_umount_key#50){.+.+}-{4:4}, at: super_cache_scan+0x91/0x4b0 fs/super.c:197
#2: ffff888011840610 (sb_internal#2){.+.+}-{0:0}, at: f2fs_evict_inode+0x8d9/0x1b60 fs/f2fs/inode.c:890
stack backtrace:
CPU: 0 UID: 0 PID: 73 Comm: kswapd0 Not tainted syzkaller #0 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
print_circular_bug+0x2ee/0x310 kernel/locking/lockdep.c:2043
check_noncircular+0x134/0x160 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain+0xb9b/0x2140 kernel/locking/lockdep.c:3908
__lock_acquire+0xab9/0xd20 kernel/locking/lockdep.c:5237
lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
down_read+0x46/0x2e0 kernel/locking/rwsem.c:1537
f2fs_down_read fs/f2fs/f2fs.h:2278 [inline]
f2fs_lock_op fs/f2fs/f2fs.h:2357 [inline]
f2fs_do_truncate_blocks+0x21c/0x10c0 fs/f2fs/file.c:791
f2fs_truncate_blocks+0x10a/0x300 fs/f2fs/file.c:867
f2fs_truncate+0x489/0x7c0 fs/f2fs/file.c:925
f2fs_evict_inode+0x9f2/0x1b60 fs/f2fs/inode.c:897
evict+0x504/0x9c0 fs/inode.c:810
f2fs_evict_inode+0x1dc/0x1b60 fs/f2fs/inode.c:853
evict+0x504/0x9c0 fs/inode.c:810
dispose_list fs/inode.c:852 [inline]
prune_icache_sb+0x21b/0x2c0 fs/inode.c:1000
super_cache_scan+0x39b/0x4b0 fs/super.c:224
do_shrink_slab+0x6ef/0x1110 mm/shrinker.c:437
shrink_slab_memcg mm/shrinker.c:550 [inline]
shrink_slab+0x7ef/0x10d0 mm/shrinker.c:628
shrink_one+0x28a/0x7c0 mm/vmscan.c:4955
shrink_many mm/vmscan.c:5016 [inline]
lru_gen_shrink_node mm/vmscan.c:5094 [inline]
shrink_node+0x315d/0x3780 mm/vmscan.c:6081
kswapd_shrink_node mm/vmscan.c:6941 [inline]
balance_pgdat mm/vmscan.c:7124 [inline]
kswapd+0x147c/0x2800 mm/vmscan.c:7389
kthread+0x70e/0x8a0 kernel/kthread.c:463
ret_from_fork+0x4bc/0x870 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>
The root cause is deadlock among four locks as below:
kswapd
- fs_reclaim --- Lock A
- shrink_one
- evict
- f2fs_evict_inode
- sb_start_intwrite --- Lock B
- iput
- evict
- f2fs_evict_inode
- sb_start_intwrite --- Lock B
- f2fs_truncate
- f2fs_truncate_blocks
- f2fs_do_truncate_blocks
- f2fs_lock_op --- Lock C
ioctl
- f2fs_ioc_commit_atomic_write
- f2fs_lock_op --- Lock C
- __f2fs_commit_atomic_write
- __replace_atomic_write_block
- f2fs_get_dnode_of_data
- __get_node_folio
- f2fs_check_nid_range
- f2fs_handle_error
- f2fs_record_errors
- f2fs_down_write --- Lock D
open
- do_open
- do_truncate
- security_inode_need_killpriv
- f2fs_getxattr
- lookup_all_xattrs
- f2fs_handle_error
- f2fs_record_errors
- f2fs_down_write --- Lock D
- f2fs_commit_super
- read_mapping_folio
- filemap_alloc_folio_noprof
- prepare_alloc_pages
- fs_reclaim_acquire --- Lock A
In order to avoid such deadlock, we need to avoid grabbing sb_lock in
f2fs_handle_error(), so, let's use asynchronous method instead:
- remove f2fs_handle_error() implementation
- rename f2fs_handle_error_async() to f2fs_handle_error()
- spread f2fs_handle_error()
Fixes: 95fa90c9e5a7 ("f2fs: support recording errors into superblock")
Cc: stable@kernel.org
Reported-by: syzbot+14b90e1156b9f6fc1266@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/68eae49b.050a0220.ac43.0001.GAE@google.com
Reported-by: Jiaming Zhang <r772577952@gmail.com>
Closes: https://lore.kernel.org/lkml/CANypQFa-Gy9sD-N35o3PC+FystOWkNuN8pv6S75HLT0ga-Tzgw@mail.gmail.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Use f2fs_filemap_get_folio() instead of __filemap_get_folio() in:
- f2fs_find_data_folio
- f2fs_write_begin
- f2fs_read_merkle_tree_page
So that, we can trigger fault injection in those places.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Let's use f2fs_filemap_get_folio() instead of f2fs_pagecache_get_page() in
ra_data_block() and move_data_block(), then remove f2fs_pagecache_get_page()
since it has no user.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
No logic changes.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
In add_bio_entry(), adding a page to newly allocated bio should never fail,
let's use bio_add_folio_nofail() instead of bio_add_page() & unnecessary
error handling for cleanup.
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Replace busylock at the Tx queuing layer with a lockless list.
Resulting in a 300% (4x) improvement on heavy TX workloads, sending
twice the number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue.
Normally we perform queue re-selection when flow comes out of idle,
but under extreme circumstances the flows may be constantly busy.
Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already
did for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is
sadly quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for
packets.
- Make MPTCP use Rx backlog processing. This lowers the lock
pressure, improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor
fit for modern container workloads, where limits are imposed using
cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of
RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
unnecessarily aggressive rcvbuf growth and overshot when the
connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC
5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API:
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
defined by the OPEN Alliance's "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in
zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads
IPsec and performs RSS.
- Add info to devlink params whether the current setting is the
default or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame
duplication for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers:
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback for
reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which
supports Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as
other drivers) to avoid wasting resources if feature is
unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor
format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211
debugfs interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support"
* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
net: page_pool: sanitise allocation order
net: page pool: xa init with destroy on pp init
net/mlx5e: Support XDP target xmit with dummy program
net/mlx5e: Update XDP features in switch channels
selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
wireguard: netlink: generate netlink code
wireguard: uapi: generate header with ynl-gen
wireguard: uapi: move flag enums
wireguard: uapi: move enum wg_cmd
wireguard: netlink: add YNL specification
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
selftests: drv-net: introduce Iperf3Runner for measurement use cases
selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
Documentation: net: dsa: mention simple HSR offload helpers
Documentation: net: dsa: mention availability of RedBox
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Convert selftests/bpf/test_tc_edt and test_tc_tunnel from .sh to
test_progs runner (Alexis Lothoré)
- Convert selftests/bpf/test_xsk to test_progs runner (Bastien
Curutchet)
- Replace bpf memory allocator with kmalloc_nolock() in
bpf_local_storage (Amery Hung), and in bpf streams and range tree
(Puranjay Mohan)
- Introduce support for indirect jumps in BPF verifier and x86 JIT
(Anton Protopopov) and arm64 JIT (Puranjay Mohan)
- Remove runqslower bpf tool (Hoyeon Lee)
- Fix corner cases in the verifier to close several syzbot reports
(Eduard Zingerman, KaFai Wan)
- Several improvements in deadlock detection in rqspinlock (Kumar
Kartikeya Dwivedi)
- Implement "jmp" mode for BPF trampoline and corresponding
DYNAMIC_FTRACE_WITH_JMP. It improves "fexit" program type performance
from 80 M/s to 136 M/s. With Steven's Ack. (Menglong Dong)
- Add ability to test non-linear skbs in BPF_PROG_TEST_RUN (Paul
Chaignon)
- Do not let BPF_PROG_TEST_RUN emit invalid GSO types to stack (Daniel
Borkmann)
- Generalize buildid reader into bpf_dynptr (Mykyta Yatsenko)
- Optimize bpf_map_update_elem() for map-in-map types (Ritesh
Oedayrajsingh Varma)
- Introduce overwrite mode for BPF ring buffer (Xu Kuohai)
* tag 'bpf-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (169 commits)
bpf: optimize bpf_map_update_elem() for map-in-map types
bpf: make kprobe_multi_link_prog_run always_inline
selftests/bpf: do not hardcode target rate in test_tc_edt BPF program
selftests/bpf: remove test_tc_edt.sh
selftests/bpf: integrate test_tc_edt into test_progs
selftests/bpf: rename test_tc_edt.bpf.c section to expose program type
selftests/bpf: Add success stats to rqspinlock stress test
rqspinlock: Precede non-head waiter queueing with AA check
rqspinlock: Disable spinning for trylock fallback
rqspinlock: Use trylock fallback when per-CPU rqnode is busy
rqspinlock: Perform AA checks immediately
rqspinlock: Enclose lock/unlock within lock entry acquisitions
bpf: Remove runqslower tool
selftests/bpf: Remove usage of lsm/file_alloc_security in selftest
bpf: Disable file_alloc_security hook
bpf: check for insn arrays in check_ptr_alignment
bpf: force BPF_F_RDONLY_PROG on insn array creation
bpf: Fix exclusive map memory leak
selftests/bpf: Make CS length configurable for rqspinlock stress test
selftests/bpf: Add lock wait time stats to rqspinlock stress test
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kunit updates from Shuah Khan:
- Make filter parameters configurable via Kconfig
- Add description of kunit.enable parameter to documentation
* tag 'linux_kselftest-kunit-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kunit: Make filter parameters configurable via Kconfig
Documentation: kunit: add description of kunit.enable parameter
|
|
The error path of copying the old config used the wrong variable in the
error message:
$ mkdir /tmp/build
$ ./tools/testing/ktest/config-bisect.pl -b /tmp/build config-good /tmp/config-bad
$ chmod 0 /tmp/build
$ ./tools/testing/ktest/config-bisect.pl -b /tmp/build config-good /tmp/config-bad good
cp /tmp/build//.config config-good.tmp ... [0 seconds] FAILED!
Use of uninitialized value $config in concatenation (.) or string at ./tools/testing/ktest/config-bisect.pl line 744.
failed to copy to config-good.tmp
When it should have shown:
failed to copy /tmp/build//.config to config-good.tmp
Cc: stable@vger.kernel.org
Cc: John 'Warthog9' Hawley <warthog9@kernel.org>
Fixes: 0f0db065999cf ("ktest: Add standalone config-bisect.pl program")
Link: https://patch.msgid.link/20251203180924.6862bd26@gandalf.local.home
Reported-by: "John W. Krahn" <jwkrahn@shaw.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The JH7110 pinctrl driver currently sets a static GPIO base number from
platform data:
sfp->gc.base = info->gc_base;
Static base assignment is deprecated and results in the following warning:
gpio gpiochip0: Static allocation of GPIO base is deprecated,
use dynamic allocation.
Set `sfp->gc.base = -1` to let the GPIO core dynamically allocate
the base number. This removes the warning and aligns the driver
with current GPIO guidelines.
Since the GPIO base is now allocated dynamically, remove `gc_base` field in
`struct jh7110_pinctrl_soc_info` and the associated `JH7110_SYS_GC_BASE`
and `JH7110_AON_GC_BASE` constants as they are no longer used anywhere
in the driver.
Tested on VisionFive 2 (JH7110 SoC).
Signed-off-by: Ali Tariq <alitariq45892@gmail.com>
Reviewed-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Linus Walleij <linusw@kernel.org>
|
|
pcs_pinconf_get() and pcs_pinconf_set() declare ret as unsigned int,
but assign it the return values of pcs_get_function() that may return
negative error codes. This causes negative error codes to be
converted to large positive values.
Change ret from unsigned int to int in both functions.
Fixes: 9dddb4df90d1 ("pinctrl: single: support generic pinconf")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Signed-off-by: Linus Walleij <linusw@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest updates from Shuah Khan:
- Add basic test for trace_marker_raw file to tracing selftest
- Fix invalid array access in printf dma_map_benchmark selftest
- Add tprobe enable/disable testcase to tracing selftest
- Update fprobe selftest for ftrace based fprobe
* tag 'linux_kselftest-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: tracing: Update fprobe selftest for ftrace based fprobe
selftests: tracing: Add tprobe enable/disable testcase
selftests/run_kselftest.sh: exit with error if tests fail
selftests/dma: fix invalid array access in printf
selftests/tracing: Add basic test for trace_marker_raw file
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild updates from Nicolas Schier:
- Enable -fms-extensions, allowing anonymous use of tagged struct or
union in struct/union (tag kbuild-ms-extensions-6.19). An exemplary
conversion patch is added here, too (btrfs).
[ Editor's note: the core of this actually came in early through a
shared branch and a few other trees - Linus ]
- Introduce architecture-specific CC_CAN_LINK and flags for userprogs
- Add new packaging target 'modules-cpio-pkg' for building a initramfs
cpio w/ kmods
- Handle included .c files in gen_compile_commands
- Minor kbuild changes:
- Use objtree for module signing key path, fixing oot kmod signing
- Improve documentation of KBUILD_BUILD_TIMESTAMP
- Reuse KBUILD_USERCFLAGS for UAPI, instead of defining twice
- Rename scripts/Makefile.extrawarn to Makefile.warn
- Drop obsolete types.h check from headers_check.pl
- Remove outdated config leak ignore entries
* tag 'kbuild-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: add target to build a cpio containing modules
initramfs: add gen_init_cpio to hostprogs unconditionally
kbuild: allow architectures to override CC_CAN_LINK
init: deduplicate cc-can-link.sh invocations
kbuild: don't enable CC_CAN_LINK if the dummy program generates warnings
scripts: headers_install.sh: Remove two outdated config leak ignore entries
scripts/clang-tools: Handle included .c files in gen_compile_commands
kbuild: uapi: Drop types.h check from headers_check.pl
kbuild: Rename Makefile.extrawarn to Makefile.warn
MAINTAINERS, .mailmap: Update mail address for Nicolas Schier
kbuild: uapi: reuse KBUILD_USERCFLAGS
kbuild: doc: improve KBUILD_BUILD_TIMESTAMP documentation
kbuild: Use objtree for module signing key path
btrfs: send: make use of -fms-extensions for defining struct fs_path
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Add support for 'syn'.
Syn is a parsing library for parsing a stream of Rust tokens into a
syntax tree of Rust source code.
Currently this library is geared toward use in Rust procedural
macros, but contains some APIs that may be useful more generally.
'syn' allows us to greatly simplify writing complex macros such as
'pin-init' (Benno has already prepared the 'syn'-based version). We
will use it in the 'macros' crate too.
'syn' is the most downloaded Rust crate (according to crates.io),
and it is also used by the Rust compiler itself. While the amount
of code is substantial, there should not be many updates needed for
these crates, and even if there are, they should not be too big,
e.g. +7k -3k lines across the 3 crates in the last year.
'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
I only modified their code to remove a third dependency
('unicode-ident') and to add the SPDX identifiers. The code can be
easily verified to exactly match upstream with the provided
scripts.
They are all licensed under "Apache-2.0 OR MIT", like the other
vendored 'alloc' crate we had for a while.
Please see the merge commit with the cover letter for more context.
- Allow 'unreachable_pub' and 'clippy::disallowed_names' for
doctests.
Examples (i.e. doctests) may want to do things like show public
items and use names such as 'foo'.
Nevertheless, we still try to keep examples as close to real code
as possible (this is part of why running Clippy on doctests is
important for us, e.g. for safety comments, which userspace Rust
does not support yet but we are stricter).
'kernel' crate:
- Replace our custom 'CStr' type with 'core::ffi::CStr'.
Using the standard library type reduces our custom code footprint,
and we retain needed custom functionality through an extension
trait and a new 'fmt!' macro which replaces the previous 'core'
import.
This started in 6.17 and continued in 6.18, and we finally land the
replacement now. This required quite some stamina from Tamir, who
split the changes in steps to prepare for the flag day change here.
- Replace 'kernel::c_str!' with C string literals.
C string literals were added in Rust 1.77, which produce '&CStr's
(the 'core' one), so now we can write:
c"hi"
instead of:
c_str!("hi")
- Add 'num' module for numerical features.
It includes the 'Integer' trait, implemented for all primitive
integer types.
It also includes the 'Bounded' integer wrapping type: an integer
value that requires only the 'N' least significant bits of the
wrapped type to be encoded:
// An unsigned 8-bit integer, of which only the 4 LSBs are used.
let v = Bounded::<u8, 4>::new::<15>();
assert_eq!(v.get(), 15);
'Bounded' is useful to e.g. enforce guarantees when working with
bitfields that have an arbitrary number of bits.
Values can also be constructed from simple non-constant expressions
or, for more complex ones, validated at runtime.
'Bounded' also comes with comparison and arithmetic operations
(with both their backing type and other 'Bounded's with a
compatible backing type), casts to change the backing type,
extending/shrinking and infallible/fallible conversions from/to
primitives as applicable.
- 'rbtree' module: add immutable cursor ('Cursor').
It enables to use just an immutable tree reference where
appropriate. The existing fully-featured mutable cursor is renamed
to 'CursorMut'.
kallsyms:
- Fix wrong "big" kernel symbol type read from procfs.
'pin-init' crate:
- A couple minor fixes (Benno asked me to pick these patches up for
him this cycle).
Documentation:
- Quick Start guide: add Debian 13 (Trixie).
Debian Stable is now able to build Linux, since Debian 13 (released
2025-08-09) packages Rust 1.85.0, which is recent enough.
We are planning to propose that the minimum supported Rust version
in Linux follows Debian Stable releases, with Debian 13 being the
first one we upgrade to, i.e. Rust 1.85.
MAINTAINERS:
- Add entry for the new 'num' module.
- Remove Alex as Rust maintainer: he hasn't had the time to
contribute for a few years now, so it is a no-op change in
practice.
And a few other cleanups and improvements"
* tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (53 commits)
rust: macros: support `proc-macro2`, `quote` and `syn`
rust: syn: enable support in kbuild
rust: syn: add `README.md`
rust: syn: remove `unicode-ident` dependency
rust: syn: add SPDX License Identifiers
rust: syn: import crate
rust: quote: enable support in kbuild
rust: quote: add `README.md`
rust: quote: add SPDX License Identifiers
rust: quote: import crate
rust: proc-macro2: enable support in kbuild
rust: proc-macro2: add `README.md`
rust: proc-macro2: remove `unicode_ident` dependency
rust: proc-macro2: add SPDX License Identifiers
rust: proc-macro2: import crate
rust: kbuild: support using libraries in `rustc_procmacro`
rust: kbuild: support skipping flags in `rustc_test_library`
rust: kbuild: add proc macro library support
rust: kbuild: simplify `--cfg` handling
rust: kbuild: introduce `core-flags` and `core-skip_flags`
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
Pull livepatching updates from Petr Mladek:
- Support both paths where tracefs is typically mounted in selftests
- Make old_sympos 0 and 1 equal. They both are valid when there is only
one symbol with the given name.
* tag 'livepatching-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching:
selftests: livepatch: use canonical ftrace path
livepatch: Match old_sympos 0 and 1 in klp_find_func()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- Improve recovery from misbehaving BPF schedulers.
When a scheduler puts many tasks with varying affinity restrictions
on a shared DSQ, CPUs scanning through tasks they cannot run can
overwhelm the system, causing lockups.
Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
and hooks into the hardlockup detector to attempt recovery.
Add scx_cpu0 example scheduler to demonstrate this scenario.
- Add lockless peek operation for DSQs to reduce lock contention for
schedulers that need to query queue state during load balancing.
- Allow scx_bpf_reenqueue_local() to be called from anywhere in
preparation for deprecating cpu_acquire/release() callbacks in favor
of generic BPF hooks.
- Prepare for hierarchical scheduler support: add
scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
structs for future aux__prog parameter.
- Implement cgroup_set_idle() callback to notify BPF schedulers when a
cgroup's idle state changes.
- Fix migration tasks being incorrectly downgraded from
stop_sched_class to rt_sched_class across sched_ext enable/disable.
Applied late as the fix is low risk and the bug subtle but needs
stable backporting.
- Various fixes and cleanups including cgroup exit ordering,
SCX_KICK_WAIT reliability, and backward compatibility improvements.
* tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
sched_ext: tools: Removing duplicate targets during non-cross compilation
sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
sched_ext: Update comments replacing breather with aborting mechanism
sched_ext: Implement load balancer for bypass mode
sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
sched_ext: Add scx_cpu0 example scheduler
sched_ext: Hook up hardlockup detector
sched_ext: Make handle_lockup() propagate scx_verror() result
sched_ext: Refactor lockup handlers into handle_lockup()
sched_ext: Make scx_exit() and scx_vexit() return bool
sched_ext: Exit dispatch and move operations immediately when aborting
sched_ext: Simplify breather mechanism with scx_aborting flag
sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
sched_ext: Refactor do_enqueue_task() local and global DSQ paths
sched_ext: Use shorter slice in bypass mode
sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
sched_ext: Minor cleanups to scx_task_iter
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- Defer task cgroup unlink until after the dying task's final context
switch so that controllers see the cgroup properly populated until
the task is truly gone
- cpuset cleanups and simplifications.
Enforce that domain isolated CPUs stay in root or isolated partitions
and fail if isolated+nohz_full would leave no housekeeping CPU. Fix
sched/deadline root domain handling during CPU hot-unplug and race
for tasks in attaching cpusets
- Misc fixes including memory reclaim protection documentation and
selftest KTAP conformance
* tag 'cgroup-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (21 commits)
cpuset: Treat cpusets in attaching as populated
sched/deadline: Walk up cpuset hierarchy to decide root domain when hot-unplug
cgroup/cpuset: Introduce cpuset_cpus_allowed_locked()
docs: cgroup: No special handling of unpopulated memcgs
docs: cgroup: Note about sibling relative reclaim protection
docs: cgroup: Explain reclaim protection target
selftests/cgroup: conform test to KTAP format output
cpuset: remove need_rebuild_sched_domains
cpuset: remove global remote_children list
cpuset: simplify node setting on error
cgroup: include missing header for struct irq_work
cgroup: Fix sleeping from invalid context warning on PREEMPT_RT
cgroup/cpuset: Globally track isolated_cpus update
cgroup/cpuset: Ensure domain isolated CPUs stay in root or isolated partition
cgroup/cpuset: Move up prstate_housekeeping_conflict() helper
cgroup/cpuset: Fail if isolated and nohz_full don't leave any housekeeping
cgroup/cpuset: Rename update_unbound_workqueue_cpumask() to update_isolation_cpumasks()
cgroup: Defer task cgroup unlink until after the task is done switching out
cgroup: Move dying_tasks cleanup from cgroup_task_release() to cgroup_task_free()
cgroup: Rename cgroup lifecycle hooks to cgroup_task_*()
...
|
|
Use tpm_ret_to_err() to transmute TPM return codes in trusted_tpm2.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
Acked-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
Using -EFAULT as the tpm_ret_to_err() fallback error code causes makes it
incompatible on how trusted keys transmute TPM return codes.
Change the fallback as -EPERM in order to gain compatibility with trusted
keys. In addition, map TPM_RC_HASH to -EINVAL in order to be compatible
with tpm2_seal_trusted() return values.
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
tpm2_get_pcr_allocation() does not cap any upper limit for the number of
banks. Cap the limit to eight banks so that out of bounds values coming
from external I/O cause on only limited harm.
Cc: stable@vger.kernel.org # v5.10+
Fixes: bcfff8384f6c ("tpm: dynamically allocate the allocated_banks array")
Tested-by: Lai Yi <yi1.lai@linux.intel.com>
Reviewed-by: Jonathan McDowell <noodles@meta.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@opinsys.com>
|
|
tpm_find_get_ops() looks for the first valid TPM if the caller passes in
NULL. All internal users have been converted to either associate
themselves with a TPM directly, or call tpm_default_chip() as part of
their setup. Remove the no longer necessary tpm_find_get_ops().
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jonathan McDowell <noodles@meta.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
Currently if a user enqueues a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistency cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This continues the effort to refactor workqueue APIs, which began with
the introduction of new workqueues and a new alloc_workqueue flag in:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
Update the kerneldoc parameter definitions for __crb_go_idle
and __crb_cmd_ready to include the loc parameter.
Signed-off-by: Stuart Yoder <stuart.yoder@arm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
The spelling of the word "requrest" is incorrect; it should be "request".
Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
Remove parentheses around assert statements in Python. With parentheses,
assert always evaluates to True, making the checks ineffective.
Signed-off-by: Maurice Hieronymus <mhi@mailbox.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
|
|
Pull workqueue updates from Tejun Heo:
- Rescuer affinity management: Affinity is now updated only when
detached using wq_unbound_cpumask consistently. DISASSOCIATED workers
also follow unbound cpumask changes to avoid breaking CPU isolation
- Rescuer cleanups preparing for fetching work items one by one from
pool list: Work assignment factored out, optimized to skip pwqs no
longer needing rescue, and shutdown logic simplified
- Unused assert_rcu_or_wq_mutex_or_pool_mutex() removed
* tag 'wq-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Don't rely on wq->rescuer to stop rescuer
workqueue: Only assign rescuer work when really needed
workqueue: Factor out assign_rescuer_work()
workqueue: Init rescuer's affinities as wq_unbound_cpumask
workqueue: Let DISASSOCIATED workers follow unbound wq cpumask changes
workqueue: Update the rescuer's affinity only when it is detached
workqueue: Remove unused assert_rcu_or_wq_mutex_or_pool_mutex
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux
Pull printk updates from Petr Mladek:
- Allow creaing nbcon console drivers with an unsafe write_atomic()
callback that can only be called by the final nbcon_atomic_flush_unsafe().
Otherwise, the driver would rely on the kthread.
It is going to be used as the-best-effort approach for an
experimental nbcon netconsole driver, see
https://lore.kernel.org/r/20251121-nbcon-v1-2-503d17b2b4af@debian.org
Note that a safe .write_atomic() callback is supposed to work in NMI
context. But some networking drivers are not safe even in IRQ
context:
https://lore.kernel.org/r/oc46gdpmmlly5o44obvmoatfqo5bhpgv7pabpvb6sjuqioymcg@gjsma3ghoz35
In an ideal world, all networking drivers would be fixed first and
the atomic flush would be blocked only in NMI context. But it brings
the question how reliable networking drivers are when the system is
in a bad state. They might block flushing more reliable serial
consoles which are more suitable for serious debugging anyway.
- Allow to use the last 4 bytes of the printk ring buffer.
- Prevent queuing IRQ work and block printk kthreads when consoles are
suspended. Otherwise, they create non-necessary churn or even block
the suspend.
- Release console_lock() between each record in the kthread used for
legacy consoles on RT. It might significantly speed up the boot.
- Release nbcon context between each record in the atomic flush. It
prevents stalls of the related printk kthread after it has lost the
ownership in the middle of a record
- Add support for NBCON consoles into KDB
- Add %ptsP modifier for printing struct timespec64 and use it where
possible
- Misc code clean up
* tag 'printk-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux: (48 commits)
printk: Use console_is_usable on console_unblank
arch: um: kmsg_dump: Use console_is_usable
drivers: serial: kgdboc: Drop checks for CON_ENABLED and CON_BOOT
lib/vsprintf: Unify FORMAT_STATE_NUM handlers
printk: Avoid irq_work for printk_deferred() on suspend
printk: Avoid scheduling irq_work on suspend
printk: Allow printk_trigger_flush() to flush all types
tracing: Switch to use %ptSp
scsi: snic: Switch to use %ptSp
scsi: fnic: Switch to use %ptSp
s390/dasd: Switch to use %ptSp
ptp: ocp: Switch to use %ptSp
pps: Switch to use %ptSp
PCI: epf-test: Switch to use %ptSp
net: dsa: sja1105: Switch to use %ptSp
mmc: mmc_test: Switch to use %ptSp
media: av7110: Switch to use %ptSp
ipmi: Switch to use %ptSp
igb: Switch to use %ptSp
e1000e: Switch to use %ptSp
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull lkmm documentation update from Paul McKenney:
- Sort the memory-barriers.txt file's wait_event* and wait_on_bit* list
alphabetically
* tag 'lkmm.2025.12.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
memory-barriers.txt: Sort wait_event* and wait_on_bit* list alphabetically
|
|
- Use max() instead of max_t() to ease static analysis (David Laight)
- Add Manivannan Sadhasivam as PCI/pwrctrl maintainer (Bartosz Golaszewski)
* pci/misc:
MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer
PCI: Use max() instead of max_t() to ease static analysis
|
|
- Add a struct pci_ops.assert_perst() function pointer to assert/deassert
PCIe PERST# and implement it for the qcom driver (Krishna Chaitanya
Chundru)
- Add DT binding and pwrctrl driver for the Toshiba TC9563 PCIe switch,
which must be held in reset after poweron so the pwrctrl driver can
configure the switch via I2C before bringing up the links (Krishna
Chaitanya Chundru)
* pci/pwrctrl-tc9563:
PCI: pwrctrl: Add power control driver for TC9563
PCI: qcom: Implement .assert_perst()
PCI: dwc: Implement .assert_perst() for dwc glue drivers
PCI: Add .assert_perst() to control PCIe PERST#
dt-bindings: PCI: Add binding for Toshiba TC9563 PCIe switch
|
|
- Fix a race between link training and endpoint register initialization
(Christian Bruel)
- Align endpoint allocations to match the ATU requirements (Christian
Bruel)
- Add #includes to avoid depending on 'proxy' headers (Andy Shevchenko)
* pci/controller/stm32:
PCI: stm32: Don't use 'proxy' headers
PCI: stm32: Fix EP page_size alignment
PCI: stm32: Fix LTSSM EP race with start link
|
|
- Add DT binding and driver for SpacemiT K1 (Alex Elder)
* pci/controller/spacemit-k1:
PCI: spacemit: Add SpacemiT PCIe host driver
dt-bindings: pci: spacemit: Introduce PCIe host controller
|
|
- Add module support for platform controller driver (Manikandan K Pillai)
- Split headers into 'legacy' (LGA) and 'high perf' (HPA) (Manikandan K
Pillai)
- Add DT binding and driver for CIX Sky1 (Hans Zhang)
* pci/controller/sky1:
MAINTAINERS: Add CIX Sky1 PCIe controller driver maintainer
PCI: sky1: Add PCIe host support for CIX Sky1
dt-bindings: PCI: Add CIX Sky1 PCIe Root Complex bindings
PCI: cadence: Add support for High Perf Architecture (HPA) controller
PCI: cadence: Move PCIe RP common functions to a separate file
PCI: cadence: Split PCIe controller header file
PCI: cadence: Add module support for platform controller driver
|
|
- Fix sg2042_pcie_remove() reference count issue (Christophe JAILLET)
* pci/controller/sg2042:
PCI: sg2042: Fix a reference count issue in sg2042_pcie_remove()
|
|
- Add NXP S32G host controller DT binding and driver (Vincent Guittot)
* pci/controller/s32g:
MAINTAINERS: Add NXP S32G PCIe controller driver maintainer
PCI: s32g: Add NXP S32G PCIe controller driver (RC)
PCI: dwc: Add register and bitfield definitions
dt-bindings: PCI: s32g: Add NXP S32G PCIe controller
|
|
- Add Renesas RZ/G3S host controller DT binding and driver (Claudiu Beznea)
* pci/controller/rzg3s-host:
PCI: Add Renesas RZ/G3S host controller driver
dt-bindings: PCI: Add Renesas RZ/G3S PCIe controller binding
|
|
- Drop ARM dependency so we can build test on other arches (Geert
Uytterhoeven)
* pci/controller/rcar-gen2:
PCI: rcar-gen2: Drop ARM dependency from PCI_RCAR_GEN2
|
|
- Look up OPP using both frequency and data rate (not just frequency) so
RPMh votes can account for both (Krishna Chaitanya Chundru)
* pci/controller/qcom:
PCI: qcom: Use frequency and level based OPP lookup
|
|
- Update DT binding to name DBI region "dbi", not "elbi", and update driver
to support both (Manivannan Sadhasivam)
* pci/controller/meson:
PCI: meson: Fix parsing the DBI register region
dt-bindings: PCI: amlogic: Fix the register name of the DBI region
|
|
- Convert DT binding to YAML schema (Christian Marangi)
- Add Airoha AN7583 DT compatible and driver support (Christian Marangi)
* pci/controller/mediatek:
PCI: mediatek: Add support for Airoha AN7583 SoC
PCI: mediatek: Use generic MACRO for TPVPERL delay
PCI: mediatek: Convert bool to single quirks entry and bitmap
dt-bindings: PCI: mediatek: Add support for Airoha AN7583
dt-bindings: PCI: mediatek: Convert to YAML schema
|
|
- Fail the probe instead of silently succeeding if ks_pcie_of_data
didn't specify Root Complex or Endpoint mode (Siddharth Vadapalli)
- Make keystone buildable as a loadable module, except on ARM32 where
hook_fault_code() is __init (Siddharth Vadapalli)
* pci/controller/keystone:
PCI: keystone: Add support to build as a loadable module
PCI: dwc: Export dw_pcie_allocate_domains() and dw_pcie_ep_raise_msix_irq()
PCI: Export pci_get_host_bridge_device() for use by pci-keystone
PCI: keystone: Exit ks_pcie_probe() for invalid mode
|
|
- Use devm_clk_get_optional_enabled() instead of open-coding
devm_clk_get_optional() and clk_prepare_enable() (Anand Moon)
* pci/controller/j721e:
PCI: j721e: Use 'pcie->reset_gpio' directly and drop the local variable
PCI: j721e: Use devm_clk_get_optional_enabled() to get and enable the clock
|
|
- Guard ARM32-specific hook_fault_code() with ifdefs so we can build test
on other arches (Bjorn Helgaas)
* pci/controller/ixp4xx:
PCI: ixp4xx: Guard ARM32-specific hook_fault_code()
|
|
- Use devm_regulator_get_enable_optional() to simplify probing (Anand Moon)
* pci/controller/dw-rockchip:
PCI: dw-rockchip: Simplify regulator setup with devm_regulator_get_enable_optional()
|
|
- Update PORT_LOGIC_LTSSM_STATE_MASK to be a 6-bit mask as per spec, not a
5-bit mask (Shawn Lin)
- Clear L1 PM Substate Capability 'Supported' bits unless glue driver says
it's supported, which prevents users from enabling non-working L1SS.
Currently only qcom and tegra194 support L1SS (Bjorn Helgaas)
- Remove now-superfluous L1SS disable code from tegra194 (Bjorn Helgaas)
- Configure L1SS support in dw-rockchip when DT says 'supports-clkreq'
(Shawn Lin)
* pci/controller/dwc:
PCI: dw-rockchip: Configure L1SS support
PCI: tegra194: Remove unnecessary L1SS disable code
PCI: dwc: Advertise L1 PM Substates only if driver requests it
PCI: dwc: Fix wrong PORT_LOGIC_LTSSM_STATE_MASK definition
|
|
- Disable advertising ASPM L0s support correctly (Jim Quinlan)
- Add a panic/die handler to print diagnostic info in case PCIe caused an
unrecoverable abort (Jim Quinlan)
* pci/controller/brcmstb:
PCI: brcmstb: Add panic/die handler to driver
PCI: brcmstb: Add a way to indicate if PCIe bridge is active
PCI: brcmstb: Fix disabling L0s capability
|
|
- Move struct pci_host_bridge allocation from pci_host_common_init() to
callers, which significantly simplifies pcie-apple (Marc Zyngier)
* pci/controller/host-common:
PCI: host-generic: Move bridge allocation outside of pci_host_common_init()
|
|
- Convert the endpoint doorbell test to use a threaded IRQ to fix a
'sleeping while atomic' issue (Bhanu Seshu Kumar Valluri)
- Add endpoint VNTB MSI doorbell support to reduce latency between host and
endpoint (Frank Li)
* pci/endpoint:
PCI: endpoint: pci-epf-vntb: Add MSI doorbell support
PCI: endpoint: Add pci_epf_assign_bar_space() API
PCI: endpoint: Add pci_epf_get_required_bar_size() helper
PCI: endpoint: Rename 'epf_bar::aligned_size' to 'epf_bar:mem_size'
PCI: endpoint: pci-epf-test: Fix sleeping function being called from atomic context
|
|
- Add Rockchip RK3528 compatible strings in DT binding (Yao Zi)
- Add Qualcomm Kaanapali to SM8550 DT binding (Qiang Yu)
- Add 'contains' to the 'select' schema to enable the amlogic,axg-pcie
binding (Rob Herring)
- Update Manivannan Sadhasivam's email address in bindings (Manivannan
Sadhasivam)
- Add required 'power-domains' and 'resets' to qcom sa8775p, sc7280,
sc8280xp, sm8150, sm8250, sm8350, sm8450, sm8550, x1e80100 DT schemas
(Krzysztof Kozlowski)
* pci/dt-binding:
dt-bindings: PCI: qcom,pcie-x1e80100: Add missing required power-domains and resets
dt-bindings: PCI: qcom,pcie-sm8550: Add missing required power-domains and resets
dt-bindings: PCI: qcom,pcie-sm8450: Add missing required power-domains and resets
dt-bindings: PCI: qcom,pcie-sm8350: Add missing required power-domains and resets
dt-bindings: PCI: qcom,pcie-sm8250: Add missing required power-domains and resets
dt-bindings: PCI: qcom,pcie-sm8150: Add missing required power-domains and resets
dt-bindings: PCI: qcom,pcie-sc8280xp: Add missing required power-domains and resets
dt-bindings: PCI: qcom,pcie-sc7280: Add missing required power-domains and resets
dt-bindings: PCI: qcom,pcie-sa8775p: Add missing required power-domains and resets
dt-bindings: PCI: Update the email address for Manivannan Sadhasivam
dt-bindings: PCI: amlogic,axg-pcie: Fix select schema
dt-bindings: PCI: qcom,pcie-sm8550: Add Kaanapali compatible
dt-bindings: PCI: dwc: rockchip: Add RK3528 variant
|
|
- Prevent resource tree corruption when BAR resize fails (Ilpo Järvinen)
- Restore BARs to the original size if a BAR resize fails (Ilpo Järvinen)
- Remove BAR release from BAR resize attempts by the xe, i915, and amdgpu
drivers so the PCI core can restore BARs if the resize fails (Ilpo
Järvinen)
- Move Resizable BAR code to rebar.c (Ilpo Järvinen)
- Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo Järvinen)
- Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo Järvinen)
* pci/resource:
PCI: Validate pci_rebar_size_supported() input
PCI: Convert BAR sizes bitmasks to u64
drm/amdgpu: Use pci_rebar_get_max_size()
drm/xe/vram: Use pci_rebar_get_max_size()
PCI: Add pci_rebar_get_max_size()
drm/xe/vram: Use PCI rebar helpers in resize_vram_bar()
drm/i915/gt: Use pci_rebar_size_supported()
PCI: Add pci_rebar_size_supported() helper
PCI: Improve Resizable BAR functions kernel doc
PCI: Move pci_rebar_size_to_bytes() and export it
PCI: Move pci_rebar_bytes_to_size() and clean it up
PCI: Move Resizable BAR code to rebar.c
PCI: Prevent restoring assigned resources
drm/amdgpu: Remove driver side BAR release before resize
drm/i915: Remove driver side BAR release before resize
drm/xe: Remove driver side BAR release before resize
PCI: Add kerneldoc for pci_resize_resource()
PCI: Fix restoring BARs on BAR resize rollback path
PCI: Free saved list without holding pci_bus_sem
PCI: Try BAR resize even when no window was released
PCI: Change pci_dev variable from 'bridge' to 'dev'
PCI/IOV: Adjust ->barsz[] when changing BAR size
PCI: Prevent resource tree corruption when BAR resize fails
|
|
- Enable PTM only if device advertises support for a relevant role, to
prevent invalid PTM Requests that cause ACS violations that are reported
as AER Uncorrectable Non-Fatal errors (Mika Westerberg)
* pci/ptm:
PCI/PTM: Enable only if device advertises relevant role
|
|
- For drivers using PCI legacy suspend, save config state at suspend so
that state (not any earlier state from enumeration, probe, or error
recovery) will be restored when resuming (Lukas Wunner)
- For devices with no driver or a driver that lacks PM, save config state
at hibernate so that state (not any earlier state from enumeration,
probe, or error recovery) will be restored when resuming (Lukas Wunner)
- Save device config space on device addition, before driver binding, so
error recovery works more reliably (Lukas Wunner)
- Drop pci_save_state() from several drivers that no longer need it since
the PCI core always does it and pci_restore_state() no longer invalidates
the saved state (Lukas Wunner)
- Document use of pci_save_state() by drivers to capture the state they
want restored during error recovery (Lukas Wunner)
* pci/err:
Documentation: PCI: Amend error recovery doc with pci_save_state() rules
treewide: Drop pci_save_state() after pci_restore_state()
PCI/ERR: Ensure error recoverability at all times
PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw
PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths
|
|
- Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms (Dan
Williams)
- Switch vmd from custom domain number allocator to the common allocator
(Dan Williams)
* pci/enumeration:
PCI: vmd: Switch to pci_bus_find_emul_domain_nr()
PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Frederic Weisbecker:
"SRCU:
- Properly handle SRCU readers within IRQ disabled sections in tiny
SRCU
- Preparation to reimplement RCU Tasks Trace on top of SRCU fast:
- Introduce API to expedite a grace period and test it through
rcutorture
- Split srcu-fast in two flavours: SRCU-fast and SRCU-fast-updown.
Both are still targeted toward faster readers (without full
barriers on LOCK and UNLOCK) at the expense of heavier write
side (using full RCU grace period ordering instead of simply
full ordering) as compared to "traditional" non-fast SRCU. But
those srcu-fast flavours are going to be optimized in two
different ways:
- SRCU-fast will become the reimplementation basis for
RCU-TASK-TRACE for consolidation. Since RCU-TASK-TRACE must
be NMI safe, SRCU-fast must be as well.
- SRCU-fast-updown will be needed for uretprobes code in order
to get rid of the read-side memory barriers while still
allowing entering the reader at task level while exiting it
in a timer handler. It is considered semaphore-like in that
it can have different owners between LOCK and UNLOCK.
However it is not NMI-safe.
The actual optimizations are work in progress for the next
cycle. Only the new interfaces are added for now, along with
related torture and scalability test code.
- Create/document/debug/torture new proper initializers for RCU fast:
DEFINE_SRCU_FAST() and init_srcu_struct_fast()
This allows for using right away the proper ordering on the write
side (either full ordering or full RCU grace period ordering)
without waiting for the read side to tell which to use.
This also optimizes the read side altogether with moving flavour
debug checks under debug config and with removing a costly RmW
operation on their first call.
- Make some diagnostic functions tracing safe
Refscale:
- Add performance testing for common context synchronizations
(Preemption, IRQ, Softirq) and per-cpu increments. Those are
relevant comparisons against SRCU-fast read side APIs, especially
as they are planned to synchronize further tracing fast-path code
Miscellanous:
- In order to prepare the layout for nohz_full work deferral to user
exit, the context tracking state must shrink the counter of
transitions to/from RCU not watching. The only possible hazard is
to trigger wrap-around more easily, delaying a bit grace periods
when that happens. This should be a rare event though. Yet add
debugging and torture code to test that assumption
- Fix memory leak on locktorture module
- Annotate accesses in rculist_nulls.h to prevent from KCSAN
warnings. On recent discussions, we also concluded that all those
WRITE_ONCE() and READ_ONCE() on list APIs deserve appropriate
comments. Something to be expected for the next cycle
- Provide a script to apply several configs to several commits with
torture
- Allow torture to reuse a build directory in order to save needless
rebuild time
- Various cleanups"
* tag 'rcu.release.v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (29 commits)
refscale: Add SRCU-fast-updown readers
refscale: Exercise DEFINE_STATIC_SRCU_FAST() and init_srcu_struct_fast()
rcutorture: Make srcu{,d}_torture_init() announce the SRCU type
srcu: Create an SRCU-fast-updown API
refscale: Do not disable interrupts for tests involving local_bh_enable()
refscale: Add non-atomic per-CPU increment readers
refscale: Add this_cpu_inc() readers
refscale: Add preempt_disable() readers
refscale: Add local_bh_disable() readers
refscale: Add local_irq_disable() and local_irq_save() readers
torture: Permit negative kvm.sh --kconfig numberic arguments
srcu: Add SRCU_READ_FLAVOR_FAST_UPDOWN CPP macro
rcu: Mark diagnostic functions as notrace
rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE
rcutorture: Remove redundant rcutorture_one_extend() from rcu_torture_one_read()
rcutorture: Permit kvm-again.sh to re-use the build directory
torture: Add kvm-series.sh to test commit/scenario combination
rcu: use WRITE_ONCE() for ->next and ->pprev of hlist_nulls
locktorture: Fix memory leak in param_set_cpumask()
doc: Update for SRCU-fast definitions and initialization
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- mempool_alloc_bulk() support for upcoming users in the block layer
that need to allocate multiple objects at once with the mempool's
guaranteed progress semantics, which is not achievable with an
allocation single objects in a loop. Along with refactoring and
various improvements (Christoph Hellwig)
- Preparations for the upcoming separation of struct slab from struct
page, mostly by removing the struct folio layer, as the purpose of
struct folio has shifted since it became used in slab code (Matthew
Wilcox)
- Modernisation of slab's boot param API usage, which removes some
unexpected parsing corner cases (Petr Tesarik)
- Refactoring of freelist_aba_t (now struct freelist_counters) and
associated functions for double cmpxchg, enabled by -fms-extensions
(Vlastimil Babka)
- Cleanups and improvements related to sheaves caching layer, that were
part of the full conversion to sheaves, which is planned for the next
release (Vlastimil Babka)
* tag 'slab-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (42 commits)
slab: Remove unnecessary call to compound_head() in alloc_from_pcs()
mempool: clarify behavior of mempool_alloc_preallocated()
mempool: drop the file name in the top of file comment
mempool: de-typedef
mempool: remove mempool_{init,create}_kvmalloc_pool
mempool: legitimize the io_schedule_timeout in mempool_alloc_from_pool
mempool: add mempool_{alloc,free}_bulk
mempool: factor out a mempool_alloc_from_pool helper
slab: Remove references to folios from virt_to_slab()
kasan: Remove references to folio in __kasan_mempool_poison_object()
memcg: Convert mem_cgroup_from_obj_folio() to mem_cgroup_from_obj_slab()
mempool: factor out a mempool_adjust_gfp helper
mempool: add error injection support
mempool: improve kerneldoc comments
mm: improve kerneldoc comments for __alloc_pages_bulk
fault-inject: make enum fault_flags available unconditionally
usercopy: Remove folio references from check_heap_object()
slab: Remove folio references from kfree_nolock()
slab: Remove folio references from kfree_rcu_sheaf()
slab: Remove folio references from build_detached_freelist()
...
|
|
Pull documentation updates from Jonathan Corbet:
"This has been another busy cycle for documentation, with a lot of
build-system thrashing. That work should slow down from here on out.
- The various scripts and tools for documentation were spread out in
several directories; now they are (almost) all coalesced under
tools/docs/. The holdout is the kernel-doc script, which cannot be
easily moved without some further thought.
- As the amount of Python code increases, we are accumulating modules
that are imported by multiple programs. These modules have been
pulled together under tools/lib/python/ -- at least, for
documentation-related programs. There is other Python code in the
tree that might eventually want to move toward this organization.
- The Perl kernel-doc.pl script has been removed. It is no longer
used by default, and nobody has missed it, least of all anybody who
actually had to look at it.
- The docs build was controlled by a complex mess of makefilese that
few dared to touch. Mauro has moved that logic into a new program
(tools/docs/sphinx-build-wrapper) that, with any luck at all, will
be far easier to understand and maintain.
- The get_feat.pl program, used to access information under
Documentation/features/, has been rewritten in Python, bringing an
end to the use of Perl in the docs subsystem.
- The top-level README file has been reorganized into a more
reader-friendly presentation.
- A lot of Chinese translation additions
- Typo fixes and documentation updates as usual"
* tag 'docs-6.19' of git://git.lwn.net/linux: (164 commits)
docs: makefile: move rustdoc check to the build wrapper
README: restructure with role-based documentation and guidelines
docs: kdoc: various fixes for grammar, spelling, punctuation
docs: kdoc_parser: use '@' for Excess enum value
docs: submitting-patches: Clarify that removal of Acks needs explanation too
docs: kdoc_parser: add data/function attributes to ignore
docs: MAINTAINERS: update Mauro's files/paths
docs/zh_CN: Add wd719x.rst translation
docs/zh_CN: Add libsas.rst translation
get_feat.pl: remove it, as it got replaced by get_feat.py
Documentation/sphinx/kernel_feat.py: use class directly
tools/docs/get_feat.py: convert get_feat.pl to Python
Documentation/admin-guide: fix typo and comment in cscope example
docs/zh_CN: Add data-integrity.rst translation
docs/zh_CN: Add blk-mq.rst translation
docs/zh_CN: Add block/index.rst translation
docs/zh_CN: Update the Chinese translation of kbuild.rst
docs: bring some order to our Python module hierarchy
docs: Move the python libraries to tools/lib/python
Documentation/kernel-parameters: Move the kernel build options
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Rewrite memcpy_sglist from scratch
- Add on-stack AEAD request allocation
- Fix partial block processing in ahash
Algorithms:
- Remove ansi_cprng
- Remove tcrypt tests for poly1305
- Fix EINPROGRESS processing in authenc
- Fix double-free in zstd
Drivers:
- Use drbg ctr helper when reseeding xilinx-trng
- Add support for PCI device 0x115A to ccp
- Add support of paes in caam
- Add support for aes-xts in dthev2
Others:
- Use likely in rhashtable lookup
- Fix lockdep false-positive in padata by removing a helper"
* tag 'v6.19-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (71 commits)
crypto: zstd - fix double-free in per-CPU stream cleanup
crypto: ahash - Zero positive err value in ahash_update_finish
crypto: ahash - Fix crypto_ahash_import with partial block data
crypto: lib/mpi - use min() instead of min_t()
crypto: ccp - use min() instead of min_t()
hwrng: core - use min3() instead of nested min_t()
crypto: aesni - ctr_crypt() use min() instead of min_t()
crypto: drbg - Delete unused ctx from struct sdesc
crypto: testmgr - Add missing DES weak and semi-weak key tests
Revert "crypto: scatterwalk - Move skcipher walk and use it for memcpy_sglist"
crypto: scatterwalk - Fix memcpy_sglist() to always succeed
crypto: iaa - Request to add Kanchana P Sridhar to Maintainers.
crypto: tcrypt - Remove unused poly1305 support
crypto: ansi_cprng - Remove unused ansi_cprng algorithm
crypto: asymmetric_keys - fix uninitialized pointers with free attribute
KEYS: Avoid -Wflex-array-member-not-at-end warning
crypto: ccree - Correctly handle return of sg_nents_for_len
crypto: starfive - Correctly handle return of sg_nents_for_len
crypto: iaa - Fix incorrect return value in save_iaa_wq()
crypto: zstd - Remove unnecessary size_t cast
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe
Pull IPE udates from Fan Wu:
"The primary change is the addition of support for the AT_EXECVE_CHECK
flag. This allows interpreters to signal the kernel to perform IPE
security checks on script files before execution, extending IPE
enforcement to indirectly executed scripts.
Update documentation for it, and also fix a comment"
* tag 'ipe-pr-20251202' of git://git.kernel.org/pub/scm/linux/kernel/git/wufan/ipe:
ipe: Update documentation for script enforcement
ipe: Add AT_EXECVE_CHECK support for script enforcement
ipe: Drop a duplicated CONFIG_ prefix in the ifdeffery
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar:
"Bug fixes:
- defer credentials checking from the bprm_check_security hook to the
bprm_creds_from_file security hook
- properly ignore IMA policy rules based on undefined SELinux labels
IMA policy rule extensions:
- extend IMA to limit including file hashes in the audit logs
(dont_audit action)
- define a new filesystem subtype policy option (fs_subtype)
Misc:
- extend IMA to support in-kernel module decompression by deferring
the IMA signature verification in kernel_read_file() to after the
kernel module is decompressed"
* tag 'integrity-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity:
ima: Handle error code returned by ima_filter_rule_match()
ima: Access decompressed kernel module to verify appended signature
ima: add fs_subtype condition for distinguishing FUSE instances
ima: add dont_audit action to suppress audit actions
ima: Attach CREDS_CHECK IMA hook to bprm_creds_from_file LSM hook
|
|
Setup qemu with KVM then run kvm stat and some host
recording/reporting/build-id tests.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add test that evlist reports expected events from perf record.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Compile a simple dlfilter and make sure it remove samples from
everything other than a test_loop.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add test that kallsyms finds a well known symbol and fails for
another.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Basic coverage for `perf timechart` doing a record and then a basic
sanity test of the generated SVG file.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The test starts a backgroup thloop workload and monitors it using
cpu-clock ensuring test_loop appears in the output.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add testing for the purge and remove commands. Use the noploop
workload rather than just a return to avoid missing samples in the
workload in perf record. Tidy up the cleanup code to cleanup when
signals happen.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add basic c2c record and report testing to gain some coverage.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
To deal with histogram code that had missing gets the c2c code had
some defensive gets. Those other issues were cleaned up by the
reference count checker, clean them up for the c2c command here.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Reference count checking caught a missing dso__put following a
machine__findnew_dso_id.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Reference count checking found the online CPU map was being gotten but
not put. Add in the missing put.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Rather than exit the internal map_symbols directly, put the mem-info
that does this and also lowers the reference count on the mem-info
itself otherwise the mem-info is being leaked.
Fixes: 56e144fe98260a0f ("perf mem_info: Add and use map_symbol__exit and addr_map_symbol__exit")
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Move nsinfo__zput from cleanup_perf_probe_events to
clear_perf_probe_event so it is always executed. Clean up
clear_perf_probe_events to not call nsinfo__zput and use the pev
variable to avoid repeated array accesses.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add missing dso__put for the dso created in maps__split_kallsyms.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In dso__process_kernel_symbol if inserting a map fails, probably
ENOMEM, then the reference count puts were missing on the dso and map.
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The '-o' option exists for the SVG creation but not for `perf
timechart record`. Add to better allow testing.
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
There are 2 slots left for kvm_add_default_arch_event, fix the
assertion so that debug builds don't fail the assert and to agree with
the comment.
Fixes: 45ff39f6e70aa55d0 ("perf tools kvm: Fix the potential out of range memory access issue")
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The updated events were published in:
https://github.com/intel/perfmon/commit/445e38f5128592f8b5c38da30267fff025e37613
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The updated events were published in:
https://github.com/intel/perfmon/commit/6edacf434dffa046435de2f6a182c00df3cf4edc
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The updated events were published in:
https://github.com/intel/perfmon/commit/348f33fae477f281812c32e1c07812b7e35614dd
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The updated events were published in:
https://github.com/intel/perfmon/commit/09a0c74b23b5d20adf1f97e5022856568d05494c
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The updated events were published in:
https://github.com/intel/perfmon/commit/dc6ffee20c74bfd21d7a7e338345578d4b7ca9ca
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The updated events were published in:
https://github.com/intel/perfmon/commit/b4acc3fd520eb098db41083010b65b75ae906c96
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The updated metrics were published in:
https://github.com/intel/perfmon/pull/348/commits/2dce436130ddfb8b442fc373d103f970de26cb78
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The updated events were published in:
https://github.com/intel/perfmon/commit/588dd77675039e1aaacee27a414cbcf3625c58a3
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The updated events were published in:
https://github.com/intel/perfmon/commit/c74f1cefa94d224cb3338507961b59d8a2a1c4e9
Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add the following CPU variants to the list for data source decoding:
- Cortex-A715 [1]
- Cortex-A78C [2]
- Cortex-X1 [3]
- Cortex-X4 [4]
- Neoverse V3 [5]
[1] https://developer.arm.com/documentation/101590/0103/Statistical-Profiling-Extension-Support/Statistical-Profiling-Extension-data-source-packet
[2] https://developer.arm.com/documentation/102226/0002/Debug-descriptions/Statistical-Profiling-Extension/implementation-defined-features-of-SPE
[3] https://developer.arm.com/documentation/101433/0102/Debug-descriptions/Statistical-Profiling-Extension/implementation-defined-features-of-SPE
[4] https://developer.arm.com/documentation/102484/0003/Statistical-Profiling-Extension-support/Statistical-Profiling-Extension-data-source-packet
[5] https://developer.arm.com/documentation/107734/0002/Statistical-Profiling-Extension-support/Statistical-Profiling-Extension-data-source-packet
Signed-off-by: Leo Yan <leo.yan@arm.com>
Reviewed-by: James Clark <james.clark@linaro.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In 754187ad73b73bcb ("perf build: Remove NO_AUXTRACE build option")
sys/types.h was removed, which broke the build in all Alpine Linux
releases, as musl libc has pid_t defined via sys/types.h, add it back.
Fixes: 754187ad73b73bcb ("perf build: Remove NO_AUXTRACE build option")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Pull smack updates from Casey Schaufler:
- fix several cases where labels were treated inconsistently when
imported from user space
- clean up the assignment of extended attributes
- documentation improvements
* tag 'Smack-for-6.19' of https://github.com/cschaufler/smack-next:
Smack: function parameter 'gfp' not described
smack: fix kernel-doc warnings for smk_import_valid_label()
smack: fix bug: setting task label silently ignores input garbage
smack: fix bug: unprivileged task can create labels
smack: fix bug: invalid label of unix socket file
smack: always "instantiate" inode in smack_inode_init_security()
smack: deduplicate xattr setting in smack_inode_init_security()
smack: fix bug: SMACK64TRANSMUTE set on non-directory
smack: deduplicate "does access rule request transmutation"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit updates from Paul Moore:
- Consolidate the loops in __audit_inode_child() to improve performance
When logging a child inode in __audit_inode_child(), we first run
through the list of recorded inodes looking for the parent and then
we repeat the search looking for a matching child entry. This pull
request consolidates both searches into one pass through the recorded
inodes, resuling in approximately a 50% reduction in audit overhead.
See the commit description for the testing details.
- Combine kmalloc()/memset() into kzalloc() in audit_krule_to_data()
- Comment fixes
* tag 'audit-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: merge loops in __audit_inode_child()
audit: Use kzalloc() instead of kmalloc()/memset() in audit_krule_to_data()
audit: fix comment misindentation in audit.h
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux updates from Paul Moore:
- Improve the granularity of SELinux labeling for memfd files
Currently when creating a memfd file, SELinux treats it the same as
any other tmpfs, or hugetlbfs, file. While simple, the drawback is
that it is not possible to differentiate between memfd and tmpfs
files.
This adds a call to the security_inode_init_security_anon() LSM hook
and wires up SELinux to provide a set of memfd specific access
controls, including the ability to control the execution of memfds.
As usual, the commit message has more information.
- Improve the SELinux AVC lookup performance
Adopt MurmurHash3 for the SELinux AVC hash function instead of the
custom hash function currently used. MurmurHash3 is already used for
the SELinux access vector table so the impact to the code is minimal,
and performance tests have shown improvements in both hash
distribution and latency.
See the commit message for the performance measurments.
- Introduce a Kconfig option for the SELinux AVC bucket/slot size
While we have the ability to grow the number of AVC hash buckets
today, the size of the buckets (slot size) is fixed at 512. This pull
request makes that slot size configurable at build time through a new
Kconfig knob, CONFIG_SECURITY_SELINUX_AVC_HASH_BITS.
* tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: improve bucket distribution uniformity of avc_hash()
selinux: Move avtab_hash() to a shared location for future reuse
selinux: Introduce a new config to make avc cache slot size adjustable
memfd,selinux: call security_inode_init_security_anon()
|
|
Remove the superfluous section name quotes, and combine the longs into a
single command.
Before:
911: .pushsection ".discard.annotate_insn", "M", @progbits, 8; .long 911b - .; .long 2; .popsection
After:
911: .pushsection .discard.annotate_insn, "M", @progbits, 8; .long 911b - ., 2; .popsection
No change in functionality.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/hpsfcihgqmhcdrg7pop7z73ptymakgjq7qlxrawrjxilosk43l@xikqif3ievj4
|
|
overflows
When the kernel build fails due to an objtool segfault, the error
message is a bit obtuse and confusing:
make[5]: *** [scripts/Makefile.build:503: drivers/scsi/qla2xxx/qla2xxx.o] Error 139
^^^^^^^^^
make[5]: *** Deleting file 'drivers/scsi/qla2xxx/qla2xxx.o'
make[4]: *** [scripts/Makefile.build:556: drivers/scsi/qla2xxx] Error 2
make[3]: *** [scripts/Makefile.build:556: drivers/scsi] Error 2
make[2]: *** [scripts/Makefile.build:556: drivers] Error 2
make[1]: *** [/home/jpoimboe/git/linux/Makefile:2013: .] Error 2
make: *** [Makefile:248: __sub-make] Error 2
Add a signal handler to objtool which prints an error message like if
the local stack has overflown (for which there's a chance as objtool
makes heavy use of recursion):
drivers/scsi/qla2xxx/qla2xxx.o: error: SIGSEGV: objtool stack overflow!
or:
drivers/scsi/qla2xxx/qla2xxx.o: error: SIGSEGV: objtool crash!
Also, re-raise the signal so the core dump still gets triggered.
[ mingo: Applied a build fix, added more comments and prettified the code. ]
Suggested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Alexandre Chartre <alexandre.chartre@oracle.com>
Cc: David Laight <david.laight.linux@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://patch.msgid.link/mi4tihk4dbncn7belrhp6ooudhpw4vdggerktu5333w3gqf3uf@vqlhc3y667mg
|
|
Remove newlines and tabs from the annotation macros so the invoking code
can insert them as needed to match the style of the surrounding code.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://patch.msgid.link/66305834c2eb78f082217611b756231ae9c0b555.1764694625.git.jpoimboe@kernel.org
|
|
Consolidate __ASM_ANNOTATE into a single macro which is used by both C
and asm. This also makes the code generation a bit more palatable by
putting it all on a single line.
Turn this:
911:
.pushsection .discard.annotate_insn,"M", @progbits, 8
.long 911b - .
.long 1
.popsection
jmp __x86_return_thunk
Into:
911: .pushsection ".discard.annotate_insn", "M", @progbits, 8; .long 911b - .; .long 1; .popsection
jmp __x86_return_thunk
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://patch.msgid.link/c05ff40d3383e85c3b59018ef0b3c7aaf993a60d.1764694625.git.jpoimboe@kernel.org
|
|
A few cases of space-Tab noise snuck in.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://patch.msgid.link/176478594889.498.15611228524880763978.tip-bot2@tip-bot2
|
|
and 'clk-qcom' into clk-next
* clk-visconti:
clk: visconti: Add VIIF clocks
dt-bindings: clock: tmpv770x: Add VIIF clocks
dt-bindings: clock: tmpv770x: Remove definition of number of clocks
clk: visconti: Do not define number of clocks in bindings
* clk-imx:
clk: imx: add driver for imx8ulp's sim lpav
dt-bindings: clock: document 8ULP's SIM LPAV
clk: imx: imx8mp-audiomix: use devm_auxiliary_device_create() to simple code
clk: imx: Add some delay before deassert the reset
* clk-microchip:
reset: mpfs: add non-auxiliary bus probing
clk: lan966x: remove unused dt-bindings include
clk: microchip: mpfs: use regmap for clocks
dt-bindings: clk: microchip: mpfs: remove first reg region
* clk-rockchip:
clk: rockchip: Add clock and reset driver for RK3506
dt-bindings: clock: rockchip: Add RK3506 clock and reset unit
clk: rockchip: Add clock controller for the RV1126B
dt-bindings: clock, reset: Add support for rv1126b
clk: rockchip: Implement rockchip_clk_register_armclk_multi_pll()
dt-bindings: clock: rk3568: Drop CLK_NR_CLKS define
clk: rockchip: rk3568: Drop CLK_NR_CLKS usage
dt-bindings: clock: rk3568: Add SCMI clock ids
* clk-qcom: (48 commits)
clk: qcom: Mark camcc_sm7150_hws static
clk: qcom: x1e80100-dispcc: Add USB4 router link resets
dt-bindings: clock: qcom: x1e80100-dispcc: Add USB4 router link resets
clk: qcom: videocc-sm8750: Add video clock controller driver for SM8750
dt-bindings: clock: qcom: Add SM8750 video clock controller
clk: qcom: branch: Extend invert logic for branch2 mem clocks
clk: qcom: ecpricc-qdu100: Add mem_enable_mask to the clock memory branch
clk: qcom: clk_mem_branch: add enable mask and invert flags
clk: qcom: mmcc-sdm660: Add missing MDSS reset
dt-bindings: clock: mmcc-sdm660: Add missing MDSS reset
clk: qcom: use different Kconfig prompts for APSS IPQ5424/6018 drivers
clk: qcom: apss-ipq5424: remove unused 'apss_clk' structure
dt-bindings: clock: qcom: Add Kaanapali Global clock controller
dt-bindings: clock: qcom: Document the Kaanapali TCSR Clock Controller
dt-bindings: clock: qcom-rpmhcc: Add RPMHCC for Kaanapali
clk: qcom: tcsrcc-glymur: Update register offsets for clock refs
clk: qcom: gcc-qcs615: Update the SDCC clock to use shared_floor_ops
clk: qcom: camcc-sm7150: Fix PLL config of PLL2
clk: qcom: camcc-sm6350: Fix PLL config of PLL2
clk: qcom: Add NSS clock controller driver for IPQ5424
...
|
|
and 'clk-mediatek' into clk-next
* clk-socfpga:
clk: socfpga: agilex5: add clock driver for Agilex5
* clk-renesas: (35 commits)
clk: renesas: r9a09g077: Add SPI module clocks
clk: renesas: r9a09g056: Add USB3.0 clocks/resets
clk: renesas: r9a09g057: Add USB3.0 clocks/resets
clk: renesas: r9a09g047: Add RSCI clocks/resets
dt-bindings: clock: renesas,r9a09g056-cpg: Add USB3.0 core clocks
dt-bindings: clock: renesas,r9a09g057-cpg: Add USB3.0 core clocks
clk: renesas: r9a06g032: Fix memory leak in error path
clk: renesas: r9a09g077: Use devm_ helpers for divider clock registration
clk: renesas: r9a09g077: Remove stray blank line
clk: renesas: r9a09g077: Propagate rate changes to parent clocks
clk: renesas: r8a779a0: Add 3DGE module clock
clk: renesas: r8a779a0: Add ZG Core clock
clk: renesas: rcar-gen4: Add support for clock dividers in FRQCRB
dt-bindings: clock: r8a779a0: Add ZG core clock
clk: renesas: r9a09g056: Add clock and reset entries for ISP
clk: renesas: r9a09g056: Add support for PLLVDO, CRU clocks, and resets
clk: renesas: r9a09g056: Add clocks and resets for DSI and LCDC modules
clk: renesas: r9a09g077: Add TSU module clock
clk: renesas: r9a09g057: Add clock and reset entries for DSI and LCDC
clk: renesas: rzv2h: Add support for DSI clocks
...
* clk-cleanup:
clk: keystone: fix compile testing
clk: keystone: syscon-clk: fix regmap leak on probe failure
clk: samsung: exynos-clkout: Assign .num before accessing .hws
clk: actions: Fix discarding const qualifier by 'container_of' macro
clk: spacemit: Set clk_hw_onecell_data::num before using flex array
clk: spacemit: fix comment typo
clk: keystone: Fix discarded const qualifiers
clk: sprd: sc9860: Simplify with of_device_get_match_data()
* clk-samsung:
firmware: exynos-acpm: add empty method to allow compile test
MAINTAINERS: add ACPM clock bindings and driver
clk: samsung: add Exynos ACPM clock driver
firmware: exynos-acpm: register ACPM clocks pdev
firmware: exynos-acpm: add DVFS protocol
dt-bindings: firmware: google,gs101-acpm-ipc: add ACPM clocks
clk: samsung: clk-pll: simplify samsung_pll_lock_wait()
clk: samsung: exynosautov920: add block mfc clock support
clk: samsung: exynosautov920: add clock support
dt-bindings: clock: exynosautov920: add mfc clock definitions
dt-bindings: clock: exynosautov920: add m2m clock definitions
dt-bindings: clock: google,gs101-clock: add power-domains
* clk-mediatek:
clk: en7523: Add reset-controller support for EN7523 SoC
dt-bindings: clock: airoha: Add reset support to EN7523 clock binding
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm
Pull LSM updates from Paul Moore:
- Rework the LSM initialization code
What started as a "quick" patch to enable a notification event once
all of the individual LSMs were initialized, snowballed a bit into a
30+ patch patchset when everything was done. Most of the patches, and
diffstat, is due to splitting out the initialization code into
security/lsm_init.c and cleaning up some of the mess that was there.
While not strictly necessary, it does cleanup the code signficantly,
and hopefully makes the upkeep a bit easier in the future.
Aside from the new LSM_STARTED_ALL notification, these changes also
ensure that individual LSM initcalls are only called when the LSM is
enabled at boot time. There should be a minor reduction in boot times
for those who build multiple LSMs into their kernels, but only enable
a subset at boot.
It is worth mentioning that nothing at present makes use of the
LSM_STARTED_ALL notification, but there is work in progress which is
dependent upon LSM_STARTED_ALL.
- Make better use of the seq_put*() helpers in device_cgroup
* tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (36 commits)
lsm: use unrcu_pointer() for current->cred in security_init()
device_cgroup: Refactor devcgroup_seq_show to use seq_put* helpers
lsm: add a LSM_STARTED_ALL notification event
lsm: consolidate all of the LSM framework initcalls
selinux: move initcalls to the LSM framework
ima,evm: move initcalls to the LSM framework
lockdown: move initcalls to the LSM framework
apparmor: move initcalls to the LSM framework
safesetid: move initcalls to the LSM framework
tomoyo: move initcalls to the LSM framework
smack: move initcalls to the LSM framework
ipe: move initcalls to the LSM framework
loadpin: move initcalls to the LSM framework
lsm: introduce an initcall mechanism into the LSM framework
lsm: group lsm_order_parse() with the other lsm_order_*() functions
lsm: output available LSMs when debugging
lsm: cleanup the debug and console output in lsm_init.c
lsm: add/tweak function header comment blocks in lsm_init.c
lsm: fold lsm_init_ordered() into security_init()
lsm: cleanup initialize_lsm() and rename to lsm_init_single()
...
|
|
The x86 bootloader ID specification text uses hexadecimal
values without a 0x prefix:
D kexec-tools
E Extended (see ext_loader_type)
F Special (0xFF = undefined)
10 Reserved
11 Minimal Linux Bootloader
<http://sebastian-plotz.blogspot.de>
12 OVMF UEFI virtualization stack
13 barebox
Which beyond the ambiguity of '13' in isolation, also
made me fail a grep -wi '0xd' when I was looking for
the kexec bootloader ID definition and caused quite
a bit of head-scratching before I found out why it
didn't show up.
Furthermore, the actual explanatory text uses the 0x
prefix:
For boot loader IDs above T = 0xD, write T = 0xE to this field and
write the extended ID minus 0x10 to the ext_loader_type field.
Similarly, the ext_loader_ver field can be used to provide more than
four bits for the bootloader version.
So make it all both unambiguous, easy to grep and consistent
across the entire documentation by prefixing the IDs with 0x.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
|
|
The bootloader ID specification text uses 2 capitalization
variants for the same thing: 'id', 'ids', 'ID' and 'IDs'.
Use 'ID/IDs' consistently.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: linux-kernel@vger.kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull trusted key updates from Jarkko Sakkinen:
- Remove duplicate 'tpm2_hash_map' in favor of 'tpm2_find_hash_alg()'
- Fix a memory leak on failure paths of 'tpm2_load_cmd'
* tag 'keys-trusted-next-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
KEYS: trusted: Fix a memory leak in tpm2_load_cmd
KEYS: trusted: Replace a redundant instance of tpm2_hash_map
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull keys update from Jarkko Sakkinen:
"This contains only three fixes"
* tag 'keys-next-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
keys: Fix grammar and formatting in 'struct key_type' comments
keys: Replace deprecated strncpy in ecryptfs_fill_auth_tok
keys: Remove redundant less-than-zero checks
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc
Pull nolibc updates from Thomas Weißschuh:
- Preparations to the use of nolibc in UML:
- Cleanup of sparse warnings
- Library mode without _start()
- More consistency when disabling errno
- Unconditional installation of all architecture support files
- Always 64-bit wide ino_t and off_t
- Various cleanups and bug fixes
* tag 'nolibc-20251130-for-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (25 commits)
selftests/nolibc: error out on linker warnings
selftests/nolibc: use lld to link loongarch binaries
tools/nolibc: remove more __nolibc_enosys() fallbacks
tools/nolibc: remove now superfluous overflow check in llseek
tools/nolibc: use 64-bit off_t
tools/nolibc: prefer the llseek syscall
tools/nolibc: handle 64-bit off_t for llseek
tools/nolibc: use 64-bit ino_t
tools/nolibc: avoid using plain integer as NULL pointer
tools/nolibc: add support for fchdir()
tools/nolibc: clean up outdated comments in generic arch.h
tools/nolibc: make the "headers" target install all supported archs
tools/nolibc: add the more portable inttypes.h
tools/nolibc: provide the portable sys/select.h
tools/nolibc: add missing memchr() to string.h
tools/nolibc: fix misleading help message regarding installation path
tools/nolibc: add uio.h with readv and writev
tools/nolibc: add option to disable runtime
tools/nolibc: use __fallthrough__ rather than fallthrough
tools/nolibc: implement %m if errno is not defined
...
|
|
A recent spinlock guard conversion used the wrong guard so that
interrupts are no longer disabled while holding the port list lock.
Based on a cursory look this appears to be safe currently, but it could
cause a deadlock if one of these helpers are ever called in interrupt
context.
Fixes: 4b1edbb028fb ("ASoC: qcom: q6afe: Use guard() for spin locks")
Cc: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@oss.qualcomm.com>
Link: https://patch.msgid.link/20251203105542.24765-2-johan@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Commit 1ee90870ce79 ("dt-bindings: thermal: tsens: Add QCS8300
compatible") uses a tab character which is illegal in YAML (at the
beginning of a line). The original patch was correct, so this got
corrupted when applied.
Fixes: 1ee90870ce79 ("dt-bindings: thermal: tsens: Add QCS8300 compatible")
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This version includes the following changes:
- Check feature status to check if the feature enablement was successful
- Reset SST-TF bucket structure to display valid bucket info
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
|
|
buckets
With SST-TF version 2 only 3 buckets are present. The information in
others buckets can be junk. So initialize the info structure of type
isst_turbo_freq_info, before issing ioctl to get bucket information.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
|
|
After change of enable/disable status of SST-CP, SST-TF and SST-BF
check if the hardware status change was successful. If not successful
even after retries, return failure.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
|
|
Instead of manually annotating each __ex_table entry, just make the
section mergeable and store the entry size in the ELF section header.
Either way works for objtool create_fake_symbols(), this way produces
cleaner code generation.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://patch.msgid.link/b858cb7891c1ba0080e22a9c32595e6c302435e2.1764694625.git.jpoimboe@kernel.org
|
|
Instead of manually annotating each .altinstructions entry, just make
the section mergeable and store the entry size in the ELF section
header.
Either way works for objtool create_fake_symbols(), this way produces
cleaner code generation.
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://patch.msgid.link/5ac04e6db5be6453dce8003a771ebb0c47b4cd7a.1764694625.git.jpoimboe@kernel.org
|
|
Extracting empty examples results in just the empty template being
generated and then validated. That's pointless and not free, so filter
out the schemas without any examples from the targets.
There's currently a little less than 10% of the binding schema files
without examples. Removing them improves the build time by ~6%.
Link: https://patch.msgid.link/20251201175030.3785060-1-robh@kernel.org
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
|
|
'version' is an enum, thus cast of pointer on 64-bit compile test with
clang W=1 causes:
rockchip_pdm.c:583:17: error: cast to smaller integer type 'enum rk_pdm_version' from 'const void *' [-Werror,-Wvoid-pointer-to-enum-cast]
This was already fixed in commit 49a4a8d12612 ("ASoC: rockchip: Fix
Wvoid-pointer-to-enum-cast warning") but then got bad in
commit 9958d85968ed ("ASoC: Use device_get_match_data()").
Discussion on LKML also pointed out that 'uintptr_t' is not the correct
type and either 'kernel_ulong_t' or 'unsigned long' should be used,
with several arguments towards the latter [1].
Link: https://lore.kernel.org/r/CAMuHMdX7t=mabqFE5O-Cii3REMuyaePHmqX+j_mqyrn6XXzsoA@mail.gmail.com/ [1]
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20251203141644.106459-2-krzysztof.kozlowski@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
clang W=1 builds warn:
nau8325.c:430:13: error: variable 'n2_max' is uninitialized when used here [-Werror,-Wuninitialized]
which are false positive, because the variables will be always
initialized when used (guarded by mclk_max!=0 check). However
initializing them upfront makes the code more obvious and easier, plus
it silences the warning.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20251203140611.87191-2-krzysztof.kozlowski@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Sphinx reports htmldocs indentation warnings:
Documentation/filesystems/nfs/nfsd-io-modes.rst:58: ERROR: Unexpected indentation. [docutils]
Documentation/filesystems/nfs/nfsd-io-modes.rst:59: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
These caused the lists to be shown as long running paragraphs merged
with their previous paragraphs.
Fix these by separating the lists with a blank line.
Fixes: fa8d4e6784d1b6 ("NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20251202152506.7a2d2d41@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Sphinx reports htmldocs indentation warnings:
Documentation/filesystems/nfs/nfsd-io-modes.rst:29: ERROR: Unexpected indentation. [docutils]
Documentation/filesystems/nfs/nfsd-io-modes.rst:34: ERROR: Unexpected indentation. [docutils]
Fix these by wrapping shell snippets in literal code blocks.
Fixes: fa8d4e6784d1b6 ("NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20251202152506.7a2d2d41@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Commit fa8d4e6784d1b6 ("NFSD: add
Documentation/filesystems/nfs/nfsd-io-modes.rst") adds documentation for
NFSD I/O modes, but it forgets to add toctree entry for it. Hence,
Sphinx reports:
Documentation/filesystems/nfs/nfsd-io-modes.rst: WARNING: document isn't included in any toctree [toc.not_included]
Add the entry.
Fixes: fa8d4e6784d1b6 ("NFSD: add Documentation/filesystems/nfs/nfsd-io-modes.rst")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20251202152506.7a2d2d41@canb.auug.org.au/
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
gpiod_set_value_cansleep() now returns an integer and can indicate
failures in the GPIO layer. Propagate any potential errors to regulator
core.
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Link: https://patch.msgid.link/20251203084737.15891-1-bartosz.golaszewski@oss.qualcomm.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Disable regulator in runtime resume when error happens to balance
the reference count of regulator.
Fixes: 2ff6d5a108c6 ("ASoC: ak5558: Add regulator support")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://patch.msgid.link/20251203100529.3841203-3-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Disable regulator in runtime resume when error happens to balance
the reference count of regulator.
Fixes: 7e3096e8f823 ("ASoC: ak4458: Add regulator support")
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Link: https://patch.msgid.link/20251203100529.3841203-2-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
A recent spinlock guard conversion consistently used the wrong guard so
that interrupts are no longer disabled while holding the chip lock
(which can cause deadlocks).
Fixes: 7e061b462b3d ("gpio: mmio: use lock guards")
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/r/20251203105206.24453-1-johan@kernel.org
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
|
|
Audio fails to resume after system exits suspend mode
due to accessing incorrect ring buffer address during
resume. This patch resolves issue by selecting correct
address based on the ACP version.
Fixes: f6f7d25b11033 ("ASoC: amd: acp: Add pte configuration for ACP7.0 platform")
Signed-off-by: Hemalatha Pinnamreddy <hemalatha.pinnamreddy2@amd.com>
Signed-off-by: Raghavendra Prasad Mallela <raghavendraprasad.mallela@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Link: https://patch.msgid.link/20251203064650.2554625-1-raghavendraprasad.mallela@amd.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Reference the dai-common.yaml schema to allow '#sound-dai-cells' and
"sound-name-prefix' to be used because cirrus,cs42xx8 is codec DAI.
Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://patch.msgid.link/20251203102836.3856471-1-shengjiu.wang@nxp.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
bcm63xx_soc_pcm_new() does not check the return value of
of_dma_configure(), which may fail with -EPROBE_DEFER or
other errors, allowing PCM setup to continue with incomplete
DMA configuration.
Add error checking for of_dma_configure() and return on failure.
Fixes: 88eb404ccc3e ("ASoC: brcm: Add DSL/PON SoC audio driver")
Signed-off-by: Haotian Zhang <vulab@iscas.ac.cn>
Link: https://patch.msgid.link/20251202101642.492-1-vulab@iscas.ac.cn
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
When CONFIG_PCI_MSI is disabled pci_alloc_irq_vectors() and
pci_free_irq_vectors() are defined as inline functions and hence require
a Rust helper.
error[E0425]: cannot find function `pci_alloc_irq_vectors` in crate `bindings`
--> rust/kernel/pci/irq.rs:144:23
|
144 | ...s::pci_alloc_irq_vectors(dev.as_raw(), min_vecs, max_vecs, irq_types.as_raw())
| ^^^^^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `pci_irq_vector`
|
::: .../rust/bindings/bindings_helpers_generated.rs:1197:5
|
1197 | pub fn pci_irq_vector(pdev: *mut pci_dev, nvec: ffi::c_uint) -> ffi::c_int;
| --------------------------------------------------------------------------- similarly named function `pci_irq_vector` defined here
error[E0425]: cannot find function `pci_free_irq_vectors` in crate `bindings`
--> rust/kernel/pci/irq.rs:170:28
|
170 | unsafe { bindings::pci_free_irq_vectors(self.dev.as_raw()) };
| ^^^^^^^^^^^^^^^^^^^^ help: a function with a similar name exists: `pci_irq_vector`
|
::: .../rust/bindings/bindings_helpers_generated.rs:1197:5
|
1197 | pub fn pci_irq_vector(pdev: *mut pci_dev, nvec: ffi::c_uint) -> ffi::c_int;
| --------------------------------------------------------------------------- similarly named function `pci_irq_vector` defined here
error: aborting due to 2 previous errors
Fix this by adding the corresponding helpers.
Fixes: 340ccc973544 ("rust: pci: Allocate and manage PCI interrupt vectors")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512012238.YgVvRRUx-lkp@intel.com/
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Link: https://patch.msgid.link/20251202210501.40998-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Link: https://patch.msgid.link/20251201132037.22835-1-mjguzik@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
The freeze_all_ptr check in filesystems_freeze_callback() introduced by
commit a3f8f8662771 ("power: always freeze efivarfs") is reverse which
quite confusingly causes all file systems to be frozen when
filesystem_freeze_enabled is false.
On my systems it causes the WARN_ON_ONCE() in __set_task_frozen() to
trigger, most likely due to an attempt to freeze a file system that is
not ready for that.
Add a logical negation to the check in question to reverse it as
appropriate.
Fixes: a3f8f8662771 ("power: always freeze efivarfs")
Cc: 6.18+ <stable@vger.kernel.org> # 6.18+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://patch.msgid.link/12788397.O9o76ZdvQC@rafael.j.wysocki
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Using drm_mode_size_dumb() to compute the size of dumb buffers introduced
an 8-byte alignment constraint on the pitch that wasn’t present before.
Let’s remove this constraint, which isn’t necessarily required and may
cause buffers to be allocated larger than needed.
Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 4977dcecb931 ("drm/gem-shmem: Compute dumb-buffer sizes with drm_mode_size_dumb()")
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251126-lcd_pitch_alignment-v1-2-991610a1e369@microchip.com
|
|
Using drm_mode_size_dumb() to compute the size of dumb buffers introduced
an 8-byte alignment constraint on the pitch that wasn’t present before.
Let’s remove this constraint, which isn’t necessarily required and may
cause buffers to be allocated larger than needed.
Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: dcacfcd35cef ("drm/gem-dma: Compute dumb-buffer sizes with drm_mode_size_dumb()")
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251126-lcd_pitch_alignment-v1-1-991610a1e369@microchip.com
|
|
kbd_led_set() can sleep, and so may not be used as the brightness_set()
callback.
Otherwise using this led with a trigger leads to system hangs
accompanied by:
BUG: scheduling while atomic: acpi_fakekeyd/2588/0x00000003
CPU: 4 UID: 0 PID: 2588 Comm: acpi_fakekeyd Not tainted 6.17.9+deb14-amd64 #1 PREEMPT(lazy) Debian 6.17.9-1
Hardware name: ASUSTeK COMPUTER INC. ASUS EXPERTBOOK B9403CVAR/B9403CVAR, BIOS B9403CVAR.311 12/24/2024
Call Trace:
<TASK>
[...]
schedule_timeout+0xbd/0x100
__down_common+0x175/0x290
down_timeout+0x67/0x70
acpi_os_wait_semaphore+0x57/0x90
[...]
asus_wmi_evaluate_method3+0x87/0x190 [asus_wmi]
led_trigger_event+0x3f/0x60
[...]
Fixes: 9fe44fc98ce4 ("platform/x86: asus-wmi: Simplify the keyboard brightness updating process")
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Denis Benato <benato.denis96@gmail.com>
Link: https://patch.msgid.link/20251129101307.18085-3-anton@khirnov.net
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
|
|
Building htmldocs generates a warning:
WARNING: include/uapi/linux/media/amlogic/c3-isp-config.h:199
error: Cannot parse struct or union!
Which correctly highlights that the c3_isp_params_block_header symbol
is wrongly documented as a struct while it's a plain #define instead.
Fix this by removing the 'struct' identifier from the documentation of
the c3_isp_params_block_header symbol.
[ribalda: Add Closes:]
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/all/20251127131425.4b5b6644@canb.auug.org.au/
Fixes: 45662082855c ("media: uapi: Convert Amlogic C3 to V4L2 extensible params")
Cc: stable@vger.kernel.org
Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
|
|
scanned in again after probe failed"
This reverts commit ab2068a6fb84751836a84c26ca72b3beb349619d.
When probing the exp-attached sata device, libsas/libata will issue a
hard reset in sas_probe_sata() -> ata_sas_async_probe(), then a
broadcast event will be received after the disk probe fails, and this
commit causes the probe will be re-executed on the disk, and a faulty
disk may get into an indefinite loop of probe.
Therefore, revert this commit, although it can fix some temporary issues
with disk probe failure.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Link: https://patch.msgid.link/20251202065627.140361-1-yangxingui@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
When CONFIG_SCSI_UFSHCD=y and CONFIG_RPMB=m, the kernel fails to link
with undefined references to ufs_rpmb_probe() and ufs_rpmb_remove():
ld: drivers/ufs/core/ufshcd.c:8950: undefined reference to `ufs_rpmb_probe'
ld: drivers/ufs/core/ufshcd.c:10505: undefined reference to `ufs_rpmb_remove'
The issue is that RPMB depends on its consumers (MMC, UFS) in Kconfig,
which is backwards. This prevents proper module dependency handling when
the library is modular but consumers are built-in.
Fix by reversing the dependency:
- Remove 'depends on MMC || SCSI_UFSHCD' from RPMB Kconfig
- Add 'depends on RPMB || !RPMB' to SCSI_UFSHCD Kconfig
This allows RPMB to be an independent library while ensuring correct
linking in all module/built-in combinations.
Fixes: b06b8c421485 ("scsi: ufs: core: Add OP-TEE based RPMB driver for UFS devices")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511300443.h7sotuL0-lkp@intel.com/
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Jens Wiklander <jens.wiklander@linaro.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://patch.msgid.link/20251202155138.2607210-1-beanhuo@iokpp.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Create a fake root directory for /proc/{version,modules,kallsyms} in
/tmp for testing. The kallsyms has a bad symbol in the module and it
causes the main map splitted. The test ensures it only has two maps -
kernel and the module and it finds the initial map after the module
without creating the split maps like [kernel].0 and so on.
$ perf test -vv "split kallsyms"
69: split kallsyms:
--- start ---
test child forked, pid 1016196
try to create fake root directory
create kernel maps from the fake root directory
maps__set_modules_path_dir: cannot open /tmp/perf-test.Zrv6Sy/lib/modules/X.Y.Z dir
Problems setting modules path maps, continuing anyway...
Failed to open /tmp/perf-test.Zrv6Sy/proc/kcore. Note /proc/kcore requires CAP_SYS_RAWIO capability to access.
Using /tmp/perf-test.Zrv6Sy/proc/kallsyms for symbols
kernel map loaded - check symbol and map
---- end(0) ----
69: split kallsyms : Ok
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This is for test functions to find the kallsyms correctly. It can find
the machine from the kernel maps and use its root_dir. This is helpful
to setup fake /proc directory for testing.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
In maps__split_kallsyms(), it assumes new kernel map when it finds a
symbol without module after any module and the initial kernel map has
some symbols. Because it expects modules are out of the kernel map so
modules should not have symbols in the kernel map.
For example, the following memory map shows symbols and maps. Any
symbols in the module 1 area will go to the module 1. The main kernel
map starts at 0xffffffffbc200000. But if any symbol has a module
between the symbols in that area, next symbols after 0xffffffffbd008000
will generate new kernel maps like [kernel].1.
kernel address | |
| |
0xffffffffc0000000 |---------------------|
| (symbols) |
| ... | <--- [kernel].N
0xffffffffbc400000 |---------------------|
| (symbols) |
| module 2 | <--- bad?
0xffffffffbc380000 |---------------------|
| ... |
| (symbols) |
| [kernel.kallsyms] | <--- initial map
0xffffffffbc200000 |---------------------|
| |
| |
0xffffffffabcde000 |---------------------|
| (symbols) |
| module 1 |
0xffffffffabcd0000 |---------------------|
This is very fragile when the module has a symbol that falls into the
main kernel map for some reason. My system has a livepatch module with
such symbols. And it created a lot of new kernel maps after those
symbols. But the symbol may have broken addresses and the later symbols
can still be found in the initial kernel map.
Let's check the symbol address in the initial map and use it if found.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It's counted twice as it's increased after calling maps__insert(). I
guess we want to increase it only after it's added properly.
Reviewed-by: Ian Rogers <irogers@google.com>
Fixes: 2e538c4a1847291cf ("perf tools: Improve kernel/modules symbol lookup")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
The maps__split_kallsyms() will split symbols to module DSOs if it comes
from a module. It also handled some unusual kernel symbols after modules
by creating new kernel maps like "[kernel].0".
But they are pseudo DSOs to have those unexpected symbols. They should
not be considered as unloaded kernel DSOs. Otherwise the dso__load()
for them will end up calling dso__load_kallsyms() and then
maps__split_kallsyms() again and again.
Reviewed-by: Ian Rogers <irogers@google.com>
Fixes: 2e538c4a1847291cf ("perf tools: Improve kernel/modules symbol lookup")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
It's possible that some kernel samples don't have matching deferred
callchain records when the profiling session was ended before the
threads came back to userspace. Let's flush the samples before
finish the session.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Save samples with deferred callchains in a separate list and deliver
them after merging the user callchains. If users don't want to merge
they can set tool->merge_deferred_callchains to false to prevent the
behavior.
With previous result, now perf script will show the merged callchains.
$ perf script
...
pwd 2312 121.163435: 249113 cpu/cycles/P:
ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
...
The old output can be get using --no-merge-callchain option.
Also perf report can get the user callchain entry at the end.
$ perf report --no-children --stdio -q -S __build_id_parse.isra.0
# symbol: __build_id_parse.isra.0
8.40% pwd [kernel.kallsyms]
|
---__build_id_parse.isra.0
perf_event_mmap
mprotect_fixup
do_mprotect_pkey
__x64_sys_mprotect
do_syscall_64
entry_SYSCALL_64_after_hwframe
mprotect
_dl_sysdep_start
_dl_start_user
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Handle the deferred callchains in the script output.
$ perf script
...
pwd 2312 121.163435: 249113 cpu/cycles/P:
ffffffff845b78d8 __build_id_parse.isra.0+0x218 ([kernel.kallsyms])
ffffffff83bb5bf6 perf_event_mmap+0x2e6 ([kernel.kallsyms])
ffffffff83c31959 mprotect_fixup+0x1e9 ([kernel.kallsyms])
ffffffff83c31dc5 do_mprotect_pkey+0x2b5 ([kernel.kallsyms])
ffffffff83c3206f __x64_sys_mprotect+0x1f ([kernel.kallsyms])
ffffffff845e6692 do_syscall_64+0x62 ([kernel.kallsyms])
ffffffff8360012f entry_SYSCALL_64_after_hwframe+0x76 ([kernel.kallsyms])
b00000006 (cookie) ([unknown])
pwd 2312 121.163447: DEFERRED CALLCHAIN [cookie: b00000006]
7f18fe337fa7 mprotect+0x7 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe330e0f _dl_sysdep_start+0x7f (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
7f18fe331448 _dl_start_user+0x0 (/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
Add a new callchain record mode option for deferred callchains. For now
it only works with FP (frame-pointer) mode.
And add the missing feature detection logic to clear the flag on old
kernels.
$ perf record --call-graph fp,defer -vv true
...
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|CALLCHAIN|PERIOD
read_format ID|LOST
disabled 1
inherit 1
mmap 1
comm 1
freq 1
enable_on_exec 1
task 1
sample_id_all 1
mmap2 1
comm_exec 1
ksymbol 1
bpf_event 1
defer_callchain 1
defer_output 1
------------------------------------------------------------
sys_perf_event_open: pid 162755 cpu 0 group_fd -1 flags 0x8
sys_perf_event_open failed, error -22
switching off deferred callchain support
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
|
|
This patch adds explanation of script enforcement mechanism in admin
guide documentation. Describes how IPE supports integrity enforcement
for indirectly executed scripts through the AT_EXECVE_CHECK flag, and
how this differs from kernel enforcement for compiled executables.
Signed-off-by: Yanzhu Huang <yanzhuhuang@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
|
|
This patch adds a new ipe_bprm_creds_for_exec() hook that integrates
with the AT_EXECVE_CHECK mechanism. To enable script enforcement,
interpreters need to incorporate the AT_EXECVE_CHECK flag when
calling execveat() on script files before execution.
When a userspace interpreter calls execveat() with the AT_EXECVE_CHECK
flag, this hook triggers IPE policy evaluation on the script file. The
hook only triggers IPE when bprm->is_check is true, ensuring it's
being called from an AT_EXECVE_CHECK context. It then builds an
evaluation context for an IPE_OP_EXEC operation and invokes IPE policy.
The kernel returns the policy decision to the interpreter, which can
then decide whether to proceed with script execution.
This extends IPE enforcement to indirectly executed scripts, permitting
trusted scripts to execute while denying untrusted ones.
Signed-off-by: Yanzhu Huang <yanzhuhuang@linux.microsoft.com>
Signed-off-by: Fan Wu <wufan@kernel.org>
|