| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
usb: renesas_usbhs: fix use-after-free in ISR during device removal
In usbhs_remove(), the driver frees resources (including the pipe array)
while the interrupt handler (usbhs_interrupt) is still registered. If an
interrupt fires after usbhs_pipe_remove() but before the driver is fully
unbound, the ISR may access freed memory, causing a use-after-free.
Fix this by calling devm_free_irq() before freeing resources. This ensures
the interrupt handler is both disabled and synchronized (waits for any
running ISR to complete) before usbhs_pipe_remove() is called. |
| In the Linux kernel, the following vulnerability has been resolved:
ALSA: mixer: oss: Add card disconnect checkpoints
ALSA OSS mixer layer calls the kcontrol ops rather individually, and
pending calls might be not always caught at disconnecting the device.
For avoiding the potential UAF scenarios, add sanity checks of the
card disconnection at each entry point of OSS mixer accesses. The
rwsem is taken just before that check, hence the rest context should
be covered by that properly. |
| In the Linux kernel, the following vulnerability has been resolved:
9p/xen: protect xen_9pfs_front_free against concurrent calls
The xenwatch thread can race with other back-end change notifications
and call xen_9pfs_front_free() twice, hitting the observed general
protection fault due to a double-free. Guard the teardown path so only
one caller can release the front-end state at a time, preventing the
crash.
This is a fix for the following double-free:
[ 27.052347] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ 27.052357] CPU: 0 UID: 0 PID: 32 Comm: xenwatch Not tainted 6.18.0-02087-g51ab33fc0a8b-dirty #60 PREEMPT(none)
[ 27.052363] RIP: e030:xen_9pfs_front_free+0x1d/0x150
[ 27.052368] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 41 55 41 54 55 48 89 fd 48 c7 c7 48 d0 92 85 53 e8 cb cb 05 00 48 8b 45 08 48 8b 55 00 <48> 3b 28 0f 85 f9 28 35 fe 48 3b 6a 08 0f 85 ef 28 35 fe 48 89 42
[ 27.052377] RSP: e02b:ffffc9004016fdd0 EFLAGS: 00010246
[ 27.052381] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88800d66e400 RCX: 0000000000000000
[ 27.052385] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000000 RDI: 0000000000000000
[ 27.052389] RBP: ffff88800a887040 R08: 0000000000000000 R09: 0000000000000000
[ 27.052393] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888009e46b68
[ 27.052397] R13: 0000000000000200 R14: 0000000000000000 R15: ffff88800a887040
[ 27.052404] FS: 0000000000000000(0000) GS:ffff88808ca57000(0000) knlGS:0000000000000000
[ 27.052408] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 27.052412] CR2: 00007f9714004360 CR3: 0000000004834000 CR4: 0000000000050660
[ 27.052418] Call Trace:
[ 27.052420] <TASK>
[ 27.052422] xen_9pfs_front_changed+0x5d5/0x720
[ 27.052426] ? xenbus_otherend_changed+0x72/0x140
[ 27.052430] ? __pfx_xenwatch_thread+0x10/0x10
[ 27.052434] xenwatch_thread+0x94/0x1c0
[ 27.052438] ? __pfx_autoremove_wake_function+0x10/0x10
[ 27.052442] kthread+0xf8/0x240
[ 27.052445] ? __pfx_kthread+0x10/0x10
[ 27.052449] ? __pfx_kthread+0x10/0x10
[ 27.052452] ret_from_fork+0x16b/0x1a0
[ 27.052456] ? __pfx_kthread+0x10/0x10
[ 27.052459] ret_from_fork_asm+0x1a/0x30
[ 27.052463] </TASK>
[ 27.052465] Modules linked in:
[ 27.052471] ---[ end trace 0000000000000000 ]--- |
| In the Linux kernel, the following vulnerability has been resolved:
net: wan: farsync: Fix use-after-free bugs caused by unfinished tasklets
When the FarSync T-series card is being detached, the fst_card_info is
deallocated in fst_remove_one(). However, the fst_tx_task or fst_int_task
may still be running or pending, leading to use-after-free bugs when the
already freed fst_card_info is accessed in fst_process_tx_work_q() or
fst_process_int_work_q().
A typical race condition is depicted below:
CPU 0 (cleanup) | CPU 1 (tasklet)
| fst_start_xmit()
fst_remove_one() | tasklet_schedule()
unregister_hdlc_device()|
| fst_process_tx_work_q() //handler
kfree(card) //free | do_bottom_half_tx()
| card-> //use
The following KASAN trace was captured:
==================================================================
BUG: KASAN: slab-use-after-free in do_bottom_half_tx+0xb88/0xd00
Read of size 4 at addr ffff88800aad101c by task ksoftirqd/3/32
...
Call Trace:
<IRQ>
dump_stack_lvl+0x55/0x70
print_report+0xcb/0x5d0
? do_bottom_half_tx+0xb88/0xd00
kasan_report+0xb8/0xf0
? do_bottom_half_tx+0xb88/0xd00
do_bottom_half_tx+0xb88/0xd00
? _raw_spin_lock_irqsave+0x85/0xe0
? __pfx__raw_spin_lock_irqsave+0x10/0x10
? __pfx___hrtimer_run_queues+0x10/0x10
fst_process_tx_work_q+0x67/0x90
tasklet_action_common+0x1fa/0x720
? hrtimer_interrupt+0x31f/0x780
handle_softirqs+0x176/0x530
__irq_exit_rcu+0xab/0xe0
sysvec_apic_timer_interrupt+0x70/0x80
...
Allocated by task 41 on cpu 3 at 72.330843s:
kasan_save_stack+0x24/0x50
kasan_save_track+0x17/0x60
__kasan_kmalloc+0x7f/0x90
fst_add_one+0x1a5/0x1cd0
local_pci_probe+0xdd/0x190
pci_device_probe+0x341/0x480
really_probe+0x1c6/0x6a0
__driver_probe_device+0x248/0x310
driver_probe_device+0x48/0x210
__device_attach_driver+0x160/0x320
bus_for_each_drv+0x101/0x190
__device_attach+0x198/0x3a0
device_initial_probe+0x78/0xa0
pci_bus_add_device+0x81/0xc0
pci_bus_add_devices+0x7e/0x190
enable_slot+0x9b9/0x1130
acpiphp_check_bridge.part.0+0x2e1/0x460
acpiphp_hotplug_notify+0x36c/0x3c0
acpi_device_hotplug+0x203/0xb10
acpi_hotplug_work_fn+0x59/0x80
...
Freed by task 41 on cpu 1 at 75.138639s:
kasan_save_stack+0x24/0x50
kasan_save_track+0x17/0x60
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x43/0x70
kfree+0x135/0x410
fst_remove_one+0x2ca/0x540
pci_device_remove+0xa6/0x1d0
device_release_driver_internal+0x364/0x530
pci_stop_bus_device+0x105/0x150
pci_stop_and_remove_bus_device+0xd/0x20
disable_slot+0x116/0x260
acpiphp_disable_and_eject_slot+0x4b/0x190
acpiphp_hotplug_notify+0x230/0x3c0
acpi_device_hotplug+0x203/0xb10
acpi_hotplug_work_fn+0x59/0x80
...
The buggy address belongs to the object at ffff88800aad1000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 28 bytes inside of
freed 1024-byte region
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xaad0
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
flags: 0x100000000000040(head|node=0|zone=1)
page_type: f5(slab)
raw: 0100000000000040 ffff888007042dc0 dead000000000122 0000000000000000
raw: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000
head: 0100000000000040 ffff888007042dc0 dead000000000122 0000000000000000
head: 0000000000000000 0000000080100010 00000000f5000000 0000000000000000
head: 0100000000000003 ffffea00002ab401 00000000ffffffff 00000000ffffffff
head: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88800aad0f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88800aad0f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88800aad1000: fa fb
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nfnetlink_queue: make hash table per queue
Sharing a global hash table among all queues is tempting, but
it can cause crash:
BUG: KASAN: slab-use-after-free in nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
[..]
nfqnl_recv_verdict+0x11ac/0x15e0 [nfnetlink_queue]
nfnetlink_rcv_msg+0x46a/0x930
kmem_cache_alloc_node_noprof+0x11e/0x450
struct nf_queue_entry is freed via kfree, but parallel cpu can still
encounter such an nf_queue_entry when walking the list.
Alternative fix is to free the nf_queue_entry via kfree_rcu() instead,
but as we have to alloc/free for each skb this will cause more mem
pressure. |
| In the Linux kernel, the following vulnerability has been resolved:
rpmsg: core: fix race in driver_override_show() and use core helper
The driver_override_show function reads the driver_override string
without holding the device_lock. However, the store function modifies
and frees the string while holding the device_lock. This creates a race
condition where the string can be freed by the store function while
being read by the show function, leading to a use-after-free.
To fix this, replace the rpmsg_string_attr macro with explicit show and
store functions. The new driver_override_store uses the standard
driver_set_override helper. Since the introduction of
driver_set_override, the comments in include/linux/rpmsg.h have stated
that this helper must be used to set or clear driver_override, but the
implementation was not updated until now.
Because driver_set_override modifies and frees the string while holding
the device_lock, the new driver_override_show now correctly holds the
device_lock during the read operation to prevent the race.
Additionally, since rpmsg_string_attr has only ever been used for
driver_override, removing the macro simplifies the code. |
| In the Linux kernel, the following vulnerability has been resolved:
ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY
filemap_fault() may drop the mmap_lock before returning VM_FAULT_RETRY,
as documented in mm/filemap.c:
"If our return value has VM_FAULT_RETRY set, it's because the mmap_lock
may be dropped before doing I/O or by lock_folio_maybe_drop_mmap()."
When this happens, a concurrent munmap() can call remove_vma() and free
the vm_area_struct via RCU. The saved 'vma' pointer in ocfs2_fault() then
becomes a dangling pointer, and the subsequent trace_ocfs2_fault() call
dereferences it -- a use-after-free.
Fix this by saving ip_blkno as a plain integer before calling
filemap_fault(), and removing vma from the trace event. Since
ip_blkno is copied by value before the lock can be dropped, it
remains valid regardless of what happens to the vma or inode
afterward. |
| In the Linux kernel, the following vulnerability has been resolved:
erofs: set fileio bio failed in short read case
For file-backed mount, IO requests are handled by vfs_iocb_iter_read().
However, it can be interrupted by SIGKILL, returning the number of
bytes actually copied. Unused folios in bio are unexpectedly marked
as uptodate.
vfs_read
filemap_read
filemap_get_pages
filemap_readahead
erofs_fileio_readahead
erofs_fileio_rq_submit
vfs_iocb_iter_read
filemap_read
filemap_get_pages <= detect signal
erofs_fileio_ki_complete <= set all folios uptodate
This patch addresses this by setting short read bio with an error
directly. |
| In the Linux kernel, the following vulnerability has been resolved:
xfrm: prevent policy_hthresh.work from racing with netns teardown
A XFRM_MSG_NEWSPDINFO request can queue the per-net work item
policy_hthresh.work onto the system workqueue.
The queued callback, xfrm_hash_rebuild(), retrieves the enclosing
struct net via container_of(). If the net namespace is torn down
before that work runs, the associated struct net may already have
been freed, and xfrm_hash_rebuild() may then dereference stale memory.
xfrm_policy_fini() already flushes policy_hash_work during teardown,
but it does not synchronize policy_hthresh.work.
Synchronize policy_hthresh.work in xfrm_policy_fini() as well, so the
queued work cannot outlive the net namespace teardown and access a
freed struct net. |
| In the Linux kernel, the following vulnerability has been resolved:
net: fix fanout UAF in packet_release() via NETDEV_UP race
`packet_release()` has a race window where `NETDEV_UP` can re-register a
socket into a fanout group's `arr[]` array. The re-registration is not
cleaned up by `fanout_release()`, leaving a dangling pointer in the fanout
array.
`packet_release()` does NOT zero `po->num` in its `bind_lock` section.
After releasing `bind_lock`, `po->num` is still non-zero and `po->ifindex`
still matches the bound device. A concurrent `packet_notifier(NETDEV_UP)`
that already found the socket in `sklist` can re-register the hook.
For fanout sockets, this re-registration calls `__fanout_link(sk, po)`
which adds the socket back into `f->arr[]` and increments `f->num_members`,
but does NOT increment `f->sk_ref`.
The fix sets `po->num` to zero in `packet_release` while `bind_lock` is
held to prevent NETDEV_UP from linking, preventing the race window.
This bug was found following an additional audit with Claude Code based
on CVE-2025-38617. |
| In the Linux kernel, the following vulnerability has been resolved:
media: mc, v4l2: serialize REINIT and REQBUFS with req_queue_mutex
MEDIA_REQUEST_IOC_REINIT can run concurrently with VIDIOC_REQBUFS(0)
queue teardown paths. This can race request object cleanup against vb2
queue cancellation and lead to use-after-free reports.
We already serialize request queueing against STREAMON/OFF with
req_queue_mutex. Extend that serialization to REQBUFS, and also take
the same mutex in media_request_ioctl_reinit() so REINIT is in the
same exclusion domain.
This keeps request cleanup and queue cancellation from running in
parallel for request-capable devices. |
| In the Linux kernel, the following vulnerability has been resolved:
can: isotp: fix tx.buf use-after-free in isotp_sendmsg()
isotp_sendmsg() uses only cmpxchg() on so->tx.state to serialize access
to so->tx.buf. isotp_release() waits for ISOTP_IDLE via
wait_event_interruptible() and then calls kfree(so->tx.buf).
If a signal interrupts the wait_event_interruptible() inside close()
while tx.state is ISOTP_SENDING, the loop exits early and release
proceeds to force ISOTP_SHUTDOWN and continues to kfree(so->tx.buf)
while sendmsg may still be reading so->tx.buf for the final CAN frame
in isotp_fill_dataframe().
The so->tx.buf can be allocated once when the standard tx.buf length needs
to be extended. Move the kfree() of this potentially extended tx.buf to
sk_destruct time when either isotp_sendmsg() and isotp_release() are done. |
| In the Linux kernel, the following vulnerability has been resolved:
media: as102: fix to not free memory after the device is registered in as102_usb_probe()
In as102_usb driver, the following race condition occurs:
```
CPU0 CPU1
as102_usb_probe()
kzalloc(); // alloc as102_dev_t
....
usb_register_dev();
fd = sys_open("/path/to/dev"); // open as102 fd
....
usb_deregister_dev();
....
kfree(); // free as102_dev_t
....
sys_close(fd);
as102_release() // UAF!!
as102_usb_release()
kfree(); // DFB!!
```
When a USB character device registered with usb_register_dev() is later
unregistered (via usb_deregister_dev() or disconnect), the device node is
removed so new open() calls fail. However, file descriptors that are
already open do not go away immediately: they remain valid until the last
reference is dropped and the driver's .release() is invoked.
In as102, as102_usb_probe() calls usb_register_dev() and then, on an
error path, does usb_deregister_dev() and frees as102_dev_t right away.
If userspace raced a successful open() before the deregistration, that
open FD will later hit as102_release() --> as102_usb_release() and access
or free as102_dev_t again, occur a race to use-after-free and
double-free vuln.
The fix is to never kfree(as102_dev_t) directly once usb_register_dev()
has succeeded. After deregistration, defer freeing memory to .release().
In other words, let release() perform the last kfree when the final open
FD is closed. |
| In the Linux kernel, the following vulnerability has been resolved:
media: hackrf: fix to not free memory after the device is registered in hackrf_probe()
In hackrf driver, the following race condition occurs:
```
CPU0 CPU1
hackrf_probe()
kzalloc(); // alloc hackrf_dev
....
v4l2_device_register();
....
fd = sys_open("/path/to/dev"); // open hackrf fd
....
v4l2_device_unregister();
....
kfree(); // free hackrf_dev
....
sys_ioctl(fd, ...);
v4l2_ioctl();
video_is_registered() // UAF!!
....
sys_close(fd);
v4l2_release() // UAF!!
hackrf_video_release()
kfree(); // DFB!!
```
When a V4L2 or video device is unregistered, the device node is removed so
new open() calls are blocked.
However, file descriptors that are already open-and any in-flight I/O-do
not terminate immediately; they remain valid until the last reference is
dropped and the driver's release() is invoked.
Therefore, freeing device memory on the error path after hackrf_probe()
has registered dev it will lead to a race to use-after-free vuln, since
those already-open handles haven't been released yet.
And since release() free memory too, race to use-after-free and
double-free vuln occur.
To prevent this, if device is registered from probe(), it should be
modified to free memory only through release() rather than calling
kfree() directly. |
| In the Linux kernel, the following vulnerability has been resolved:
NFSD: Defer sub-object cleanup in export put callbacks
svc_export_put() calls path_put() and auth_domain_put() immediately
when the last reference drops, before the RCU grace period. RCU
readers in e_show() and c_show() access both ex_path (via
seq_path/d_path) and ex_client->name (via seq_escape) without
holding a reference. If cache_clean removes the entry and drops the
last reference concurrently, the sub-objects are freed while still
in use, producing a NULL pointer dereference in d_path.
Commit 2530766492ec ("nfsd: fix UAF when access ex_uuid or
ex_stats") moved kfree of ex_uuid and ex_stats into the
call_rcu callback, but left path_put() and auth_domain_put() running
before the grace period because both may sleep and call_rcu
callbacks execute in softirq context.
Replace call_rcu/kfree_rcu with queue_rcu_work(), which defers the
callback until after the RCU grace period and executes it in process
context where sleeping is permitted. This allows path_put() and
auth_domain_put() to be moved into the deferred callback alongside
the other resource releases. Apply the same fix to expkey_put(),
which has the identical pattern with ek_path and ek_client.
A dedicated workqueue scopes the shutdown drain to only NFSD
export release work items; flushing the shared
system_unbound_wq would stall on unrelated work from other
subsystems. nfsd_export_shutdown() uses rcu_barrier() followed
by flush_workqueue() to ensure all deferred release callbacks
complete before the export caches are destroyed.
Reviwed-by: Jeff Layton <jlayton@kernel.org> |
| In the Linux kernel, the following vulnerability has been resolved:
iommu/sva: Fix crash in iommu_sva_unbind_device()
domain->mm->iommu_mm can be freed by iommu_domain_free():
iommu_domain_free()
mmdrop()
__mmdrop()
mm_pasid_drop()
After iommu_domain_free() returns, accessing domain->mm->iommu_mm may
dereference a freed mm structure, leading to a crash.
Fix this by moving the code that accesses domain->mm->iommu_mm to before
the call to iommu_domain_free(). |
| In the Linux kernel, the following vulnerability has been resolved:
nfc: rawsock: cancel tx_work before socket teardown
In rawsock_release(), cancel any pending tx_work and purge the write
queue before orphaning the socket. rawsock_tx_work runs on the system
workqueue and calls nfc_data_exchange which dereferences the NCI
device. Without synchronization, tx_work can race with socket and
device teardown when a process is killed (e.g. by SIGKILL), leading
to use-after-free or leaked references.
Set SEND_SHUTDOWN first so that if tx_work is already running it will
see the flag and skip transmitting, then use cancel_work_sync to wait
for any in-progress execution to finish, and finally purge any
remaining queued skbs. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: bpf: defer hook memory release until rcu readers are done
Yiming Qian reports UaF when concurrent process is dumping hooks via
nfnetlink_hooks:
BUG: KASAN: slab-use-after-free in nfnl_hook_dump_one.isra.0+0xe71/0x10f0
Read of size 8 at addr ffff888003edbf88 by task poc/79
Call Trace:
<TASK>
nfnl_hook_dump_one.isra.0+0xe71/0x10f0
netlink_dump+0x554/0x12b0
nfnl_hook_get+0x176/0x230
[..]
Defer release until after concurrent readers have completed. |
| In the Linux kernel, the following vulnerability has been resolved:
net: sched: avoid qdisc_reset_all_tx_gt() vs dequeue race for lockless qdiscs
When shrinking the number of real tx queues,
netif_set_real_num_tx_queues() calls qdisc_reset_all_tx_gt() to flush
qdiscs for queues which will no longer be used.
qdisc_reset_all_tx_gt() currently serializes qdisc_reset() with
qdisc_lock(). However, for lockless qdiscs, the dequeue path is
serialized by qdisc_run_begin/end() using qdisc->seqlock instead, so
qdisc_reset() can run concurrently with __qdisc_run() and free skbs
while they are still being dequeued, leading to UAF.
This can easily be reproduced on e.g. virtio-net by imposing heavy
traffic while frequently changing the number of queue pairs:
iperf3 -ub0 -c $peer -t 0 &
while :; do
ethtool -L eth0 combined 1
ethtool -L eth0 combined 2
done
With KASAN enabled, this leads to reports like:
BUG: KASAN: slab-use-after-free in __qdisc_run+0x133f/0x1760
...
Call Trace:
<TASK>
...
__qdisc_run+0x133f/0x1760
__dev_queue_xmit+0x248f/0x3550
ip_finish_output2+0xa42/0x2110
ip_output+0x1a7/0x410
ip_send_skb+0x2e6/0x480
udp_send_skb+0xb0a/0x1590
udp_sendmsg+0x13c9/0x1fc0
...
</TASK>
Allocated by task 1270 on cpu 5 at 44.558414s:
...
alloc_skb_with_frags+0x84/0x7c0
sock_alloc_send_pskb+0x69a/0x830
__ip_append_data+0x1b86/0x48c0
ip_make_skb+0x1e8/0x2b0
udp_sendmsg+0x13a6/0x1fc0
...
Freed by task 1306 on cpu 3 at 44.558445s:
...
kmem_cache_free+0x117/0x5e0
pfifo_fast_reset+0x14d/0x580
qdisc_reset+0x9e/0x5f0
netif_set_real_num_tx_queues+0x303/0x840
virtnet_set_channels+0x1bf/0x260 [virtio_net]
ethnl_set_channels+0x684/0xae0
ethnl_default_set_doit+0x31a/0x890
...
Serialize qdisc_reset_all_tx_gt() against the lockless dequeue path by
taking qdisc->seqlock for TCQ_F_NOLOCK qdiscs, matching the
serialization model already used by dev_reset_queue().
Additionally clear QDISC_STATE_NON_EMPTY after reset so the qdisc state
reflects an empty queue, avoiding needless re-scheduling. |
| In the Linux kernel, the following vulnerability has been resolved:
perf/x86: Move event pointer setup earlier in x86_pmu_enable()
A production AMD EPYC system crashed with a NULL pointer dereference
in the PMU NMI handler:
BUG: kernel NULL pointer dereference, address: 0000000000000198
RIP: x86_perf_event_update+0xc/0xa0
Call Trace:
<NMI>
amd_pmu_v2_handle_irq+0x1a6/0x390
perf_event_nmi_handler+0x24/0x40
The faulting instruction is `cmpq $0x0, 0x198(%rdi)` with RDI=0,
corresponding to the `if (unlikely(!hwc->event_base))` check in
x86_perf_event_update() where hwc = &event->hw and event is NULL.
drgn inspection of the vmcore on CPU 106 showed a mismatch between
cpuc->active_mask and cpuc->events[]:
active_mask: 0x1e (bits 1, 2, 3, 4)
events[1]: 0xff1100136cbd4f38 (valid)
events[2]: 0x0 (NULL, but active_mask bit 2 set)
events[3]: 0xff1100076fd2cf38 (valid)
events[4]: 0xff1100079e990a90 (valid)
The event that should occupy events[2] was found in event_list[2]
with hw.idx=2 and hw.state=0x0, confirming x86_pmu_start() had run
(which clears hw.state and sets active_mask) but events[2] was
never populated.
Another event (event_list[0]) had hw.state=0x7 (STOPPED|UPTODATE|ARCH),
showing it was stopped when the PMU rescheduled events, confirming the
throttle-then-reschedule sequence occurred.
The root cause is commit 7e772a93eb61 ("perf/x86: Fix NULL event access
and potential PEBS record loss") which moved the cpuc->events[idx]
assignment out of x86_pmu_start() and into step 2 of x86_pmu_enable(),
after the PERF_HES_ARCH check. This broke any path that calls
pmu->start() without going through x86_pmu_enable() -- specifically
the unthrottle path:
perf_adjust_freq_unthr_events()
-> perf_event_unthrottle_group()
-> perf_event_unthrottle()
-> event->pmu->start(event, 0)
-> x86_pmu_start() // sets active_mask but not events[]
The race sequence is:
1. A group of perf events overflows, triggering group throttle via
perf_event_throttle_group(). All events are stopped: active_mask
bits cleared, events[] preserved (x86_pmu_stop no longer clears
events[] after commit 7e772a93eb61).
2. While still throttled (PERF_HES_STOPPED), x86_pmu_enable() runs
due to other scheduling activity. Stopped events that need to
move counters get PERF_HES_ARCH set and events[old_idx] cleared.
In step 2 of x86_pmu_enable(), PERF_HES_ARCH causes these events
to be skipped -- events[new_idx] is never set.
3. The timer tick unthrottles the group via pmu->start(). Since
commit 7e772a93eb61 removed the events[] assignment from
x86_pmu_start(), active_mask[new_idx] is set but events[new_idx]
remains NULL.
4. A PMC overflow NMI fires. The handler iterates active counters,
finds active_mask[2] set, reads events[2] which is NULL, and
crashes dereferencing it.
Move the cpuc->events[hwc->idx] assignment in x86_pmu_enable() to
before the PERF_HES_ARCH check, so that events[] is populated even
for events that are not immediately started. This ensures the
unthrottle path via pmu->start() always finds a valid event pointer. |