| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
iommu: Fix WARN_ON in __iommu_group_set_domain_nofail() due to reset
In __iommu_group_set_domain_internal(), concurrent domain attachments are
rejected when any device in the group is recovering. This is necessary to
fence concurrent attachments to a multi-device group where devices might
share the same RID due to PCI DMA alias quirks, but triggers the WARN_ON in
__iommu_group_set_domain_nofail().
Other IOMMU_SET_DOMAIN_MUST_SUCCEED callers in detach/teardown paths, such
as __iommu_group_set_core_domain and __iommu_release_dma_ownership, should
not be rejected, as the domain would be freed anyway in these nofail paths
while group->domain is still pointing to it. So pci_dev_reset_iommu_done()
could trigger a UAF when re-attaching group->domain.
Honor the IOMMU_SET_DOMAIN_MUST_SUCCEED flag, allowing the callers through
the group->recovery_cnt fence, so as to update the group->domain pointer.
Instead add a gdev->blocked check in the device iteration loop, to prevent
any concurrent per-device detachment. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/xe/dma-buf: handle empty bo and UAF races
There look to be some nasty races here when triggering the
invalidate_mappings hook:
1) We do xe_bo_alloc() followed by the attach, before the actual full bo
init step in xe_dma_buf_init_obj(). However the bo is visible on the
attachments list after the attach. This is bad since exporter driver,
say amdgpu, can at any time call back into our invalidate_mappings hook,
with an empty/bogus bo, leading to potential bugs/crashes.
2) Similar to 1) but here we get a UAF, when the invalidate_mappings
hook is triggered. For example, we get as far as xe_bo_init_locked()
but this fails in some way. But here the bo will be freed on error, but
we still have it attached from dma-buf pov, so if the
invalidate_mappings is now triggered then the bo we access is gone and
we trigger UAF and more bugs/crashes.
To fix this, move the attach step until after we actually have a fully
set up buffer object. Note that the bo is not published to userspace
until later, so not sure what the comment "Don't publish the bo
until we have a valid attachment", is referring to.
We have at least two different customers reporting hitting a NULL ptr
deref in evict_flags when importing something from amdgpu, followed by
triggering the evict flow. Hit rate is also pretty low, which would
hint at some kind of race, so something like 1) or 2) might explain
this.
v2:
- Shuffle the order of the ops slightly (no functional change)
- Improve the comment to better explain the ordering (Matt B)
(cherry picked from commit af1f2ad0c59fe4e2f924c526f66e968289d77971) |
| In the Linux kernel, the following vulnerability has been resolved:
drm/xe/dma-buf: fix UAF with retry loop
Retry doesn't work here, since bo will be freed on error, leading to
UAF. However, now that we do the alloc & init before the attach, we can
now combine this as one unit and have the init do the alloc for us. This
should make the retry safe.
Reported by Sashiko.
v2: Fix up the error unwind (CI)
(cherry picked from commit 479669418253e0f27f8cf5db01a731352ea592e7) |
| In the Linux kernel, the following vulnerability has been resolved:
net: qrtr: fix refcount saturation and potential UAF in qrtr_port_remove
In qrtr_port_remove(), the socket reference count is decremented via
__sock_put() before the port is removed from the qrtr_ports XArray and
before the RCU grace period elapses.
This breaks the fundamental RCU update paradigm. It exposes a race
window where a concurrent RCU reader (such as qrtr_reset_ports() or
qrtr_port_lookup()) can obtain a pointer to the socket from the XArray,
and attempt to call sock_hold() on a socket whose reference count has
already dropped to zero.
This exact race condition was hit during syzkaller fuzzing, leading to
the following refcount saturation warning and a potential Use-After-Free:
refcount_t: saturated; leaking memory.
WARNING: CPU: 3 PID: 1273 at lib/refcount.c:22 refcount_warn_saturate+0xae/0x1d0
Modules linked in: qrtr(+) bochs drm_shmem_helper ...
Call Trace:
<TASK>
qrtr_reset_ports net/qrtr/af_qrtr.c:768 [inline] [qrtr]
__qrtr_bind.isra.0+0x48b/0x570 net/qrtr/af_qrtr.c:805 [qrtr]
qrtr_bind+0x17d/0x210 net/qrtr/af_qrtr.c:901 [qrtr]
kernel_bind+0xe4/0x120 net/socket.c:3592
qrtr_ns_init+0x1a6/0x380 net/qrtr/ns.c:715 [qrtr]
qrtr_proto_init+0x3b/0xff0 net/qrtr/af_qrtr.c:169 [qrtr]
do_one_initcall+0xf5/0x5e0 init/main.c:1283
...
</TASK>
Fix this by deferring the reference count decrement until after the
xa_erase() and the synchronize_rcu() complete.
(Note: The v1 of this patch incorrectly replaced __sock_put() with
sock_put(). As Simon Horman pointed out, the callers of qrtr_port_remove()
still hold a reference to the socket, so freeing the socket memory here
would lead to a subsequent UAF in the caller. Thus, the __sock_put() is
kept, but only repositioned to close the RCU race.) |
| In the Linux kernel, the following vulnerability has been resolved:
Revert "wireguard: device: enable threaded NAPI"
This reverts commit 933466fc50a8e4eb167acbd0d8ec96a078462e9c which is
commit db9ae3b6b43c79b1ba87eea849fd65efa05b4b2e upstream.
We have had three independent production user reports in combination
with Cilium utilizing WireGuard as encryption underneath that k8s Pod
E/W traffic to certain peer nodes fully stalled. The situation appears
as follows:
- Occurs very rarely but at random times under heavy networking load.
- Once the issue triggers the decryption side stops working completely
for that WireGuard peer, other peers keep working fine. The stall
happens also for newly initiated connections towards that particular
WireGuard peer.
- Only the decryption side is affected, never the encryption side.
- Once it triggers, it never recovers and remains in this state,
the CPU/mem on that node looks normal, no leak, busy loop or crash.
- bpftrace on the affected system shows that wg_prev_queue_enqueue
fails, thus the MAX_QUEUED_PACKETS (1024 skbs!) for the peer's
rx_queue is reached.
- Also, bpftrace shows that wg_packet_rx_poll for that peer is never
called again after reaching this state for that peer. For other
peers wg_packet_rx_poll does get called normally.
- Commit db9ae3b ("wireguard: device: enable threaded NAPI")
switched WireGuard to threaded NAPI by default. The default has
not been changed for triggering the issue, neither did CPU
hotplugging occur (i.e. 5bd8de2 ("wireguard: queueing: always
return valid online CPU in wg_cpumask_choose_online()")).
- The issue has been observed with stable kernels of v5.15 as well as
v6.1. It was reported to us that v5.10 stable is working fine, and
no report on v6.6 stable either (somewhat related discussion in [0]
though).
- In the WireGuard driver the only material difference between v5.10
stable and v5.15 stable is the switch to threaded NAPI by default.
[0] https://lore.kernel.org/netdev/CA+wXwBTT74RErDGAnj98PqS=wvdh8eM1pi4q6tTdExtjnokKqA@mail.gmail.com/
Breakdown of the problem:
1) skbs arriving for decryption are enqueued to the peer->rx_queue in
wg_packet_consume_data via wg_queue_enqueue_per_device_and_peer.
2) The latter only moves the skb into the MPSC peer queue if it does
not surpass MAX_QUEUED_PACKETS (1024) which is kept track in an
atomic counter via wg_prev_queue_enqueue.
3) In case enqueueing was successful, the skb is also queued up
in the device queue, round-robin picks a next online CPU, and
schedules the decryption worker.
4) The wg_packet_decrypt_worker, once scheduled, picks these up
from the queue, decrypts the packets and once done calls into
wg_queue_enqueue_per_peer_rx.
5) The latter updates the state to PACKET_STATE_CRYPTED on success
and calls napi_schedule on the per peer->napi instance.
6) NAPI then polls via wg_packet_rx_poll. wg_prev_queue_peek checks
on the peer->rx_queue. It will wg_prev_queue_dequeue if the
queue->peeked skb was not cached yet, or just return the latter
otherwise. (wg_prev_queue_drop_peeked later clears the cache.)
7) From an ordering perspective, the peer->rx_queue has skbs in order
while the device queue with the per-CPU worker threads from a
global ordering PoV can finish the decryption and signal the skb
PACKET_STATE_CRYPTED out of order.
8) A situation can be observed that the first packet coming in will
be stuck waiting for the decryption worker to be scheduled for
a longer time when the system is under pressure.
9) While this is the case, the other CPUs in the meantime finish
decryption and call into napi_schedule.
10) Now in wg_packet_rx_poll it picks up the first in-order skb
from the peer->rx_queue and sees that its state is still
PACKET_STATE_UNCRYPTED. The NAPI poll routine then exits e
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
net: skbuff: fix missing zerocopy reference in pskb_carve helpers
pskb_carve_inside_header() and pskb_carve_inside_nonlinear() both copy
the old skb_shared_info header into a new buffer via memcpy(), which
includes the destructor_arg pointer (uarg) for MSG_ZEROCOPY skbs.
Neither function calls net_zcopy_get() for the new shinfo, creating an
unaccounted holder: every skb_shared_info with destructor_arg set will
call skb_zcopy_clear() once when freed, but the corresponding
net_zcopy_get() was never called for the new copy. Repeated calls
drive uarg->refcnt to zero prematurely, freeing ubuf_info_msgzc while
TX skbs still hold live destructor_arg pointers.
KASAN reports use-after-free on a freed ubuf_info_msgzc:
BUG: KASAN: slab-use-after-free in skb_release_data+0x77b/0x810
Read of size 8 at addr ffff88801574d3e8 by task poc/220
Call Trace:
skb_release_data+0x77b/0x810
kfree_skb_list_reason+0x13e/0x610
skb_release_data+0x4cd/0x810
sk_skb_reason_drop+0xf3/0x340
skb_queue_purge_reason+0x282/0x440
rds_tcp_inc_free+0x1e/0x30
rds_recvmsg+0x354/0x1780
__sys_recvmsg+0xdf/0x180
Allocated by task 219:
msg_zerocopy_realloc+0x157/0x7b0
tcp_sendmsg_locked+0x2892/0x3ba0
Freed by task 219:
ip_recv_error+0x74a/0xb10
tcp_recvmsg+0x475/0x530
The skb consuming the late access still referenced the same uarg via
shinfo->destructor_arg copied by pskb_carve_inside_nonlinear() without
a refcount bump. This has been verified to be reliably exploitable: a
working proof-of-concept achieves full root privilege escalation from
an unprivileged local user on a default kernel configuration.
The fix follows the pattern of pskb_expand_head() which has the same
memcpy/cloned structure. For pskb_carve_inside_header(), net_zcopy_get()
is placed after skb_orphan_frags() succeeds, so the orphan error path
needs no cleanup. For pskb_carve_inside_nonlinear(), net_zcopy_get() is
placed after all failure points and just before skb_release_data(), so
no error path needs cleanup at all -- matching pskb_expand_head() more
closely and avoiding the need for a balancing net_zcopy_put(). |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_log: validate MAC header was set before dumping it
The fallback path of dump_mac_header() guards the MAC header access
only with "skb->mac_header != skb->network_header", without checking
skb_mac_header_was_set(). When the MAC header is unset, mac_header is
0xffff, so the test passes and skb_mac_header(skb) returns
skb->head + 0xffff, ~64 KiB past the buffer; the loop then reads
dev->hard_header_len bytes out of bounds into the kernel log.
This is reachable via the netdev logger: nf_log_unknown_packet() calls
dump_mac_header() unconditionally, and an skb sent through AF_PACKET
with PACKET_QDISC_BYPASS reaches the egress hook with mac_header still
unset (__dev_queue_xmit(), which would reset it, is bypassed).
Add the skb_mac_header_was_set() check the ARPHRD_ETHER path already
uses, and replace the open-coded MAC header length test with
skb_mac_header_len(). Only skbs with an unset MAC header are affected;
valid ones are dumped as before.
BUG: KASAN: slab-out-of-bounds in dump_mac_header (net/netfilter/nf_log_syslog.c:831)
Read of size 1 at addr ffff88800ea49d3f by task exploit/148
Call Trace:
kasan_report (mm/kasan/report.c:595)
dump_mac_header (net/netfilter/nf_log_syslog.c:831)
nf_log_netdev_packet (net/netfilter/nf_log_syslog.c:938 net/netfilter/nf_log_syslog.c:963)
nf_log_packet (net/netfilter/nf_log.c:260)
nft_log_eval (net/netfilter/nft_log.c:60)
nft_do_chain (net/netfilter/nf_tables_core.c:285)
nft_do_chain_netdev (net/netfilter/nft_chain_filter.c:307)
nf_hook_slow (net/netfilter/core.c:619)
nf_hook_direct_egress (net/packet/af_packet.c:257)
packet_xmit (net/packet/af_packet.c:280)
packet_sendmsg (net/packet/af_packet.c:3114)
__sys_sendto (net/socket.c:2265) |
| In the Linux kernel, the following vulnerability has been resolved:
xfrm: espintcp: do not reuse an in-progress partial send
espintcp keeps a single in-flight transmit in ctx->partial.
Before building a new sk_msg, espintcp_sendmsg() first tries to flush
that state through espintcp_push_msgs().
For blocking callers, espintcp_push_msgs() may return success even when
the previous partial send is still pending. espintcp_sendmsg() would
then reinitialize emsg->skmsg and reuse ctx->partial while the old
transfer still owns that state.
Do not rebuild the send message when ctx->partial is still in progress.
If espintcp_push_msgs() returns with emsg->len still set, fail the new
send instead of overwriting the live partial state.
This is a memory-safety fix: reusing the live partial-send state can
leave a stale offset attached to a new sk_msg and lead to an out-of-
bounds read in the send path.
tcp_sendmsg_locked() already handles waiting for send buffer memory, so
the fix here is just to preserve espintcp's one-message-at-a-time
transmit state. |
| In the Linux kernel, the following vulnerability has been resolved:
batman-adv: tvlv: reject oversized TVLV packets
batadv_tvlv_container_ogm_append() builds a TVLV packet section from
the tvlv.container_list. The total size of this section is computed by
batadv_tvlv_container_list_size(), which sums the sizes of all registered
containers.
The return type and accumulator in batadv_tvlv_container_list_size() were
u16. If the accumulated size exceeds U16_MAX, the value wraps around,
causing the subsequent allocation in batadv_tvlv_container_ogm_append()
to be undersized. The memcpy-style copy that follows would then write
beyond the end of the allocated buffer, corrupting kernel memory.
Fix this by widening the return type of batadv_tvlv_container_list_size()
to size_t. In batadv_tvlv_container_ogm_append(), check the computed length
against U16_MAX before proceeding, and bail out as if the allocation had
failed when the limit is exceeded. |
| In the Linux kernel, the following vulnerability has been resolved:
io_uring/poll: fix signed comparison in io_poll_get_ownership()
io_poll_get_ownership() uses a signed comparison to check whether
poll_refs has reached the threshold for the slowpath:
if (unlikely(atomic_read(&req->poll_refs) >= IO_POLL_REF_BIAS))
atomic_read() returns int (signed). When IO_POLL_CANCEL_FLAG
(BIT(31)) is set in poll_refs, the value becomes negative in
signed arithmetic, so the >= 128 comparison always evaluates to
false and the slowpath is never taken.
Fix this by casting the atomic_read() result to unsigned int
before the comparison, so that the cancel flag is treated as a
large positive value and correctly triggers the slowpath. |
| In the Linux kernel, the following vulnerability has been resolved:
xfrm: ipcomp: Free destination pages on acomp errors
Move the out_free_req label up by a couple of lines so that the
allocated dst SG list gets freed on error as well as success. |
| In the Linux kernel, the following vulnerability has been resolved:
batman-adv: tp_meter: avoid use of uninit sender vars
batadv_tp_recv_ack() and batadv_tp_stop() are only valid for tp_vars in the
BATADV_TP_SENDER role. When called with a BATADV_TP_RECEIVER role, it
proceeds to read sender-only members that were never initialized, leading
to undefined behavior.
This can be triggered when a node that is currently acting as a receiver in
an ongoing tp_meter session receives a malicious ACK packet.
Guard against this by checking tp_vars->role immediately after the
lookup and bailing out if it is not BATADV_TP_SENDER, before any of
those members are accessed. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: ebtables: fix OOB read in compat_mtw_from_user
Luxiao Xu says:
The function compat_mtw_from_user() converts ebtables extensions from
32-bit user structures to kernel native structures. However, it lacks
proper validation of the user-supplied match_size/target_size.
When certain extensions are processed, the kernel-side translation
logic may perform memory accesses based on the extension's expected
size. If the user provides a size smaller than what the extension
requires, it results in an out-of-bounds read as reported by KASAN.
This fix introduces a check to ensure match_size is at least as large
as the extension's required compatsize. This covers matches, watchers,
and targets, while maintaining compatibility with standard targets.
AFAIU this is relevant for matches that need to go though
match->compat_from_user() call. Those that use plain memcpy with the
user-provided size are ok because the caller checks that size vs the
start of the next rule entry offset (which itself is checked vs. total
size copied from userspace).
The ->compat_from_user() callbacks assume they can read compatsize bytes,
so they need this extra check.
Based on an earlier patch from Luxiao Xu. |
| In the Linux kernel, the following vulnerability has been resolved:
ipc: limit next_id allocation to the valid ID range
The checkpoint/restore sysctl path can request the next SysV IPC id
through ids->next_id. ipc_idr_alloc() currently forwards that request to
idr_alloc() with an open-ended upper bound.
If the valid tail of the SysV IPC id space is full, the allocation can
spill beyond ipc_mni. The returned SysV IPC id still uses the normal
index encoding, so later lookup and removal can target the wrong slot.
This leaves the real IDR entry behind and breaks the IDR state for the
object.
The bug is in ipc_idr_alloc() in the checkpoint/restore path.
1. ids->next_id is passed to:
idr_alloc(&ids->ipcs_idr, new, ipcid_to_idx(next_id), 0, ...)
2. The zero upper bound makes the allocation effectively open-ended.
Once the valid SysV IPC tail is occupied, idr_alloc() can spill past
ipc_mni and allocate an entry beyond the valid IPC id range.
3. The new object id is still encoded with the narrower SysV IPC index
width:
new->id = (new->seq << ipcmni_seq_shift()) + idx
4. Later removal goes through ipc_rmid(), which uses:
ipcid_to_idx(ipcp->id)
That truncates the real IDR index. An object actually stored at a
high index can then be removed as if it lived at a low in-range
index.
5. For shared memory, shm_destroy() frees the current object anyway, but
the real high IDR slot is left behind as a dangling pointer.
6. A subsequent walk of /proc/sysvipc/shm reaches the stale IDR entry
and dereferences freed memory.
Prevent this by bounding the requested allocation to ipc_mni so the
checkpoint/restore path fails once the valid range is exhausted. |
| In the Linux kernel, the following vulnerability has been resolved:
batman-adv: dat: handle forward allocation error
batadv_dat_forward_data() calls pskb_copy_for_clone() to duplicate an skb
for each DHT candidate, but does not check the return value before passing
it to batadv_send_skb_prepare_unicast_4addr(). That function dereferences
the skb unconditionally, so a failed allocation triggers a NULL pointer
dereference.
Skip forwarding to the current DHT candidate on allocation failure. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: xt_policy: fix strict mode inbound policy matching
match_policy_in() walks sec_path entries from the last transform to the
first one, but strict policy matching needs to consume info->pol[] in
the same forward order as the rule layout.
Derive the strict-match policy position from the number of transforms
already consumed so that multi-element inbound rules are matched
consistently. |
| In the Linux kernel, the following vulnerability has been resolved:
batman-adv: fix tp_meter counter underflow during shutdown
batadv_tp_sender_shutdown() unconditionally decrements the "sending"
atomic counter. If multiple paths (e.g. timeout, user cancel, and
normal finish) call this function, the counter can underflow to -1.
Since the sender logic treats any non-zero value as "still sending",
a negative value causes the sender kthread to loop indefinitely.
This leads to a use-after-free when the interface is removed while
the zombie thread is still active.
Fix this by using atomic_xchg() to ensure the counter only transitions
from 1 to 0 once.
[sven: added missing change in batadv_tp_send] |
| In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: serialize accept_q access
bt_sock_poll() walks the accept queue without synchronization, while
child teardown can unlink the same socket and drop its last reference.
The unsynchronized accept queue walk has existed since the initial
Bluetooth import.
Protect accept_q with a dedicated lock for queue updates and polling.
Also rework bt_accept_dequeue() to take temporary child references under
the queue lock before dropping it and locking the child socket. |
| In the Linux kernel, the following vulnerability has been resolved:
sctp: diag: reject stale associations in dump_one path
The SCTP exact sock_diag lookup can hold a transport reference, block on
lock_sock(sk), and then resume after sctp_association_free() has marked
the association dead and freed its bind address list.
When that happens, inet_assoc_attr_size() and
inet_diag_msg_sctpasoc_fill() can still dereference association state
that is no longer valid for reporting. In particular,
inet_diag_msg_sctpasoc_fill() may read an empty bind-address list as a
real sctp_sockaddr_entry and trigger an out-of-bounds read from
unrelated association memory.
Reject the association after taking the socket lock if it has been
reaped or detached from the endpoint, and report the lookup as stale.
This keeps the exact dump-one path from formatting torn association
state. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: ip6t_hbh: reject oversized option lists
struct ip6t_opts stores at most IP6T_OPTS_OPTSNR option descriptors,
but hbh_mt6_check() does not reject larger optsnr values supplied from
userspace.
Validate optsnr in the rule setup path so only match data that fits the
fixed-size opts array can be installed. This follows the existing xtables
pattern of rejecting invalid user-provided counts in checkentry() and
keeps the packet matching path unchanged.
`struct ip6t_opts` has a fixed `opts[IP6T_OPTS_OPTSNR]` array,
where `IP6T_OPTS_OPTSNR` is 16, then off-by-one array access is possible:
[ 137.924693][ T8692] UBSAN: array-index-out-of-bounds in ../net/ipv6/netfilter/ip6t_hbh.c:110:29
[ 137.926167][ T8692] index 16 is out of range for type '__u16 [16]' |