| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix to do sanity check on ino and xnid
syzbot reported a f2fs bug as below:
INFO: task syz-executor140:5308 blocked for more than 143 seconds.
Not tainted 6.14.0-rc7-syzkaller-00069-g81e4f8d68c66 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor140 state:D stack:24016 pid:5308 tgid:5308 ppid:5306 task_flags:0x400140 flags:0x00000006
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5378 [inline]
__schedule+0x190e/0x4c90 kernel/sched/core.c:6765
__schedule_loop kernel/sched/core.c:6842 [inline]
schedule+0x14b/0x320 kernel/sched/core.c:6857
io_schedule+0x8d/0x110 kernel/sched/core.c:7690
folio_wait_bit_common+0x839/0xee0 mm/filemap.c:1317
__folio_lock mm/filemap.c:1664 [inline]
folio_lock include/linux/pagemap.h:1163 [inline]
__filemap_get_folio+0x147/0xb40 mm/filemap.c:1917
pagecache_get_page+0x2c/0x130 mm/folio-compat.c:87
find_get_page_flags include/linux/pagemap.h:842 [inline]
f2fs_grab_cache_page+0x2b/0x320 fs/f2fs/f2fs.h:2776
__get_node_page+0x131/0x11b0 fs/f2fs/node.c:1463
read_xattr_block+0xfb/0x190 fs/f2fs/xattr.c:306
lookup_all_xattrs fs/f2fs/xattr.c:355 [inline]
f2fs_getxattr+0x676/0xf70 fs/f2fs/xattr.c:533
__f2fs_get_acl+0x52/0x870 fs/f2fs/acl.c:179
f2fs_acl_create fs/f2fs/acl.c:375 [inline]
f2fs_init_acl+0xd7/0x9b0 fs/f2fs/acl.c:418
f2fs_init_inode_metadata+0xa0f/0x1050 fs/f2fs/dir.c:539
f2fs_add_inline_entry+0x448/0x860 fs/f2fs/inline.c:666
f2fs_add_dentry+0xba/0x1e0 fs/f2fs/dir.c:765
f2fs_do_add_link+0x28c/0x3a0 fs/f2fs/dir.c:808
f2fs_add_link fs/f2fs/f2fs.h:3616 [inline]
f2fs_mknod+0x2e8/0x5b0 fs/f2fs/namei.c:766
vfs_mknod+0x36d/0x3b0 fs/namei.c:4191
unix_bind_bsd net/unix/af_unix.c:1286 [inline]
unix_bind+0x563/0xe30 net/unix/af_unix.c:1379
__sys_bind_socket net/socket.c:1817 [inline]
__sys_bind+0x1e4/0x290 net/socket.c:1848
__do_sys_bind net/socket.c:1853 [inline]
__se_sys_bind net/socket.c:1851 [inline]
__x64_sys_bind+0x7a/0x90 net/socket.c:1851
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Let's dump and check metadata of corrupted inode, it shows its xattr_nid
is the same to its i_ino.
dump.f2fs -i 3 chaseyu.img.raw
i_xattr_nid [0x 3 : 3]
So that, during mknod in the corrupted directory, it tries to get and
lock inode page twice, result in deadlock.
- f2fs_mknod
- f2fs_add_inline_entry
- f2fs_get_inode_page --- lock dir's inode page
- f2fs_init_acl
- f2fs_acl_create(dir,..)
- __f2fs_get_acl
- f2fs_getxattr
- lookup_all_xattrs
- __get_node_page --- try to lock dir's inode page
In order to fix this, let's add sanity check on ino and xnid. |
| In the Linux kernel, the following vulnerability has been resolved:
perf/x86/intel: Fix crash in icl_update_topdown_event()
The perf_fuzzer found a hard-lockup crash on a RaptorLake machine:
Oops: general protection fault, maybe for address 0xffff89aeceab400: 0000
CPU: 23 UID: 0 PID: 0 Comm: swapper/23
Tainted: [W]=WARN
Hardware name: Dell Inc. Precision 9660/0VJ762
RIP: 0010:native_read_pmc+0x7/0x40
Code: cc e8 8d a9 01 00 48 89 03 5b cd cc cc cc cc 0f 1f ...
RSP: 000:fffb03100273de8 EFLAGS: 00010046
....
Call Trace:
<TASK>
icl_update_topdown_event+0x165/0x190
? ktime_get+0x38/0xd0
intel_pmu_read_event+0xf9/0x210
__perf_event_read+0xf9/0x210
CPUs 16-23 are E-core CPUs that don't support the perf metrics feature.
The icl_update_topdown_event() should not be invoked on these CPUs.
It's a regression of commit:
f9bdf1f95339 ("perf/x86/intel: Avoid disable PMU if !cpuc->enabled in sample read")
The bug introduced by that commit is that the is_topdown_event() function
is mistakenly used to replace the is_topdown_count() call to check if the
topdown functions for the perf metrics feature should be invoked.
Fix it. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Avoid __bpf_prog_ret0_warn when jit fails
syzkaller reported an issue:
WARNING: CPU: 3 PID: 217 at kernel/bpf/core.c:2357 __bpf_prog_ret0_warn+0xa/0x20 kernel/bpf/core.c:2357
Modules linked in:
CPU: 3 UID: 0 PID: 217 Comm: kworker/u32:6 Not tainted 6.15.0-rc4-syzkaller-00040-g8bac8898fe39
RIP: 0010:__bpf_prog_ret0_warn+0xa/0x20 kernel/bpf/core.c:2357
Call Trace:
<TASK>
bpf_dispatcher_nop_func include/linux/bpf.h:1316 [inline]
__bpf_prog_run include/linux/filter.h:718 [inline]
bpf_prog_run include/linux/filter.h:725 [inline]
cls_bpf_classify+0x74a/0x1110 net/sched/cls_bpf.c:105
...
When creating bpf program, 'fp->jit_requested' depends on bpf_jit_enable.
This issue is triggered because of CONFIG_BPF_JIT_ALWAYS_ON is not set
and bpf_jit_enable is set to 1, causing the arch to attempt JIT the prog,
but jit failed due to FAULT_INJECTION. As a result, incorrectly
treats the program as valid, when the program runs it calls
`__bpf_prog_ret0_warn` and triggers the WARN_ON_ONCE(1). |
| In the Linux kernel, the following vulnerability has been resolved:
x86/mm: Check return value from memblock_phys_alloc_range()
At least with CONFIG_PHYSICAL_START=0x100000, if there is < 4 MiB of
contiguous free memory available at this point, the kernel will crash
and burn because memblock_phys_alloc_range() returns 0 on failure,
which leads memblock_phys_free() to throw the first 4 MiB of physical
memory to the wolves.
At a minimum it should fail gracefully with a meaningful diagnostic,
but in fact everything seems to work fine without the weird reserve
allocation. |
| In the Linux kernel, the following vulnerability has been resolved:
rseq: Fix segfault on registration when rseq_cs is non-zero
The rseq_cs field is documented as being set to 0 by user-space prior to
registration, however this is not currently enforced by the kernel. This
can result in a segfault on return to user-space if the value stored in
the rseq_cs field doesn't point to a valid struct rseq_cs.
The correct solution to this would be to fail the rseq registration when
the rseq_cs field is non-zero. However, some older versions of glibc
will reuse the rseq area of previous threads without clearing the
rseq_cs field and will also terminate the process if the rseq
registration fails in a secondary thread. This wasn't caught in testing
because in this case the leftover rseq_cs does point to a valid struct
rseq_cs.
What we can do is clear the rseq_cs field on registration when it's
non-zero which will prevent segfaults on registration and won't break
the glibc versions that reuse rseq areas on thread creation. |
| In the Linux kernel, the following vulnerability has been resolved:
dm: fix unconditional IO throttle caused by REQ_PREFLUSH
When a bio with REQ_PREFLUSH is submitted to dm, __send_empty_flush()
generates a flush_bio with REQ_OP_WRITE | REQ_PREFLUSH | REQ_SYNC,
which causes the flush_bio to be throttled by wbt_wait().
An example from v5.4, similar problem also exists in upstream:
crash> bt 2091206
PID: 2091206 TASK: ffff2050df92a300 CPU: 109 COMMAND: "kworker/u260:0"
#0 [ffff800084a2f7f0] __switch_to at ffff80004008aeb8
#1 [ffff800084a2f820] __schedule at ffff800040bfa0c4
#2 [ffff800084a2f880] schedule at ffff800040bfa4b4
#3 [ffff800084a2f8a0] io_schedule at ffff800040bfa9c4
#4 [ffff800084a2f8c0] rq_qos_wait at ffff8000405925bc
#5 [ffff800084a2f940] wbt_wait at ffff8000405bb3a0
#6 [ffff800084a2f9a0] __rq_qos_throttle at ffff800040592254
#7 [ffff800084a2f9c0] blk_mq_make_request at ffff80004057cf38
#8 [ffff800084a2fa60] generic_make_request at ffff800040570138
#9 [ffff800084a2fae0] submit_bio at ffff8000405703b4
#10 [ffff800084a2fb50] xlog_write_iclog at ffff800001280834 [xfs]
#11 [ffff800084a2fbb0] xlog_sync at ffff800001280c3c [xfs]
#12 [ffff800084a2fbf0] xlog_state_release_iclog at ffff800001280df4 [xfs]
#13 [ffff800084a2fc10] xlog_write at ffff80000128203c [xfs]
#14 [ffff800084a2fcd0] xlog_cil_push at ffff8000012846dc [xfs]
#15 [ffff800084a2fda0] xlog_cil_push_work at ffff800001284a2c [xfs]
#16 [ffff800084a2fdb0] process_one_work at ffff800040111d08
#17 [ffff800084a2fe00] worker_thread at ffff8000401121cc
#18 [ffff800084a2fe70] kthread at ffff800040118de4
After commit 2def2845cc33 ("xfs: don't allow log IO to be throttled"),
the metadata submitted by xlog_write_iclog() should not be throttled.
But due to the existence of the dm layer, throttling flush_bio indirectly
causes the metadata bio to be throttled.
Fix this by conditionally adding REQ_IDLE to flush_bio.bi_opf, which makes
wbt_should_throttle() return false to avoid wbt_wait(). |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: adjust subpage bit start based on sectorsize
When running machines with 64k page size and a 16k nodesize we started
seeing tree log corruption in production. This turned out to be because
we were not writing out dirty blocks sometimes, so this in fact affects
all metadata writes.
When writing out a subpage EB we scan the subpage bitmap for a dirty
range. If the range isn't dirty we do
bit_start++;
to move onto the next bit. The problem is the bitmap is based on the
number of sectors that an EB has. So in this case, we have a 64k
pagesize, 16k nodesize, but a 4k sectorsize. This means our bitmap is 4
bits for every node. With a 64k page size we end up with 4 nodes per
page.
To make this easier this is how everything looks
[0 16k 32k 48k ] logical address
[0 4 8 12 ] radix tree offset
[ 64k page ] folio
[ 16k eb ][ 16k eb ][ 16k eb ][ 16k eb ] extent buffers
[ | | | | | | | | | | | | | | | | ] bitmap
Now we use all of our addressing based on fs_info->sectorsize_bits, so
as you can see the above our 16k eb->start turns into radix entry 4.
When we find a dirty range for our eb, we correctly do bit_start +=
sectors_per_node, because if we start at bit 0, the next bit for the
next eb is 4, to correspond to eb->start 16k.
However if our range is clean, we will do bit_start++, which will now
put us offset from our radix tree entries.
In our case, assume that the first time we check the bitmap the block is
not dirty, we increment bit_start so now it == 1, and then we loop
around and check again. This time it is dirty, and we go to find that
start using the following equation
start = folio_start + bit_start * fs_info->sectorsize;
so in the case above, eb->start 0 is now dirty, and we calculate start
as
0 + 1 * fs_info->sectorsize = 4096
4096 >> 12 = 1
Now we're looking up the radix tree for 1, and we won't find an eb.
What's worse is now we're using bit_start == 1, so we do bit_start +=
sectors_per_node, which is now 5. If that eb is dirty we will run into
the same thing, we will look at an offset that is not populated in the
radix tree, and now we're skipping the writeout of dirty extent buffers.
The best fix for this is to not use sectorsize_bits to address nodes,
but that's a larger change. Since this is a fs corruption problem fix
it simply by always using sectors_per_node to increment the start bit. |
| In the Linux kernel, the following vulnerability has been resolved:
usb: gadget: f_midi: fix MIDI Streaming descriptor lengths
While the MIDI jacks are configured correctly, and the MIDIStreaming
endpoint descriptors are filled with the correct information,
bNumEmbMIDIJack and bLength are set incorrectly in these descriptors.
This does not matter when the numbers of in and out ports are equal, but
when they differ the host will receive broken descriptors with
uninitialized stack memory leaking into the descriptor for whichever
value is smaller.
The precise meaning of "in" and "out" in the port counts is not clearly
defined and can be confusing. But elsewhere the driver consistently
uses this to match the USB meaning of IN and OUT viewed from the host,
so that "in" ports send data to the host and "out" ports receive data
from it. |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: reject mismatching sum of field_len with set key length
The field length description provides the length of each separated key
field in the concatenation, each field gets rounded up to 32-bits to
calculate the pipapo rule width from pipapo_init(). The set key length
provides the total size of the key aligned to 32-bits.
Register-based arithmetics still allows for combining mismatching set
key length and field length description, eg. set key length 10 and field
description [ 5, 4 ] leading to pipapo width of 12. |
| In the Linux kernel, the following vulnerability has been resolved:
net: let net.core.dev_weight always be non-zero
The following problem was encountered during stability test:
(NULL net_device): NAPI poll function process_backlog+0x0/0x530 \
returned 1, exceeding its budget of 0.
------------[ cut here ]------------
list_add double add: new=ffff88905f746f48, prev=ffff88905f746f48, \
next=ffff88905f746e40.
WARNING: CPU: 18 PID: 5462 at lib/list_debug.c:35 \
__list_add_valid_or_report+0xf3/0x130
CPU: 18 UID: 0 PID: 5462 Comm: ping Kdump: loaded Not tainted 6.13.0-rc7+
RIP: 0010:__list_add_valid_or_report+0xf3/0x130
Call Trace:
? __warn+0xcd/0x250
? __list_add_valid_or_report+0xf3/0x130
enqueue_to_backlog+0x923/0x1070
netif_rx_internal+0x92/0x2b0
__netif_rx+0x15/0x170
loopback_xmit+0x2ef/0x450
dev_hard_start_xmit+0x103/0x490
__dev_queue_xmit+0xeac/0x1950
ip_finish_output2+0x6cc/0x1620
ip_output+0x161/0x270
ip_push_pending_frames+0x155/0x1a0
raw_sendmsg+0xe13/0x1550
__sys_sendto+0x3bf/0x4e0
__x64_sys_sendto+0xdc/0x1b0
do_syscall_64+0x5b/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The reproduction command is as follows:
sysctl -w net.core.dev_weight=0
ping 127.0.0.1
This is because when the napi's weight is set to 0, process_backlog() may
return 0 and clear the NAPI_STATE_SCHED bit of napi->state, causing this
napi to be re-polled in net_rx_action() until __do_softirq() times out.
Since the NAPI_STATE_SCHED bit has been cleared, napi_schedule_rps() can
be retriggered in enqueue_to_backlog(), causing this issue.
Making the napi's weight always non-zero solves this problem.
Triggering this issue requires system-wide admin (setting is
not namespaced). |
| In the Linux kernel, the following vulnerability has been resolved:
NFSD: fix hang in nfsd4_shutdown_callback
If nfs4_client is in courtesy state then there is no point to send
the callback. This causes nfsd4_shutdown_callback to hang since
cl_cb_inflight is not 0. This hang lasts about 15 minutes until TCP
notifies NFSD that the connection was dropped.
This patch modifies nfsd4_run_cb_work to skip the RPC call if
nfs4_client is in courtesy state. |
| In the Linux kernel, the following vulnerability has been resolved:
ipv4: use RCU protection in __ip_rt_update_pmtu()
__ip_rt_update_pmtu() must use RCU protection to make
sure the net structure it reads does not disappear. |
| In the Linux kernel, the following vulnerability has been resolved:
ipv6: use RCU protection in ip6_default_advmss()
ip6_default_advmss() needs rcu protection to make
sure the net structure it reads does not disappear. |
| In the Linux kernel, the following vulnerability has been resolved:
ipv6: mcast: add RCU protection to mld_newpack()
mld_newpack() can be called without RTNL or RCU being held.
Note that we no longer can use sock_alloc_send_skb() because
ipv6.igmp_sk uses GFP_KERNEL allocations which can sleep.
Instead use alloc_skb() and charge the net->ipv6.igmp_sk
socket under RCU protection. |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Send signals asynchronously if !preemptible
BPF programs can execute in all kinds of contexts and when a program
running in a non-preemptible context uses the bpf_send_signal() kfunc,
it will cause issues because this kfunc can sleep.
Change `irqs_disabled()` to `!preemptible()`. |
| In the Linux kernel, the following vulnerability has been resolved:
pfifo_tail_enqueue: Drop new packet when sch->limit == 0
Expected behaviour:
In case we reach scheduler's limit, pfifo_tail_enqueue() will drop a
packet in scheduler's queue and decrease scheduler's qlen by one.
Then, pfifo_tail_enqueue() enqueue new packet and increase
scheduler's qlen by one. Finally, pfifo_tail_enqueue() return
`NET_XMIT_CN` status code.
Weird behaviour:
In case we set `sch->limit == 0` and trigger pfifo_tail_enqueue() on a
scheduler that has no packet, the 'drop a packet' step will do nothing.
This means the scheduler's qlen still has value equal 0.
Then, we continue to enqueue new packet and increase scheduler's qlen by
one. In summary, we can leverage pfifo_tail_enqueue() to increase qlen by
one and return `NET_XMIT_CN` status code.
The problem is:
Let's say we have two qdiscs: Qdisc_A and Qdisc_B.
- Qdisc_A's type must have '->graft()' function to create parent/child relationship.
Let's say Qdisc_A's type is `hfsc`. Enqueue packet to this qdisc will trigger `hfsc_enqueue`.
- Qdisc_B's type is pfifo_head_drop. Enqueue packet to this qdisc will trigger `pfifo_tail_enqueue`.
- Qdisc_B is configured to have `sch->limit == 0`.
- Qdisc_A is configured to route the enqueued's packet to Qdisc_B.
Enqueue packet through Qdisc_A will lead to:
- hfsc_enqueue(Qdisc_A) -> pfifo_tail_enqueue(Qdisc_B)
- Qdisc_B->q.qlen += 1
- pfifo_tail_enqueue() return `NET_XMIT_CN`
- hfsc_enqueue() check for `NET_XMIT_SUCCESS` and see `NET_XMIT_CN` => hfsc_enqueue() don't increase qlen of Qdisc_A.
The whole process lead to a situation where Qdisc_A->q.qlen == 0 and Qdisc_B->q.qlen == 1.
Replace 'hfsc' with other type (for example: 'drr') still lead to the same problem.
This violate the design where parent's qlen should equal to the sum of its childrens'qlen.
Bug impact: This issue can be used for user->kernel privilege escalation when it is reachable. |
| In the Linux kernel, the following vulnerability has been resolved:
fs/proc: fix softlockup in __read_vmcore (part 2)
Since commit 5cbcb62dddf5 ("fs/proc: fix softlockup in __read_vmcore") the
number of softlockups in __read_vmcore at kdump time have gone down, but
they still happen sometimes.
In a memory constrained environment like the kdump image, a softlockup is
not just a harmless message, but it can interfere with things like RCU
freeing memory, causing the crashdump to get stuck.
The second loop in __read_vmcore has a lot more opportunities for natural
sleep points, like scheduling out while waiting for a data write to
happen, but apparently that is not always enough.
Add a cond_resched() to the second loop in __read_vmcore to (hopefully)
get rid of the softlockups. |
| In the Linux kernel, the following vulnerability has been resolved:
gtp: Destroy device along with udp socket's netns dismantle.
gtp_newlink() links the device to a list in dev_net(dev) instead of
src_net, where a udp tunnel socket is created.
Even when src_net is removed, the device stays alive on dev_net(dev).
Then, removing src_net triggers the splat below. [0]
In this example, gtp0 is created in ns2, and the udp socket is created
in ns1.
ip netns add ns1
ip netns add ns2
ip -n ns1 link add netns ns2 name gtp0 type gtp role sgsn
ip netns del ns1
Let's link the device to the socket's netns instead.
Now, gtp_net_exit_batch_rtnl() needs another netdev iteration to remove
all gtp devices in the netns.
[0]:
ref_tracker: net notrefcnt@000000003d6e7d05 has 1/2 users at
sk_alloc (./include/net/net_namespace.h:345 net/core/sock.c:2236)
inet_create (net/ipv4/af_inet.c:326 net/ipv4/af_inet.c:252)
__sock_create (net/socket.c:1558)
udp_sock_create4 (net/ipv4/udp_tunnel_core.c:18)
gtp_create_sock (./include/net/udp_tunnel.h:59 drivers/net/gtp.c:1423)
gtp_create_sockets (drivers/net/gtp.c:1447)
gtp_newlink (drivers/net/gtp.c:1507)
rtnl_newlink (net/core/rtnetlink.c:3786 net/core/rtnetlink.c:3897 net/core/rtnetlink.c:4012)
rtnetlink_rcv_msg (net/core/rtnetlink.c:6922)
netlink_rcv_skb (net/netlink/af_netlink.c:2542)
netlink_unicast (net/netlink/af_netlink.c:1321 net/netlink/af_netlink.c:1347)
netlink_sendmsg (net/netlink/af_netlink.c:1891)
____sys_sendmsg (net/socket.c:711 net/socket.c:726 net/socket.c:2583)
___sys_sendmsg (net/socket.c:2639)
__sys_sendmsg (net/socket.c:2669)
do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
WARNING: CPU: 1 PID: 60 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179)
Modules linked in:
CPU: 1 UID: 0 PID: 60 Comm: kworker/u16:2 Not tainted 6.13.0-rc5-00147-g4c1224501e9d #5
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
Workqueue: netns cleanup_net
RIP: 0010:ref_tracker_dir_exit (lib/ref_tracker.c:179)
Code: 00 00 00 fc ff df 4d 8b 26 49 bd 00 01 00 00 00 00 ad de 4c 39 f5 0f 85 df 00 00 00 48 8b 74 24 08 48 89 df e8 a5 cc 12 02 90 <0f> 0b 90 48 8d 6b 44 be 04 00 00 00 48 89 ef e8 80 de 67 ff 48 89
RSP: 0018:ff11000009a07b60 EFLAGS: 00010286
RAX: 0000000000002bd3 RBX: ff1100000f4e1aa0 RCX: 1ffffffff0e40ac6
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8423ee3c
RBP: ff1100000f4e1af0 R08: 0000000000000001 R09: fffffbfff0e395ae
R10: 0000000000000001 R11: 0000000000036001 R12: ff1100000f4e1af0
R13: dead000000000100 R14: ff1100000f4e1af0 R15: dffffc0000000000
FS: 0000000000000000(0000) GS:ff1100006ce80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f9b2464bd98 CR3: 0000000005286005 CR4: 0000000000771ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
? __warn (kernel/panic.c:748)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? report_bug (lib/bug.c:201 lib/bug.c:219)
? handle_bug (arch/x86/kernel/traps.c:285)
? exc_invalid_op (arch/x86/kernel/traps.c:309 (discriminator 1))
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
? _raw_spin_unlock_irqrestore (./arch/x86/include/asm/irqflags.h:42 ./arch/x86/include/asm/irqflags.h:97 ./arch/x86/include/asm/irqflags.h:155 ./include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
? ref_tracker_dir_exit (lib/ref_tracker.c:179)
? __pfx_ref_tracker_dir_exit (lib/ref_tracker.c:158)
? kfree (mm/slub.c:4613 mm/slub.c:4761)
net_free (net/core/net_namespace.c:476 net/core/net_namespace.c:467)
cleanup_net (net/core/net_namespace.c:664 (discriminator 3))
process_one_work (kernel/workqueue.c:3229)
worker_thread (kernel/workqueue.c:3304 kernel/workqueue.c:3391
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
dm thin: make get_first_thin use rcu-safe list first function
The documentation in rculist.h explains the absence of list_empty_rcu()
and cautions programmers against relying on a list_empty() ->
list_first() sequence in RCU safe code. This is because each of these
functions performs its own READ_ONCE() of the list head. This can lead
to a situation where the list_empty() sees a valid list entry, but the
subsequent list_first() sees a different view of list head state after a
modification.
In the case of dm-thin, this author had a production box crash from a GP
fault in the process_deferred_bios path. This function saw a valid list
head in get_first_thin() but when it subsequently dereferenced that and
turned it into a thin_c, it got the inside of the struct pool, since the
list was now empty and referring to itself. The kernel on which this
occurred printed both a warning about a refcount_t being saturated, and
a UBSAN error for an out-of-bounds cpuid access in the queued spinlock,
prior to the fault itself. When the resulting kdump was examined, it
was possible to see another thread patiently waiting in thin_dtr's
synchronize_rcu.
The thin_dtr call managed to pull the thin_c out of the active thins
list (and have it be the last entry in the active_thins list) at just
the wrong moment which lead to this crash.
Fortunately, the fix here is straight forward. Switch get_first_thin()
function to use list_first_or_null_rcu() which performs just a single
READ_ONCE() and returns NULL if the list is already empty.
This was run against the devicemapper test suite's thin-provisioning
suites for delete and suspend and no regressions were observed. |
| In the Linux kernel, the following vulnerability has been resolved:
net_sched: cls_flow: validate TCA_FLOW_RSHIFT attribute
syzbot found that TCA_FLOW_RSHIFT attribute was not validated.
Right shitfing a 32bit integer is undefined for large shift values.
UBSAN: shift-out-of-bounds in net/sched/cls_flow.c:329:23
shift exponent 9445 is too large for 32-bit type 'u32' (aka 'unsigned int')
CPU: 1 UID: 0 PID: 54 Comm: kworker/u8:3 Not tainted 6.13.0-rc3-syzkaller-00180-g4f619d518db9 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Workqueue: ipv6_addrconf addrconf_dad_work
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
ubsan_epilogue lib/ubsan.c:231 [inline]
__ubsan_handle_shift_out_of_bounds+0x3c8/0x420 lib/ubsan.c:468
flow_classify+0x24d5/0x25b0 net/sched/cls_flow.c:329
tc_classify include/net/tc_wrapper.h:197 [inline]
__tcf_classify net/sched/cls_api.c:1771 [inline]
tcf_classify+0x420/0x1160 net/sched/cls_api.c:1867
sfb_classify net/sched/sch_sfb.c:260 [inline]
sfb_enqueue+0x3ad/0x18b0 net/sched/sch_sfb.c:318
dev_qdisc_enqueue+0x4b/0x290 net/core/dev.c:3793
__dev_xmit_skb net/core/dev.c:3889 [inline]
__dev_queue_xmit+0xf0e/0x3f50 net/core/dev.c:4400
dev_queue_xmit include/linux/netdevice.h:3168 [inline]
neigh_hh_output include/net/neighbour.h:523 [inline]
neigh_output include/net/neighbour.h:537 [inline]
ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236
iptunnel_xmit+0x55d/0x9b0 net/ipv4/ip_tunnel_core.c:82
udp_tunnel_xmit_skb+0x262/0x3b0 net/ipv4/udp_tunnel_core.c:173
geneve_xmit_skb drivers/net/geneve.c:916 [inline]
geneve_xmit+0x21dc/0x2d00 drivers/net/geneve.c:1039
__netdev_start_xmit include/linux/netdevice.h:5002 [inline]
netdev_start_xmit include/linux/netdevice.h:5011 [inline]
xmit_one net/core/dev.c:3590 [inline]
dev_hard_start_xmit+0x27a/0x7d0 net/core/dev.c:3606
__dev_queue_xmit+0x1b73/0x3f50 net/core/dev.c:4434 |