| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
ASoC: SOF: ipc4-topology: Correct the allocation size for bytes controls
The size of the data behind of scontrol->ipc_control_data for bytes
controls is:
[1] sizeof(struct sof_ipc4_control_data) + // kernel only struct
[2] sizeof(struct sof_abi_hdr)) + payload
The max_size specifies the size of [2] and it is coming from topology.
Change the function to take this into account and allocate adequate amount
of memory behind scontrol->ipc_control_data.
With the change we will allocate [1] amount more memory to be able to hold
the full size of data. |
| In the Linux kernel, the following vulnerability has been resolved:
net: qrtr: Drop the MHI auto_queue feature for IPCR DL channels
MHI stack offers the 'auto_queue' feature, which allows the MHI stack to
auto queue the buffers for the RX path (DL channel). Though this feature
simplifies the client driver design, it introduces race between the client
drivers and the MHI stack. For instance, with auto_queue, the 'dl_callback'
for the DL channel may get called before the client driver is fully probed.
This means, by the time the dl_callback gets called, the client driver's
structures might not be initialized, leading to NULL ptr dereference.
Currently, the drivers have to workaround this issue by initializing the
internal structures before calling mhi_prepare_for_transfer_autoqueue().
But even so, there is a chance that the client driver's internal code path
may call the MHI queue APIs before mhi_prepare_for_transfer_autoqueue() is
called, leading to similar NULL ptr dereference. This issue has been
reported on the Qcom X1E80100 CRD machines affecting boot.
So to properly fix all these races, drop the MHI 'auto_queue' feature
altogether and let the client driver (QRTR) manage the RX buffers manually.
In the QRTR driver, queue the RX buffers based on the ring length during
probe and recycle the buffers in 'dl_callback' once they are consumed. This
also warrants removing the setting of 'auto_queue' flag from controller
drivers.
Currently, this 'auto_queue' feature is only enabled for IPCR DL channel.
So only the QRTR client driver requires the modification. |
| In the Linux kernel, the following vulnerability has been resolved:
ocfs2: fix out-of-bounds write in ocfs2_write_end_inline
KASAN reports a use-after-free write of 4086 bytes in
ocfs2_write_end_inline, called from ocfs2_write_end_nolock during a
copy_file_range splice fallback on a corrupted ocfs2 filesystem mounted on
a loop device. The actual bug is an out-of-bounds write past the inode
block buffer, not a true use-after-free. The write overflows into an
adjacent freed page, which KASAN reports as UAF.
The root cause is that ocfs2_try_to_write_inline_data trusts the on-disk
id_count field to determine whether a write fits in inline data. On a
corrupted filesystem, id_count can exceed the physical maximum inline data
capacity, causing writes to overflow the inode block buffer.
Call trace (crash path):
vfs_copy_file_range (fs/read_write.c:1634)
do_splice_direct
splice_direct_to_actor
iter_file_splice_write
ocfs2_file_write_iter
generic_perform_write
ocfs2_write_end
ocfs2_write_end_nolock (fs/ocfs2/aops.c:1949)
ocfs2_write_end_inline (fs/ocfs2/aops.c:1915)
memcpy_from_folio <-- KASAN: write OOB
So add id_count upper bound check in ocfs2_validate_inode_block() to
alongside the existing i_size check to fix it. |
| In the Linux kernel, the following vulnerability has been resolved:
ocfs2: validate inline data i_size during inode read
When reading an inode from disk, ocfs2_validate_inode_block() performs
various sanity checks but does not validate the size of inline data. If
the filesystem is corrupted, an inode's i_size can exceed the actual
inline data capacity (id_count).
This causes ocfs2_dir_foreach_blk_id() to iterate beyond the inline data
buffer, triggering a use-after-free when accessing directory entries from
freed memory.
In the syzbot report:
- i_size was 1099511627576 bytes (~1TB)
- Actual inline data capacity (id_count) is typically <256 bytes
- A garbage rec_len (54648) caused ctx->pos to jump out of bounds
- This triggered a UAF in ocfs2_check_dir_entry()
Fix by adding a validation check in ocfs2_validate_inode_block() to ensure
inodes with inline data have i_size <= id_count. This catches the
corruption early during inode read and prevents all downstream code from
operating on invalid data. |
| In the Linux kernel, the following vulnerability has been resolved:
crypto: af_alg - Fix page reassignment overflow in af_alg_pull_tsgl
When page reassignment was added to af_alg_pull_tsgl the original
loop wasn't updated so it may try to reassign one more page than
necessary.
Add the check to the reassignment so that this does not happen.
Also update the comment which still refers to the obsolete offset
argument. |
| In the Linux kernel, the following vulnerability has been resolved:
net: ioam6: fix OOB and missing lock
When trace->type.bit6 is set:
if (trace->type.bit6) {
...
queue = skb_get_tx_queue(dev, skb);
qdisc = rcu_dereference(queue->qdisc);
This code can lead to an out-of-bounds access of the dev->_tx[] array
when is_input is true. In such a case, the packet is on the RX path and
skb->queue_mapping contains the RX queue index of the ingress device. If
the ingress device has more RX queues than the egress device (dev) has
TX queues, skb_get_queue_mapping(skb) will exceed dev->num_tx_queues.
Add a check to avoid this situation since skb_get_tx_queue() does not
clamp the index. This issue has also revealed that per queue visibility
cannot be accurate and will be replaced later as a new feature.
While at it, add missing lock around qdisc_qstats_qlen_backlog(). The
function __ioam6_fill_trace_data() is called from both softirq and
process contexts, hence the use of spin_lock_bh() here. |
| In the Linux kernel, the following vulnerability has been resolved:
pinctrl: mcp23s08: Disable all pin interrupts during probe
A chip being probed may have the interrupt-on-change feature enabled on
some of its pins, for example after a reboot. This can cause the chip to
generate interrupts for pins that don't have a registered nested handler,
which leads to a kernel crash such as below:
[ 7.928897] Unable to handle kernel read from unreadable memory at virtual address 00000000000000ac
[ 7.932314] Mem abort info:
[ 7.935081] ESR = 0x0000000096000004
[ 7.938808] EC = 0x25: DABT (current EL), IL = 32 bits
[ 7.944094] SET = 0, FnV = 0
[ 7.947127] EA = 0, S1PTW = 0
[ 7.950247] FSC = 0x04: level 0 translation fault
[ 7.955101] Data abort info:
[ 7.957961] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 7.963421] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 7.968447] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 7.973734] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000089b7000
[ 7.980148] [00000000000000ac] pgd=0000000000000000, p4d=0000000000000000
[ 7.986913] Internal error: Oops: 0000000096000004 [#1] SMP
[ 7.992545] Modules linked in:
[ 8.073678] CPU: 0 UID: 0 PID: 81 Comm: irq/18-4-0025 Not tainted 7.0.0-rc6-gd2b5a1f931c8-dirty #199
[ 8.073689] Hardware name: Khadas VIM3 (DT)
[ 8.073692] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 8.094639] pc : _raw_spin_lock_irq+0x40/0x80
[ 8.098970] lr : handle_nested_irq+0x2c/0x168
[ 8.098979] sp : ffff800082b2bd20
[ 8.106599] x29: ffff800082b2bd20 x28: ffff800080107920 x27: ffff800080104d88
[ 8.106611] x26: ffff000003298080 x25: 0000000000000001 x24: 000000000000ff00
[ 8.113707] x23: 0000000000000001 x22: 0000000000000000 x21: 000000000000000e
[ 8.120850] x20: 0000000000000000 x19: 00000000000000ac x18: 0000000000000000
[ 8.135046] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 8.135062] x14: ffff800081567ea8 x13: ffffffffffffffff x12: 0000000000000000
[ 8.135070] x11: 00000000000000c0 x10: 0000000000000b60 x9 : ffff800080109e0c
[ 8.135078] x8 : 1fffe0000069dbc1 x7 : 0000000000000001 x6 : ffff0000034ede00
[ 8.135086] x5 : 0000000000000000 x4 : ffff0000034ede08 x3 : 0000000000000001
[ 8.163460] x2 : 0000000000000000 x1 : 0000000000000001 x0 : 00000000000000ac
[ 8.170560] Call trace:
[ 8.180094] _raw_spin_lock_irq+0x40/0x80 (P)
[ 8.184443] mcp23s08_irq+0x248/0x358
[ 8.184462] irq_thread_fn+0x34/0xb8
[ 8.184470] irq_thread+0x1a4/0x310
[ 8.195093] kthread+0x13c/0x150
[ 8.198309] ret_from_fork+0x10/0x20
[ 8.201850] Code: d65f03c0 d2800002 52800023 f9800011 (885ffc01)
[ 8.207931] ---[ end trace 0000000000000000 ]---
This issue has always been present, but has been latent until commit
"f9f4fda15e72" ("pinctrl: mcp23s08: init reg_defaults from HW at probe and
switch cache type"), which correctly removed reg_defaults from the regmap
and as a side effect changed the behavior of the interrupt handler so that
the real value of the MCP_GPINTEN register is now being read from the chip
instead of using a bogus 0 default value; a non-zero value for this
register can trigger the invocation of a nested handler which may not exist
(yet).
Fix this issue by disabling all pin interrupts during initialization. |
| Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting') vulnerability in Apache Wicket.
This issue affects Apache Wicket: from 8.0.0 through 8.17.0, 9.0.0, from 10.0.0 through 10.8.0.
Users are recommended to upgrade to version 10.9.0, which fixes the issue. |
| In the Linux kernel, the following vulnerability has been resolved:
bnxt_en: Fix RSS context delete logic
We need to free the corresponding RSS context VNIC
in FW everytime an RSS context is deleted in driver.
Commit 667ac333dbb7 added a check to delete the VNIC
in FW only when netif_running() is true to help delete
RSS contexts with interface down.
Having that condition will make the driver leak VNICs
in FW whenever close() happens with active RSS contexts.
On the subsequent open(), as part of RSS context restoration,
we will end up trying to create extra VNICs for which we
did not make any reservation. FW can fail this request,
thereby making us lose active RSS contexts.
Suppose an RSS context is deleted already and we try to
process a delete request again, then the HWRM functions
will check for validity of the request and they simply
return if the resource is already freed. So, even for
delete-when-down cases, netif_running() check is not
necessary.
Remove the netif_running() condition check when deleting
an RSS context. |
| In the Linux kernel, the following vulnerability has been resolved:
9p/xen: protect xen_9pfs_front_free against concurrent calls
The xenwatch thread can race with other back-end change notifications
and call xen_9pfs_front_free() twice, hitting the observed general
protection fault due to a double-free. Guard the teardown path so only
one caller can release the front-end state at a time, preventing the
crash.
This is a fix for the following double-free:
[ 27.052347] Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ 27.052357] CPU: 0 UID: 0 PID: 32 Comm: xenwatch Not tainted 6.18.0-02087-g51ab33fc0a8b-dirty #60 PREEMPT(none)
[ 27.052363] RIP: e030:xen_9pfs_front_free+0x1d/0x150
[ 27.052368] Code: 90 90 90 90 90 90 90 90 90 90 90 90 90 41 55 41 54 55 48 89 fd 48 c7 c7 48 d0 92 85 53 e8 cb cb 05 00 48 8b 45 08 48 8b 55 00 <48> 3b 28 0f 85 f9 28 35 fe 48 3b 6a 08 0f 85 ef 28 35 fe 48 89 42
[ 27.052377] RSP: e02b:ffffc9004016fdd0 EFLAGS: 00010246
[ 27.052381] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88800d66e400 RCX: 0000000000000000
[ 27.052385] RDX: 6b6b6b6b6b6b6b6b RSI: 0000000000000000 RDI: 0000000000000000
[ 27.052389] RBP: ffff88800a887040 R08: 0000000000000000 R09: 0000000000000000
[ 27.052393] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888009e46b68
[ 27.052397] R13: 0000000000000200 R14: 0000000000000000 R15: ffff88800a887040
[ 27.052404] FS: 0000000000000000(0000) GS:ffff88808ca57000(0000) knlGS:0000000000000000
[ 27.052408] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 27.052412] CR2: 00007f9714004360 CR3: 0000000004834000 CR4: 0000000000050660
[ 27.052418] Call Trace:
[ 27.052420] <TASK>
[ 27.052422] xen_9pfs_front_changed+0x5d5/0x720
[ 27.052426] ? xenbus_otherend_changed+0x72/0x140
[ 27.052430] ? __pfx_xenwatch_thread+0x10/0x10
[ 27.052434] xenwatch_thread+0x94/0x1c0
[ 27.052438] ? __pfx_autoremove_wake_function+0x10/0x10
[ 27.052442] kthread+0xf8/0x240
[ 27.052445] ? __pfx_kthread+0x10/0x10
[ 27.052449] ? __pfx_kthread+0x10/0x10
[ 27.052452] ret_from_fork+0x16b/0x1a0
[ 27.052456] ? __pfx_kthread+0x10/0x10
[ 27.052459] ret_from_fork_asm+0x1a/0x30
[ 27.052463] </TASK>
[ 27.052465] Modules linked in:
[ 27.052471] ---[ end trace 0000000000000000 ]--- |
| In the Linux kernel, the following vulnerability has been resolved:
soc: ti: k3-socinfo: Fix regmap leak on probe failure
The mmio regmap allocated during probe is never freed.
Switch to using the device managed allocator so that the regmap is
released on probe failures (e.g. probe deferral) and on driver unbind. |
| In the Linux kernel, the following vulnerability has been resolved:
hfs: Replace BUG_ON with error handling for CNID count checks
In a06ec283e125 next_id, folder_count, and file_count in the super block
info were expanded to 64 bits, and BUG_ONs were added to detect
overflow. This triggered an error reported by syzbot: if the MDB is
corrupted, the BUG_ON is triggered. This patch replaces this mechanism
with proper error handling and resolves the syzbot reported bug.
Singed-off-by: Jori Koolstra <jkoolstra@xs4all.nl> |
| In the Linux kernel, the following vulnerability has been resolved:
media: verisilicon: AV1: Fix tile info buffer size
Each tile info is composed of: row_sb, col_sb, start_pos
and end_pos (4 bytes each). So the total required memory
is AV1_MAX_TILES * 16 bytes.
Use the correct #define to allocate the buffer and avoid
writing tile info in non-allocated memory. |
| In the Linux kernel, the following vulnerability has been resolved:
cifs: Fix locking usage for tcon fields
We used to use the cifs_tcp_ses_lock to protect a lot of objects
that are not just the server, ses or tcon lists. We later introduced
srv_lock, ses_lock and tc_lock to protect fields within the
corresponding structs. This was done to provide a more granular
protection and avoid unnecessary serialization.
There were still a couple of uses of cifs_tcp_ses_lock to provide
tcon fields. In this patch, I've replaced them with tc_lock. |
| In the Linux kernel, the following vulnerability has been resolved:
net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query
Fix a "scheduling while atomic" bug in mlx5e_ipsec_init_macs() by
replacing mlx5_query_mac_address() with ether_addr_copy() to get the
local MAC address directly from netdev->dev_addr.
The issue occurs because mlx5_query_mac_address() queries the hardware
which involves mlx5_cmd_exec() that can sleep, but it is called from
the mlx5e_ipsec_handle_event workqueue which runs in atomic context.
The MAC address is already available in netdev->dev_addr, so no need
to query hardware. This avoids the sleeping call and resolves the bug.
Call trace:
BUG: scheduling while atomic: kworker/u112:2/69344/0x00000200
__schedule+0x7ab/0xa20
schedule+0x1c/0xb0
schedule_timeout+0x6e/0xf0
__wait_for_common+0x91/0x1b0
cmd_exec+0xa85/0xff0 [mlx5_core]
mlx5_cmd_exec+0x1f/0x50 [mlx5_core]
mlx5_query_nic_vport_mac_address+0x7b/0xd0 [mlx5_core]
mlx5_query_mac_address+0x19/0x30 [mlx5_core]
mlx5e_ipsec_init_macs+0xc1/0x720 [mlx5_core]
mlx5e_ipsec_build_accel_xfrm_attrs+0x422/0x670 [mlx5_core]
mlx5e_ipsec_handle_event+0x2b9/0x460 [mlx5_core]
process_one_work+0x178/0x2e0
worker_thread+0x2ea/0x430 |
| In the Linux kernel, the following vulnerability has been resolved:
net: consume xmit errors of GSO frames
udpgro_frglist.sh and udpgro_bench.sh are the flakiest tests
currently in NIPA. They fail in the same exact way, TCP GRO
test stalls occasionally and the test gets killed after 10min.
These tests use veth to simulate GRO. They attach a trivial
("return XDP_PASS;") XDP program to the veth to force TSO off
and NAPI on.
Digging into the failure mode we can see that the connection
is completely stuck after a burst of drops. The sender's snd_nxt
is at sequence number N [1], but the receiver claims to have
received (rcv_nxt) up to N + 3 * MSS [2]. Last piece of the puzzle
is that senders rtx queue is not empty (let's say the block in
the rtx queue is at sequence number N - 4 * MSS [3]).
In this state, sender sends a retransmission from the rtx queue
with a single segment, and sequence numbers N-4*MSS:N-3*MSS [3].
Receiver sees it and responds with an ACK all the way up to
N + 3 * MSS [2]. But sender will reject this ack as TCP_ACK_UNSENT_DATA
because it has no recollection of ever sending data that far out [1].
And we are stuck.
The root cause is the mess of the xmit return codes. veth returns
an error when it can't xmit a frame. We end up with a loss event
like this:
-------------------------------------------------
| GSO super frame 1 | GSO super frame 2 |
|-----------------------------------------------|
| seg | seg | seg | seg | seg | seg | seg | seg |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
-------------------------------------------------
x ok ok <ok>| ok ok ok <x>
\\
snd_nxt
"x" means packet lost by veth, and "ok" means it went thru.
Since veth has TSO disabled in this test it sees individual segments.
Segment 1 is on the retransmit queue and will be resent.
So why did the sender not advance snd_nxt even tho it clearly did
send up to seg 8? tcp_write_xmit() interprets the return code
from the core to mean that data has not been sent at all. Since
TCP deals with GSO super frames, not individual segment the crux
of the problem is that loss of a single segment can be interpreted
as loss of all. TCP only sees the last return code for the last
segment of the GSO frame (in <> brackets in the diagram above).
Of course for the problem to occur we need a setup or a device
without a Qdisc. Otherwise Qdisc layer disconnects the protocol
layer from the device errors completely.
We have multiple ways to fix this.
1) make veth not return an error when it lost a packet.
While this is what I think we did in the past, the issue keeps
reappearing and it's annoying to debug. The game of whack
a mole is not great.
2) fix the damn return codes
We only talk about NETDEV_TX_OK and NETDEV_TX_BUSY in the
documentation, so maybe we should make the return code from
ndo_start_xmit() a boolean. I like that the most, but perhaps
some ancient, not-really-networking protocol would suffer.
3) make TCP ignore the errors
It is not entirely clear to me what benefit TCP gets from
interpreting the result of ip_queue_xmit()? Specifically once
the connection is established and we're pushing data - packet
loss is just packet loss?
4) this fix
Ignore the rc in the Qdisc-less+GSO case, since it's unreliable.
We already always return OK in the TCQ_F_CAN_BYPASS case.
In the Qdisc-less case let's be a bit more conservative and only
mask the GSO errors. This path is taken by non-IP-"networks"
like CAN, MCTP etc, so we could regress some ancient thing.
This is the simplest, but also maybe the hackiest fix?
Similar fix has been proposed by Eric in the past but never committed
because original reporter was working with an OOT driver and wasn't
providing feedback (see Link). |
| In the Linux kernel, the following vulnerability has been resolved:
gpio: sysfs: fix chip removal with GPIOs exported over sysfs
Currently if we export a GPIO over sysfs and unbind the parent GPIO
controller, the exported attribute will remain under /sys/class/gpio
because once we remove the parent device, we can no longer associate the
descriptor with it in gpiod_unexport() and never drop the final
reference.
Rework the teardown code: provide an unlocked variant of
gpiod_unexport() and remove all exported GPIOs with the sysfs_lock taken
before unregistering the parent device itself. This is done to prevent
any new exports happening before we unregister the device completely. |
| In the Linux kernel, the following vulnerability has been resolved:
usb: dwc3: gadget: Move vbus draw to workqueue context
Currently dwc3_gadget_vbus_draw() can be called from atomic
context, which in turn invokes power-supply-core APIs. And
some these PMIC APIs have operations that may sleep, leading
to kernel panic.
Fix this by moving the vbus_draw into a workqueue context. |
| In the Linux kernel, the following vulnerability has been resolved:
xfrm: always flush state and policy upon NETDEV_UNREGISTER event
syzbot is reporting that "struct xfrm_state" refcount is leaking.
unregister_netdevice: waiting for netdevsim0 to become free. Usage count = 2
ref_tracker: netdev@ffff888052f24618 has 1/1 users at
__netdev_tracker_alloc include/linux/netdevice.h:4400 [inline]
netdev_tracker_alloc include/linux/netdevice.h:4412 [inline]
xfrm_dev_state_add+0x3a5/0x1080 net/xfrm/xfrm_device.c:316
xfrm_state_construct net/xfrm/xfrm_user.c:986 [inline]
xfrm_add_sa+0x34ff/0x5fa0 net/xfrm/xfrm_user.c:1022
xfrm_user_rcv_msg+0x58e/0xc00 net/xfrm/xfrm_user.c:3507
netlink_rcv_skb+0x158/0x420 net/netlink/af_netlink.c:2550
xfrm_netlink_rcv+0x71/0x90 net/xfrm/xfrm_user.c:3529
netlink_unicast_kernel net/netlink/af_netlink.c:1318 [inline]
netlink_unicast+0x5aa/0x870 net/netlink/af_netlink.c:1344
netlink_sendmsg+0x8c8/0xdd0 net/netlink/af_netlink.c:1894
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
____sys_sendmsg+0xa5d/0xc30 net/socket.c:2592
___sys_sendmsg+0x134/0x1d0 net/socket.c:2646
__sys_sendmsg+0x16d/0x220 net/socket.c:2678
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xcd/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This is because commit d77e38e612a0 ("xfrm: Add an IPsec hardware
offloading API") implemented xfrm_dev_unregister() as no-op despite
xfrm_dev_state_add() from xfrm_state_construct() acquires a reference
to "struct net_device".
I guess that that commit expected that NETDEV_DOWN event is fired before
NETDEV_UNREGISTER event fires, and also assumed that xfrm_dev_state_add()
is called only if (dev->features & NETIF_F_HW_ESP) != 0.
Sabrina Dubroca identified steps to reproduce the same symptoms as below.
echo 0 > /sys/bus/netdevsim/new_device
dev=$(ls -1 /sys/bus/netdevsim/devices/netdevsim0/net/)
ip xfrm state add src 192.168.13.1 dst 192.168.13.2 proto esp \
spi 0x1000 mode tunnel aead 'rfc4106(gcm(aes))' $key 128 \
offload crypto dev $dev dir out
ethtool -K $dev esp-hw-offload off
echo 0 > /sys/bus/netdevsim/del_device
Like these steps indicate, the NETIF_F_HW_ESP bit can be cleared after
xfrm_dev_state_add() acquired a reference to "struct net_device".
Also, xfrm_dev_state_add() does not check for the NETIF_F_HW_ESP bit
when acquiring a reference to "struct net_device".
Commit 03891f820c21 ("xfrm: handle NETDEV_UNREGISTER for xfrm device")
re-introduced the NETDEV_UNREGISTER event to xfrm_dev_event(), but that
commit for unknown reason chose to share xfrm_dev_down() between the
NETDEV_DOWN event and the NETDEV_UNREGISTER event.
I guess that that commit missed the behavior in the previous paragraph.
Therefore, we need to re-introduce xfrm_dev_unregister() in order to
release the reference to "struct net_device" by unconditionally flushing
state and policy. |
| In the Linux kernel, the following vulnerability has been resolved:
Revert "PCI/IOV: Add PCI rescan-remove locking when enabling/disabling SR-IOV"
This reverts commit 05703271c3cd ("PCI/IOV: Add PCI rescan-remove locking
when enabling/disabling SR-IOV"), which causes a deadlock by recursively
taking pci_rescan_remove_lock when sriov_del_vfs() is called as part of
pci_stop_and_remove_bus_device(). For example with the following sequence
of commands:
$ echo <NUM> > /sys/bus/pci/devices/<pf>/sriov_numvfs
$ echo 1 > /sys/bus/pci/devices/<pf>/remove
A trimmed trace of the deadlock on a mlx5 device is as below:
zsh/5715 is trying to acquire lock:
000002597926ef50 (pci_rescan_remove_lock){+.+.}-{3:3}, at: sriov_disable+0x34/0x140
but task is already holding lock:
000002597926ef50 (pci_rescan_remove_lock){+.+.}-{3:3}, at: pci_stop_and_remove_bus_device_locked+0x24/0x80
...
Call Trace:
[<00000259778c4f90>] dump_stack_lvl+0xc0/0x110
[<00000259779c844e>] print_deadlock_bug+0x31e/0x330
[<00000259779c1908>] __lock_acquire+0x16c8/0x32f0
[<00000259779bffac>] lock_acquire+0x14c/0x350
[<00000259789643a6>] __mutex_lock_common+0xe6/0x1520
[<000002597896413c>] mutex_lock_nested+0x3c/0x50
[<00000259784a07e4>] sriov_disable+0x34/0x140
[<00000258f7d6dd80>] mlx5_sriov_disable+0x50/0x80 [mlx5_core]
[<00000258f7d5745e>] remove_one+0x5e/0xf0 [mlx5_core]
[<00000259784857fc>] pci_device_remove+0x3c/0xa0
[<000002597851012e>] device_release_driver_internal+0x18e/0x280
[<000002597847ae22>] pci_stop_bus_device+0x82/0xa0
[<000002597847afce>] pci_stop_and_remove_bus_device_locked+0x5e/0x80
[<00000259784972c2>] remove_store+0x72/0x90
[<0000025977e6661a>] kernfs_fop_write_iter+0x15a/0x200
[<0000025977d7241c>] vfs_write+0x24c/0x300
[<0000025977d72696>] ksys_write+0x86/0x110
[<000002597895b61c>] __do_syscall+0x14c/0x400
[<000002597896e0ee>] system_call+0x6e/0x90
This alone is not a complete fix as it restores the issue the cited commit
tried to solve. A new fix will be provided as a follow on. |