Sequoia Linux文件系统提权漏洞|CVE-2021-33909

目录导航

Qualys 研究团队在影响大多数 Linux 操作系统的 Linux 内核文件系统层中发现了 size_t-to-int 类型转换漏洞。通过在默认配置中利用此漏洞，任何非特权用户都可以在易受攻击的主机上获得 root 权限。

关于 Linux 文件系统

文件系统是存储设备上数据和元数据的组织。它控制数据的存储和检索方式，其最重要的功能是管理用户数据。Linux 文件系统接口实现为分层架构，将用户接口层与文件系统实现以及操作存储设备的驱动程序分开。它是任何操作系统中最重要的功能，并且在所有主要的 Linux 操作系统中无处不在。

影响

成功利用此漏洞允许任何非特权用户在易受攻击的主机上获得 root 权限。

Qualys 安全研究人员已经能够在 Ubuntu 20.04、Ubuntu 20.10、Ubuntu 21.04、Debian 11 和 Fedora 34 Workstation 的默认安装上独立验证漏洞、利用漏洞并获得完全 root 权限。其他 Linux 发行版可能容易受到攻击并且可能被利用。

Qualys 研究团队确认该漏洞后，Qualys 立即进行负责任的漏洞披露，并与供应商和开源发行版协调公布该漏洞。

披露时间表

2021-06-09：Qualys 研究团队 (QRT) 向 Red Hat Product Security 发送了关于 CVE-2021-33909 和 CVE-2021-33910 的建议（这两个漏洞密切相关，systemd-security 邮件列表由红色的帽子）。
2021-07-06：QRT 向 linux-distros@openwall 邮件列表发送了建议，Red Hat 将他们编写的补丁发送到了 linux-distros@openwall 邮件列表。
2021-07-13：QRT 发送了 CVE-2021-33909 的咨询，Red Hat 将他们编写的补丁发送到 security@kernel 邮件列表。
2021 年 7 月 20 日：协调发布日期（UTC 下午 12:00）。

概念验证视频

技术细节

Linux 内核的 seq_file 接口产生包含记录序列的虚拟文件（例如，/proc 中的许多文件是 seq_files，而记录通常是行）。每个记录都必须适合一个 seq_file 缓冲区，因此可以根据需要放大，方法是在第 242 行将其大小加倍（seq_buf_alloc() 是 kvmalloc() 的简单包装）：

 ------------------------------------------------------------------------
168 ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)
169 {
170         struct seq_file m = iocb->ki_filp->private_data; 
… 
205         / grab buffer if we didn't have one */
206         if (!m->buf) {
207                 m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
…
210         }
…
220         // get a non-empty record in the buffer
…
223         while (1) {
…
227                 err = m->op->show(m, p);
…
236                 if (!seq_has_overflowed(m)) // got it
237                         goto Fill;
238                 // need a bigger buffer
…
240                 kvfree(m->buf);
…
242                 m->buf = seq_buf_alloc(m->size <<= 1);
…
246         }
------------------------------------------------------------------------

这个大小乘法本身并不是一个漏洞，因为 m->size 是一个 size_t（一个无符号的 64 位整数，在 x86_64 上），并且系统会在这个乘法溢出整数 m->size 之前很久就耗尽内存。不幸的是，这个 size_t 也被传递给 size 参数是 int（有符号的 32 位整数）而不是 size_t 的函数。例如，show_mountinfo() 函数（在第 227 行调用以格式化 /proc/self/mountinfo 中的记录）调用 seq_dentry()（在第 150 行），它调用 dentry_path()（在第 530 行），它调用prepend()（在第 387 行）：

 ------------------------------------------------------------------------
135 static int show_mountinfo(struct seq_file *m, struct vfsmount *mnt)
136 {
…
150                 seq_dentry(m, mnt->mnt_root, " \t\n\");
------------------------------------------------------------------------
523 int seq_dentry(struct seq_file *m, struct dentry *dentry, const char *esc)
524 {
525         char *buf;
526         size_t size = seq_get_buf(m, &buf);
…
529         if (size) {
530                 char *p = dentry_path(dentry, buf, size);
------------------------------------------------------------------------
380 char *dentry_path(struct dentry *dentry, char *buf, int buflen)
381 {
382         char *p = NULL;
…
385         if (d_unlinked(dentry)) {
386                 p = buf + buflen;
387                 if (prepend(&p, &buflen, "//deleted", 10) != 0)
------------------------------------------------------------------------
11 static int prepend(char **buffer, int *buflen, const char *str, int namelen)
12 {
13         *buflen -= namelen;
14         if (*buflen < 0)
15                 return -ENAMETOOLONG;
16         *buffer -= namelen;
17         memcpy(*buffer, str, namelen);
------------------------------------------------------------------------

因此，如果无特权的本地攻击者创建、挂载和删除总路径长度超过 1GB 的深层目录结构，并且如果攻击者 open()s 和 read()s /proc/self/mountinfo，则：

在 seq_read_iter() 中，vmalloc() 化了 2GB 缓冲区（第 242 行），并调用了 show_mountinfo()（第 227 行）；
在 show_mountinfo() 中，使用空的 2GB 缓冲区调用 seq_dentry()（第 150 行）；
在 seq_dentry() 中，dentry_path() 以 2GB 大小调用（第 530 行）；
在 dentry_path() 中，因此 int buflen 为负数 (INT_MIN, -2GB)，p 指向 vmalloc()ated 缓冲区下方 -2GB 的偏移量（第 386 行），并调用 prepend()（第 387 行）；
在 prepend() 中，*buflen 减少了 10 个字节，变成了一个很大但为正的 int（第 13 行），*buffer 减少了 10 个字节并指向 vmalloc()ated 缓冲区下方的 -2GB-10B 偏移量（行16)，并且 10 字节的字符串“//deleted”被越界写入（第 17 行）。

解决方案

鉴于此漏洞的攻击面广度，Qualys 建议用户立即针对此漏洞应用补丁。

Qualys 客户可以搜索 CVE-2021-33909 的漏洞知识库，以识别所有易受此漏洞影响的 QID 和资产。

如果您不是客户，请开始免费的 Qualys VMDR 试用，以完全访问 CVE-2021-33909 的 QID（检测），以便识别易受攻击的资产。

漏洞覆盖范围

Qualys 将发布下表中的 QID，因为它们从 vulnsigs 版本 VULNSIGS-2.5.237-3 和 Linux 云代理清单版本 lx_manifest-2.5.237.3-2 开始可用

合格证	标题	VulnSigs 版本
750847	适用于 Linux 内核的 OpenSUSE 安全更新 (openSUSE-SU-2021:2409-1)	VULNSIGS-2.5.237-3 / lx_manifest-2.5.237.3-2
750844	SUSE Enterprise Linux 内核安全更新 (SUSE-SU-2021:2407-1)	VULNSIGS-2.5.237-3 / lx_manifest-2.5.237.3-2
178710	用于 Linux 的 Debian 安全更新 (DSA 4941-1)	VULNSIGS-2.5.237-3 / lx_manifest-2.5.237.3-2
375710	Linux 内核本地提权漏洞 (Sequoia)	VULNSIGS-2.5.237-3 / lx_manifest-2.5.237.3-2

常见问题 (FAQ)

哪些版本易受攻击？

2014 年以后的所有 Linux 内核版本都存在漏洞。

Qualys 研究团队是否会发布此漏洞的利用代码？

PoC 随咨询附有，可从https://www.qualys.com/research/security-advisories/ 获得。


Qualys Security Advisory

Sequoia: A deep root in Linux's filesystem layer (CVE-2021-33909)


========================================================================
Contents
========================================================================

Summary
Analysis
Exploitation overview
Exploitation details
Mitigations
Acknowledgments
Timeline


========================================================================
Summary
========================================================================

We discovered a size_t-to-int conversion vulnerability in the Linux
kernel's filesystem layer: by creating, mounting, and deleting a deep
directory structure whose total path length exceeds 1GB, an unprivileged
local attacker can write the 10-byte string "//deleted" to an offset of
exactly -2GB-10B below the beginning of a vmalloc()ated kernel buffer.

We successfully exploited this uncontrolled out-of-bounds write, and
obtained full root privileges on default installations of Ubuntu 20.04,
Ubuntu 20.10, Ubuntu 21.04, Debian 11, and Fedora 34 Workstation; other
Linux distributions are certainly vulnerable, and probably exploitable.
Our exploit requires approximately 5GB of memory and 1M inodes; we will
publish it in the near future. A basic proof of concept (a crasher) is
attached to this advisory and is available at:

https://www.qualys.com/research/security-advisories/

To the best of our knowledge, this vulnerability was introduced in July
2014 (Linux 3.16) by commit 058504ed ("fs/seq_file: fallback to vmalloc
allocation").


========================================================================
Analysis
========================================================================

The Linux kernel's seq_file interface produces virtual files that
contain sequences of records (for example, many files in /proc are
seq_files, and records are usually lines). Each record must fit into a
seq_file buffer, which is therefore enlarged as needed, by doubling its
size at line 242 (seq_buf_alloc() is a simple wrapper around
kvmalloc()):

------------------------------------------------------------------------
 168 ssize_t seq_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 169 {
 170         struct seq_file *m = iocb->ki_filp->private_data;
 ...
 205         /* grab buffer if we didn't have one */
 206         if (!m->buf) {
 207                 m->buf = seq_buf_alloc(m->size = PAGE_SIZE);
 ...
 210         }
 ...
 220         // get a non-empty record in the buffer
 ...
 223         while (1) {
 ...
 227                 err = m->op->show(m, p);
 ...
 236                 if (!seq_has_overflowed(m)) // got it
 237                         goto Fill;
 238                 // need a bigger buffer
 ...
 240                 kvfree(m->buf);
 ...
 242                 m->buf = seq_buf_alloc(m->size <<= 1);
 ...
 246         }
------------------------------------------------------------------------

This size multiplication is not a vulnerability in itself, because
m->size is a size_t (an unsigned 64-bit integer, on x86_64), and the
system would run out of memory long before this multiplication overflows
the integer m->size.

Unfortunately, this size_t is also passed to functions whose size
argument is an int (a signed 32-bit integer), not a size_t. For example,
the show_mountinfo() function (which is called at line 227 to format the
records in /proc/self/mountinfo) calls seq_dentry() (at line 150), which
calls dentry_path() (at line 530), which calls prepend() (at line 387):

------------------------------------------------------------------------
135 static int show_mountinfo(struct seq_file *m, struct vfsmount *mnt)
136 {
...
150                 seq_dentry(m, mnt->mnt_root, " \t\n\\");
------------------------------------------------------------------------
 523 int seq_dentry(struct seq_file *m, struct dentry *dentry, const char *esc)
 524 {
 525         char *buf;
 526         size_t size = seq_get_buf(m, &buf);
 ...
 529         if (size) {
 530                 char *p = dentry_path(dentry, buf, size);
------------------------------------------------------------------------
380 char *dentry_path(struct dentry *dentry, char *buf, int buflen)
381 {
382         char *p = NULL;
...
385         if (d_unlinked(dentry)) {
386                 p = buf + buflen;
387                 if (prepend(&p, &buflen, "//deleted", 10) != 0)
------------------------------------------------------------------------
 11 static int prepend(char **buffer, int *buflen, const char *str, int namelen)
 12 {
 13         *buflen -= namelen;
 14         if (*buflen < 0)
 15                 return -ENAMETOOLONG;
 16         *buffer -= namelen;
 17         memcpy(*buffer, str, namelen);
------------------------------------------------------------------------

As a result, if an unprivileged local attacker creates, mounts, and
deletes a deep directory structure whose total path length exceeds 1GB,
and if the attacker open()s and read()s /proc/self/mountinfo, then:

- in seq_read_iter(), a 2GB buffer is vmalloc()ated (line 242), and
  show_mountinfo() is called (line 227);

- in show_mountinfo(), seq_dentry() is called with the empty 2GB buffer
  (line 150);

- in seq_dentry(), dentry_path() is called with a 2GB size (line 530);

- in dentry_path(), the int buflen is therefore negative (INT_MIN,
  -2GB), p points to an offset of -2GB below the vmalloc()ated buffer
  (line 386), and prepend() is called (line 387);

- in prepend(), *buflen is decreased by 10 bytes and becomes a large but
  positive int (line 13), *buffer is decreased by 10 bytes and points to
  an offset of -2GB-10B below the vmalloc()ated buffer (line 16), and
  the 10-byte string "//deleted" is written out of bounds (line 17).


========================================================================
Exploitation overview
========================================================================

1/ We mkdir() a deep directory structure (roughly 1M nested directories)
whose total path length exceeds 1GB, we bind-mount it in an unprivileged
user namespace, and rmdir() it.

2/ We create a thread that vmalloc()ates a small eBPF program (via
BPF_PROG_LOAD), and we block this thread (via userfaultfd or FUSE) after
our eBPF program has been validated by the kernel eBPF verifier but
before it is JIT-compiled by the kernel.

3/ We open() /proc/self/mountinfo in our unprivileged user namespace,
and start read()ing the long path of our bind-mounted directory, thereby
writing the string "//deleted" to an offset of exactly -2GB-10B below
the beginning of a vmalloc()ated buffer.

4/ We arrange for this "//deleted" string to overwrite an instruction of
our validated eBPF program (and therefore nullify the security checks of
the kernel eBPF verifier), and transform this uncontrolled out-of-bounds
write into an information disclosure, and into a limited but controlled
out-of-bounds write.

5/ We transform this limited out-of-bounds write into an arbitrary read
and write of kernel memory, by reusing Manfred Paul's beautiful btf and
map_push_elem techniques from:

https://www.thezdi.com/blog/2020/4/8/cve-2020-8835-linux-kernel-privilege-escalation-via-improper-ebpf-program-verification

6/ We use this arbitrary read to locate the modprobe_path[] buffer in
kernel memory, and use the arbitrary write to replace the contents of
this buffer ("/sbin/modprobe" by default) with a path to our own
executable, thus obtaining full root privileges.


========================================================================
Exploitation details
========================================================================

a/ We create a directory whose total path length exceeds 1GB: in theory,
we need to create over 1GB/256B=4M nested directories (NAME_MAX is 255);
in practice, show_mountinfo() replaces each '\\' character in our long
directory with the 4-byte string "\\134", and we therefore need to
create only 1M nested directories.

b/ We fill all large vmalloc holes: we bind-mount (MS_BIND) various
parts of our long directory in several unprivileged user namespaces and
vmalloc()ate large seq_file buffers by read()ing /proc/self/mountinfo.
For example, we vmalloc()ate 768MB of large buffers in our exploit.

c/ We vmalloc()ate two 1GB buffers and one 2GB buffer (by bind-mounting
our long directory in three different user namespaces, and by read()ing
/proc/self/mountinfo), and we check that "//deleted" is indeed written
to an offset of -2GB-10B below the beginning of our 2GB buffer (i.e.,
8182B above the beginning of our first 1GB buffer -- the "XXX"s are
guard pages):

        "//deleted"
             |
      4KB    v   1GB        4KB        1GB        4KB        2GB
-----|---|---+-------------|---|-----------------|---|-----------------|
 ... |XXX| seq_file buffer |XXX| seq_file buffer |XXX| seq_file buffer |
-----|---|---+-------------|---|-----------------|---|-----------------|
         |   |                                       |
         |   \----<----<----<----<----<----<----<----/
         8182B               -2GB-10B

d/ We fill all small vmalloc holes: we vmalloc()ate various small socket
buffers by send()ing numerous NETLINK_USERSOCK messages. For example, we
vmalloc()ate 256MB of small buffers in our exploit.

e/ We create 1024 user-space threads; each thread starts loading an eBPF
program into the kernel, but (via userfaultfd or FUSE) we block every
thread in kernel space (at line 2101), before our eBPF programs are
actually vmalloc()ated (at line 2162):

------------------------------------------------------------------------
2076 static int bpf_prog_load(union bpf_attr *attr, union bpf_attr __user *uattr)
2077 {
....
2100         /* copy eBPF program license from user space */
2101         if (strncpy_from_user(license, u64_to_user_ptr(attr->license),
....
2161         /* plain bpf_prog allocation */
2162         prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
------------------------------------------------------------------------

f/ We vfree() our first 1GB seq_file buffer (where "//deleted" was
written out of bounds), and we immediately unblock all 1024 threads; our
eBPF programs are vmalloc()ated into the 1GB hole that we just vfree()d:

      4KB        1GB        4KB        1GB        4KB        2GB
-----|---|-----------------|---|-----------------|---|-----------------|
 ... |XXX|  eBPF programs  |XXX| seq_file buffer |XXX| seq_file buffer |
-----|---|-----------------|---|-----------------|---|-----------------|

g/ Next, (again via userfaultfd or FUSE) we block one of our threads (at
line 12795) after its eBPF program has been validated by the kernel eBPF
verifier but before it is JIT-compiled by the kernel:

------------------------------------------------------------------------
12640 int bpf_check(struct bpf_prog **prog, union bpf_attr *attr,
12641               union bpf_attr __user *uattr)
12642 {
.....
12795         print_verification_stats(env);
------------------------------------------------------------------------

h/ Last, we overwrite an instruction of this eBPF program with an
out-of-bounds "//deleted" string (again via our 2GB seq_file buffer),
and therefore nullify the security checks of the kernel eBPF verifier:

        "//deleted"
             |
      4KB    v   1GB        4KB        1GB        4KB        2GB
-----|---|---+-------------|---|-----------------|---|-----------------|
 ... |XXX|  eBPF programs  |XXX| seq_file buffer |XXX| seq_file buffer |
-----|---|---+-------------|---|-----------------|---|-----------------|
         |   |                                       |
         |   \----<----<----<----<----<----<----<----/
         8182B               -2GB-10B

First, we transform this uncontrolled eBPF-program corruption into an
information disclosure. Our first, uncorrupted eBPF program is deemed
safe by the kernel eBPF verifier ("storage" and "control" are two basic
BPF_MAP_TYPE_ARRAYs, readable and writable from user space via
BPF_MAP_LOOKUP_ELEM and BPF_MAP_UPDATE_ELEM):

- BPF_LD_IMM64_RAW(BPF_REG_2, BPF_PSEUDO_MAP_VALUE, storage) loads the
  address of our storage map (which resides in kernel space and whose
  address is unknown to us) into the eBPF register BPF_REG_2;

- BPF_MOV64_IMM(BPF_REG_2, 0) immediately replaces the contents of
  BPF_REG_2 (the address of our storage map) with the constant value 0;

- BPF_LD_IMM64_RAW(BPF_REG_3, BPF_PSEUDO_MAP_VALUE, control) loads the
  address of our control map into BPF_REG_3;

- BPF_STX_MEM(BPF_DW, BPF_REG_3, BPF_REG_2, 0) stores the contents of
  BPF_REG_2 (the constant value 0) into our control map.

However, our eBPF-program corruption overwrites the instruction
BPF_MOV64_IMM(BPF_REG_2, 0) with the 8-byte string "deleted", which
translates into the instruction BPF_ALU32_IMM(BPF_LSH, BPF_REG_5, 0x74):
a NOP ("no operation"), because our program does not use BPF_REG_5. As a
result, we do not store the constant value 0 into our control map:
instead, we store and disclose the address of our storage map.

(This information disclosure allowed us to greatly reduce the number of
hardcoded kernel offsets in our exploit: our Ubuntu 20.04 exploit worked
out of the box on Ubuntu 20.10, Ubuntu 21.04, Debian 11, and Fedora 34.)

Second, we transform our uncontrolled eBPF-program corruption into a
limited but controlled out-of-bounds write. Our second, uncorrupted eBPF
program is also deemed safe by the kernel eBPF verifier ("corrupt" is a
3*64KB BPF_MAP_TYPE_ARRAY):

- BPF_LD_IMM64_RAW(BPF_REG_4, BPF_PSEUDO_MAP_VALUE, corrupt) loads the
  address of our corrupt map into BPF_REG_4;

- BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, 3*64KB/2) points BPF_REG_4 to the
  middle of our corrupt map;

- BPF_ALU64_IMM(BPF_SUB, BPF_REG_4, 3*64KB/4) points BPF_REG_4 to the
  first quarter of our corrupt map;

- BPF_LD_IMM64_RAW(BPF_REG_3, BPF_PSEUDO_MAP_VALUE, control) loads the
  address of our control map into BPF_REG_3;

- BPF_LDX_MEM(BPF_H, BPF_REG_7, BPF_REG_3, 0) loads a variable 16-bit
  offset from our control map into BPF_REG_7;

- BPF_ALU64_REG(BPF_ADD, BPF_REG_4, BPF_REG_7) adds BPF_REG_7 (our
  variable 16-bit offset) to BPF_REG_4, which therefore points safely
  within the bounds of our corrupt map (because BPF_REG_7 is in the
  [0,64KB] range).

However, our eBPF-program corruption overwrites the instruction
BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, 3*64KB/2) with the string "deleted",
which translates into BPF_ALU32_IMM(BPF_LSH, BPF_REG_5, 0x74) (a NOP).
As a result, the following BPF_ALU64_IMM(BPF_SUB, BPF_REG_4, 3*64KB/4)
points BPF_REG_4 out of bounds and allows us to read from and write to
the struct bpf_map that precedes our corrupt map in kernel space.

Finally, we transform this limited out-of-bounds read and write into an
arbitrary read and write of kernel memory, by reusing Manfred Paul's btf
and map_push_elem techniques:

- With the arbitrary kernel read we locate the symbol "__request_module"
  and hence the function __request_module(), disassemble this function,
  and extract the address of modprobe_path[] from the instructions for
  "if (!modprobe_path[0])".

- With the arbitrary kernel write we overwrite the contents of
  modprobe_path[] ("/sbin/modprobe" by default) with a path to our own
  executable, and call request_module() (by creating a netlink socket),
  which executes modprobe_path, and hence our own executable, as root.


========================================================================
Mitigations
========================================================================

Important note: the following mitigations prevent only our specific
exploit from working (but other exploitation techniques may exist); to
completely fix this vulnerability, the kernel must be patched.

- Set /proc/sys/kernel/unprivileged_userns_clone to 0, to prevent an
  attacker from mounting a long directory in a user namespace. However,
  the attacker may mount a long directory via FUSE instead; we have not
  fully explored this possibility, because we accidentally stumbled upon
  CVE-2021-33910 in systemd: if an attacker FUSE-mounts a long directory
  (longer than 8MB), then systemd exhausts its stack, crashes, and
  therefore crashes the entire operating system (a kernel panic).

- Set /proc/sys/kernel/unprivileged_bpf_disabled to 1, to prevent an
  attacker from loading an eBPF program into the kernel. However, the
  attacker may corrupt other vmalloc()ated objects instead (for example,
  thread stacks), but we have not investigated this possibility.


========================================================================
Acknowledgments
========================================================================

We thank the PaX Team for answering our many questions about the Linux
kernel. We also thank Manfred Paul, Jann Horn, Brandon Azad, Simon
Scannell, and Bruce Leidl for their exploits and write-ups:

https://www.thezdi.com/blog/2020/4/8/cve-2020-8835-linux-kernel-privilege-escalation-via-improper-ebpf-program-verification
https://googleprojectzero.blogspot.com/2016/06/exploiting-recursion-in-linux-kernel_20.html
https://googleprojectzero.blogspot.com/2020/12/an-ios-hacker-tries-android.html
https://scannell.io/posts/ebpf-fuzzing/
https://github.com/brl/grlh

We thank Red Hat Product Security and the members of
linux-distros@openwall and security@kernel for their work on this
coordinated disclosure. We also thank Mitre's CVE Assignment Team.
Finally, we thank Marco Ivaldi for his continued support.


========================================================================
Timeline
========================================================================

2021-06-09: We sent our advisories for CVE-2021-33909 and CVE-2021-33910
to Red Hat Product Security (the two vulnerabilities are closely related
and the systemd-security mailing list is hosted by Red Hat).

2021-07-06: We sent our advisories, and Red Hat sent the patches they
wrote, to the linux-distros@openwall mailing list.

2021-07-13: We sent our advisory for CVE-2021-33909, and Red Hat sent
the patch they wrote, to the security@kernel mailing list.

2021-07-20: Coordinated Release Date (12:00 PM UTC).

是否有针对此漏洞的缓解措施？

以下缓解措施仅阻止我们特定的漏洞利用（但可能存在其他漏洞利用技术）；要完全修复此漏洞，必须修补内核。

将 /proc/sys/kernel/unprivileged_userns_clone 设置为 0，以防止攻击者在用户命名空间中挂载长目录。但是，攻击者可能会通过 FUSE 挂载一个长目录；我们还没有完全探索这种可能性，因为我们无意中在 systemd 中偶然发现了 CVE-2021-33910：如果攻击者 FUSE-mount 一个长目录（超过 8MB），那么 systemd 会耗尽其堆栈，崩溃，从而导致整个操作系统（内核恐慌）。
将 /proc/sys/kernel/unprivileged_bpf_disabled 设置为 1，以防止攻击者将 eBPF 程序加载到内核中。然而，攻击者可能会破坏其他 vmalloc() 化的对象（例如，线程堆栈），但我们尚未调查这种可能性。

为什么漏洞被命名为“Sequoia”？

该错误位于 Linux 的 seq_file 接口中，而“Sequoia sempervirens”是一棵具有广泛根的树：该错误的深层目录树上的双关语，可产生 root 权限。

参考链接

http://packetstormsecurity.com/files/163621/Sequoia-A-Deep-Root-In-Linuxs-Filesystem-Layer.html
https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.13.4
https://github.com/torvalds/linux/commit/8cae8cd89f05f6de223d63e6d15e31c8ba9cf53b
https://lists.debian.org/debian-lts-announce/2021/07/msg00014.html
https://lists.debian.org/debian-lts-announce/2021/07/msg00015.html
https://lists.debian.org/debian-lts-announce/2021/07/msg00016.html
https://www.debian.org/security/2021/dsa-4941
https://www.openwall.com/lists/oss-security/2021/07/20/1

from