[BigData] Troubleshooting BeeGFS Deployment

上一篇介紹了 BeeGFS 如何與 k8s 的機器做對接,本篇希望能夠分享與紀錄檸檬爸在實際安裝 BeeGFS 到地端 k8s 叢集的時候,遇到的很多問題,以下是關於這些問題的 TroubleShooting 與一些 BeeGFS 工具的實用整理,由於 BeeGFS 是一個博大精深,筆者也是持續在學習中。

beegfs-client-dkms

由於要使用 BeeGFS CSI Driver,所以預設要安裝 beegfs-client-dkms,一開始透過指令:

yum install -y beegfs-client-dkms

安裝 beegfs-client-dkms 的時候就發生無法安裝的問題,但是在 Ubuntu 並沒有遇到錯誤,後續釐清之後,發現必須要先安裝 dkms,可以參考 How to install DKMS on Rocky Linux 9 這一篇的安裝方法。

beegfs-client 與 beegfs-client-dkms 以上兩個套件只能安裝其中一個,彼此是互相衝突。

beegfs-client-dkms mount

由於 beegfs-client 與 beegfs-client-dkms 只能夠使用一個,所以如果同時要管理資料又想要利用 K8S 來使用的話,beegfs-client-dkms mount 就無法避免,詳細可以參考

否則就是執行以下的指令:

sudo modprobe beegfs
echo "beegfs_nodev  /mnt/beegfs  beegfs  rw,relatime,cfgFile=/etc/beegfs/beegfs-client.conf,_netdev 0 0" > /etc/fstab
sudo mount /mnt/beegfs

kernel-devel 版本不匹配

[root@rocky9 home]# systemctl status beegfs-client.service
× beegfs-client.service - Start BeeGFS Client
     Loaded: loaded (/usr/lib/systemd/system/beegfs-client.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Wed 2025-04-23 04:01:16 UTC; 5s ago
    Process: 6490 ExecStart=/etc/init.d/beegfs-client start (code=exited, status=2)
   Main PID: 6490 (code=exited, status=2)
        CPU: 1.216s

Apr 23 04:01:15 rocky9 beegfs-client[6490]: - BeeGFS module autobuild
Apr 23 04:01:15 rocky9 beegfs-client[6503]: $OFED_INCLUDE_PATH = []
Apr 23 04:01:15 rocky9 beegfs-client[6749]: $OFED_INCLUDE_PATH = []
Apr 23 04:01:16 rocky9 beegfs-client[7000]: $OFED_INCLUDE_PATH = []
Apr 23 04:01:16 rocky9 beegfs-client[7000]: Makefile:188: *** Linux kernel build directory not found. Please check if the kernel module development packages are installed for the current kernel version. (RHEL: kernel-devel; SLES: kernel-defau>
Apr 23 04:01:16 rocky9 beegfs-client[6503]: make: *** [AutoRebuild.mk:34: auto_rebuild] Error 2
Apr 23 04:01:16 rocky9 systemd[1]: beegfs-client.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 23 04:01:16 rocky9 systemd[1]: beegfs-client.service: Failed with result 'exit-code'.
Apr 23 04:01:16 rocky9 systemd[1]: Failed to start Start BeeGFS Client.
Apr 23 04:01:16 rocky9 systemd[1]: beegfs-client.service: Consumed 1.216s CPU time.

以上錯誤訊息原因為 kernel module development packages (kernel-devel) 與當前使用的 kernel 版本不匹配,可以嘗試去公開的 Repository 找適合的安裝檔,並且利用 rpm -i 的指令安裝。

modprobe: Key was rejected by the services

[root@rocky9 home]# systemctl status beegfs-client.service
× beegfs-client.service - Start BeeGFS Client
     Loaded: loaded (/usr/lib/systemd/system/beegfs-client.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Wed 2025-04-23 04:25:54 UTC; 10s ago
    Process: 66542 ExecStart=/etc/init.d/beegfs-client start (code=exited, status=1/FAILURE)
   Main PID: 66542 (code=exited, status=1/FAILURE)
        CPU: 3min 44.962s

Apr 23 04:22:10 rocky9 beegfs-client[67510]: feature detection gives: -DKERNEL_HAS_INODE_ATIME -DKERNEL_HAS_SCHED_SIG_H -DKERNEL_HAS_LINUX_STDARG_H -DKERNEL_HAS_STATX -DKERNEL_HAS_KREF_READ -DKERNEL_HAS_FILE_DENTRY -DKERNEL_HAS_SUPER_SETUP_BD>
Apr 23 04:25:30 rocky9 beegfs-client[70182]: feature detection gives: -DKERNEL_HAS_INODE_ATIME -DKERNEL_HAS_SCHED_SIG_H -DKERNEL_HAS_LINUX_STDARG_H -DKERNEL_HAS_STATX -DKERNEL_HAS_KREF_READ -DKERNEL_HAS_FILE_DENTRY -DKERNEL_HAS_SUPER_SETUP_BD>
Apr 23 04:25:31 rocky9 beegfs-client[70531]: Skipping BTF generation for /opt/beegfs/src/client/client_module_7/build/../source/beegfs.ko due to unavailability of vmlinux
Apr 23 04:25:31 rocky9 beegfs-client[70539]: $OFED_INCLUDE_PATH = []
Apr 23 04:25:53 rocky9 beegfs-client[70789]: $OFED_INCLUDE_PATH = []
Apr 23 04:25:54 rocky9 beegfs-client[71042]: modprobe: ERROR: could not insert 'beegfs': Key was rejected by service
Apr 23 04:25:54 rocky9 systemd[1]: beegfs-client.service: Main process exited, code=exited, status=1/FAILURE
Apr 23 04:25:54 rocky9 systemd[1]: beegfs-client.service: Failed with result 'exit-code'.
Apr 23 04:25:54 rocky9 systemd[1]: Failed to start Start BeeGFS Client.
Apr 23 04:25:54 rocky9 systemd[1]: beegfs-client.service: Consumed 3min 44.962s CPU time.

遇到無法掛載 BeeGFS 空間,參考文章猜想是因為 Secure Boot 打開了。

BeeGFS Client 實用工具 (beegfs-ctl)

當 BeeGFS FileSystem 組建起來之後,通常不只一台 Storage daemon,此時根據參考資料會有很多可以利用的指令幫助我們了解目前 BeeGFS FileSystem 全部的狀態:

beegfs-ctl --listnodes --nodetype=meta --nicdetails
beegfs-ctl --listnodes --nodetype=storage --nicdetails

beegfs-net                # Displays connections the client is actually using
beegfs-check-servers      # Displays possible connectivity of the services
beegfs-df                 # Displays free space and inodes of storage and metadata targets

beegfs-ctl 也是一個用來管理 BeeGFS Filesystem 的工具,以下是詳細功能的列表:

$ beegfs-ctl --<modename> [mode_arguments] [client_arguments]

MODES:
 --listnodes            => List registered clients and servers.
 --listargets           => List metadata and storage targets.
 --removenode (*)       => Remove (unregister) a node.
 --removetarget (*)     => Remove (unregister) a storage target.
 
 --getentryinfo         => Show file system entry details
 --setpattern (*)       => Set a new striping configuration.
 --mirrormd (*)         => Enable metadata mirroring.
 --find                 => Find files located on certain servers.
 --refreshentryinfo     => Refresh file system entry metadata.
 
 --createfile (*)       => Create a new file.
 --createdir (*)        => Create a new directory.
 --migrate              => Migrate files to other storage servers.
 --disposeunused (*)    => Purge remains of unlinked files.
 
 --serverstats          => Show server IO statistics.
 --clientstats          => Show client IO statistics.
 --userstats            => Show user IO statistics.
 --storagebench (*)     => Run a storage targets benchmark.
 
 --getquota             => Show quota information for users or groups.
 --setquota (*)         => Set the quota limits for users or groups.
 
 --listmirrorgroups     => List mirror buddy groups.
 --addmirrorgroups (*)  => Add a mirror buddy groups.
 --startresync (*)      => Start resync of a storage target or metadata node.
 --resyncstats          => Get statistics on a resync.
 --setstate (*)         => Manually set the consistency state of a target or metadata node.
 
 --liststoragepools     => List storage pools
 --addstoragepool (*)   => Add a storage pool.
 --removestoragepool (*)=> Remove a storage pool.
 --modifystoragepool (*)=> Modify a storage pool.
 
(*) Marked modes require root privileges.

USAGE:
 This is the BeeGFS command-line control tool.
 
Choose a control mode from the list above and use the parameter "--help" to 
show arguments and usage examples for that particular mode.

Example: Show help for mode "--listnodes" 
 $ beegfs-ctl --listnodes --help