[BigData] Troubleshooting BeeGFS Deployment
在上一篇介紹了 BeeGFS 如何與 k8s 的機器做對接,本篇希望能夠分享與紀錄檸檬爸在實際安裝 BeeGFS 到地端 k8s 叢集的時候,遇到的很多問題,以下是關於這些問題的 TroubleShooting 與一些 BeeGFS 工具的實用整理,由於 BeeGFS 是一個博大精深,筆者也是持續在學習中。
beegfs-client-dkms
由於要使用 BeeGFS CSI Driver,所以預設要安裝 beegfs-client-dkms,一開始透過指令:
yum install -y beegfs-client-dkms
安裝 beegfs-client-dkms 的時候就發生無法安裝的問題,但是在 Ubuntu 並沒有遇到錯誤,後續釐清之後,發現必須要先安裝 dkms,可以參考 How to install DKMS on Rocky Linux 9 這一篇的安裝方法。
beegfs-client 與 beegfs-client-dkms 以上兩個套件只能安裝其中一個,彼此是互相衝突。
beegfs-client-dkms mount
由於 beegfs-client 與 beegfs-client-dkms 只能夠使用一個,所以如果同時要管理資料又想要利用 K8S 來使用的話,beegfs-client-dkms mount 就無法避免,詳細可以參考。
否則就是執行以下的指令:
sudo modprobe beegfs
echo "beegfs_nodev /mnt/beegfs beegfs rw,relatime,cfgFile=/etc/beegfs/beegfs-client.conf,_netdev 0 0" > /etc/fstab
sudo mount /mnt/beegfs
kernel-devel 版本不匹配
[root@rocky9 home]# systemctl status beegfs-client.service
× beegfs-client.service - Start BeeGFS Client
Loaded: loaded (/usr/lib/systemd/system/beegfs-client.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Wed 2025-04-23 04:01:16 UTC; 5s ago
Process: 6490 ExecStart=/etc/init.d/beegfs-client start (code=exited, status=2)
Main PID: 6490 (code=exited, status=2)
CPU: 1.216s
Apr 23 04:01:15 rocky9 beegfs-client[6490]: - BeeGFS module autobuild
Apr 23 04:01:15 rocky9 beegfs-client[6503]: $OFED_INCLUDE_PATH = []
Apr 23 04:01:15 rocky9 beegfs-client[6749]: $OFED_INCLUDE_PATH = []
Apr 23 04:01:16 rocky9 beegfs-client[7000]: $OFED_INCLUDE_PATH = []
Apr 23 04:01:16 rocky9 beegfs-client[7000]: Makefile:188: *** Linux kernel build directory not found. Please check if the kernel module development packages are installed for the current kernel version. (RHEL: kernel-devel; SLES: kernel-defau>
Apr 23 04:01:16 rocky9 beegfs-client[6503]: make: *** [AutoRebuild.mk:34: auto_rebuild] Error 2
Apr 23 04:01:16 rocky9 systemd[1]: beegfs-client.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Apr 23 04:01:16 rocky9 systemd[1]: beegfs-client.service: Failed with result 'exit-code'.
Apr 23 04:01:16 rocky9 systemd[1]: Failed to start Start BeeGFS Client.
Apr 23 04:01:16 rocky9 systemd[1]: beegfs-client.service: Consumed 1.216s CPU time.
以上錯誤訊息原因為 kernel module development packages (kernel-devel) 與當前使用的 kernel 版本不匹配,可以嘗試去公開的 Repository 找適合的安裝檔,並且利用 rpm -i 的指令安裝。
modprobe: Key was rejected by the services
[root@rocky9 home]# systemctl status beegfs-client.service
× beegfs-client.service - Start BeeGFS Client
Loaded: loaded (/usr/lib/systemd/system/beegfs-client.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Wed 2025-04-23 04:25:54 UTC; 10s ago
Process: 66542 ExecStart=/etc/init.d/beegfs-client start (code=exited, status=1/FAILURE)
Main PID: 66542 (code=exited, status=1/FAILURE)
CPU: 3min 44.962s
Apr 23 04:22:10 rocky9 beegfs-client[67510]: feature detection gives: -DKERNEL_HAS_INODE_ATIME -DKERNEL_HAS_SCHED_SIG_H -DKERNEL_HAS_LINUX_STDARG_H -DKERNEL_HAS_STATX -DKERNEL_HAS_KREF_READ -DKERNEL_HAS_FILE_DENTRY -DKERNEL_HAS_SUPER_SETUP_BD>
Apr 23 04:25:30 rocky9 beegfs-client[70182]: feature detection gives: -DKERNEL_HAS_INODE_ATIME -DKERNEL_HAS_SCHED_SIG_H -DKERNEL_HAS_LINUX_STDARG_H -DKERNEL_HAS_STATX -DKERNEL_HAS_KREF_READ -DKERNEL_HAS_FILE_DENTRY -DKERNEL_HAS_SUPER_SETUP_BD>
Apr 23 04:25:31 rocky9 beegfs-client[70531]: Skipping BTF generation for /opt/beegfs/src/client/client_module_7/build/../source/beegfs.ko due to unavailability of vmlinux
Apr 23 04:25:31 rocky9 beegfs-client[70539]: $OFED_INCLUDE_PATH = []
Apr 23 04:25:53 rocky9 beegfs-client[70789]: $OFED_INCLUDE_PATH = []
Apr 23 04:25:54 rocky9 beegfs-client[71042]: modprobe: ERROR: could not insert 'beegfs': Key was rejected by service
Apr 23 04:25:54 rocky9 systemd[1]: beegfs-client.service: Main process exited, code=exited, status=1/FAILURE
Apr 23 04:25:54 rocky9 systemd[1]: beegfs-client.service: Failed with result 'exit-code'.
Apr 23 04:25:54 rocky9 systemd[1]: Failed to start Start BeeGFS Client.
Apr 23 04:25:54 rocky9 systemd[1]: beegfs-client.service: Consumed 3min 44.962s CPU time.
遇到無法掛載 BeeGFS 空間,參考文章猜想是因為 Secure Boot 打開了。
BeeGFS Client 實用工具 (beegfs-ctl)
當 BeeGFS FileSystem 組建起來之後,通常不只一台 Storage daemon,此時根據參考資料會有很多可以利用的指令幫助我們了解目前 BeeGFS FileSystem 全部的狀態:
beegfs-ctl --listnodes --nodetype=meta --nicdetails
beegfs-ctl --listnodes --nodetype=storage --nicdetails
beegfs-net # Displays connections the client is actually using
beegfs-check-servers # Displays possible connectivity of the services
beegfs-df # Displays free space and inodes of storage and metadata targets
beegfs-ctl 也是一個用來管理 BeeGFS Filesystem 的工具,以下是詳細功能的列表:
$ beegfs-ctl --<modename> [mode_arguments] [client_arguments]
MODES:
--listnodes => List registered clients and servers.
--listargets => List metadata and storage targets.
--removenode (*) => Remove (unregister) a node.
--removetarget (*) => Remove (unregister) a storage target.
--getentryinfo => Show file system entry details
--setpattern (*) => Set a new striping configuration.
--mirrormd (*) => Enable metadata mirroring.
--find => Find files located on certain servers.
--refreshentryinfo => Refresh file system entry metadata.
--createfile (*) => Create a new file.
--createdir (*) => Create a new directory.
--migrate => Migrate files to other storage servers.
--disposeunused (*) => Purge remains of unlinked files.
--serverstats => Show server IO statistics.
--clientstats => Show client IO statistics.
--userstats => Show user IO statistics.
--storagebench (*) => Run a storage targets benchmark.
--getquota => Show quota information for users or groups.
--setquota (*) => Set the quota limits for users or groups.
--listmirrorgroups => List mirror buddy groups.
--addmirrorgroups (*) => Add a mirror buddy groups.
--startresync (*) => Start resync of a storage target or metadata node.
--resyncstats => Get statistics on a resync.
--setstate (*) => Manually set the consistency state of a target or metadata node.
--liststoragepools => List storage pools
--addstoragepool (*) => Add a storage pool.
--removestoragepool (*)=> Remove a storage pool.
--modifystoragepool (*)=> Modify a storage pool.
(*) Marked modes require root privileges.
USAGE:
This is the BeeGFS command-line control tool.
Choose a control mode from the list above and use the parameter "--help" to
show arguments and usage examples for that particular mode.
Example: Show help for mode "--listnodes"
$ beegfs-ctl --listnodes --help