[GPU] Native Rapids UDF – Compilation Environment

上一篇我們介紹了如何利用 cudf C++ 創建自己可以跑在 GPU 上面的 UDF,本篇我們想要紀錄如何利用 Spark Rapids Examples 提供的編譯環境建造出屬於自己的 jar 可以跑在有 GPU 的 Spark 叢集運算之上。

MAVEN Build and Deploy JAR

因為是 C++ 的程式碼,要利用 Spark Rapids Example 的 Github 提供的 Docker 環境做編譯,分成以下幾個步驟:

  1. 創建出一個安裝 CUDA 環境的 Docker Image
  2. 在有 nvidia-docker 的虛擬機上面利用 Maven 去編譯:mvn clean package -Dudf-native-examples,這一個過實作需要大約三個小時
  3. 將編譯好的 jar 檔放入 /home/spark-current/jars 資料夾下
  4. 重啟 Spark Thrift Server

執行下去之後,馬上就遇到以下的錯誤,稍微查詢了一下,理解是因為環境中沒有安裝 CUDA 的關係。

Driver stacktrace:
06:11:29.150 INFO  DAGScheduler - Job 5 failed: run at AccessController.java:0, took 2.944054 s
06:11:29.152 INFO  DAGScheduler - Asked to cancel job group 8f2e33b9-f530-4c03-93d5-df4832ac6ce4
06:11:29.152 ERROR SparkExecuteStatementOperation - Error executing query with 8f2e33b9-f530-4c03-93d5-df4832ac6ce4, currentState RUNNING, 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 9.0 failed 4 times, most recent failure: Lost task 0.3 in stage 9.0 (TID 108) (10.0.0.10 executor 0): java.lang.UnsatisfiedLinkError: /tmp/atgxnativeudfjni6117543534348807468.so: libcudart.so.12: cannot open shared object file: No such file or directory
	at java.lang.ClassLoader$NativeLibrary.load(Native Method)
	at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1934)
	at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1817)
	at java.lang.Runtime.load0(Runtime.java:782)
	at java.lang.System.load(System.java:1100)
	at ai.rapids.cudf.NativeDepsLoader.loadDep(NativeDepsLoader.java:187)
	at ai.rapids.cudf.NativeDepsLoader.loadDep(NativeDepsLoader.java:182)
	at ai.rapids.cudf.NativeDepsLoader.loadNativeDeps(NativeDepsLoader.java:129)
	at com.atgenomix.seqslab.piper.plugin.atgenomix.udf.hive.AtgxNativeUDFLoader.ensureLoaded(AtgxNativeUDFLoader.java:14)
	at com.atgenomix.seqslab.piper.plugin.atgenomix.udf.hive.RetrieveRsID.evaluateColumnar(RetrieveRsID.java:71)
	at com.nvidia.spark.rapids.GpuUserDefinedFunction.$anonfun$columnarEval$4(GpuUserDefinedFunction.scala:61)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.GpuUserDefinedFunction.$anonfun$columnarEval$2(GpuUserDefinedFunction.scala:59)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:47)
	at com.nvidia.spark.rapids.GpuUserDefinedFunction.columnarEval(GpuUserDefinedFunction.scala:57)
	at com.nvidia.spark.rapids.GpuUserDefinedFunction.columnarEval$(GpuUserDefinedFunction.scala:55)
	at org.apache.spark.sql.hive.rapids.GpuHiveGenericUDF.columnarEval(hiveUDFs.scala:60)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
	at com.nvidia.spark.rapids.GpuAlias.columnarEval(namedExpressions.scala:110)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$ReallyAGpuExpression.columnarEval(implicits.scala:34)
	at com.nvidia.spark.rapids.GpuProjectExec$.$anonfun$project$1(basicPhysicalOperators.scala:111)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1(implicits.scala:220)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.$anonfun$safeMap$1$adapted(implicits.scala:217)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$MapsSafely.safeMap(implicits.scala:217)
	at com.nvidia.spark.rapids.RapidsPluginImplicits$AutoCloseableProducingSeq.safeMap(implicits.scala:252)
	at com.nvidia.spark.rapids.GpuProjectExec$.project(basicPhysicalOperators.scala:111)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$project$2(basicPhysicalOperators.scala:612)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.GpuTieredProject.recurse$2(basicPhysicalOperators.scala:611)
	at com.nvidia.spark.rapids.GpuTieredProject.project(basicPhysicalOperators.scala:624)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$5(basicPhysicalOperators.scala:560)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRestoreOnRetry(RmmRapidsRetryIterator.scala:268)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$4(basicPhysicalOperators.scala:560)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$3(basicPhysicalOperators.scala:558)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$NoInputSpliterator.next(RmmRapidsRetryIterator.scala:377)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryIterator.next(RmmRapidsRetryIterator.scala:569)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$RmmRapidsRetryAutoCloseableIterator.next(RmmRapidsRetryIterator.scala:495)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.drainSingleWithVerification(RmmRapidsRetryIterator.scala:287)
	at com.nvidia.spark.rapids.RmmRapidsRetryIterator$.withRetryNoSplit(RmmRapidsRetryIterator.scala:185)
	at com.nvidia.spark.rapids.GpuTieredProject.$anonfun$projectWithRetrySingleBatchInternal$1(basicPhysicalOperators.scala:558)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:38)
	at com.nvidia.spark.rapids.GpuTieredProject.projectWithRetrySingleBatchInternal(basicPhysicalOperators.scala:555)
	at com.nvidia.spark.rapids.GpuTieredProject.projectAndCloseWithRetrySingleBatch(basicPhysicalOperators.scala:594)
	at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$internalDoExecuteColumnar$2(basicPhysicalOperators.scala:385)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.GpuProjectExec.$anonfun$internalDoExecuteColumnar$1(basicPhysicalOperators.scala:381)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.$anonfun$fetchNextBatch$3(GpuColumnarToRowExec.scala:287)
	at com.nvidia.spark.rapids.Arm$.withResource(Arm.scala:29)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.fetchNextBatch(GpuColumnarToRowExec.scala:284)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.loadNextBatch(GpuColumnarToRowExec.scala:257)
	at com.nvidia.spark.rapids.ColumnarToRowIterator.hasNext(GpuColumnarToRowExec.scala:301)
	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
	at org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:364)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
	at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
	at org.apache.spark.scheduler.Task.run(Task.scala:136)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

解決方法就是執行以下的安裝 CUDA 的程式碼,以下參考連結

wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt install -y cuda
nvcc --version

Note: 根據參考連結,第一步中的 $distro/$arch 要根據部署的 VM 來選擇,例如 ubuntu2004/x86_64