{"id":9480,"date":"2023-12-24T08:53:52","date_gmt":"2023-12-24T07:53:52","guid":{"rendered":"https:\/\/myoceane.fr\/?p=9480"},"modified":"2023-12-28T09:10:07","modified_gmt":"2023-12-28T08:10:07","slug":"spark-define-and-register-hive-udf-with-spark-rapids","status":"publish","type":"post","link":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/","title":{"rendered":"[Spark] Define and Register Hive UDF with Spark Rapids"},"content":{"rendered":"<div id=\"fb-root\"><\/div>\n\n<p style=\"text-align: justify;\">\u5728\u4e0a\u4e00\u7bc7\u6211\u5011\u4ecb\u7d39\u5982\u4f55\u900f\u904e <a href=\"https:\/\/myoceane.fr\/index.php\/rapids-support-spark-sql-with-spark-rapids\/\">Spark Rapids \u53bb\u5229\u7528 GPU \u52a0\u901f\u57f7\u884c SQL<\/a>\uff0c\u6211\u5011\u9047\u5230\u4e86\u5e7e\u500b\u554f\u984c\u4e26\u4e00\u4e00\u89e3\u6c7a\uff0c\u6700\u5f8c\u6211\u5011\u6210\u529f\u5728 Spark Thrift Server \u4e0a\u9762\u555f\u52d5\u4e86 Spark Rapids \u7684\u529f\u80fd\uff0c\u4e26\u4e14\u4f7f\u7528 pyHive \u5c07 SQL \u7684 Request \u9001\u9032 Spark Cluster \u88e1\u9762\uff0c\u70ba\u4e86\u8981\u66f4\u9032\u4e00\u6b65\u5b8c\u5168\u4f7f\u7528 GPU \u7684\u8cc7\u6e90\uff0c\u5728\u57f7\u884c SQL command \u7684\u6642\u5019\u5982\u679c\u9047\u5230\u6c92\u6709\u652f\u63f4 Spark Rapids \u7684 UDF (User-Defined Function) \u7684\u6642\u5019\uff0c\u6703\u62d6\u6162\u6574\u9ad4\u7684\u901f\u5ea6\uff0c\u8b93\u4f7f\u7528 GPU \u7684\u6548\u679c\u6c92\u6709\u767c\u63ee\u51fa\u4f86\uff0c\u56e0\u6b64\u672c\u7bc7\u60f3\u8981\u7d00\u9304\u5982\u4f55\u5be6\u4f5c\u4e26\u5b9a\u7fa9\u4e00\u500b Hive UDF\u3002<\/p>\n\n\n<h4>\u5be6\u4f5c Hive UDF \u7bc4\u4f8b<\/h4>\n<p>Simple Tutorial: <a href=\"https:\/\/medium.com\/@jackgoettle23\/building-a-hive-user-defined-function-f6abe92f6e56\">Building a Hive User Defined Function<\/a><\/p>\n\n\n<p>\u9019\u4e00\u7bc7\u6587\u7ae0\u63d0\u4f9b\u4e86\u4e00\u500b\u7bc4\u4f8b\u5be6\u4f5c Hive UDF: <a href=\"https:\/\/www.congiu.com\/structured-data-in-hive-a-generic-udf-to-sort-arrays-of-structs\/\">Structured data in Hive: a generic UDF to sort arrays of structs<\/a><\/p>\n\n\n\n<pre class=\"wp-block-aphph-prism-block lang:java language-java\"><code>package com.congiu.udf;\n\nimport java.util.ArrayList;\nimport java.util.Collections;\nimport java.util.Comparator;\nimport java.util.HashMap;\nimport java.util.Map;\nimport org.apache.hadoop.hive.ql.exec.Description;\nimport org.apache.hadoop.hive.ql.exec.UDFArgumentException;\nimport org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;\nimport org.apache.hadoop.hive.ql.metadata.HiveException;\nimport org.apache.hadoop.hive.ql.udf.generic.GenericUDF;\nimport org.apache.hadoop.hive.serde.Constants;\nimport org.apache.hadoop.hive.serde2.objectinspector.ListObjectInspector;\nimport org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;\nimport static org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector.Category.LIST;\nimport org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils;\nimport org.apache.hadoop.hive.serde2.objectinspector.StructField;\nimport org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;\n\n\/**\n *\n * @author rcongiu\n *\/\n@Description(name = \"array_struct_sort\",\n    value = \"_FUNC_(array(struct1,struct2,...), string myfield) - \" +\n    \"returns the passed array struct, ordered by the given field  \",\n    extended = \"Example:\\n\" +\n    \"  &gt; SELECT _FUNC_(str, 'myfield') FROM src LIMIT 1;\\n\" +\n    \" 'b' \")\npublic class ArrayStructSortUDF extends GenericUDF {\n    protected ObjectInspector[] argumentOIs;\n\n    ListObjectInspector loi;\n    StructObjectInspector elOi;\n\n    \/\/ cache comparators for performance\n    Map &lt; String, Comparator &gt; comparatorCache = new HashMap &lt; String, Comparator &gt; ();\n\n    @Override\n    public ObjectInspector initialize(ObjectInspector[] ois) throws UDFArgumentException {\n        \/\/ all common initialization\n        argumentOIs = ois;\n\n        \/\/ clear comparator cache from previous invokations\n        comparatorCache.clear();\n\n        return checkAndReadObjectInspectors(ois);\n    }\n\n    \/**\n     * Utility method to check that an object inspector is of the correct type,\n     * and returns its element object inspector\n     * @param oi\n     * @return\n     * @throws UDFArgumentTypeException \n     *\/\n    protected ListObjectInspector checkAndReadObjectInspectors(ObjectInspector[] ois)\n    throws UDFArgumentTypeException, UDFArgumentException {\n        \/\/ check number of arguments. We only accept two,\n        \/\/ the list of struct to sort and the name of the struct field\n        \/\/ to sort by\n        if (ois.length != 2) {\n            throw new UDFArgumentException(\"2 arguments needed, found \" + ois.length);\n        }\n\n        \/\/ first argument must be a list\/array\n        if (!ois[0].getCategory().equals(LIST)) {\n            throw new UDFArgumentTypeException(0, \"Argument 1\" +\n                \" of function \" + this.getClass().getCanonicalName() + \" must be \" + Constants.LIST_TYPE_NAME +\n                \", but \" + ois[0].getTypeName() +\n                \" was found.\");\n        }\n\n        \/\/ a list\/array is read by a LIST object inspector\n        loi = (ListObjectInspector) ois[0];\n\n        \/\/ a list has an element type associated to it\n        \/\/ elements must be structs for this UDF\n        if (loi.getListElementObjectInspector().getCategory() != ObjectInspector.Category.STRUCT) {\n            throw new UDFArgumentTypeException(0, \"Argument 1\" +\n                \" of function \" + this.getClass().getCanonicalName() + \" must be an array of structs \" +\n                \" but is an array of \" + loi.getListElementObjectInspector().getCategory().name());\n        }\n\n        \/\/ store the object inspector for the elements\n        elOi = (StructObjectInspector) loi.getListElementObjectInspector();\n\n        \/\/ returns the same object inspector\n        return loi;\n    }\n\n    \/\/ to sort a list , we must supply our comparator\n    public class StructFieldComparator implements Comparator {\n        StructField field;\n\n        public StructFieldComparator(String fieldName) {\n            field = elOi.getStructFieldRef(fieldName);\n        }\n\n        public int compare(Object o1, Object o2) {\n\n            \/\/ ok..so both not null\n            Object f1 = elOi.getStructFieldData(o1, field);\n            Object f2 = elOi.getStructFieldData(o2, field);\n            \/\/ compare using hive's utility functions\n            return ObjectInspectorUtils.compare(f1, field.getFieldObjectInspector(),\n                f2, field.getFieldObjectInspector());\n        }\n    }\n\n    \/\/ factory method for cached comparators\n    Comparator getComparator(String field) {\n        if (!comparatorCache.containsKey(field)) {\n            comparatorCache.put(field, new StructFieldComparator(field));\n        }\n        return comparatorCache.get(field);\n    }\n\n    @Override\n    public Object evaluate(DeferredObject[] dos) throws HiveException {\n        \/\/ get list\n        if (dos == null || dos.length != 2) {\n            throw new HiveException(\"received \" + (dos == null ? \"null\" :\n                Integer.toString(dos.length) + \" elements instead of 2\"));\n        }\n\n        \/\/ each object is supposed to be a struct\n        \/\/ we make a shallow copy of the list. We don't want to sort \n        \/\/ the list in place since the object could be used elsewhere in the\n        \/\/ hive query\n        ArrayList al = new ArrayList(loi.getList(dos[0].get()));\n\n        \/\/ sort with our comparator, then return\n        \/\/ note that we could get a different field to sort by for every\n        \/\/ invocation\n        Collections.sort(al, getComparator((String) dos[1].get()));\n\n        return al;\n    }\n\n    @Override\n    public String getDisplayString(String[] children) {\n        return (children == null ? null : this.getClass().getCanonicalName() + \"(\" + children[0] + \",\" + children[1] + \")\");\n    }\n\n}<\/code><\/pre>\n\n\n<h4>\u5be6\u4f5c RAPIDS Accelerated User Defined Functions<\/h4>\n<p style=\"text-align: justify;\">\u5be6\u4f5c\u5b8c\u4ee5\u4e0a\u7684 Hive UDF\uff0c\u6211\u5011\u53ea\u662f\u80fd\u6210\u529f\u5728 Spark Thrift Server \u4e0a\u57f7\u884c\u81ea\u5b9a\u7fa9\u7684 UDF\uff0c\u6b64\u6642\u9019\u500b UDF \u4e26\u6c92\u6709\u8fa6\u6cd5\u88ab\u642c\u5230 GPU \u53bb\u57f7\u884c\uff0c\u56e0\u6b64\u6211\u5011\u9084\u9700\u8981\u518d\u7e7c\u7e8c\u5c07\u9019\u500b class \u7e7c\u627f\u5230 RapidsUDF \u4e26\u4e14\u5be6\u4f5c evaluateColumnar \u9019\u4e00\u500b\u51fd\u6578\uff0c\u8b80\u8005\u6709\u8208\u8da3\u53ef\u4ee5\u9032\u4e00\u6b65\u53bb<a href=\"https:\/\/github.com\/NVIDIA\/spark-rapids-examples\/tree\/main\/examples\/UDF-Examples\/RAPIDS-accelerated-UDFs\/src\/main\/java\/com\/nvidia\/spark\/rapids\/udf\/hive\">\u53c3\u8003 Spark Rapids \u7684 Tutorial\u00a0<\/a>\uff0c\u4ed6\u5011\u5c55\u793a\u4e86\u56db\u500b\u7bc4\u4f8b\uff0c\u5206\u5225\u662f DecimalFraction, StringWordCount, URLDecode \u548c URLEncode\uff0c\u4ee5\u4e0b\u6211\u5011\u76f4\u63a5\u5448\u73fe DecimalFraction\uff1a<\/p>\n\n\n<pre class=\"wp-block-aphph-prism-block lang:java language-java\"><code>\/*\n * Copyright (c) 2021-2022, NVIDIA CORPORATION.\n *\n * Licensed under the Apache License, Version 2.0 (the \"License\");\n * you may not use this file except in compliance with the License.\n * You may obtain a copy of the License at\n *\n *     http:\/\/www.apache.org\/licenses\/LICENSE-2.0\n *\n * Unless required by applicable law or agreed to in writing, software\n * distributed under the License is distributed on an \"AS IS\" BASIS,\n * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n * See the License for the specific language governing permissions and\n * limitations under the License.\n *\/\n\npackage com.nvidia.spark.rapids.udf.hive;\n\nimport ai.rapids.cudf.ColumnVector;\nimport ai.rapids.cudf.Scalar;\nimport com.nvidia.spark.RapidsUDF;\nimport org.apache.hadoop.hive.common.type.HiveDecimal;\nimport org.apache.hadoop.hive.ql.exec.UDFArgumentException;\nimport org.apache.hadoop.hive.ql.metadata.HiveException;\nimport org.apache.hadoop.hive.ql.udf.generic.GenericUDF;\nimport org.apache.hadoop.hive.serde2.io.HiveDecimalWritable;\nimport org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;\nimport org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;\nimport org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;\nimport org.apache.hadoop.hive.serde2.typeinfo.DecimalTypeInfo;\n\nimport java.math.BigDecimal;\n\n\n\/**\n * A simple HiveGenericUDF demo for DecimalType, which extracts and returns\n * the fraction part of the input Decimal data. So, the output data has the\n * same precision and scale as the input one.\n *\/\npublic class DecimalFraction extends GenericUDF implements RapidsUDF {\n  private transient PrimitiveObjectInspector inputOI;\n\n  @Override\n  public String getDisplayString(String[] strings) {\n    return getStandardDisplayString(\"DecimalFraction\", strings);\n  }\n\n  @Override\n  public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {\n    if (arguments.length != 1) {\n      throw new UDFArgumentException(\"One argument is supported, found: \" + arguments.length);\n    }\n    if (!(arguments[0] instanceof PrimitiveObjectInspector)) {\n      throw new UDFArgumentException(\"Unsupported argument type: \" + arguments[0].getTypeName());\n    }\n\n    inputOI = (PrimitiveObjectInspector) arguments[0];\n    if (inputOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.DECIMAL) {\n      throw new UDFArgumentException(\"Unsupported primitive type: \" + inputOI.getPrimitiveCategory());\n    }\n\n    DecimalTypeInfo inputTypeInfo = (DecimalTypeInfo) inputOI.getTypeInfo();\n\n    return PrimitiveObjectInspectorFactory.getPrimitiveWritableObjectInspector(inputTypeInfo);\n  }\n\n  @Override\n  public Object evaluate(GenericUDF.DeferredObject[] arguments) throws HiveException {\n    if (arguments[0] == null || arguments[0].get() == null) {\n      return null;\n    }\n\n    Object input = arguments[0].get();\n    HiveDecimalWritable decimalWritable = (HiveDecimalWritable) inputOI.getPrimitiveWritableObject(input);\n    BigDecimal decimalInput = decimalWritable.getHiveDecimal().bigDecimalValue();\n    BigDecimal decimalResult = decimalInput.subtract(new BigDecimal(decimalInput.toBigInteger()));\n    HiveDecimalWritable result = new HiveDecimalWritable(decimalWritable);\n    result.set(HiveDecimal.create(decimalResult));\n\n    return result;\n  }\n\n  @Override\n  public ColumnVector evaluateColumnar(int numRows, ColumnVector... args) {\n    if (args.length != 1) {\n      throw new IllegalArgumentException(\"Unexpected argument count: \" + args.length);\n    }\n    ColumnVector input = args[0];\n    if (numRows != input.getRowCount()) {\n      throw new IllegalArgumentException(\"Expected \" + numRows + \" rows, received \" + input.getRowCount());\n    }\n    if (!input.getType().isDecimalType()) {\n      throw new IllegalArgumentException(\"Argument type is not a decimal column: \" +\n          input.getType());\n    }\n\n    try (Scalar nullScalar = Scalar.fromNull(input.getType());\n         ColumnVector nullPredicate = input.isNull();\n         ColumnVector integral = input.floor();\n         ColumnVector fraction = input.sub(integral, input.getType())) {\n      return nullPredicate.ifElse(nullScalar, fraction);\n    }\n  }\n}<\/code><\/pre>\n\n\n<p style=\"text-align: justify;\">\u57fa\u672c\u4e0a\u5728 evaluateColumnar \u51fd\u5f0f\u88e1\u9762\u6211\u5011\u8981\u505a\u7684\u4e8b\u60c5\u5c31\u662f\u628a ColumnVector \u9019\u500b\u7269\u4ef6\u900f\u904e Nvidia \u63d0\u4f9b\u7684\u51fd\u5f0f\u53c3\u8003\u9023\u7d50 (<a href=\"https:\/\/docs.rapids.ai\/api\/cudf-java\/stable\/ai\/rapids\/cudf\/columnview\">https:\/\/docs.rapids.ai\/api\/cudf-java\/stable\/ai\/rapids\/cudf\/columnview<\/a>) \u53bb\u8f49\u63db\u6210\u53e6\u5916\u4e00\u500b ColumnVector \u7136\u5f8c\u56de\u50b3\uff0c\u57fa\u672c\u4e0a\u9019\u4e9b function \u90fd\u662f\u5229\u7528 JNI \u532f\u5165\u5230 JAVA \u7684 C++ \u7a0b\u5f0f\u78bc\uff0c\u5982\u6b64\u4e00\u4f86 SparkRapids \u5c31\u53ef\u4ee5\u900f\u904e Java \u53bb\u8abf\u7528\u5e95\u5c64\u7684 CUDA \u7a0b\u5f0f\uff0c\u4e0b\u5716(<a href=\"https:\/\/www.slideshare.net\/databricks\/deep-dive-into-gpu-support-in-apache-spark-3x\">\u53c3\u80032020 Spark-Submit \u7684\u6295\u5f71\u7247<\/a>)\u662f\u6574\u500b Rapids \u7684\u6280\u8853 ETL\u3002<\/p>\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"549\" src=\"https:\/\/myoceane.fr\/wp-content\/uploads\/2023\/12\/SparkRapidsETL-1024x549.png\" alt=\"\" class=\"wp-image-9557\" srcset=\"https:\/\/myoceane.fr\/wp-content\/uploads\/2023\/12\/SparkRapidsETL-1024x549.png 1024w, https:\/\/myoceane.fr\/wp-content\/uploads\/2023\/12\/SparkRapidsETL-300x161.png 300w, https:\/\/myoceane.fr\/wp-content\/uploads\/2023\/12\/SparkRapidsETL-768x412.png 768w, https:\/\/myoceane.fr\/wp-content\/uploads\/2023\/12\/SparkRapidsETL-1536x823.png 1536w, https:\/\/myoceane.fr\/wp-content\/uploads\/2023\/12\/SparkRapidsETL-2048x1098.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n<p>\u7531\u65bc Nvidia \u63d0\u4f9b\u7684 ColumnView \u8f49\u63db\u7684\u7a0b\u5f0f\u7d42\u7a76\u662f\u6709\u9650\uff0c\u6240\u4ee5\u5982\u679c\u7528\u6236\u6709\u66f4\u5ba2\u88fd\u5316\u7684\u9700\u6c42\u7684\u8a71\uff0c\u9700\u8981\u66f4\u9032\u4e00\u6b65\u5229\u7528 cuDF C++ \u53bb\u64b0\u5beb\u81ea\u5df1\u9700\u8981\u7684\u7a0b\u5f0f\u78bc\uff0c\u4ee5\u4e0b\u6211\u5011\u5217\u51fa Nvidia \u5df2\u7d93\u6709\u7684 JNI Java \u51fd\u5f0f\u5c0d\u61c9\u7684 C++ \u7a0b\u5f0f\u78bc\u5728\u4ee5\u4e0b\u7684\u9023\u7d50\u88e1\u9762\u4f5c\u70ba\u53c3\u8003\u3002<\/p>\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>cuDF \u5be6\u4f5c\u7684 ColumnViewJni<\/p>\n<cite><a href=\"https:\/\/github.com\/rapidsai\/cudf\/blob\/branch-24.02\/java\/src\/main\/native\/src\/ColumnViewJni.cpp\">https:\/\/github.com\/rapidsai\/cudf\/blob\/branch-24.02\/java\/src\/main\/native\/src\/ColumnViewJni.cpp<\/a><\/cite><\/blockquote>\n\n\n<h4>\u8868\u5217\u6240\u6709\u5df2\u7d93\u5b58\u5728\u7684 Functions<\/h4>\n\n\n<pre class=\"wp-block-aphph-prism-block lang:sql language-sql\"><code>SHOW FUNCTIONS<\/code><\/pre>\n\n\n<h4>\u8a3b\u518a UDF \u9032 Hive Metastore \u53c3\u8003 <a href=\"https:\/\/docs.cloudera.com\/documentation\/enterprise\/latest\/topics\/cm_mc_hive_udf.html#concept_un3_yrm_2r\">Managing Apache Hive User-defined Functions<\/a><\/h4>\n\n\n<pre class=\"wp-block-aphph-prism-block lang:sql language-sql\" data-line=\"0\"><code>CREATE FUNCTION &lt;function_name&gt; AS '&lt;fully_qualified_class_name&gt;' USING JAR 'hdfs:\/\/\/&lt;path\/to\/jar\/in\/hdfs&gt;'<\/code><\/pre>\n\n\n<h4>\u5f9e Hive Metastore \u4e2d\u6e05\u9664\u5df2\u6709\u7684 UDF<\/h4>\n\n\n<pre class=\"wp-block-aphph-prism-block lang:sql language-sql\"><code>DROP FUNCTION &lt;function_name&gt; <\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>\u5728\u4e0a\u4e00\u7bc7\u6211\u5011\u4ecb\u7d39\u5982\u4f55\u900f\u904e Spark Rapids \u53bb\u5229\u7528 GPU \u52a0\u901f\u57f7\u884c SQL\uff0c\u6211\u5011\u9047\u5230\u4e86\u5e7e\u500b\u554f\u984c\u4e26\u4e00\u4e00\u89e3\u6c7a\uff0c\u6700\u5f8c\u6211\u5011\u6210\u529f\u5728 Spark Thrift Server \u4e0a\u9762\u555f\u52d5\u4e86 Spark Rapids \u7684\u529f\u80fd\uff0c\u4e26\u4e14\u4f7f\u7528 pyHive \u5c07 SQL \u7684 Request \u9001\u9032 Spark Cluster \u88e1\u9762\uff0c\u70ba\u4e86\u8981\u66f4\u9032\u4e00\u6b65\u5b8c\u5168\u4f7f\u7528 GPU \u7684\u8cc7\u6e90\uff0c\u5728\u57f7\u884c SQL command \u7684\u6642\u5019\u5982\u679c\u9047\u5230\u6c92\u6709\u652f\u63f4 Spark Rapids \u7684 UDF (User-Defined Function) \u7684\u6642\u5019\uff0c\u6703\u62d6\u6162\u6574\u9ad4\u7684\u901f\u5ea6\uff0c\u8b93\u4f7f\u7528 GPU \u7684\u6548\u679c\u6c92\u6709\u767c\u63ee\u51fa\u4f86\uff0c\u56e0\u6b64\u672c\u7bc7\u60f3\u8981\u7d00\u9304\u5982\u4f55\u5be6\u4f5c\u4e26\u5b9a\u7fa9\u4e00\u500b Hive UDF\u3002<\/p>\n","protected":false},"author":1,"featured_media":8700,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9,14],"tags":[1758,1757],"class_list":["post-9480","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-ml","category-it-technology","tag-hive-udf","tag-spark-rapids"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v24.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>[Spark] Define and Register Hive UDF with Spark Rapids - \u60f3\u65b9\u6d89\u6cd5 - \u91cf\u74f6\u5916\u7684\u5929\u7a7a M-Y-Oceane<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"[Spark] Define and Register Hive UDF with Spark Rapids - \u60f3\u65b9\u6d89\u6cd5 - \u91cf\u74f6\u5916\u7684\u5929\u7a7a M-Y-Oceane\" \/>\n<meta property=\"og:description\" content=\"\u5728\u4e0a\u4e00\u7bc7\u6211\u5011\u4ecb\u7d39\u5982\u4f55\u900f\u904e Spark Rapids \u53bb\u5229\u7528 GPU \u52a0\u901f\u57f7\u884c SQL\uff0c\u6211\u5011\u9047\u5230\u4e86\u5e7e\u500b\u554f\u984c\u4e26\u4e00\u4e00\u89e3\u6c7a\uff0c\u6700\u5f8c\u6211\u5011\u6210\u529f\u5728 Spark Thrift Server \u4e0a\u9762\u555f\u52d5\u4e86 Spark Rapids \u7684\u529f\u80fd\uff0c\u4e26\u4e14\u4f7f\u7528 pyHive \u5c07 SQL \u7684 Request \u9001\u9032 Spark Cluster \u88e1\u9762\uff0c\u70ba\u4e86\u8981\u66f4\u9032\u4e00\u6b65\u5b8c\u5168\u4f7f\u7528 GPU \u7684\u8cc7\u6e90\uff0c\u5728\u57f7\u884c SQL command \u7684\u6642\u5019\u5982\u679c\u9047\u5230\u6c92\u6709\u652f\u63f4 Spark Rapids \u7684 UDF (User-Defined Function) \u7684\u6642\u5019\uff0c\u6703\u62d6\u6162\u6574\u9ad4\u7684\u901f\u5ea6\uff0c\u8b93\u4f7f\u7528 GPU \u7684\u6548\u679c\u6c92\u6709\u767c\u63ee\u51fa\u4f86\uff0c\u56e0\u6b64\u672c\u7bc7\u60f3\u8981\u7d00\u9304\u5982\u4f55\u5be6\u4f5c\u4e26\u5b9a\u7fa9\u4e00\u500b Hive UDF\u3002\" \/>\n<meta property=\"og:url\" content=\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/\" \/>\n<meta property=\"og:site_name\" content=\"\u60f3\u65b9\u6d89\u6cd5 - \u91cf\u74f6\u5916\u7684\u5929\u7a7a M-Y-Oceane\" \/>\n<meta property=\"article:published_time\" content=\"2023-12-24T07:53:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-12-28T08:10:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/myoceane.fr\/wp-content\/uploads\/2022\/08\/RAPIDSSpark.png\" \/>\n\t<meta property=\"og:image:width\" content=\"936\" \/>\n\t<meta property=\"og:image:height\" content=\"248\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"\u6ab8\u6aac\u7238\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"\u6ab8\u6aac\u7238\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/\"},\"author\":{\"name\":\"\u6ab8\u6aac\u7238\",\"@id\":\"https:\/\/myoceane.fr\/#\/schema\/person\/4a4552fb8c27693083d465e12db7658b\"},\"headline\":\"[Spark] Define and Register Hive UDF with Spark Rapids\",\"datePublished\":\"2023-12-24T07:53:52+00:00\",\"dateModified\":\"2023-12-28T08:10:07+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/\"},\"wordCount\":143,\"commentCount\":3,\"publisher\":{\"@id\":\"https:\/\/myoceane.fr\/#\/schema\/person\/4a4552fb8c27693083d465e12db7658b\"},\"image\":{\"@id\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/myoceane.fr\/wp-content\/uploads\/2022\/08\/RAPIDSSpark.png\",\"keywords\":[\"Hive UDF\",\"Spark Rapids\"],\"articleSection\":[\"Big Data &amp; Machine Learning\",\"IT Technology\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/\",\"url\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/\",\"name\":\"[Spark] Define and Register Hive UDF with Spark Rapids - \u60f3\u65b9\u6d89\u6cd5 - \u91cf\u74f6\u5916\u7684\u5929\u7a7a M-Y-Oceane\",\"isPartOf\":{\"@id\":\"https:\/\/myoceane.fr\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/myoceane.fr\/wp-content\/uploads\/2022\/08\/RAPIDSSpark.png\",\"datePublished\":\"2023-12-24T07:53:52+00:00\",\"dateModified\":\"2023-12-28T08:10:07+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#primaryimage\",\"url\":\"https:\/\/myoceane.fr\/wp-content\/uploads\/2022\/08\/RAPIDSSpark.png\",\"contentUrl\":\"https:\/\/myoceane.fr\/wp-content\/uploads\/2022\/08\/RAPIDSSpark.png\",\"width\":936,\"height\":248},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/myoceane.fr\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"[Spark] Define and Register Hive UDF with Spark Rapids\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/myoceane.fr\/#website\",\"url\":\"https:\/\/myoceane.fr\/\",\"name\":\"M-Y-Oceane \u60f3\u65b9\u6d89\u6cd5\u3002\u91cf\u74f6\u5916\u7684\u5929\u7a7a\",\"description\":\"\u60f3\u65b9\u6d89\u6cd5, France, Taiwan, Health, Information Technology\",\"publisher\":{\"@id\":\"https:\/\/myoceane.fr\/#\/schema\/person\/4a4552fb8c27693083d465e12db7658b\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/myoceane.fr\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\/\/myoceane.fr\/#\/schema\/person\/4a4552fb8c27693083d465e12db7658b\",\"name\":\"\u6ab8\u6aac\u7238\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/myoceane.fr\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/6cc678684664f8ad45a8d56a6630b183?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/6cc678684664f8ad45a8d56a6630b183?s=96&d=mm&r=g\",\"caption\":\"\u6ab8\u6aac\u7238\"},\"logo\":{\"@id\":\"https:\/\/myoceane.fr\/#\/schema\/person\/image\/\"},\"url\":\"https:\/\/myoceane.fr\/index.php\/author\/johnny5584767gmail-com\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"[Spark] Define and Register Hive UDF with Spark Rapids - \u60f3\u65b9\u6d89\u6cd5 - \u91cf\u74f6\u5916\u7684\u5929\u7a7a M-Y-Oceane","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/","og_locale":"en_US","og_type":"article","og_title":"[Spark] Define and Register Hive UDF with Spark Rapids - \u60f3\u65b9\u6d89\u6cd5 - \u91cf\u74f6\u5916\u7684\u5929\u7a7a M-Y-Oceane","og_description":"\u5728\u4e0a\u4e00\u7bc7\u6211\u5011\u4ecb\u7d39\u5982\u4f55\u900f\u904e Spark Rapids \u53bb\u5229\u7528 GPU \u52a0\u901f\u57f7\u884c SQL\uff0c\u6211\u5011\u9047\u5230\u4e86\u5e7e\u500b\u554f\u984c\u4e26\u4e00\u4e00\u89e3\u6c7a\uff0c\u6700\u5f8c\u6211\u5011\u6210\u529f\u5728 Spark Thrift Server \u4e0a\u9762\u555f\u52d5\u4e86 Spark Rapids \u7684\u529f\u80fd\uff0c\u4e26\u4e14\u4f7f\u7528 pyHive \u5c07 SQL \u7684 Request \u9001\u9032 Spark Cluster \u88e1\u9762\uff0c\u70ba\u4e86\u8981\u66f4\u9032\u4e00\u6b65\u5b8c\u5168\u4f7f\u7528 GPU \u7684\u8cc7\u6e90\uff0c\u5728\u57f7\u884c SQL command \u7684\u6642\u5019\u5982\u679c\u9047\u5230\u6c92\u6709\u652f\u63f4 Spark Rapids \u7684 UDF (User-Defined Function) \u7684\u6642\u5019\uff0c\u6703\u62d6\u6162\u6574\u9ad4\u7684\u901f\u5ea6\uff0c\u8b93\u4f7f\u7528 GPU \u7684\u6548\u679c\u6c92\u6709\u767c\u63ee\u51fa\u4f86\uff0c\u56e0\u6b64\u672c\u7bc7\u60f3\u8981\u7d00\u9304\u5982\u4f55\u5be6\u4f5c\u4e26\u5b9a\u7fa9\u4e00\u500b Hive UDF\u3002","og_url":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/","og_site_name":"\u60f3\u65b9\u6d89\u6cd5 - \u91cf\u74f6\u5916\u7684\u5929\u7a7a M-Y-Oceane","article_published_time":"2023-12-24T07:53:52+00:00","article_modified_time":"2023-12-28T08:10:07+00:00","og_image":[{"width":936,"height":248,"url":"https:\/\/myoceane.fr\/wp-content\/uploads\/2022\/08\/RAPIDSSpark.png","type":"image\/png"}],"author":"\u6ab8\u6aac\u7238","twitter_card":"summary_large_image","twitter_misc":{"Written by":"\u6ab8\u6aac\u7238","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#article","isPartOf":{"@id":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/"},"author":{"name":"\u6ab8\u6aac\u7238","@id":"https:\/\/myoceane.fr\/#\/schema\/person\/4a4552fb8c27693083d465e12db7658b"},"headline":"[Spark] Define and Register Hive UDF with Spark Rapids","datePublished":"2023-12-24T07:53:52+00:00","dateModified":"2023-12-28T08:10:07+00:00","mainEntityOfPage":{"@id":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/"},"wordCount":143,"commentCount":3,"publisher":{"@id":"https:\/\/myoceane.fr\/#\/schema\/person\/4a4552fb8c27693083d465e12db7658b"},"image":{"@id":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#primaryimage"},"thumbnailUrl":"https:\/\/myoceane.fr\/wp-content\/uploads\/2022\/08\/RAPIDSSpark.png","keywords":["Hive UDF","Spark Rapids"],"articleSection":["Big Data &amp; Machine Learning","IT Technology"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/","url":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/","name":"[Spark] Define and Register Hive UDF with Spark Rapids - \u60f3\u65b9\u6d89\u6cd5 - \u91cf\u74f6\u5916\u7684\u5929\u7a7a M-Y-Oceane","isPartOf":{"@id":"https:\/\/myoceane.fr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#primaryimage"},"image":{"@id":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#primaryimage"},"thumbnailUrl":"https:\/\/myoceane.fr\/wp-content\/uploads\/2022\/08\/RAPIDSSpark.png","datePublished":"2023-12-24T07:53:52+00:00","dateModified":"2023-12-28T08:10:07+00:00","breadcrumb":{"@id":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#primaryimage","url":"https:\/\/myoceane.fr\/wp-content\/uploads\/2022\/08\/RAPIDSSpark.png","contentUrl":"https:\/\/myoceane.fr\/wp-content\/uploads\/2022\/08\/RAPIDSSpark.png","width":936,"height":248},{"@type":"BreadcrumbList","@id":"https:\/\/myoceane.fr\/index.php\/spark-define-and-register-hive-udf-with-spark-rapids\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/myoceane.fr\/"},{"@type":"ListItem","position":2,"name":"[Spark] Define and Register Hive UDF with Spark Rapids"}]},{"@type":"WebSite","@id":"https:\/\/myoceane.fr\/#website","url":"https:\/\/myoceane.fr\/","name":"M-Y-Oceane \u60f3\u65b9\u6d89\u6cd5\u3002\u91cf\u74f6\u5916\u7684\u5929\u7a7a","description":"\u60f3\u65b9\u6d89\u6cd5, France, Taiwan, Health, Information Technology","publisher":{"@id":"https:\/\/myoceane.fr\/#\/schema\/person\/4a4552fb8c27693083d465e12db7658b"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/myoceane.fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/myoceane.fr\/#\/schema\/person\/4a4552fb8c27693083d465e12db7658b","name":"\u6ab8\u6aac\u7238","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/myoceane.fr\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/6cc678684664f8ad45a8d56a6630b183?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/6cc678684664f8ad45a8d56a6630b183?s=96&d=mm&r=g","caption":"\u6ab8\u6aac\u7238"},"logo":{"@id":"https:\/\/myoceane.fr\/#\/schema\/person\/image\/"},"url":"https:\/\/myoceane.fr\/index.php\/author\/johnny5584767gmail-com\/"}]}},"amp_enabled":false,"_links":{"self":[{"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/posts\/9480","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/comments?post=9480"}],"version-history":[{"count":32,"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/posts\/9480\/revisions"}],"predecessor-version":[{"id":9563,"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/posts\/9480\/revisions\/9563"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/media\/8700"}],"wp:attachment":[{"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/media?parent=9480"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/categories?post=9480"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/myoceane.fr\/index.php\/wp-json\/wp\/v2\/tags?post=9480"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}