[Big Data] 建立 BigData 在 Local 的測試環境

Posted on 2019-08-212019-08-21 by 檸檬爸

Post Views: 3,046

在開發 Spark 大數據程式的時候，基本上都會遇到測試的需求，但是可能當時並沒有建立相對應的服務 Cluster 例如 Hdfs, Hive, HBase 等等的服務群集，所以在開發上面會遇到很多困難，其實 Hdfs 還算是比較好解決的，使用 FileSystem 某種程度上面還是可以利用本機的磁碟模擬 Hdfs 測試其與 Java 程式的溝通情況，但是如果遇到像是 Hive, HBase 等等的資料庫，在沒有真正群集的情況之下如何測試自己的程式就變得非常需要了！

本篇要介紹的是一個第三方函式庫可以幫助我們單元測試 Java 的程式：

Hadoop-Mini-Cluster

首先建立 SparkTestSuite 供給 Local 端的測試環境，並且在裡面建立 Hadoop 的 Cluster，例如以下程式碼所呈現：

package myoceane.testing

import com.github.sakserv.minicluster.impl.HdfsLocalCluster;

public abstract class SparkTestSuite {

	protected static HdfsLocalCluster hdfsLocalCluster = null;
	protected static FileSystem fsContext;
	protected static String tempDir;
	
	@BeforeClass
	public static void beforeAll(){
		configureHadoop();
		hdfsLocalCluster = new HdfsLocalCluster.Builder()
			.setHdfsNamenodePort(12345)
			.setHdfsNamenodeHttpPort(12341)
			.setHdfsTempDir("target/embedded_hdfs")
			.setHdfsNumDataNodes(1)
			.setHdfsEnablePermissions(false)
			.setHdfsFormat(true)
			.setHdfsEnableRunningUserAsProxyUser(true)
			.setHdfsConfig(new Configuration())
			.build();
		hdfsLocalCluster.start();
		fsContext = hdfsLocalCluster.getHdfsFileSystemHandle();
		setHiveAndMetaStore();
	}
}

取得 fsContext 之後就可以進行 hdfs 的操作例如：

fsContext.mkdirs(new Path(path));
fsContext.copyFromLocalFile(new Path(file), new Path(inHdfs));

在 configureHadoop() 的函式裡面，主要執行的是以下內容，首先判斷執行的環境，如果是 Windows 的環境的話，需要將 hadoop.dll, hdfs.dll, libwinutils.lib 與 winutils.exe 載入，設定好這些參數才能夠成功啟動 Hdfs Cluster.

private static void configureHadoop() throws Exception { 
	if (SystemUtils.IS_OS_WINDOWS){ 
		tempDir = new File("target/").getAbsolutePath() + "/"; 
		saveAndLoadResource(new File(tempDir + "hadoop"), "lib/hadoop.dll", true, true); 
		saveAndLoadResource(new File(tempDir + "hadoop"), "lib/hdfs.dll", true, true); 
		saveAndLoadResource(new File(tempDir + "hadoop"), "lib/libwinutils.lib", true, false); 
		saveAndLoadResource(new File(tempDir + "hadoop"), "bin/winutils.exe", true, false); 
	} else { 
		tempDir = "/tmp/"; 
		command = "chmod 777 /tmp/hive"; 
	} 
	
	String hadoopHome = tempDir + "hadoop"; 
	System.setProperty("HADOOP_HOME", hadoopHome); 
	System.setProperty("hadoop.home.dir", hadoopHome); 
	System.setProperty("spark.testing", "true");
}

關於 Hive MetaStore 與 MetaWareHouse 的設定與啟動，可以參考 startHiveAndMetaStore() 以下範例將 Hive 資料庫設定在 Port = 12356，相較於 Hdfs 則是設定在 Port = 12345。

private static void startHiveAndMetaStore(){

	hivePort = 12356; 
	String hostName = "localhost"; 
	String hiveScratchDir = "hive"; 
	String hiveWareHouseDir = "hive" + File.separator + "warehouse"; 
	String hiveMetastoreDir = "hive" + File.separator + "metastore_db";
	String hiveUrls = "thrift://" + hostName + ":" + hivePort;
	
	System.setProperty("hive.exec.scratchdir", "hdfs://127.0.0.1:12345/" + hiveScratchDir);	
	System.setProperty("hive.warehouse.dir", "hdfs://127.0.0.1:12345/" + hiveWareHouseDir);	
	System.setProperty("hive.metastore.derby.db.dir", "hdfs://127.0.0.1:12345/" + hiveMetastoreDir);	
	System.setProperty("hive.metastore.hostname", hostName);	
	System.setProperty("hive.metastore.port", hivePort.toString());	
	System.setProperty("hive.metastore.urls", hiveUrls);
	
	hiveConf = new HiveConf();
	hiveConf.setVar(HiveConf.ConfVars.METASTOREURLS, hiveUrls);
	hiveConf.setVar(HiveConf.ConfVars.HADOOPFS, "hdfs://127.0.0.1:12345/");
	hiveConf.setVar(HiveConf.ConfVars.SCRATCHDIR, "hdfs://127.0.0.1:12345/" + hiveScratchDir);
	hiveConf.setVar(HiveConf.ConfVars.METASTORECONNECURLKEY, "jdbc:derby:;databaseName=" + "target/embedded_hdfs/" + hiveMetastoreDir + "create=true");
	hiveConf.setVar(HiveConf.ConfVars.METASTOREWAREHOUSE, "hdfs://127.0.0.1:12345" + hiveWareHouseDir);
	hiveConf.setBoolVar(HiveConf.ConfVars.HIVE_IN_TEST, true);
	hiveConf.set("datanucleus.schema.autoCreateTables", "true");
	hiveConf.set("hive.metastore.schema.verification", "false");
	
	StartHiveLocalMetaStore startHiveLocalMetaStore = new StartHiveLocalMetaStore();
	startHiveLocalMetaStore.setHiveConf(hiveConf);
	startHiveLocalMetaStore.setHivePort(hivePort);
	Thread t = new Thread(startHiveLocalMetaStore);
	t.setDaemon(true);
	t.start();
}

到這裡 SparkTestSuite 算是大致上完成了，之後的所有 Unit Test 都可以繼承 SparkTestSuite 以使用暫時創造出來測試的 Hdfs 跟 Hive。當然這一個函式庫也支援創造其他的服務用做測試用，例如 HBase, Oozie, Kafka, etc.

備註：除了使用 hadoop-mini-clusters 之外，一般常用來測試 BigData ecosystem 的服務還有 Hadoop-Unit 這一個函式庫可以使用，可以參考以下連結：

Hadoop Unit 函式庫

[Big Data] 建立 BigData 在 Local 的測試環境

Leave a Reply Cancel reply

Most Viewed Posts

Categories

Recent Posts

Archives

Facebook Page Widget

Contact Us

檸檬媽

檸檬爸