[BioI] Cromwell 學習紀錄

Cromwell 是一個用來執行工作流敘述語言 WDL (workflow description language) 的平台,他是由 Broad Institute 開發的一個開源專案,Cromwell 比較厲害的地方是可以利用不同的雲平台當作他的後台來跑 WDL 的工作流腳本,例如亞馬遜的 AWS 或是 Google Cloud,以下提供一個 AWS 介紹 Cromwell 的影片,本篇想要簡單記錄一些有關 Cromwell 的基本知識。

關於 Cromwell 的專案可以在 Github 裡面看到,下載 cromwell 專案之後可以跑 

git clone https://github.com/broadinstitute/cromwell.git
cd cromwell
sbt assembly

就可以利用 SBT (Simple Build Tool) 去編譯並且打包 Cromwell,理論上成功跑完,以上的 cmd 可以得到一個 server 的 jar 檔,如下圖所示:

user:scala-2.12 user$ ls -l
total 458768
drwxr-xr-x  6 user  staff        192  9 24 10:15 classes
-rw-r--r--  1 user  staff  234582613  9 24 10:17 cromwell-54-a3242c6-SNAP.jar
drwxr-xr-x  3 user  staff         96  9 24 10:11 resource_managed

備註:如果是要清除已經生成的 jar 檔,因為會一直累積,可以使用

sbt clean cleanFiles

啟動 Cromwell Swagger API

當有了 cromwell-XXX.jar,此時可以利用以下指令啟動 Cromwell Swagger。

java -Dwebservice.port=8080 
     -jar cromwell-54-a3242c6-SNAP.jar 
     server

log 檔:

2020-09-24 10:59:07,981  INFO  - Running with database db.url = jdbc:hsqldb:mem:cfaf72e7-f82e-4703-a693-6d46fb6760ef;shutdown=false;hsqldb.tx=mvcc
2020-09-24 10:59:14,934  INFO  - Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000
2020-09-24 10:59:14,945  INFO  - [RenameWorkflowOptionsInMetadata] 100%
2020-09-24 10:59:15,037  INFO  - Running with database db.url = jdbc:hsqldb:mem:4c88b2a2-928a-47da-84bb-0caed4b5b91b;shutdown=false;hsqldb.tx=mvcc
2020-09-24 10:59:15,455  INFO  - Slf4jLogger started
2020-09-24 10:59:15,727 cromwell-system-akka.dispatchers.engine-dispatcher-6 INFO  - Workflow heartbeat configuration:
{
  "cromwellId" : "cromid-3543c76",
  "heartbeatInterval" : "2 minutes",
  "ttl" : "10 minutes",
  "failureShutdownDuration" : "5 minutes",
  "writeBatchSize" : 10000,
  "writeThreshold" : 10000
}
2020-09-24 10:59:15,774 cromwell-system-akka.dispatchers.service-dispatcher-12 INFO  - Metadata summary refreshing every 1 second.
2020-09-24 10:59:15,784 cromwell-system-akka.actor.default-dispatcher-13 INFO  - KvWriteActor configured to flush with batch size 200 and process rate 5 seconds.
2020-09-24 10:59:15,784 cromwell-system-akka.dispatchers.service-dispatcher-8 INFO  - WriteMetadataActor configured to flush with batch size 200 and process rate 5 seconds.
2020-09-24 10:59:15,789 cromwell-system-akka.dispatchers.engine-dispatcher-30 INFO  - JobStoreWriterActor configured to flush with batch size 1000 and process rate 1 second.
2020-09-24 10:59:15,800 cromwell-system-akka.dispatchers.engine-dispatcher-30 INFO  - CallCacheWriteActor configured to flush with batch size 100 and process rate 3 seconds.
2020-09-24 10:59:15,800  WARN  - 'docker.hash-lookup.gcr-api-queries-per-100-seconds' is being deprecated, use 'docker.hash-lookup.gcr.throttle' instead (see reference.conf)
2020-09-24 10:59:16,090 cromwell-system-akka.dispatchers.engine-dispatcher-30 INFO  - JobExecutionTokenDispenser - Distribution rate: 50 per 1 seconds.
2020-09-24 10:59:16,999 cromwell-system-akka.dispatchers.engine-dispatcher-6 INFO  - Cromwell 54-a3242c6-SNAP service started on 0:0:0:0:0:0:0:0:8080...
2020-09-24 10:59:21,106 cromwell-system-akka.dispatchers.engine-dispatcher-15 INFO  - Not triggering log of token queue status. Effective log interval = None

啟動之後 Cromwell Server REST API 的畫面
直接執行 WDL (Workflow Description Language)

我們也可以直接透過 jar 檔去執行一個 wdl 腳本,這邊可以配合不同的雲平台來推送 workflow 的工作!

java -Dconfig.file=~/cromwell/cromwell.example.backends/AWS.conf 
     -jar cromwell-54-1637396-SNAP.jar 
     run ~/cromwell_wdl/hello_aws.wdl

備註:以上是 AWS 的範例。

使用 Curl 推送 Wdl 檔案

上面提供 REST API Swagger 的介面,用戶除了可以利用網頁的介面推送工作流之外,也可以利用 curl 的指令去推送 HTTP 的工作,以下提供一個 curl 的範例,除此之外也可以在 swagger 介面取得相對應的指令細節,官網的介紹

curl -X POST --header "Accept: application/json"\
    -v "localhost:8080/api/workflows/v1" \
    -F workflowSource=@aws.wdl \
    -F workflowInputs=@aws.inputs

備註:從 Swagger 取得相對應 curl 方法的截圖。