Cloudera Data Engineering (CDE) on ECS
This article explains the steps to deploy the CDE service on the ECS platform after successful configuration of the CDP PvC Data Services Management Console.
CDE Deployment
In CM, navigate to
Data Services
. ClickOpen CDP Private Cloud Data Services
.The browser will redirect to the following page. Click
Data Engineering
.At the CDE main portal, you may enable the CDE service. Click
Enable CDE Service
.Fill in the fields below and click
Enable
.Next, you may create a new virtual cluster. Click
Create DE Cluster
.Fill in the fields below and click
Create
.The virtual CDE cluster is ready to run the Spark/Airflow job.
Register the user credential in the form of K8s secret object into the CDE virtual cluster namespace.
# ./cdp-cde-utils.sh init-user-in-virtual-cluster -h p2dmnmzb.cde-4c9twhtd.apps.ecs1.cdpkvm.cldr -u ldapuser2 -p ldapuser2.principal -k ldapuser2.keytab
You may also run the CDE job using
cde
CLI. Thecde
tool can be downloaded via the CDE virtual cluster landing page. Filename/root/credentials
stores the password of the user.# cat .cde/config.yaml user: ldapuser2 auth-pass-file: /root/credentials vcluster-endpoint: https://p2dmnmzb.cde-4c9twhtd.apps.ecs1.cdpkvm.cldr/dex/api/v1 tls-insecure: true # ./cde job create --type spark --application-file spark_wordcount.py --mount-1-resource resource1 --driver-cores 1 --driver-memory 4g --num-executors 0 --name wordcountjob --log-level DEBUG # ./cde job run --name wordcountjob
CDE Artifacts inside ECS Platform
# kubectl get ns | head -1 ; kubectl get ns | grep dex
NAME STATUS AGE
dex-app-nqjfkfb2 Active 25m
dex-base-ggmgt8m4 Active 30m
# kubectl -n dex-app-nqjfkfb2 get pods
NAME READY STATUS RESTARTS AGE
dex-app-nqjfkfb2-airflow-scheduler-79bb7fcc9-2nt5k 1/1 Running 0 31m
dex-app-nqjfkfb2-airflow-web-68bbb47bc8-mqk66 1/1 Running 0 31m
dex-app-nqjfkfb2-airflowapi-6758987794-tplfn 2/2 Running 2 31m
dex-app-nqjfkfb2-api-6cb85f94b9-qmj5b 1/1 Running 0 31m
dex-app-nqjfkfb2-livy-564c8b45c8-4r4ng 1/1 Running 0 31m
dex-app-nqjfkfb2-safari-77fb94577-whrjd 1/1 Running 0 31m
# kubectl -n dex-app-nqjfkfb2 get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
airflow-dags Bound pvc-1a57ffb4-92dc-4c03-a958-38702549ceb1 100Gi RWX longhorn-nfs 28m
airflow-logs Bound pvc-d424fdf6-2035-4418-8d95-03769926a069 100Gi RWX longhorn-nfs 28m
dex-app-nqjfkfb2-livystate-pvc Bound pvc-b9088e9a-1fb6-42a8-9e34-5135f0e1ce07 100Gi RWX longhorn-nfs 28m
dex-app-nqjfkfb2-safari-pvc Bound pvc-e565af81-424a-4e2c-8b32-ade212159492 100Gi RWX longhorn-nfs 28m
dex-app-nqjfkfb2-storage-pvc Bound pvc-af9ad8cb-b069-45cd-8338-97351ba0bacd 100Gi RWX longhorn-nfs 28m
# kubectl -n dex-base-ggmgt8m4 get pods
NAME READY STATUS RESTARTS AGE
cdp-cde-embedded-db-0 1/1 Running 0 34m
dex-base-configs-manager-686d55b995-992nl 2/2 Running 0 34m
dex-base-dex-downloads-5fb84f65c6-sxqj6 1/1 Running 0 34m
dex-base-ggmgt8m4-controller-6d6c7d598b-79rh9 1/1 Running 0 34m
dex-base-grafana-67d95886cf-kcjpl 1/1 Running 0 34m
dex-base-knox-5d4b8fd79d-f7nxz 1/1 Running 0 34m
dex-base-management-api-5f76b698f-hqmss 1/1 Running 4 34m
fluentd-forwarder-6747b5b567-bmv5x 1/1 Running 0 34m
# kubectl -n dex-base-ggmgt8m4 get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
dex-base-db-pvc Bound pvc-a12be944-e529-4f23-afad-d0ec58fb9677 100Gi RWO longhorn 34m
dex-base-grafana Bound pvc-64a82154-e511-4262-960b-92b4be27d631 10Gi RWO longhorn 34m
# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
local-path rancher.io/local-path Delete WaitForFirstConsumer false 3d12h
longhorn (default) driver.longhorn.io Delete Immediate true 3d12h
longhorn-nfs nfs.longhorn.io Delete Immediate false 3d12h