基於國內阿里雲鏡像解決kubeflow一鍵安裝

google出品在國內都存在牆的問題,而kubeflow作為雲原生的機器學習套件對團隊的幫助很大,對於無翻牆條件的團隊,基於國內鏡像搭建kubeflow可以幫助大家解決不少麻煩,這裡給大家提供一套基於國內阿里雲鏡像的kubeflow 0.6的安裝方案。


環境準備

kubeflow 為環境要求比較高,看官方要求:
at least one worker node with a minimum of:

  • 4 CPU
  • 50 GB storage
  • 12 GB memory

當然,沒達到也能安裝,不過在後面使用中會出現資源問題,因為這是整包安裝方案。

一個已經安裝好的kubernetes集群,這裡我採用的是rancher安裝的集群。

<code>sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher/<code>

這裡我選擇的是k8s的1.14版本,kubeflow和k8s之間的版本兼容可以查看官網說明,這裡我的kubeflow採用了0.6版本。

也可以直接創建阿里雲kubernetes(記得需要選擇1.14版本):

基於國內阿里雲鏡像解決kubeflow一鍵安裝

如果直接想安裝可以直接跳到最後kubeflow一鍵安裝部分

kustomize安裝

下載kustomize文件

官方的教程是用 kfclt 安裝的,kfclt 本質上是使用了 kustomize 來安裝,因此這裡我直接下載 kustomize 文件,通過修改鏡像的方式安裝。

官方kustomize文件下載地址

<code>git clone https://github.com/kubeflow/manifests
cd manifests
git checkout v0.6-branch
cd <target>/base
kubectl kustomize . | tee <output>/<target>/<code>

文件比較多,可以用腳本分別導出,也可以用 kfctl 命令生成kfctl generate all -V:

<code>kustomize/
├── ambassador.yaml
├── api-service.yaml
├── argo.yaml
├── centraldashboard.yaml
├── jupyter-web-app.yaml
├── katib.yaml
├── metacontroller.yaml
├── minio.yaml
├── mysql.yaml
├── notebook-controller.yaml
├── persistent-agent.yaml
├── pipelines-runner.yaml
├── pipelines-ui.yaml
├── pipelines-viewer.yaml
├── pytorch-operator.yaml
├── scheduledworkflow.yaml
├── tensorboard.yaml
└── tf-job-operator.yaml/<code>

ambassador 微服務網關
argo 用於任務工作流編排
centraldashboard kubeflow的dashboard看板頁面
tf-job-operator 深度學習框架引擎,一個基於tensorflow構建的CRD,資源類型kind為TFJob
katib 超參數服務器

機器學習套件使用流程

基於國內阿里雲鏡像解決kubeflow一鍵安裝

修改kustomize文件

修改kustomize鏡像

修改鏡像:

<code>grc_image = [
"gcr.io/kubeflow-images-public/ingress-setup:latest",
"gcr.io/kubeflow-images-public/admission-webhook:v20190520-v0-139-gcee39dbc-dirty-0d8f4c",
"gcr.io/kubeflow-images-public/kubernetes-sigs/application:1.0-beta",
"gcr.io/kubeflow-images-public/centraldashboard:v20190823-v0.6.0-rc.0-69-gcb7dab59",
"gcr.io/kubeflow-images-public/jupyter-web-app:9419d4d",
"gcr.io/kubeflow-images-public/katib/v1alpha2/katib-controller:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/katib-manager:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/katib-manager-rest:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-bayesianoptimization:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-grid:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-hyperband:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-nasrl:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/suggestion-random:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/katib/v1alpha2/katib-ui:v0.6.0-rc.0",
"gcr.io/kubeflow-images-public/metadata:v0.1.8",
"gcr.io/kubeflow-images-public/metadata-frontend:v0.1.8",
"gcr.io/ml-pipeline/api-server:0.1.23",
"gcr.io/ml-pipeline/persistenceagent:0.1.23",
"gcr.io/ml-pipeline/scheduledworkflow:0.1.23",
"gcr.io/ml-pipeline/frontend:0.1.23",
"gcr.io/ml-pipeline/viewer-crd-controller:0.1.23",
"gcr.io/kubeflow-images-public/notebook-controller:v20190603-v0-175-geeca4530-e3b0c4",
"gcr.io/kubeflow-images-public/profile-controller:v20190619-v0-219-gbd3daa8c-dirty-1ced0e",
"gcr.io/kubeflow-images-public/kfam:v20190612-v0-170-ga06cdb79-dirty-a33ee4",
"gcr.io/kubeflow-images-public/pytorch-operator:v1.0.0-rc.0",
"gcr.io/google_containers/spartakus-amd64:v1.1.0",
"gcr.io/kubeflow-images-public/tf_operator:v0.6.0.rc0",
"gcr.io/arrikto/kubeflow/oidc-authservice:v0.2"
]

doc_image = [
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.ingress-setup:latest",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.admission-webhook:v20190520-v0-139-gcee39dbc-dirty-0d8f4c",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.kubernetes-sigs.application:1.0-beta",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.centraldashboard:v20190823-v0.6.0-rc.0-69-gcb7dab59",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.jupyter-web-app:9419d4d",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-controller:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-manager:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-manager-rest:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-bayesianoptimization:v0.6.0-rc.0",

"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-grid:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-hyperband:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-nasrl:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.suggestion-random:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.katib.v1alpha2.katib-ui:v0.6.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.metadata:v0.1.8",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.metadata-frontend:v0.1.8",
"registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.api-server:0.1.23",
"registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.persistenceagent:0.1.23",
"registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.scheduledworkflow:0.1.23",
"registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.frontend:0.1.23",
"registry.cn-shenzhen.aliyuncs.com/shikanon/ml-pipeline.viewer-crd-controller:0.1.23",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.notebook-controller:v20190603-v0-175-geeca4530-e3b0c4",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.profile-controller:v20190619-v0-219-gbd3daa8c-dirty-1ced0e",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.kfam:v20190612-v0-170-ga06cdb79-dirty-a33ee4",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.pytorch-operator:v1.0.0-rc.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/google_containers.spartakus-amd64:v1.1.0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/kubeflow-images-public.tf_operator:v0.6.0.rc0",
"registry.cn-shenzhen.aliyuncs.com/shikanon/arrikto.kubeflow.oidc-authservice:v0.2"
]/<code>

修改PVC,使用動態存儲

修改pvc存儲,採用local-path-provisioner動態分配PV。

安裝local-path-provisioner:

<code>kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml/<code>

如果想直接在kubeflow中使用,還需要將StorageClass改為默認存儲:

<code>...
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: local-path
annotations: #添加為默認StorageClass
storageclass.beta.kubernetes.io/is-default-class: "true"
provisioner: rancher.io/local-path
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete
.../<code>

完成後可以建一個PVC試試:

<code>apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: local-path-pvc
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi/<code>

注:如果沒有設為默認storageclass需要在PVC加入storageClassName: local-path進行綁定

一鍵安裝

這裡我製作了一個一鍵啟動的國內鏡像版kubeflow項目:
https://github.com/shikanon/kubeflow-manifests

基於國內阿里雲鏡像解決kubeflow一鍵安裝


https://developer.aliyun.com/article/740721?spm=a2c6h.12873581.0.dArticle740721.1ba61219pe9YnP&groupCode=mvp


分享到:


相關文章: