2019년

tensorflow를 spark과 함께 돌려보자(yahoo의 tensorflowonSpark)

위지원 2019. 2. 22. 15:50

tensor를 하다가 spark이랑 같이 돌려볼수 있나? 라고 생각해서 검색해봤더니 역시나 있다 해보자.


야후에서 배포하는 TensoFlowOnSpark를 먼저해보자 https://github.com/yahoo/TensorFlowOnSpark


YARN 버전은 https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_YARN

나는 standalone으로 구동하겠다. https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_Standalone


가이드라인이 잘 되어 있어서 그대로 따라하면 문제가 없을 것 같다.



설치는 그냥


$ pip install tensorflowonspark


Installing collected packages: tensorflowonspark
Successfully installed tensorflowonspark-1.4.2


spark 실행까지 해준다. 설정을 ㅇㅏ래와 같이 해주어도 되지만 난 그냥 spark-env.sh 에 고정해서 했다.


>>export MASTER=spark://$(hostname):7077
>>export SPARK_WORKER_INSTANCES=2
>>export CORES_PER_WORKER=1 
>>export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) 
>>${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}


웹 UI로 접속하면 문제없이 실행 완료 *_*




나는 3을 좋아해서 인스턴스를 3을 주었다.


나중에 모델 학습시에 에러가 났다.

2019-02-22 06:27:05,656 INFO (MainThread-15877) waiting for 1 reservations

그래서 worker Instance를 2로 하고 다시했더니 에러 해결됨..



pyspark를 실행시켜서 잘되는지 체크도 해줘보자 아주 문제없다


Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.1
      /_/

Using Python version 3.6.7 (default, Oct 22 2018 11:32:17)
SparkSession available as 'spark'.
>>> import tensorflow as tf
>>> from tensorflowonspark import TFCluster
>>> exit()



깃에서 데이터를 ㄷㅏ운 받아주자.


>>git clone https://github.com/yahoo/TensorFlowOnSpark.git
>>cd TensorFlowOnSpark
>>export TFoS_HOME=$(pwd)


가이드에 있는대로 mnist 데이터셋을 다운받아주고...


>>mkdir ${TFoS_HOME}/mnist
>>pushd ${TFoS_HOME}/mnist
>>curl -O "http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz"
>>curl -O "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz"
>>curl -O "http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz"
>>curl -O "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz"
>>popd


(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark$ git clone https://github.com/yahoo/TensorFlowOnSpark.git
Cloning into 'TensorFlowOnSpark'...
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 2934 (delta 0), reused 2 (delta 0), pack-reused 2927
Receiving objects: 100% (2934/2934), 2.76 MiB | 2.52 MiB/s, done.
Resolving deltas: 100% (1724/1724), done.
(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark$ cd TensorFlowOnSpark
(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark$ export TFoS_HOME=$(pwd)


다운받은 경로로 이동해서 mnist파일을 변경해준다


>>cd ${TFoS_HOME}
>>${SPARK_HOME}/bin/spark-submit \
--master ${MASTER} \
>>${TFoS_HOME}/examples/mnist/mnist_data_setup.py \
--output examples/mnist/csv \
--format csv

그럼 여러가지 로그가 뜨고,, 변환이 완료된다.


(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark$ ${SPARK_HOME}/bin/spark-submit --master ${MASTER} ${TFoS_HOME}/examples/mnist/mnist_data_setup.py --output examples/mnist/csv --format csv


그럼 아래와 같이 파일이 생긴다.


.
├── test
│   ├── images
│   │   ├── part-00000
│   │   ├── part-00001
│   │   ├── part-00002
│   │   ├── part-00003
│   │   ├── part-00004
│   │   ├── part-00005
│   │   ├── part-00006
│   │   ├── part-00007
│   │   ├── part-00008
│   │   ├── part-00009
│   │   └── _SUCCESS
│   └── labels
│       ├── part-00000
│       ├── part-00001
│       ├── part-00002
│       ├── part-00003
│       ├── part-00004
│       ├── part-00005
│       ├── part-00006
│       ├── part-00007
│       ├── part-00008
│       ├── part-00009
│       └── _SUCCESS
└── train
    ├── images
    │   ├── part-00000
    │   ├── part-00001
    │   ├── part-00002
    │   ├── part-00003
    │   ├── part-00004
    │   ├── part-00005
    │   ├── part-00006
    │   ├── part-00007
    │   ├── part-00008
    │   ├── part-00009
    │   └── _SUCCESS
    └── labels
        ├── part-00000
        ├── part-00001
        ├── part-00002
        ├── part-00003
        ├── part-00004
        ├── part-00005
        ├── part-00006
        ├── part-00007
        ├── part-00008
        ├── part-00009
        └── _SUCCESS


모델을 학습해보자.


(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark/examples/mnist/csv$ ${SPARK_HOME}/bin/spark-submit \
> --master ${MASTER} \
> --py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist.py \
> --conf spark.cores.max=${TOTAL_CORES} \
> --conf spark.task.cpus=${CORES_PER_WORKER} \
> --conf spark.executorEnv.JAVA_HOME="$JAVA_HOME" \
> ${TFoS_HOME}/examples/mnist/spark/mnist_spark.py \
> --cluster_size ${SPARK_WORKER_INSTANCES} \
> --images examples/mnist/csv/train/images \
> --labels examples/mnist/csv/train/labels \
> --format csv \
> --mode train \
> --model mnist_model


학습이 완료되면 다음과 같이 mnist_model 폴더가 생긴다.


(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark/mnist_model$ tree
.
├── checkpoint
├── events.out.tfevents.1550817346.weeserver
├── graph.pbtxt
├── model.ckpt-0.data-00000-of-00001
├── model.ckpt-0.index
├── model.ckpt-0.meta
├── model.ckpt-594.data-00000-of-00001
├── model.ckpt-594.index
├── model.ckpt-594.meta
└── train
    └── done
        └── 0


학습 된 모델을 이용해서 예측을 해보자


>>${SPARK_HOME}/bin/spark-submit \
--master ${MASTER} \
--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist.py \
--conf spark.cores.max=${TOTAL_CORES} \
--conf spark.task.cpus=${CORES_PER_WORKER} \
--conf spark.executorEnv.JAVA_HOME="$JAVA_HOME" \
${TFoS_HOME}/examples/mnist/spark/mnist_spark.py \
--cluster_size ${SPARK_WORKER_INSTANCES} \
--images examples/mnist/csv/test/images \
--labels examples/mnist/csv/test/labels \
--mode inference \
--format csv \
--model mnist_model \
--output predictions

그럼 아래와 같이 predictions라는 폴더가 생긴다.


(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark/predictions$ tree
.
├── part-00000
├── part-00001
├── part-00002
├── part-00003
├── part-00004
├── part-00005
├── part-00006
├── part-00007
├── part-00008
├── part-00009
└── _SUCCESS


파일 내용을 조금 보기위해 less predictions/part-00000를 입력하면 아래와 같이 나온다!


2019-02-22T06:45:08.484559 Label: 4, Prediction: 4
2019-02-22T06:45:08.484562 Label: 0, Prediction: 0
2019-02-22T06:45:08.484565 Label: 6, Prediction: 6
2019-02-22T06:45:08.484567 Label: 0, Prediction: 0
2019-02-22T06:45:08.484570 Label: 1, Prediction: 1
2019-02-22T06:45:08.484573 Label: 2, Prediction: 2
2019-02-22T06:45:08.484575 Label: 3, Prediction: 3
2019-02-22T06:45:08.484578 Label: 4, Prediction: 4
2019-02-22T06:45:08.484580 Label: 7, Prediction: 7
2019-02-22T06:45:08.484583 Label: 8, Prediction: 8


다른 부분도 좀 봤다. 예측을 틀리는 부분도 있긴 하다.


2019-02-22T06:45:08.484446 Label: 3, Prediction: 3
2019-02-22T06:45:08.484479 Label: 9, Prediction: 4
2019-02-22T06:45:08.484482 Label: 9, Prediction: 9
2019-02-22T06:45:08.484485 Label: 8, Prediction: 8
2019-02-22T06:45:08.484488 Label: 4, Prediction: 4
2019-02-22T06:45:08.484490 Label: 1, Prediction: 1
2019-02-22T06:45:08.484493 Label: 0, Prediction: 0
2019-02-22T06:45:08.484496 Label: 6, Prediction: 6
2019-02-22T06:45:08.484499 Label: 0, Prediction: 0
2019-02-22T06:45:08.484502 Label: 9, Prediction: 9


재밌ㄷ ㅏ끝 !