tensorflow를 spark과 함께 돌려보자(yahoo의 tensorflowonSpark)
tensor를 하다가 spark이랑 같이 돌려볼수 있나? 라고 생각해서 검색해봤더니 역시나 있다 해보자.
야후에서 배포하는 TensoFlowOnSpark를 먼저해보자 https://github.com/yahoo/TensorFlowOnSpark
YARN 버전은 https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_YARN
나는 standalone으로 구동하겠다. https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_Standalone
가이드라인이 잘 되어 있어서 그대로 따라하면 문제가 없을 것 같다.
설치는 그냥
$ pip install tensorflowonspark
Installing collected packages: tensorflowonspark
Successfully installed tensorflowonspark-1.4.2
spark 실행까지 해준다. 설정을 ㅇㅏ래와 같이 해주어도 되지만 난 그냥 spark-env.sh 에 고정해서 했다.
>>export MASTER=spark://$(hostname):7077
>>
export SPARK_WORKER_INSTANCES=2
>>
export CORES_PER_WORKER=1
>>
export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES}))
>>
${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}
웹 UI로 접속하면 문제없이 실행 완료 *_*
나는 3을 좋아해서 인스턴스를 3을 주었다.
나중에 모델 학습시에 에러가 났다.
2019-02-22 06:27:05,656 INFO (MainThread-15877) waiting for 1 reservations
그래서 worker Instance를 2로 하고 다시했더니 에러 해결됨..
pyspark를 실행시켜서 잘되는지 체크도 해줘보자 아주 문제없다
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.2.1
/_/
Using Python version 3.6.7 (default, Oct 22 2018 11:32:17)
SparkSession available as 'spark'.
>>> import tensorflow as tf
>>> from tensorflowonspark import TFCluster
>>> exit()
깃에서 데이터를 ㄷㅏ운 받아주자.
>>git clone https://github.com/yahoo/TensorFlowOnSpark.git >>cd TensorFlowOnSpark
>>
export TFoS_HOME=$(pwd)
가이드에 있는대로 mnist 데이터셋을 다운받아주고...
>>mkdir ${TFoS_HOME}/mnist
>>
pushd ${TFoS_HOME}/mnist
curl -O "http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz"
>>
curl -O "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz"
>>
>>
curl -O "http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz"
>>
curl -O "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz"
popd
>>
(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark$ git clone https://github.com/yahoo/TensorFlowOnSpark.git
Cloning into 'TensorFlowOnSpark'...
remote: Enumerating objects: 7, done.
remote: Counting objects: 100% (7/7), done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 2934 (delta 0), reused 2 (delta 0), pack-reused 2927
Receiving objects: 100% (2934/2934), 2.76 MiB | 2.52 MiB/s, done.
Resolving deltas: 100% (1724/1724), done.
(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark$ cd TensorFlowOnSpark
(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark$ export TFoS_HOME=$(pwd)
다운받은 경로로 이동해서 mnist파일을 변경해준다
>>cd ${TFoS_HOME}
>>${SPARK_HOME}/bin/spark-submit \
--master ${MASTER} \
>>${TFoS_HOME}/examples/mnist/mnist_data_setup.py \
--output examples/mnist/csv \
--format csv
그럼 여러가지 로그가 뜨고,, 변환이 완료된다.
(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark$ ${SPARK_HOME}/bin/spark-submit --master ${MASTER} ${TFoS_HOME}/examples/mnist/mnist_data_setup.py --output examples/mnist/csv --format csv
그럼 아래와 같이 파일이 생긴다.
.
├── test
│ ├── images
│ │ ├── part-00000
│ │ ├── part-00001
│ │ ├── part-00002
│ │ ├── part-00003
│ │ ├── part-00004
│ │ ├── part-00005
│ │ ├── part-00006
│ │ ├── part-00007
│ │ ├── part-00008
│ │ ├── part-00009
│ │ └── _SUCCESS
│ └── labels
│ ├── part-00000
│ ├── part-00001
│ ├── part-00002
│ ├── part-00003
│ ├── part-00004
│ ├── part-00005
│ ├── part-00006
│ ├── part-00007
│ ├── part-00008
│ ├── part-00009
│ └── _SUCCESS
└── train
├── images
│ ├── part-00000
│ ├── part-00001
│ ├── part-00002
│ ├── part-00003
│ ├── part-00004
│ ├── part-00005
│ ├── part-00006
│ ├── part-00007
│ ├── part-00008
│ ├── part-00009
│ └── _SUCCESS
└── labels
├── part-00000
├── part-00001
├── part-00002
├── part-00003
├── part-00004
├── part-00005
├── part-00006
├── part-00007
├── part-00008
├── part-00009
└── _SUCCESS
모델을 학습해보자.
(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark/examples/mnist/csv$ ${SPARK_HOME}/bin/spark-submit \
> --master ${MASTER} \
> --py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist.py \
> --conf spark.cores.max=${TOTAL_CORES} \
> --conf spark.task.cpus=${CORES_PER_WORKER} \
> --conf spark.executorEnv.JAVA_HOME="$JAVA_HOME" \
> ${TFoS_HOME}/examples/mnist/spark/mnist_spark.py \
> --cluster_size ${SPARK_WORKER_INSTANCES} \
> --images examples/mnist/csv/train/images \
> --labels examples/mnist/csv/train/labels \
> --format csv \
> --mode train \
> --model mnist_model
학습이 완료되면 다음과 같이 mnist_model 폴더가 생긴다.
(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark/mnist_model$ tree
.
├── checkpoint
├── events.out.tfevents.1550817346.weeserver
├── graph.pbtxt
├── model.ckpt-0.data-00000-of-00001
├── model.ckpt-0.index
├── model.ckpt-0.meta
├── model.ckpt-594.data-00000-of-00001
├── model.ckpt-594.index
├── model.ckpt-594.meta
└── train
└── done
└── 0
학습 된 모델을 이용해서 예측을 해보자
>>${SPARK_HOME}/bin/spark-submit \
--master ${MASTER} \
--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist.py \
--conf spark.cores.max=${TOTAL_CORES} \
--conf spark.task.cpus=${CORES_PER_WORKER} \
--conf spark.executorEnv.JAVA_HOME="$JAVA_HOME" \
${TFoS_HOME}/examples/mnist/spark/mnist_spark.py \
--cluster_size ${SPARK_WORKER_INSTANCES} \
--images examples/mnist/csv/test/images \
--labels examples/mnist/csv/test/labels \
--mode inference \
--format csv \
--model mnist_model \
--output predictions
그럼 아래와 같이 predictions라는 폴더가 생긴다.
(tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark/predictions$ tree
.
├── part-00000
├── part-00001
├── part-00002
├── part-00003
├── part-00004
├── part-00005
├── part-00006
├── part-00007
├── part-00008
├── part-00009
└── _SUCCESS
파일 내용을 조금 보기위해 less predictions/part-00000를 입력하면 아래와 같이 나온다!
2019-02-22T06:45:08.484559 Label: 4, Prediction: 4
2019-02-22T06:45:08.484562 Label: 0, Prediction: 0
2019-02-22T06:45:08.484565 Label: 6, Prediction: 6
2019-02-22T06:45:08.484567 Label: 0, Prediction: 0
2019-02-22T06:45:08.484570 Label: 1, Prediction: 1
2019-02-22T06:45:08.484573 Label: 2, Prediction: 2
2019-02-22T06:45:08.484575 Label: 3, Prediction: 3
2019-02-22T06:45:08.484578 Label: 4, Prediction: 4
2019-02-22T06:45:08.484580 Label: 7, Prediction: 7
2019-02-22T06:45:08.484583 Label: 8, Prediction: 8
다른 부분도 좀 봤다. 예측을 틀리는 부분도 있긴 하다.
2019-02-22T06:45:08.484446 Label: 3, Prediction: 3
2019-02-22T06:45:08.484479 Label: 9, Prediction: 4
2019-02-22T06:45:08.484482 Label: 9, Prediction: 9
2019-02-22T06:45:08.484485 Label: 8, Prediction: 8
2019-02-22T06:45:08.484488 Label: 4, Prediction: 4
2019-02-22T06:45:08.484490 Label: 1, Prediction: 1
2019-02-22T06:45:08.484493 Label: 0, Prediction: 0
2019-02-22T06:45:08.484496 Label: 6, Prediction: 6
2019-02-22T06:45:08.484499 Label: 0, Prediction: 0
2019-02-22T06:45:08.484502 Label: 9, Prediction: 9
재밌ㄷ ㅏ끝 !