• tensorflow를 spark과 함께 돌려보자(yahoo의 tensorflowonSpark)

    2019. 2. 22. 15:50

    by. 위지원

    tensor를 하다가 spark이랑 같이 돌려볼수 있나? 라고 생각해서 검색해봤더니 역시나 있다 해보자.


    야후에서 배포하는 TensoFlowOnSpark를 먼저해보자 https://github.com/yahoo/TensorFlowOnSpark


    YARN 버전은 https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_YARN

    나는 standalone으로 구동하겠다. https://github.com/yahoo/TensorFlowOnSpark/wiki/GetStarted_Standalone


    가이드라인이 잘 되어 있어서 그대로 따라하면 문제가 없을 것 같다.



    설치는 그냥


    $ pip install tensorflowonspark


    Installing collected packages: tensorflowonspark
    Successfully installed tensorflowonspark-1.4.2


    spark 실행까지 해준다. 설정을 ㅇㅏ래와 같이 해주어도 되지만 난 그냥 spark-env.sh 에 고정해서 했다.


    >>export MASTER=spark://$(hostname):7077
    >>export SPARK_WORKER_INSTANCES=2
    >>export CORES_PER_WORKER=1 
    >>export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) 
    >>${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}


    웹 UI로 접속하면 문제없이 실행 완료 *_*




    나는 3을 좋아해서 인스턴스를 3을 주었다.


    나중에 모델 학습시에 에러가 났다.

    2019-02-22 06:27:05,656 INFO (MainThread-15877) waiting for 1 reservations

    그래서 worker Instance를 2로 하고 다시했더니 에러 해결됨..



    pyspark를 실행시켜서 잘되는지 체크도 해줘보자 아주 문제없다


    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /__ / .__/\_,_/_/ /_/\_\   version 2.2.1
          /_/

    Using Python version 3.6.7 (default, Oct 22 2018 11:32:17)
    SparkSession available as 'spark'.
    >>> import tensorflow as tf
    >>> from tensorflowonspark import TFCluster
    >>> exit()



    깃에서 데이터를 ㄷㅏ운 받아주자.


    >>git clone https://github.com/yahoo/TensorFlowOnSpark.git
    >>cd TensorFlowOnSpark
    >>export TFoS_HOME=$(pwd)


    가이드에 있는대로 mnist 데이터셋을 다운받아주고...


    >>mkdir ${TFoS_HOME}/mnist
    >>pushd ${TFoS_HOME}/mnist
    >>curl -O "http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz"
    >>curl -O "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz"
    >>curl -O "http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz"
    >>curl -O "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz"
    >>popd


    (tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark$ git clone https://github.com/yahoo/TensorFlowOnSpark.git
    Cloning into 'TensorFlowOnSpark'...
    remote: Enumerating objects: 7, done.
    remote: Counting objects: 100% (7/7), done.
    remote: Compressing objects: 100% (7/7), done.
    remote: Total 2934 (delta 0), reused 2 (delta 0), pack-reused 2927
    Receiving objects: 100% (2934/2934), 2.76 MiB | 2.52 MiB/s, done.
    Resolving deltas: 100% (1724/1724), done.
    (tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark$ cd TensorFlowOnSpark
    (tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark$ export TFoS_HOME=$(pwd)


    다운받은 경로로 이동해서 mnist파일을 변경해준다


    >>cd ${TFoS_HOME}
    >>${SPARK_HOME}/bin/spark-submit \
    --master ${MASTER} \
    >>${TFoS_HOME}/examples/mnist/mnist_data_setup.py \
    --output examples/mnist/csv \
    --format csv

    그럼 여러가지 로그가 뜨고,, 변환이 완료된다.


    (tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark$ ${SPARK_HOME}/bin/spark-submit --master ${MASTER} ${TFoS_HOME}/examples/mnist/mnist_data_setup.py --output examples/mnist/csv --format csv


    그럼 아래와 같이 파일이 생긴다.


    .
    ├── test
    │   ├── images
    │   │   ├── part-00000
    │   │   ├── part-00001
    │   │   ├── part-00002
    │   │   ├── part-00003
    │   │   ├── part-00004
    │   │   ├── part-00005
    │   │   ├── part-00006
    │   │   ├── part-00007
    │   │   ├── part-00008
    │   │   ├── part-00009
    │   │   └── _SUCCESS
    │   └── labels
    │       ├── part-00000
    │       ├── part-00001
    │       ├── part-00002
    │       ├── part-00003
    │       ├── part-00004
    │       ├── part-00005
    │       ├── part-00006
    │       ├── part-00007
    │       ├── part-00008
    │       ├── part-00009
    │       └── _SUCCESS
    └── train
        ├── images
        │   ├── part-00000
        │   ├── part-00001
        │   ├── part-00002
        │   ├── part-00003
        │   ├── part-00004
        │   ├── part-00005
        │   ├── part-00006
        │   ├── part-00007
        │   ├── part-00008
        │   ├── part-00009
        │   └── _SUCCESS
        └── labels
            ├── part-00000
            ├── part-00001
            ├── part-00002
            ├── part-00003
            ├── part-00004
            ├── part-00005
            ├── part-00006
            ├── part-00007
            ├── part-00008
            ├── part-00009
            └── _SUCCESS


    모델을 학습해보자.


    (tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark/examples/mnist/csv$ ${SPARK_HOME}/bin/spark-submit \
    > --master ${MASTER} \
    > --py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist.py \
    > --conf spark.cores.max=${TOTAL_CORES} \
    > --conf spark.task.cpus=${CORES_PER_WORKER} \
    > --conf spark.executorEnv.JAVA_HOME="$JAVA_HOME" \
    > ${TFoS_HOME}/examples/mnist/spark/mnist_spark.py \
    > --cluster_size ${SPARK_WORKER_INSTANCES} \
    > --images examples/mnist/csv/train/images \
    > --labels examples/mnist/csv/train/labels \
    > --format csv \
    > --mode train \
    > --model mnist_model


    학습이 완료되면 다음과 같이 mnist_model 폴더가 생긴다.


    (tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark/mnist_model$ tree
    .
    ├── checkpoint
    ├── events.out.tfevents.1550817346.weeserver
    ├── graph.pbtxt
    ├── model.ckpt-0.data-00000-of-00001
    ├── model.ckpt-0.index
    ├── model.ckpt-0.meta
    ├── model.ckpt-594.data-00000-of-00001
    ├── model.ckpt-594.index
    ├── model.ckpt-594.meta
    └── train
        └── done
            └── 0


    학습 된 모델을 이용해서 예측을 해보자


    >>${SPARK_HOME}/bin/spark-submit \
    --master ${MASTER} \
    --py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist.py \
    --conf spark.cores.max=${TOTAL_CORES} \
    --conf spark.task.cpus=${CORES_PER_WORKER} \
    --conf spark.executorEnv.JAVA_HOME="$JAVA_HOME" \
    ${TFoS_HOME}/examples/mnist/spark/mnist_spark.py \
    --cluster_size ${SPARK_WORKER_INSTANCES} \
    --images examples/mnist/csv/test/images \
    --labels examples/mnist/csv/test/labels \
    --mode inference \
    --format csv \
    --model mnist_model \
    --output predictions

    그럼 아래와 같이 predictions라는 폴더가 생긴다.


    (tensorEnv) weejiwon@weeserver:~/tensorEnv/tensorOnSpark/TensorFlowOnSpark/predictions$ tree
    .
    ├── part-00000
    ├── part-00001
    ├── part-00002
    ├── part-00003
    ├── part-00004
    ├── part-00005
    ├── part-00006
    ├── part-00007
    ├── part-00008
    ├── part-00009
    └── _SUCCESS


    파일 내용을 조금 보기위해 less predictions/part-00000를 입력하면 아래와 같이 나온다!


    2019-02-22T06:45:08.484559 Label: 4, Prediction: 4
    2019-02-22T06:45:08.484562 Label: 0, Prediction: 0
    2019-02-22T06:45:08.484565 Label: 6, Prediction: 6
    2019-02-22T06:45:08.484567 Label: 0, Prediction: 0
    2019-02-22T06:45:08.484570 Label: 1, Prediction: 1
    2019-02-22T06:45:08.484573 Label: 2, Prediction: 2
    2019-02-22T06:45:08.484575 Label: 3, Prediction: 3
    2019-02-22T06:45:08.484578 Label: 4, Prediction: 4
    2019-02-22T06:45:08.484580 Label: 7, Prediction: 7
    2019-02-22T06:45:08.484583 Label: 8, Prediction: 8


    다른 부분도 좀 봤다. 예측을 틀리는 부분도 있긴 하다.


    2019-02-22T06:45:08.484446 Label: 3, Prediction: 3
    2019-02-22T06:45:08.484479 Label: 9, Prediction: 4
    2019-02-22T06:45:08.484482 Label: 9, Prediction: 9
    2019-02-22T06:45:08.484485 Label: 8, Prediction: 8
    2019-02-22T06:45:08.484488 Label: 4, Prediction: 4
    2019-02-22T06:45:08.484490 Label: 1, Prediction: 1
    2019-02-22T06:45:08.484493 Label: 0, Prediction: 0
    2019-02-22T06:45:08.484496 Label: 6, Prediction: 6
    2019-02-22T06:45:08.484499 Label: 0, Prediction: 0
    2019-02-22T06:45:08.484502 Label: 9, Prediction: 9


    재밌ㄷ ㅏ끝 !