hadoop/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine
Wangda Tan de85f841da YARN-8698. [Submarine] Failed to reset Hadoop home environment when submitting a submarine job. (Zac Zhou via wangda)
Change-Id: If7fdc11fcc4e49bed2186ded76cd69164f34f502
2018-09-09 17:45:31 -07:00
..
src YARN-8698. [Submarine] Failed to reset Hadoop home environment when submitting a submarine job. (Zac Zhou via wangda) 2018-09-09 17:45:31 -07:00
README.md YARN-8561. [Submarine] Initial implementation: Training job submission and job history retrieval. Contributed by Wangda Tan. 2018-08-13 14:22:55 +05:30
pom.xml YARN-8561. [Submarine] Initial implementation: Training job submission and job history retrieval. Contributed by Wangda Tan. 2018-08-13 14:22:55 +05:30

README.md

Overview

              _                              _
             | |                            (_)
  ___  _   _ | |__   _ __ ___    __ _  _ __  _  _ __    ___
 / __|| | | || '_ \ | '_ ` _ \  / _` || '__|| || '_ \  / _ \
 \__ \| |_| || |_) || | | | | || (_| || |   | || | | ||  __/
 |___/ \__,_||_.__/ |_| |_| |_| \__,_||_|   |_||_| |_| \___|

                             ?
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~|^"~~~~~~~~~~~~~~~~~~~~~~~~~o~~~~~~~~~~~
        o                   |                  o      __o
         o                  |                 o     |X__>
       ___o                 |                __o
     (X___>--             __|__            |X__>     o
                         |     \                   __o
                         |      \                |X__>
  _______________________|_______\________________
 <                                                \____________   _
  \                                                            \ (_)
   \    O       O       O                                       >=)
    \__________________________________________________________/ (_)

Submarine is a project which allows infra engineer / data scientist to run unmodified Tensorflow programs on YARN.

Goals of Submarine:

  • It allows jobs easy access data/models in HDFS and other storages.
  • Can launch services to serve Tensorflow/MXNet models.
  • Support run distributed Tensorflow jobs with simple configs.
  • Support run user-specified Docker images.
  • Support specify GPU and other resources.
  • Support launch tensorboard for training jobs if user specified.
  • Support customized DNS name for roles (like tensorboard.$user.$domain:6006)

Please jump to QuickStart guide to quickly understand how to use this framework.

If you're a developer, please find Developer guide for more details.