214c2104ed | ||
---|---|---|
.. | ||
src | ||
README.md | ||
pom.xml |
README.md
Overview
_ _
| | (_)
___ _ _ | |__ _ __ ___ __ _ _ __ _ _ __ ___
/ __|| | | || '_ \ | '_ ` _ \ / _` || '__|| || '_ \ / _ \
\__ \| |_| || |_) || | | | | || (_| || | | || | | || __/
|___/ \__,_||_.__/ |_| |_| |_| \__,_||_| |_||_| |_| \___|
?
~~~~~~~~~~~~~~~~~~~~~~~~~~~|^"~~~~~~~~~~~~~~~~~~~~~~~~~o~~~~~~~~~~~
o | o __o
o | o |X__>
___o | __o
(X___>-- __|__ |X__> o
| \ __o
| \ |X__>
_______________________|_______\________________
< \____________ _
\ \ (_)
\ O O O >=)
\__________________________________________________________/ (_)
Submarine is a project which allows infra engineer / data scientist to run unmodified Tensorflow or PyTorch programs on YARN or Kubernetes.
Goals of Submarine:
- It allows jobs easy access data/models in HDFS and other storages.
- Can launch services to serve Tensorflow/PyTorch models.
- Support run distributed Tensorflow jobs with simple configs.
- Support run user-specified Docker images.
- Support specify GPU and other resources.
- Support launch tensorboard for training jobs if user specified.
- Support customized DNS name for roles (like tensorboard.$user.$domain:6006)
Please jump to QuickStart guide to quickly understand how to use this framework.
Please jump to Examples to try other examples like running Distributed Tensorflow Training for CIFAR 10.