Differences

This shows you the differences between two versions of the page.

--- docker [2019/11/28 17:23] – created mimbert
+++ docker [2019/12/06 13:34] – [Proposed solution] mimbert
@@ Line 1: / Line 1: @@
+===== Docker =====
 Since November 2019, a new way to conduct experiments is implemented in CorteXlab.
-====== Why ======
+==== Why ====
+The legacy way of running experiments is to run one (or more) commands on each node of the experiment. These commands are run from the minus task. This has some drawbacks:
+  * The development workflow, to develop and debug an experiment code, is:
+    * create the task
+    * submit the task
+    * wait the task end
+    * unzip the task's results
+    * look in the task's results (stdout, stderr) and in the minus log to understand issues, bugs, errors, exceptions
+    * fix issues
+    * repeat the process
+    * ... This workflow is painful, not interactive, not user-friendly
+  * The experimenter needs to pack in the task everything needed. This includes potentially big datasets, and if these datasets are different for each node, then the task will contain the union of all the datasets, which can be huge.
+  * The executable code has to be in the task. For simple scripts, it's ok, but for binaries, or as soon as there are some dependencies (libraries, which may also have dependencies of their own), building the task may become pretty difficult or impossible. See [[embedding_oot_modules_or_custom_libraries_binaries_in_minus_scenario]]). In particular, [[https://cmake.org/|cmake]] or [[https://www.gnu.org/savannah-checkouts/gnu/automake/manual/automake.html#GNU-Build-System|gnu autotools]] build system allow that kind of building, but it's not the case for every source code. Even worse is the case of complete frameworks or software stacks, such as [[https://www.tensorflow.org/|TensorFlow]] or [[http://openbts.org/|OpenBTS]], where it is nearly impossible to embed them in a task (the amount of work to do that would be gigantic)
+  * All the results are gathered with the task directory, compressed, and sent back to airlock. This means that the results may include huge unneeded things, such as experiment code, input datasets, etc.
+==== Proposed solution ====
-The legacy way of running experiments is to run one (or more) commands on each node of the experiment.
+The proposed solution is to use [[https://www.docker.com/|Docker]] containers to run experiments. Here's the way it is supposed to work and solve the issues:
+  * [[https://docs.docker.com/get-started/#images-and-containers|Docker images]] are used to package a full toolchain, such as gnuradio 3.8, gnuradio 3.7, or custom tools such as OpenBTS, [[https://www.openairinterface.org/|OpenAirInterface]] etc. [[docker images|Some images will be provided and maintained by the CorteXlab team]], while others may be created by experimenters, possibly based on ours or not. This strongly relaxes the constraints on what experimenters can do on the platform. It is even possible to use a different linux distribution if needed (we did this for OpenBTS, which was much easier to build on Ubuntu Xenial than on Debian).
+  * Preparing an image is a much more convenient process than preparing a task, when it comes to complex software bundles such as TensorFlow, OpenBTS, etc. One just needs to install dependencies and build as if it was a real machine, there is no need to tweak or hack build steps, it just works directly.
+  * Image preparation can be an interactive process, or it can be automated with a [[https://docs.docker.com/engine/reference/builder/|Dockerfile]]. The interactive way is nice for development and debug, and the automated way is an interresting step towards experimental reproducibility.
+  * The exact same image can be used to test things on an experimenter's laptop connected to a single USRP, or on CorteXlab. Docker images are actually a great way to share reproducible experimental environments between us.
+  * When running a task, images are instanciated to [[https://docs.docker.com/get-started/#images-and-containers|containers]]. Inside the container, a //command// executable and arguments are run. This command can be specified in the image, but can also be overriden in the task's scenario, and additional executables can be //exec//uted in parallel in the container. This gives the experimenter a great deal of flexibility: During development/debugging phases, one can run for example an ssh server in the container, allowing to connect interactively to the running container on the node. When the code and experiment are stable, the experimenter can switch to directly run the experiment code in the container, with no human interaction. All this can be changed in the image or in the task's scenario.
+  * The experiment results are structured differently. For each node, there is one directory per container, containing the stdout/stderr of the container's command and a directory hierarchy with all the files that have changed (modified or added) during the lifetime of the container. This should reduce drastically the size of the results, only keeping what is relevant. The fact that the results are not compressed should be less annoying as well.
+  * The homes of the users are NFS mounted on the nodes, and the home of the user is mounted in the containers. This gives a lot of flexibility on the way to organize the experimental workflow, regarding input datasets, output datasets and experimental code. Input dataset and code can be put in the images, in the user's home or in the task's scenario. Output dataset can be put in the user's home or can be retrieved from the results. Some precautions have to be taken regarding performance, since reading or writing on the user's home through NFS is not as efficient as reading or writing to the node's SSD storage, so users have to be aware of that when dealing with huge datasets / high throughputs of data, a common situation with SDR.