User Tools

Site Tools


reserve

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
reserve [2015/09/16 10:47] trissetreserve [2022/11/18 16:43] (current) pgirard
Line 1: Line 1:
 +===== Book the testbed with the Cortexlab web application =====
 +
 +**Booking the Cortexlab platform with the Cortexlab web application saves you from using the OAR commands like described in the "Book the testbed with OAR" section below.**
 +
 +When logged in (https://xp.cortexlab.fr/app), you can see : 
 +  * the planning (Drawgantt),
 +  * your reservation list (of course you can delete a reservation), 
 +  * the button to book the testbed.
 +You can also make your reservation(s) by clicking on the "Book the testbed" sub-menu in the navigation bar "Reservation" menu.
 +
 +To book the testbed, you must at least :
 +  * select a start date and hour,
 +  * select a duration OR an end date and hour,
 +  * select "Reservation room" checkbox OR one or more nodes.
 +Then the "Book the testbed" button will be activated, to request your reservation.
 +
 +To go to your reservation list, just click on "My current reservations" sub-menu in the navigation bar "Reservation" menu.
 +
 ===== Book the testbed with OAR ===== ===== Book the testbed with OAR =====
  
 Once you have accessed the testbed (i.e. connected on CorteXlab server), you can reserve the nodes for an experimentation.  Once you have accessed the testbed (i.e. connected on CorteXlab server), you can reserve the nodes for an experimentation. 
  
-To avoid cross interference between multiple experiments, only one person can use the whole CorteXlab testbed at the same time. The //OAR// scheduler is used to book nodes on the platform. As soon as you book one or more nodes, the CorteXlab room is reserved for your usage during the requested time.+To avoid cross interference between multiple experiments, only one person can use the whole CorteXlab testbed at time. The //OAR// scheduler is used to book nodes on the platform. As soon as you book one or more nodes, the CorteXlab room is reserved for your usage during the requested time.
  
 The state of reservation of the CorteXlab testbed can be visualized here: http://xp.cortexlab.fr/drawgantt/ The state of reservation of the CorteXlab testbed can be visualized here: http://xp.cortexlab.fr/drawgantt/
  
-The role of //OAR// is to schedule node reservations. It manages **jobs** associated with users, which has a start time, a duration (walltime), and uses some resources (CorteXlab nodes).+The role of //OAR// is to schedule node reservations. It manages **jobs** associated with users. A job has a start time, a duration (walltime), and uses some resources (CorteXlab nodes). 
 + 
 +==== Submissions ====
  
 The principle of operation of CorteXlab is that users submit jobs to OAR. When the job starts, the user gets exclusive access to the platform, and inside an OAR job, the user can perform (interactively, or in batch) one or several experiments. The principle of operation of CorteXlab is that users submit jobs to OAR. When the job starts, the user gets exclusive access to the platform, and inside an OAR job, the user can perform (interactively, or in batch) one or several experiments.
Line 15: Line 35:
 <code>$ oarsub -I -l nodes=BEST</code> <code>$ oarsub -I -l nodes=BEST</code>
  
-This command will wait for the resources to be available, and as soon as they are (i.e. -I sands for interactive), job is allocated, is started, and a subshell is instanciated where you can work on experiments. As soon as the subshell is closed, the job ends. (It can be usefull to work in a [screen](https://www.gnu.org/software/screen/) session to avoid loosing jobs in case of network disconnection).+The same example with max duration of 4 hours :
  
-By default OAR submissions are scheduled as soon as possible. It is also possible to ask for an OAR //reservation// where you choose the date at which the job will be scheduled.+<code>$ oarsub -I -l nodes=BEST,walltime=4:00:00</code>
  
 +This command will wait for the resources to be available, and as soon as they are (i.e. -I stands for interactive), a job is allocated, is started, and a subshell is instantiated where you can work on experiments. As soon as the subshell is closed, the job ends. (It can be useful to work in a [screen](https://www.gnu.org/software/screen/) session to avoid losing jobs in case of network disconnection).
  
-This other simple example is reserving all the nodes on the 18 of september 2015 from 10h to 11h+Submissions may not be interactive. You can provide a script name to execute when the job starts. It has the strong advantage that it allows you to avoid waiting for the job start, which can be long if the platform is heavily used. But for this to work you have to automate everything:
  
-<code>$ oarsub -l nodes=BEST,walltime=1:00:00 -r "2015-09-18 10:00:00" </code>+<code>$ oarsub -l nodes=BEST '/path/to/script/to/execute/when/job/starts script args'</code>
  
-This command will wait for the resources to be available, and as soon as they are, a job is allocatedis started, and a subshell is instanciated where you can work on experiments. As soon as the subshell is closed, the job ends. (It can be usefull to work in a [screen](https://www.gnu.org/software/screen/) session to avoid loosing jobs in case of network disconnection).+A particular case of this syntax is: 
 + 
 +<code>$ oarsub -l nodes=BEST 'sleep 1000000'</code> 
 + 
 +It allows you to have a job which is not tied to a terminalbut you still need to manually submit minus tasks when the job starts 
 + 
 +==== Reservations ====
  
 By default OAR submissions are scheduled as soon as possible. It is also possible to ask for an OAR //reservation// where you choose the date at which the job will be scheduled. By default OAR submissions are scheduled as soon as possible. It is also possible to ask for an OAR //reservation// where you choose the date at which the job will be scheduled.
  
 +This other simple example is reserving all the nodes on the 18 of September 2015 from 10AM to 11AM: 
 +
 +<code>$ oarsub -l nodes=BEST,walltime=1:00:00 -r "2015-09-18 10:00:00" </code>
 +
 +==== Booking specific nodes ====
 +
 +If you want to reserve specific nodes, there are several possible syntaxes.
 +
 +To make a submission using two nodes:
 +
 +<code>$ oarsub -I -l nodes=2</code>
 +
 +But the nodes will be randomly chosen by OAR, so you'll have to adapt your task's scenario to the allocated nodes.
 +
 +It is possible to ask for explicit nodes with a less user-friendly syntax (especially in situations where you need lots of nodes). For example, to make a submission using specifically nodes 4 and 6, for a 30 minutes job:
 +
 +<code>$ oarsub -l {"network_address in ('mnode4.cortexlab.fr', 'mnode6.cortexlab.fr')"}/nodes=2,walltime=0:30:00</code>
 +
 +Despite this syntax being not user-friendly, we strongly encourage you to use it, since it has many advantages:
 +  * Nodes are often shutdown, in energy saving mode. Booking only the needed nodes ensures that only these ones will be wakeup. This contributes to increase node lifetime and saving energy
 +  * As only needed nodes are asked, it avoids your oar job being canceled or postponed in case one node that you don't use is unavailable
 +  * It allows tracing more accurately resources usage
 +
 +==== Booking the room without any node ====
 +
 +You can book the room only, without any node with the following syntax:
 +
 +<code>$ oarsub -t noop -l {"type='cortexlab-room'"}/nodes=1 "sleep infinity"</code>
 +
 +This is useful in particular if you want to go physically in the room for experimenting with specific hardware, because it avoids waking any node if they are in energy saving mode.
 +
 +==== A note on OAR job scheduling ====
 +
 +Be aware that OAR behaviour may sometimes be counter-intuitive: You may think that just because you ran ''oarsub'' successfuly and it returns a job id, your job is running, but this assumption is wrong: OAR tells you that it accepts your job submission but it may schedule it later for various reasons:
 +  * because the nodes are currently shutdown, so it needs to wake them up, which may take some time
 +  * because another job is running
 +  * because the resources you ask are currently not available but OAR expects them to be available in the future
 +  * etc.
 +So, the only reliable way to be sure that your job is actually running is to check that the job's state is "//Running//" with command:
 +
 +<code>oarstat -fj <JOBID></code>
 +
 +Tasks submitted to minus will never start unless the job is "//Running//" anyway.
 +
 +To sum-up things:
 +  * //Submissions// versus //Reservations//:
 +    * //Submission//: To get the resources as soon as possible. You do not control when the job will be scheduled, and as long as the job hasn't started, the schedule may change. You may get the resources right now if OAR can (and decides to) schedule the job immediately, but there's no guarantee. __With a //Submission// you are sure to get the resources you asked, but you don't control when__ (the extreme case is that if a resource becomes permanently unavailable, like for example when a node is broken, then your job will never be scheduled, ie. it will stay in "//Waiting//" state indefinitely)
 +    * //Reservation//; you ask for resources at a specific date. __With a //Reservation// you are sure to get the resources at the date you requested (with sometimes a few minutes margin), but you are not sure to get exactly the resources you requested__ (some resources may have become unavailable at that date)
 +  * //Interactive// versus //Non Interactive//:
 +    * in the //non interactive// case, you provide an executable which will be executed by OAR on airlock during your job. This executable can be anything, a ''sleep infinity'' command, a script, a binary executable. It will be run at job start and killed at job end. If it ends before the walltime of the job, the job terminates. You can for example provide a script that runs one or several ''minus task submit'' commands to fully automate an experimental campaign (but beware that your script needs to wait for the end of the last minus task execution, otherwise, the job will be killed (together with the last minus task) when the script terminates.
 +    * in the //interactive// case, the executable is actually an interactive subshell started by OAR on airlock. Beware, if this subshell terminates, the job will be killed as well (you can prevent that by running interactive submissions inside a [[https://www.gnu.org/software/screen/|gnu screen]] or [[https://github.com/tmux/tmux/wiki|tmux]] to protect from ssh disconnections. Both are installed on airlock. For this, you need to open the ssh connection to airlock, then start a gnu screen / tmux session, then run your interactive jobs in that session. The gnu screen / tmux session will survice ssh disconnections or shutdowns of your workstation, and can be rejoined later)
 +
 +Note that almost all OAR jobs terminate with status "//Error//". It's actually a side effect of the fact that the status of a terminated OAR job depends on the return code of the job executable. If it returns 0 (the unix convention for success, or True) the job is in "//Terminated//" state, if it returns anything but 0 (the unix convention for failure, or False), the job is in "//Error//" state. Most jobs use ''sleep infinity'' as executable, and ''sleep'' returns something different than 0 when it is killed at job end.
 +==== A note on energy saving ====
 +
 +When nodes are unused, and after a timeout, they will be automatically shutdown (and will appear as "//Standby//" in the drawgantt)
 +
 +When a job is submitted, shutdown nodes are waken up. Thus the job will not start immediately, it will wait for the nodes to have started. If some nodes are not started after the timeout, they will be set in state "//Absent//". If the job is a reservation, it will start without these nodes. If it is a submission, it will scheduled later (possibly never, OAR will wait for the resource to be back, which may never happen)
 +
 +When a job is a reservation, OAR knows the scheduled start and starts to wakeup nodes before the scheduled start of the job, so usually the job should start approximately as planned.
 +
 +Since energy saving is active, it is strongly encouraged to submit/reserve only the nodes you need, and if you don't need any node, to only submit/reserve the //room//. It will save energy, avoid heating the room needlessly, increase longevity of our hardware. Also nodes are awaken by groups of 5, so the more nodes you reserve, the longer the startup time of your job, maybe up to 15 minutes if you reserve all nodes.
 +==== Advanced usage: sharing the platform ====
 +
 +It is possible to share the platform, for specific situations such as tutorials, courses, challenges. In these situations, you want several users to be able to use the platform at the same time. For this, you need to follow these steps:
 +
 +The organizer of the tutorial/course/challenge submits or reserve the whole (or part of) the platform for the duration of the event, with the ''-t container'' option:
 +
 +<code>$ oarsub -l nodes=BEST,walltime=4:00:00 -t container -r '2018-07-21 14:00:00'</code>
 +
 +This will reserve all available nodes for a 4 hours event, between 14 and 18 on July 21, 2018.
 +
 +Then, participants can submit jobs inside the container job with this (example) syntax:
 +
 +<code>$ oarsub -t inner=<job_id of the container job> -l {"network_address in ('mnode4.cortexlab.fr', 'mnode6.cortexlab.fr')"}/nodes=2,walltime=0:30:00 -I</code>
 +
 +or (another example):
 +
 +<code>$ oarsub -t inner=<job_id of the container job> -l nodes=2,walltime=0:30:00 'sleep 10000000'</code>
 +
 +==== OAR Documentation ====
 +
 +The complete OAR documentation, with much more details and examples, is available here: https://oar.imag.fr
reserve.1442393243.txt.gz · Last modified: 2015/09/16 10:47 by trisset

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki