Docker for SAS

 

Getting started: What is a Docker?

For the first time we are delivering a Docker image for the XMM-Newton SAS, but what is a Docker?

Briefly stated, a Docker is a way to run an application on a given host OS where otherwise it would not run. Docker encapsulates the application in such a way that it can be run effectively by the host OS. The process is possible thanks to the Docker engine, a sort of application player, which is required always to run the application once it has been dockerized. The process applied to the application to get it dockerized is known as image creation.

Docker image creation is nowadays relatively simple thanks to the availability of a large library of ready-to-use Docker images, each providing a sort of base image, on top of which we can build our own customized image. For example, we can take as base image a bare bone Linux Ubuntu OS, with the minimum elements necessary to run as such, on top of which we can install our Linux application to create the customized image.

The dockerization process of any application is also carried out by using the same Docker engine used to run the resulting customized image. Once the image is built, it can be copied to any OS where is needed and run there as long as there is a Docker engine available for such OS, regardless of the OS where it was generated.

When a Docker image is run by the Docker engine, it becomes what is known as a container, which is an independent process executing our application in the context of a guest OS, isolated from the host OS. As soon as the container stops running, so does the guest OS. Containers are ephemeral so that if we want to preserve state or data between sucessive runs of a container, we must allow it to read/write their files to the host OS. This is achieved by means of docker volumes, a special type of disks controlled by the Docker engine, which can share data among containers and the host OS.

Given that a Docker image can be run in a reduced version of the guest OS, it is much more efficient and easy to use than a Virtual Machine. It offers also the granularity to run individual applications in simple containers which can not be found in the host OS.

Creating a customized image for SAS 21.0

A customized image for SAS can be created on Linux, macOS and Windows, as long as a Docker engine is available on them. For SAS we have chosen to create it on macOS by means of the Docker Desktop application, which includes a Docker engine and a very powerful GUI to control and configure it. For example, using the GUI we can configure the amount of RAM memory and the number of CPU cores allocated to run a container, the total amount of disk space available for all running containers, the directories on the host OS which can be shared with these containers, etc.

As many SAS tasks require Heasoft to work, we needed to find a way to acess it from a container running SAS. Provided that SAS requires only the Heasoft FTOOLS, we have created and ad-hoc Docker image of Heasoft, including exclusively these components, which could be used later on to create the customized image for SAS. The image includes as well Xspec to give us the capability of creating new Xspec models.

The Heasoft v6.30 Docker image for SAS was named heasoft_6_29 and it has been built on top of the official image of Ubuntu 22.04LTS (focal), downloaded from Docker Hub.

During image building, the source code of Heasoft v6.29 including only FTOOLS and Xspec, is copied, built and installed in the directory /usr/local/heasoft-6.29 of the image. Once done, heasoft_6_29 could be executed as a container to run any FTOOLS tasks and the Xspec application.

Then we have created the sas_21_0 Docker image on top of the heasoft_6_30 image created for Heasoft v6.30. At difference with Heasoft v6.30, we have installed the sas_21.0.0-Ubuntu22.04.tgz binary distribution built on Ubuntu 22.04LTS to create the sas_21_0 image.

The resulting image includes as well:

  • ds9 v8.4.1 and xpa v2.1.20.
  • xmgrace v5.1.25.
  • wcstools v3.9.7.
  • All Python packages required to run SAS Python based tasks, as listed in file sas_21.0.0_python_packages.txt.

The sas_21_0 image is available for download in a compressed file named sas_21_0_docker.tar.gz.

Additional features of the sas_21_0 image: Sharing data, X11 and Jupyter Notebooks

To avoid using root while running the sas_21_0 image, user xmmsas (password: xmmsas) has been added as default user, and set /home/xmmsas/mywork as the image working directory. xmmsas is configured to become administrator by means of the sudo command.

While sas_21_0 is running in a container, its /home/xmmsas/mywork becomes shared with a specific directory in the host OS named ${PWD}/mywork, where ${PWD} represents the current directory from where the image was launched. For example, assuming we launched the sas_21_0 image from /mydir, the subdirectory /mydir/mywork in our machine and the /home/xmmsas/mywork in the running container, are the same. This way, any data produced in the running container is shared with our host OS.

Similarly, it is assumed that the user has set a subdirectory named ${PWD}/ccf (in the example described earlier this would be /mydir/ccf), to be the same as /ccf in the running container. Inside /mydir/ccf/valid we should have stored all CCFs required to work with SAS. With such directory arrangements, within the running container for the sas_21_0 image, we could define SAS_CCFPATH as

SAS_CCFPATH=/ccf/valid ( == ${PWD}/ccf/valid in the host OS)

In a similar way, the X11 display system has been setup to allow displaying on the host X server, any display request made from any X clients running on the container.

Finally, the Jupyter Notebook http server listening on port 8888 in the running container, can be accessed from the same port via any web browser running on the host, at http://localhost:8888. To get into the notebook, we require a password which has been preset to xmmsas.

Installing the SAS Docker image

Once we have downloaded the sas_21_0_docker.tar.gz file, before using it we must load it into your favorite OS Docker engine.

On macOS and Windows, we recommend to use the Docker engine included in the Docker Desktop application. On Linux, Docker Desktop is not available yet although it seems it will be soon (look here for more information). However, all major Linux distributions provide the Docker engine from their respective package repositories, which can be easily installed using the specific distribution's package manager. Look here for links to specific instructions on how to install the Docker engine on your favorite Linux distribution.

To load the SAS Docker image, you must apply the following commands:

 

# gunzip sas_21_0_docker.tar.gz # docker load -i sas_21_0_docker.tar

 

As soon as the load is complete, we can check whether the sas_21_0 image is available to run, by listing all the Docker images which are known to your Docker engine. This can be done using the following command:

 

# docker images | grep sas_21_0

 

which will output something similar to the following:

 

REPOSITORY	TAG 		IMAGE ID 	CREATED 	SIZE docker4sas	sas_21_0	cb701a4778c3	2 hours ago	14.9GB

 

The combination docker4sas:sas_21_0 is considered the fully qualified named of the sas_21_0 image.

Running the SAS Docker image

In general Docker images can be run either from the command line, using the docker command, or directly from the Docker Desktop GUI. For SAS we will use mainly the docker command. To facilitate using all the options and parameters available to that command, we are including a shell script named run_sasdocker which can be used to run any SAS task with the required options.

The run_sasdocker script can be downloaded from here.

Armed with this script, we can run the SAS in two modes:

  • Interactive: We get access to the shell prompt within the sas_21_0 running container, where we can execute any SAS task, including the SAS GUI, or even any tool that works together with SAS like Heasoft fv, the ds9 image display tool, the xmgrace plotting tool, etc. In a later section we will show several examples of using this mode.
  • Non-interactive (also known as background or detached): We do not get the container's shell prompt but simply ask the running container to execute a given SAS task and get the results. Using this mode, we have access to almost all SAS tasks, as we will show in later examples.

Before running the run_sasdocker script, it is mandatory to define the SAS_DOCKER_IMAGE environment variable, as follows:

# export SAS_DOCKER_IMAGE="docker4sas:sas_21_0"

Examples of interactive usage

The numbered bullets below show examples of using the sas_21_0 container in interactive mode after invoking the proper docker command by running the run_sasdocker script.

Immediately after the invocation of the run_sasdocker script without additional parameters

./run_sasdocker

we get

xmmsas@dcb3fc7251c9:~mywork$

The xmmsas@dcb3fc7251c9:~mywork$ is the interactive shell prompt of the sas_21_0 container. The value dcb3fc7251c9 is a hash number assigned to this particular running container. The prompt provides access to the standard Ubuntu interactive Bash shell. Once there, you do not need to initialise neither Heasoft nor SAS; they are both already initialised.

The DISPLAY variable is set in such a way that we can run any X11 application like xterm, xclock, etc, and they will show up on the host's desktop. As a consequence, any command requiring X11 display, like the SAS GUI, ds9, Heasoft fv, etc, will show up as well on the host's desktop.

The interactive prompt shows also the working directory as ~/mywork, which is equivalent to /home/xmmsas/mywork. Remember that such directory is also equivalent to the subdirectory mywork of the directory from where you invoked run_sasdocker.

Here we have 5 examples of running in interactive mode:

  1. Running SAS GUIs: xmmsas@dcb3fc7251c9:~mywork$ sas will show up the SAS GUI on your desktop. Similarly, any SAS task with option -d should display its own GUI.
  2. Running ds9 and xmgrace: Execute ds9 or xmgrace at the prompt. The respective GUI should show up on your desktop.
  3. Running Heasoft fv: Execute fv and the respective GUI will show up on your desktop.
  4. Running Heasoft Xspec: Execute xspec at the prompt to get the Xspec interface where you may try getting the plotting device with the Xspec command cpd /xwin:
    xmmsas@dcb3fc7251c9:~mywork$ xspec Creating a $HOME/.xspec directory for you XSPEC version: 12.12.0 Build Date/Time: Mon Dec 13 09:56:43 2021 XSPEC12> cpd /xwin
  5. Running SAS from a Jupyter Notebook: Since the Jupyter Notebook provides an auxiliary web server running by default on port 8888, we can launch our favorite web browser on the host OS to connect to that port on the running sas_21_0 container. The sas_21_0 image is configured to allow connecting to the Jupyter Notebook web server running on the container, from any web browser running on our host. The command to launch the Jupyter Notebook from the sas_21_0 container is:

    xmmsas@dcb3fc7251c9:~mywork$ jupyter notebook --no-browser --ip=*

    Then, launch any web browser on your host and connect to URL http://localhost:8888. Use password xmmsas to connect. Once in, you may use the Python interface to SAS by means of importing the pysas.wrapper module.

    To download ODFs from the XMM-Newton Science Archive, you may use the Python task startsas which uses the Python module astroquery.XMM-Newton to carry on the downloads.

Examples of non-interactive use

At any time we can run the run_sasdocker script with a single parameter as the command to be executed in the running container, in detached or non-interactive mode. For example, the following command will run the Heasoft fv tool in detached mode:

 

# ./run_sasdocker fv

 

As a consequence, the fv GUI will be shown on our desktop and we can work with it normally, as it would be a native application of your host OS. However, it is being executed in the background, within the running sas_21_0 container. This is possible because fv is an X11 application and can continue running as long as its GUI is kept open. We could do the same with any other X11 based applications like xterm, ds9, xmgrace, the SAS GUI (sas), etc.

But, can we do this with other SAS commands?

In the following numbered bullets, we give some examples to illustrate the capabilities of this mode. The list is not intended to be a comprehensive list of examples.

  1. Running epproc --version: This example illustrates how to get the version number of any task, in that case for epproc.
    # ./run_sasdocker epproc --version 10.63.108.208 being added to access control list epproc (epicproc-2.25.1) [xmmsas_20230412_1735-21.0.0]
  2. Running sasversion and sasver without parameters: This example illustrates how to get the identity card of a SAS specific version.
    # ./run_sasdocker sasversion sasversion:- Executing (routine): sasversion  -w 1 -V 4 sasversion:- sasversion (sasversion-1.3)  [xmmsas_20230412_1735-21.0.0] started:  2023-04-14T14:32:59.000 sasversion:- XMM-Newton SAS release and build information: SAS release: xmmsas_20230412_1735-21.0.0 Compiled on: Fri Apr 14 09:45:46 CET 2023 Compiled by: xmmsas@sasbld06 Platform   : Ubuntu22.04 SAS-related environment variables that are set: SAS_DIR = /usr/local/SAS/xmmsas_20230412_1735 SAS_PATH = /usr/local/SAS/xmmsas_20230412_1735 sasversion:- sasversion (sasversion-1.3)  [xmmsas_20230412_1735-21.0.0] ended:    2023-04-14T14:32:59.000
    #./run_sasdocker sasver XMM-Newton SAS - release and build information SAS release: xmmsas_20230412_1735-21.0.0 Compiled on: Fri Apr 14 09:45:46 CET 2023 Compiled by: xmmsas@sasbld06 Platform   : Ubuntu22.04 SAS-related environment variables set: SAS_DIR        = /usr/local/SAS/xmmsas_20230412_1735 SAS_PATH       = /usr/local/SAS/xmmsas_20230412_1735
  3. Executing startsas odfid=0727780501 workdir=test

    The execution of the Python task startsas with these two parameters will download ODF ID 0727780501 (we might use any other ODF ID) from the XMM_newton XSA using astroquery and put all results into a subdirectory named test in the working directory of the running container.

  4. Running epproc without parameters:

    To execute epproc in detached mode, we need to understand that the command would need to have defined all what it needs to be executed properly, in the context of the running container. For example, it will need to have properly defined SAS_CCF and SAS_ODF. But, how to pass the proper definitions to the container?

    The run_sasdocker script accepts an additional parameter named add_eopt, acronym of additional environment options, which is a string composed of -e VARIABLE=VALUE options separated by blanks, to be passed together, with those already defined in the script, to the docker command.

    Assuming we produced already the respective CIF and SAS summary files in the container in the directory /home/xmmsas/mywork/test, we can define the additional environment options string as follows:

    add_eopt="-e SAS_CCF=/home/xmmsas/mywork/test/ccf.cif -e SAS_ODF=/home/xmmsas/mywork/test/3242_0727780501_SCX0000SUM.SAS"

    Then we could run the epproc command without parameters as follows

    # ./run_sasdocker epproc
  5. Running jupyter notebook --no-browser --ip=*

    The command works perfectly well and we can use our host browswer to connect to the URL http://localhost:8888 to work with SAS from Python. The notebook will be able to access any files available in the context of the container, e.g. we could start opening any .ipynb files we might have copied there through the directory shared with the host.

Additional directions and caveats

Size of the SAS Docker image

The large size of the docker4sas:sas_21_0 image (sas_21_0_docker.tar.gz is ~9GB) is mainly due to the need to base it on a Heasoft Docker image that includes most of its source code and the development tools (gcc, g++, gfortran, perl, python, etc) required to build it. That image also provides the capability to build from source code new Xspec models.

Some guidelines to handle Docker containers

The following are some helpful guidelines to handle Docker images and containers:

  • Exiting a running interactive container:

    If you are running an interactive container, you may close and exit it with the command exit. The effect of that command will be the total removal of the container from the control of the Docker engine, due to the presence of the option --rm in the docker invocation command in run_sasdocker, which implies the removal of the container as soon as it is terminated.

  • List of containers known to the Docker engine:

    At any time you may list all containers known to the Docker engine by issuing the comand:

    	# docker ps -a
  • Remove container with identification number id:

    At any time you may remove any container under control of the Docker engine with the command:

    	# docker rm <id>

    where <id> is the 12 digit alphanumeric container id number which appears in the first column of the list produced by the command docker ps -a.

  • List of all Docker images known to the Docker engine:

     

    	# docker images

Setting required to work with XQuartz in macOS

On macOS with XQuartz, to allow X11 display comming from a container on your host desktop, you must set to on the XQuartz setting allow connections from network clients, which is available in Preferences, under Security preferences.

Using the SAS Docker image in Windows

Requirements:

  • Install Docker Desktop for Windows which includes the Docker engine and the docker commands.
  • Install the Windows Subsystem for Linux 2 (WSL 2). The Docker commands are bound to WSL 2. For more details on installing and using WSL 2 on Windows, please look here.
  • To run X11 you will need to install an X server for Windows. We recommend to install and use MobaXterm as explained in here. This X server provides a running X11 Server on Windows similar to XQuartz on macOS. The MobaXterm has a configuration setup GUI where you need to set the X11 remote access to Full, to allow X11 clients to make use of the X server.

Caveats:

  • WSL 2 runs within a VM on Windows, therefore it has its own network setup, different to the Windows host network setup. As soon as we run the SAS Docker image launch script run_sasdocker, the real IP number of the Windows host is not used to set the variable MY_HOST_IP but, instead, this variable receives the IP number of the WSL 2 VM. Thus, we need to set manually the value of MY_HOST_IP environment variable to the real IP number of the Windows host, before running run_sasdocker.
  • Given that the SAS Docker container is run under the user xmmsas, which is not priviledged, to be able to write files from the container into the local hosts disks, we need to run the container as root. This is a condition required by Windows to handle properly any files being created and written from the container into any NTFS host file system (this is the standard Windows file system).