SAS Thread - startup in Python - XMM-Newton
SAS Startup Thread in Python
Introduction The SAS Start-up thread provides a detailed explanation on how to get started with SAS. In particular it shows how to initialize SAS, how to point SAS to the calibration files needed for a given XMM-Newton Observation, and how to get the data ready to be processed by any SAS task. With SAS 19, we are introducing a new infrastructure for Python which allows one to run Python tasks from the command line, as any other non Python SAS task, and to access the same code from a Jupyter Notebook. Besides that, SAS 19 includes several new Python tasks, among them, two which can help us to start working with SAS: startsas and sasver. Expected Outcome The ability to process any XMM-Newton observation with any SAS task. SAS Tasks to be Used Prerequisites It is assumed that SAS has been installed properly, according to the explanations given in the current SAS installation pages. Before SAS is initialized, the HEASOFT software must be already initialized as well (see SAS Watchout). Useful Links
Last Reviewed: 25 May 2023, for SAS v21.0Last Updated: 15 March 2021 |
Procedure
Lets begin by asking four questions:
- Where in my system have I installed the SAS software?
- Where in my system have I stored the Calibration files?
- Where have I placed the XMM-Newton Observation data that I want to process?
- Which directory am I going to use to work with SAS?
Before answering these questions, it is worth emphasizing that, regardless of the approach we choose, we will always need to customize our environment so that we can access easily all the SAS software and Calibration files. Such process is known as initialisation and, normally, it involves the definition of some environment variables and the execution of some shell scripts. That is what we will do once the answers to questions 1 and 2 are known. We will then use the SAS task startsas to handle the setup and answer questions 3 and 4.
Let us assume that you have installed SAS in /some_dir/xmmsas_20201015_1931, put all Calibration files in /ccf/valid and that you are going to work in /home/user/my_work in observaton id 0104860501 located at /home/user/xmm_obs/0104860501.
It is important for Headas and SAS to be initialized in the terminal from which the Jupyter Notebook is going to be launched. So follow the next two blocks in a terminal and then start up the Noteboook again.
Initialization of Headas
Some SAS tasks use Heasoft FTOOLS to do its work. Thus, we need to initialise Heasoft before SAS is initialised.
Let the directory where you have installed the Heasoft software be,
/usr/local/heasoft-M.N.P/architecture
where M.N.P is the Heasoft version. __architecture__ is an alias used to avoid remembering the name of the installation directory, e.g. x86_64-pc-linux-gnu-libc2.27. We call this directory HEADAS.
The following sequence of csh/tcsh commands allow us to initialise Heasoft,
setenv HEADAS /usr/local/heasoft-M.N.P/architecture
source \$HEADAS/headas-init.csh
The equivalent commands for the sh/bash shell are,
export HEADAS=/usr/local/heasoft-M.N.P/architecture
. \$HEADAS/headas-init.sh
You may define a shell alias named heainit, such as,
alias heainit "source \$HEADAS/headas-init.csh" (csh/tcsh)
alias heainit=". \$HEADAS/headas-init.sh" (sh/bash)
which will allow you to initialilse Heasoft by simply invoking such alias,
heainit
Initialization of SAS
The environment variable SAS_DIR must point to the directory containing the SAS installation, the one identified in the first one of your answers, /some_dir/xmmsas_20201015_1931.
Depending on the shell you use, you can define SAS_DIR as,
setenv SAS_DIR /some_dir/xmmsas_20201015_1931 (csh/tcsh)
export SAS_DIR=/some_dir/xmmsas_20201015_1931 (sh/bash)
Depending on your shell, now you can initialise SAS by issuing the command,
source \$SAS_DIR/setsas.csh (csh/tcsh)
. \$SAS_DIR/setsas.sh (sh/bash)
Running sasver
You may try now to run your first SAS Python task: sasver. This task provides a sort of *about SAS* and also, a test of SAS *readiness*. If such task is able to run successfully, the whole SAS is ready to be used.
The purpose of sasver is to show the *identity* card of the SAS version you are running. Besides, it shows all SAS shell environment variables defined so far.
The task sasver can be run either from the command line or from a notebook. Most Python SAS tasks will behave this way. To run it from the command line, you simply have to invoke it as you would do any other SAS command,
sasver
which will produce in the terminal several output lines.
However, to run this task from a Jupyter Notebook, we need to employ a different method. Given that such method can be used to run any other SAS task, either Python or non Python, we are going to explain it by using the SAS task sasver as an example.
Invoking SAS Python tasks from notebooks
To work with any specific Python component included in SAS, we need to import the corresponding package from the Python core package for SAS. Such package is named pysas.
To execute any SAS task within a Notebook, we need to import from pysas a component known as Wrapper. The following cell shows how to do that,
from pysas.wrapper import Wrapper as w
Any SAS task accepts arguments which can be either specific options, e.g. --version, which shows the task's version, or parameters with format param=value. When the task is invoked from the command line, these arguments follow the name of the task. However, in Notebooks we have to pass them to the task in a different way. This is done using a Python list, whose name you are free to choose. Let the name of such list be inargs.
To pass the option --version to the task to be executed, we must define inargs as,
inargs = ['--version']
To execute the task, we will use the Wrapper component imported earlier from pysas, as w (which is a sort of alias), as follows,
t = w('sasver', inargs)
In Python terms, t is an *instantiation* of the object Wrapper (or its alias w).
To run sasver, we can now do as follows,
t.run()
This output is equivalent to having run sasver in the command line with argument --version.
Each SAS task, regardless of the task being a Python task or not, accepts a predefined set of options. To list which are these options, we can always invoke the task with option --help (or -h as well).
With sasver, as with some other SAS tasks, we could define inargs as an empty list, which is equivalent to run the task in the command line without options, like this,
inargs = []
t = w('sasver', inargs)
t.run()
That is indeed the desired output of the task sasver.
A similar result can be achieved by combining all the previous steps into a single expression, like this,
w('sasver', []).run()
The output of sasver provides useful information on which version of SAS is being run and which SAS environment variables are defined.
Note:It is important to always use [ ] when passing parameters to a task when using the wrapper, as parameters and options have to be passed in the form of a list. For example, w('evselect', ['-h']).run(), will execute the SAS task evselect with option -h.
Should you define any new SAS environment variable, you can immediately check its existence by running again sasver, as shown below as an example for the definition of the variable SAS_ODF,
import os
os.environ['SAS_ODF'] = 'myODFs/'
w('sasver', []).run()
Listing available options
As noted earlier, we can list all options available to any SAS task with option --help (or -h),
w('sasver', ['-h']).run()
As explained in the help text shown here, if the task would have had any available parameters, we would get a listing of them immediately after the help text.
As shown in the text above, the task sasver has no parameters.
You may try any other of the available options listed above to see how they behave when passed from the notebook.
Calibration files
Our next step is to tell SAS where in the system the calibration data files are placed or, as they are known, the Current Calibration Files (CCF).
According to answer 2 above, the CCFs are in /ccf/valid. Then, one must tell SAS where to find them. As before, we must define an evironment variable, now named, SAS_CCFPATH.
setenv SAS_CCFPATH /ccf/valid (csh/tcsh)
export SAS_CCFPATH=/ccf/valid (sh/bash)
Should you have other CCFs in other directories, you can add these directories to the previous definition. For example, imagine you have some test CCFs in /ccf/test. Then you could define SAS_CCFPATH as,
setenv SAS_CCFPATH /ccf/test:/ccf/valid (csh/tcsh)
export SAS_CCFPATH=/ccf/test:/ccf/valid (sh/bash)
Should you have two CCFs named the same placed in both directories, /ccf/test and /ccf/valid, SAS will use first those in /ccf/test.
Running startsas
To begin working with SAS, once we have initialised Heasoft and SAS and defined where to find all Calibration Files, one must complete the following tasks:
- Download the Observation Data File (ODF) you are interested in
- Generate a Calibration Index File for such ODF
- Create the Observation Summary File for such ODF
This can be done individually step-by-step, or with a new SAS task, called startsas, which is aimed at providing these three steps in an easy way, as we will see further down. Before we do this, we summarize in the next three blocks each one of these steps, and the two SAS tasks used to produce a Calibration Index File and Observation Summary File. If you want to start using startsas, you can skip the next three blocks.
The Observation Data Files
XMM-Newton observation data are provided in the form of a bundle of files known as Observation Data Files (aka ODF). The file components of such ODF include information on a single XMM-Newton observation, its different exposures, their modes and filters, etc.
Now that SAS is intalled and initialised, one must tell SAS where to find the ODF to be processed. As usual, one must define the location of the data through a shell environment variable, for example,
setenv SAS_ODF /home/user/xmm_obs/0104860501 (csh/tcsh)
export SAS_ODF=/home/user/xmm_obs/0104860501 (sh/bash)
The Calibration Index File
From this point onwards in our analysis process, we can move to the working directory, /home/user/my_work/, where all the output products of the different SAS task will be placed.
To be able to process your ODF data set, SAS needs to identify which CCF files have to be used among all the files available. This is known as generating the Calibration Index File or CIF file. SAS has a specific task to do it, named cifbuild. This task will access the ODF and look into SAS_CCFPATH for all the CCFs required by your observation. As output, cifbuild will generate a file named ccf.cif in the directory where you have run the task.
The ccf.cif file is in __FITS__ format and can be examined with any FITS viewer, as for example the Heasoft FTOOL named fv. You may inspect the file to see the list of all the CCFs required to process your ODF without any reference to SAS_CCFPATH (by default cifbuild uses the parameter withccfpath=no).
Once the CIF file is produced, the SAS_CCF variable must be defined and pointed to this file,
setenv SAS_CCF /home/user/my_work/ccf.cif (csh/tcsh)
export SAS_CCF=/home/user/my_work/ccf.cif (sh/bash)
The SAS Summary File
Within the file components of any XMM-Newton ODF set, there is a file which summarizes all the observational information involved. Such file (with extension ASC) must be updated before processing the data with SAS.
To do the job, SAS provides the task odfingest which you must execute to collect information from all the components in the ODF and produce what we know as SAS summary file, *SUM.SAS.
Once the SAS Summary File is created (in the directory where you have run the task), the SAS_ODF variable must be redefined to point to it, in the case of the example observation id being used,
setenv SAS_ODF /home/user/my_work/0466_0104860501_SCX00000SUM.SAS (csh/tcsh)
export SAS_ODF=/home/user/my_work/0466_0104860501_SCX00000SUM.SAS (sh/bash)
The SAS summary file is a text file (ASCII format), so you can edit it with any available text editor. We recommend you explore it and search for the value assigned to the PATH tag, right at the beginning of the file.
starsas parameters
As mentioned above, startsas, is aimed at providing these three steps just described in an easy way. To start with, we can get a list of the parameters available for this task by means of the options -h or -p.
inargs=['-h']
w('startsas', inargs).run()
Column name lists the parameter names, column mandatory shows whether the parameter is optional or mandatory, column type shows the type of the parameter (boolean, integer, real, etc), column default shows whether the parameter has a default value or not, and finally, column description provides a short description of the parameter.
Executing startsas
We will show three modes of execution of startsas:
- Using the parameter odfid
- Using the parameters sas_ccf and sas_odf
- Using the parameter odfid with level=PPS
With parameter odfid
The first mode is possibly the most general usage of the task, where startsas is asked to download the ODF identified by odfid, into the Working directory, identified by workdir. Once downloaded, the task unpacks the file into a subdirectory with the name of the ODF identifier and runs immediately after, the tasks cifbuild and odfingest (without parameters). This will provide the CIF file (ccf.cif) and the Summary file (*SUM.SAS), both in the Working directory. Besides this, the environment variables SAS_CCF and SAS_ODF are set to point to those files, respectively.
Assuming that we want to work on ODF 0104860501, let us define the input parameters for startsas as,
Note: work_dir should be the absolute path to the Working directory and finished with '/'.
work_dir = 'absolute_path_to_wrk_directory'
inargs = [f'odfid=0104860501',f'workdir={work_dir}']
Now, we execute startsas as described earlier. Any log information produced by the SAS tasks will appear in the terminal used to initialize SAS and start this notebook.
w('startsas', inargs).run()
The process ends with both variables, SAS_CCF and SAS_ODF defined, both respectively pointing to the CIF and Summary Files.
From that point onwards we can start running specific SAS commands to obtain the observation event files.
This process has downloaded the ODF into the Working directory as defined by the input parameter workdir and has run cifbuild and odfingest. workdir has to be defined with an absolute path in the call to startsas. In the Working directory a new directory with the identifier of the ODF will be created, and all the ODF constituents will be contained inside. The outputs of running cifbuild and odfingest will be placed directly under the Working directory.
work_dir = 'absolute_path_to_cifSUMSAS_directory'
inargs = [f'sas_ccf={work_dir}ccf.cif', f'sas_odf={work_dir}0466_0104860501_SCX00000SUM.SAS', f'workdir={work_dir}']
w('startsas', inargs).run()
As you may have guessed, we could have reached a similar effect by defining these two environment variables by means of os.environ, e.g.,
os.environ['SAS_CCF'] = work_dir+'ccf.cif'
os.environ['SAS_ODF'] = work_dir+'0466_0104860501_SCX00000SUM.SAS'
With parameter odfid and level=PPS
Selecting level=PPS for a given odfid will download all the products resuting from the processing of the ODF in the XMM-Newton Pipeline. All these products are deposited in the Working directory under a subdirectory named odfid/pps.
The program ends by providing a direct link to the Observation Summary html file (P
inargs = ['odfid=0104860501', 'level=PPS']
w('startsas', inargs).run()
How to continue from here?
This depends on the type of products you have requested.
If you requested the Pipeline products (level=PPS), your may begin exploring these products directly. Among them, you will find the Observation Event Files for the different instruments and a lot of information ready to be used.
If you simply requested the ODF (level=ODF), the first step is to run the proper SAS tasks to get the Observation Event Files for each instrument. Then, you may have alook to other Threads to get familiar with specific processing tasks for each instrument.
In the next cells we show how to run from here four typical SAS tasks, three `procs` and one `chain` to process exposures taken with the EPIC PN and MOS instruments, RGS and OM.
Given that the execution of these tasks produces a lot of output, we have not run them within the notebook.
We leave this up to you!
os.chdir(work_dir)
w('epproc', []).run()
Running emproc without parameters,
w('emproc', []).run()
Running rgsproc without parameters,
w('rgsproc', []).run()
Running omichain without parameters,
w('omichain', []).run()