SAS Startup Thread in Python

 

 

Introduction

The SAS Start-up thread provides a detailed explanation on how to get started with SAS. In particular it shows how to initialize SAS, how to point SAS to the calibration files needed for a given XMM-Newton Observation, and how to get the data ready to be processed by any SAS task. With SAS 19, we are introducing a new infrastructure for Python which allows one to run Python tasks from the command line, as any other non Python SAS task, and to access the same code from a Jupyter Notebook. Besides that, SAS 19 includes several new Python tasks, among them, two which can help us to start working with SAS: startsas and sasver.

Expected Outcome

The ability to process any XMM-Newton observation with any SAS task.

SAS Tasks to be Used

Prerequisites

It is assumed that SAS has been installed properly, according to the explanations given in the current SAS installation pages. Before SAS is initialized, the HEASOFT software must be already initialized as well (see SAS Watchout).

Useful Links

Caveats

Last Reviewed: 25 May 2023, for SAS v21.0

Last Updated: 15 March 2021

 

 


 

Procedure

 

Lets begin by asking four questions:

  1. Where in my system have I installed the SAS software?
  2. Where in my system have I stored the Calibration files?
  3. Where have I placed the XMM-Newton Observation data that I want to process?
  4. Which directory am I going to use to work with SAS?

Before answering these questions, it is worth emphasizing that, regardless of the approach we choose, we will always need to customize our environment so that we can access easily all the SAS software and Calibration files. Such process is known as initialisation and, normally, it involves the definition of some environment variables and the execution of some shell scripts. That is what we will do once the answers to questions 1 and 2 are known. We will then use the SAS task startsas to handle the setup and answer questions 3 and 4.

Let us assume that you have installed SAS in /some_dir/xmmsas_20201015_1931, put all Calibration files in /ccf/valid and that you are going to work in /home/user/my_work in observaton id 0104860501 located at /home/user/xmm_obs/0104860501.

It is important for Headas and SAS to be initialized in the terminal from which the Jupyter Notebook is going to be launched. So follow the next two blocks in a terminal and then start up the Noteboook again.

 

 

Initialization of Headas

 

 

Some SAS tasks use Heasoft FTOOLS to do its work. Thus, we need to initialise Heasoft before SAS is initialised.

Let the directory where you have installed the Heasoft software be,

  /usr/local/heasoft-M.N.P/architecture

where M.N.P is the Heasoft version. __architecture__ is an alias used to avoid remembering the name of the installation directory, e.g. x86_64-pc-linux-gnu-libc2.27. We call this directory HEADAS.

The following sequence of csh/tcsh commands allow us to initialise Heasoft,

  setenv HEADAS /usr/local/heasoft-M.N.P/architecture
  source \$HEADAS/headas-init.csh

The equivalent commands for the sh/bash shell are,

  export HEADAS=/usr/local/heasoft-M.N.P/architecture
  . \$HEADAS/headas-init.sh

You may define a shell alias named heainit, such as,

  alias heainit "source \$HEADAS/headas-init.csh" (csh/tcsh)
  alias heainit=". \$HEADAS/headas-init.sh" (sh/bash)

which will allow you to initialilse Heasoft by simply invoking such alias,

heainit
 

 

Initialization of SAS

 

 

The environment variable SAS_DIR must point to the directory containing the SAS installation, the one identified in the first one of your answers, /some_dir/xmmsas_20201015_1931.

Depending on the shell you use, you can define SAS_DIR as,

  setenv SAS_DIR /some_dir/xmmsas_20201015_1931 (csh/tcsh)
  export SAS_DIR=/some_dir/xmmsas_20201015_1931 (sh/bash)

Depending on your shell, now you can initialise SAS by issuing the command,

  source \$SAS_DIR/setsas.csh (csh/tcsh)
  . \$SAS_DIR/setsas.sh (sh/bash)

 

 

Running sasver

 

 

You may try now to run your first SAS Python task: sasver. This task provides a sort of *about SAS* and also, a test of SAS *readiness*. If such task is able to run successfully, the whole SAS is ready to be used.

The purpose of sasver is to show the *identity* card of the SAS version you are running. Besides, it shows all SAS shell environment variables defined so far.

The task sasver can be run either from the command line or from a notebook. Most Python SAS tasks will behave this way. To run it from the command line, you simply have to invoke it as you would do any other SAS command,

  sasver

which will produce in the terminal several output lines.

However, to run this task from a Jupyter Notebook, we need to employ a different method. Given that such method can be used to run any other SAS task, either Python or non Python, we are going to explain it by using the SAS task sasver as an example.

 

 

Invoking SAS Python tasks from notebooks

 

 

To work with any specific Python component included in SAS, we need to import the corresponding package from the Python core package for SAS. Such package is named pysas.

To execute any SAS task within a Notebook, we need to import from pysas a component known as Wrapper. The following cell shows how to do that,

In [ ]:
from pysas.wrapper import Wrapper as w
            
 

Any SAS task accepts arguments which can be either specific options, e.g. --version, which shows the task's version, or parameters with format param=value. When the task is invoked from the command line, these arguments follow the name of the task. However, in Notebooks we have to pass them to the task in a different way. This is done using a Python list, whose name you are free to choose. Let the name of such list be inargs.

To pass the option --version to the task to be executed, we must define inargs as,

In [ ]:
inargs = ['--version']
            
 

To execute the task, we will use the Wrapper component imported earlier from pysas, as w (which is a sort of alias), as follows,

In [ ]:
t = w('sasver', inargs)
            
 

In Python terms, t is an *instantiation* of the object Wrapper (or its alias w).

To run sasver, we can now do as follows,

In [ ]:
t.run()
            
 

This output is equivalent to having run sasver in the command line with argument --version.

Each SAS task, regardless of the task being a Python task or not, accepts a predefined set of options. To list which are these options, we can always invoke the task with option --help (or -h as well).

With sasver, as with some other SAS tasks, we could define inargs as an empty list, which is equivalent to run the task in the command line without options, like this,

In [ ]:
inargs = []
            
In [ ]:
t = w('sasver', inargs)
            
In [ ]:
t.run()
            
 

That is indeed the desired output of the task sasver.

A similar result can be achieved by combining all the previous steps into a single expression, like this,

In [ ]:
w('sasver', []).run()
            
 

The output of sasver provides useful information on which version of SAS is being run and which SAS environment variables are defined.

Note:It is important to always use [ ] when passing parameters to a task when using the wrapper, as parameters and options have to be passed in the form of a list. For example, w('evselect', ['-h']).run(), will execute the SAS task evselect with option -h.

Should you define any new SAS environment variable, you can immediately check its existence by running again sasver, as shown below as an example for the definition of the variable SAS_ODF,

In [ ]:
import os
            
In [ ]:
os.environ['SAS_ODF'] = 'myODFs/'
            
In [ ]:
w('sasver', []).run()
            
 

 

Listing available options

 

 

As noted earlier, we can list all options available to any SAS task with option --help (or -h),

In [ ]:
w('sasver', ['-h']).run()
            
 

As explained in the help text shown here, if the task would have had any available parameters, we would get a listing of them immediately after the help text.

As shown in the text above, the task sasver has no parameters.

You may try any other of the available options listed above to see how they behave when passed from the notebook.

 

 

Calibration files

 

 

Our next step is to tell SAS where in the system the calibration data files are placed or, as they are known, the Current Calibration Files (CCF).

According to answer 2 above, the CCFs are in /ccf/valid. Then, one must tell SAS where to find them. As before, we must define an evironment variable, now named, SAS_CCFPATH.

  setenv SAS_CCFPATH /ccf/valid (csh/tcsh)
  export SAS_CCFPATH=/ccf/valid (sh/bash)

Should you have other CCFs in other directories, you can add these directories to the previous definition. For example, imagine you have some test CCFs in /ccf/test. Then you could define SAS_CCFPATH as,

  setenv SAS_CCFPATH /ccf/test:/ccf/valid (csh/tcsh)
  export SAS_CCFPATH=/ccf/test:/ccf/valid (sh/bash)

Should you have two CCFs named the same placed in both directories, /ccf/test and /ccf/valid, SAS will use first those in /ccf/test.

 

 

Running startsas

 

 

To begin working with SAS, once we have initialised Heasoft and SAS and defined where to find all Calibration Files, one must complete the following tasks:

  1. Download the Observation Data File (ODF) you are interested in
  2. Generate a Calibration Index File for such ODF
  3. Create the Observation Summary File for such ODF

This can be done individually step-by-step, or with a new SAS task, called startsas, which is aimed at providing these three steps in an easy way, as we will see further down. Before we do this, we summarize in the next three blocks each one of these steps, and the two SAS tasks used to produce a Calibration Index File and Observation Summary File. If you want to start using startsas, you can skip the next three blocks.

 

 

The Observation Data Files

 

 

XMM-Newton observation data are provided in the form of a bundle of files known as Observation Data Files (aka ODF). The file components of such ODF include information on a single XMM-Newton observation, its different exposures, their modes and filters, etc.

Now that SAS is intalled and initialised, one must tell SAS where to find the ODF to be processed. As usual, one must define the location of the data through a shell environment variable, for example,

  setenv SAS_ODF /home/user/xmm_obs/0104860501 (csh/tcsh)
  export SAS_ODF=/home/user/xmm_obs/0104860501 (sh/bash)

 

 

The Calibration Index File

 

 

From this point onwards in our analysis process, we can move to the working directory, /home/user/my_work/, where all the output products of the different SAS task will be placed.

To be able to process your ODF data set, SAS needs to identify which CCF files have to be used among all the files available. This is known as generating the Calibration Index File or CIF file. SAS has a specific task to do it, named cifbuild. This task will access the ODF and look into SAS_CCFPATH for all the CCFs required by your observation. As output, cifbuild will generate a file named ccf.cif in the directory where you have run the task.

The ccf.cif file is in __FITS__ format and can be examined with any FITS viewer, as for example the Heasoft FTOOL named fv. You may inspect the file to see the list of all the CCFs required to process your ODF without any reference to SAS_CCFPATH (by default cifbuild uses the parameter withccfpath=no).

Once the CIF file is produced, the SAS_CCF variable must be defined and pointed to this file,

  setenv SAS_CCF /home/user/my_work/ccf.cif (csh/tcsh)
  export SAS_CCF=/home/user/my_work/ccf.cif (sh/bash)

 

 

The SAS Summary File

 

 

Within the file components of any XMM-Newton ODF set, there is a file which summarizes all the observational information involved. Such file (with extension ASC) must be updated before processing the data with SAS.

To do the job, SAS provides the task odfingest which you must execute to collect information from all the components in the ODF and produce what we know as SAS summary file, *SUM.SAS.

Once the SAS Summary File is created (in the directory where you have run the task), the SAS_ODF variable must be redefined to point to it, in the case of the example observation id being used,

  setenv SAS_ODF /home/user/my_work/0466_0104860501_SCX00000SUM.SAS (csh/tcsh)
  export SAS_ODF=/home/user/my_work/0466_0104860501_SCX00000SUM.SAS (sh/bash)

The SAS summary file is a text file (ASCII format), so you can edit it with any available text editor. We recommend you explore it and search for the value assigned to the PATH tag, right at the beginning of the file.

 

 

starsas parameters

 

 

As mentioned above, startsas, is aimed at providing these three steps just described in an easy way. To start with, we can get a list of the parameters available for this task by means of the options -h or -p.

In [ ]:
inargs=['-h']
            
In [ ]:
w('startsas', inargs).run()
            
 

Column name lists the parameter names, column mandatory shows whether the parameter is optional or mandatory, column type shows the type of the parameter (boolean, integer, real, etc), column default shows whether the parameter has a default value or not, and finally, column description provides a short description of the parameter.

 

 

Executing startsas

 

 

We will show three modes of execution of startsas:

  1. Using the parameter odfid
  2. Using the parameters sas_ccf and sas_odf
  3. Using the parameter odfid with level=PPS
 

 

With parameter odfid

 

 

The first mode is possibly the most general usage of the task, where startsas is asked to download the ODF identified by odfid, into the Working directory, identified by workdir. Once downloaded, the task unpacks the file into a subdirectory with the name of the ODF identifier and runs immediately after, the tasks cifbuild and odfingest (without parameters). This will provide the CIF file (ccf.cif) and the Summary file (*SUM.SAS), both in the Working directory. Besides this, the environment variables SAS_CCF and SAS_ODF are set to point to those files, respectively.

 

Assuming that we want to work on ODF 0104860501, let us define the input parameters for startsas as,


Note: work_dir should be the absolute path to the Working directory and finished with '/'.

 

In [ ]:
work_dir = 'absolute_path_to_wrk_directory'
            
In [ ]:
inargs = [f'odfid=0104860501',f'workdir={work_dir}']
            
 

Now, we execute startsas as described earlier. Any log information produced by the SAS tasks will appear in the terminal used to initialize SAS and start this notebook.

In [ ]:
w('startsas', inargs).run()
            
 

The process ends with both variables, SAS_CCF and SAS_ODF defined, both respectively pointing to the CIF and Summary Files.

From that point onwards we can start running specific SAS commands to obtain the observation event files.

 

This process has downloaded the ODF into the Working directory as defined by the input parameter workdir and has run cifbuild and odfingest. workdir has to be defined with an absolute path in the call to startsas. In the Working directory a new directory with the identifier of the ODF will be created, and all the ODF constituents will be contained inside. The outputs of running cifbuild and odfingest will be placed directly under the Working directory.

 

 

With parameter sas_ccf and sas_odf

 

 

This mode is aimed to allow to work with startsas with an already existing CIF and Summary Files. If you already have in your Working directory a CIF and a Summary Files which you want to use from now onwards, you can tell startsas so by executing,

In [ ]:
work_dir = 'absolute_path_to_cifSUMSAS_directory'
            
In [ ]:
inargs = [f'sas_ccf={work_dir}ccf.cif', f'sas_odf={work_dir}0466_0104860501_SCX00000SUM.SAS', f'workdir={work_dir}']
            
In [ ]:
w('startsas', inargs).run()
            
 

As you may have guessed, we could have reached a similar effect by defining these two environment variables by means of os.environ, e.g.,

In [ ]:
os.environ['SAS_CCF'] = work_dir+'ccf.cif'
            
In [ ]:
os.environ['SAS_ODF'] = work_dir+'0466_0104860501_SCX00000SUM.SAS'
            
 

 

With parameter odfid and level=PPS

 

 

Selecting level=PPS for a given odfid will download all the products resuting from the processing of the ODF in the XMM-Newton Pipeline. All these products are deposited in the Working directory under a subdirectory named odfid/pps.

The program ends by providing a direct link to the Observation Summary html file (POBX000SUMMAR0000.HTM).

In [ ]:
inargs = ['odfid=0104860501', 'level=PPS']
            
In [ ]:
w('startsas', inargs).run()
            
 

 

How to continue from here?

 

 

This depends on the type of products you have requested.

If you requested the Pipeline products (level=PPS), your may begin exploring these products directly. Among them, you will find the Observation Event Files for the different instruments and a lot of information ready to be used.

If you simply requested the ODF (level=ODF), the first step is to run the proper SAS tasks to get the Observation Event Files for each instrument. Then, you may have alook to other Threads to get familiar with specific processing tasks for each instrument.

In the next cells we show how to run from here four typical SAS tasks, three `procs` and one `chain` to process exposures taken with the EPIC PN and MOS instruments, RGS and OM.

Given that the execution of these tasks produces a lot of output, we have not run them within the notebook.

We leave this up to you!

In [ ]:
os.chdir(work_dir)
            
In [ ]:
w('epproc', []).run()
            
 

Running emproc without parameters,

In [ ]:
w('emproc', []).run()
            
 

Running rgsproc without parameters,

In [ ]:
w('rgsproc', []).run()
            
 

Running omichain without parameters,

In [ ]:
w('omichain', []).run()
            
 

Caveats

To work with SAS, it is recommended not to use the same directory where you store your ODF to run and store the output of the SAS tasks. As much as possible, work always in a different directory where you can create the specific CIF and summary files for your observation and organize and store the products.