CAIO to TAP - Cluster Science Archive
Differences between CAIO and TAP
Note that full user pages are available at https://www.cosmos.esa.int/web/csa-guide/home. This is a 'quick start' guide on how to adapt your CAIO script to use TAP instead.
There are several different requests that are possible using the CAIO:
- Synchronous product requests (data), via:
- Asynchronous product requests (data) via a browser
- Streamed data requests (data)
- Metadata requests, including
The above features are already available in the TAP limited-functionality beta release and the sections below describe how to adapt scripts for these services.
The following features, already available in CAIO, will be added shortly to the TAP functionality:
- Header requests (counts as data)
If you require assistance to alter code, please contact us.
Adapting scripts for:
Synchronous Data Download
The format is slightly different, but the development team have worked hard to limit the changes necessary to a request. The two biggest changes are as follows
Change of downloaded format suffix:
The CAIO package downloaded has a .tar.gz extension; the TAP system will give you a .tgz, which can be treated in the same way as the .tar.gz package but with possible repercussions for your scripts.
Initial URL:
Essentially, replacing the start of the request with the following is the only change that needs to be made between the CAIO and TAP for synchronous data download:
So
https://csa.esac.esa.int/csa/aio/product-action?
becomes
https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&
After this, the request will look exactly the same.
An example:
(CAIO:)
becomes
(TAP:)
wget Example for SYNCHRONOUS DATA DOWNLOAD
To use the URL request with wget, put the whole request in double quotes and add wget at the beginning. Adding '--content-disposition' can help with the naming of the downloaded file.
This example:
wget --content-disposition "https://csa.esac.esa.int/csa/aio/product-action?DATASET_ID=C1_CP_WHI_NATURAL&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z"
will change to:
wget --content-disposition "https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C1_CP_WHI_NATURAL&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z"
Python Example
To change the example given in the CAIO web site to be able to use the TAP server, the URL needs to change and another item added to the query_specs dictionary. Note that to get more than one dataset, put the strings of the datasets in a list as the DATASET_ID value. Times including fractions of seconds will be accepted but rounded down.
from requests import get # to make GET request
import tarfile
def download(url, params, file_name):
# open in binary mode
with open(file_name, "wb") as file:
# get request
response = get(url, params=params)
# write to file
file.write(response.content)
# Update the URL:
myurl = 'https://csa.esac.esa.int/csa-sl-tap/data'
# Add another item to the query parameters dictionary:
query_specs = {'RETRIEVAL_TYPE': 'product',
'DATASET_ID': 'C1_CP_FGM_SPIN',
'START_DATE': '2003-03-03T12:00:00Z',
'END_DATE': '2003-03-04T12:00:00Z',
'DELIVERY_FORMAT': 'CEF',
'DELIVERY_INTERVAL': 'hourly'}
download(myurl, query_specs, '2021taptest.tar.gz')
with tarfile.open("2021taptest.tar.gz") as tar:
tarname = tar.getnames()
tar.extractall()
MATLAB Example
Just like the Python example, change the URL and add RETRIEVAL_TYPE=product
URL = 'https://csa.esac.esa.int/csa-sl-tap/data';
fileName=tempname;
gzFileName = [fileName '.gz'];
options = weboptions('RequestMethod', 'get', 'Timeout', Inf);
tgzFileName = websave(gzFileName, URL, 'RETRIEVAL_TYPE', 'product', ...
'DATASET_ID', 'C1_CP_FGM_SPIN', ...
'START_DATE', '2003-03-03T00:00:00Z', ...
'END_DATE', '2003-03-04T00:00:00Z', options);
gunzip(gzFileName);
fileNames=untar(fileName);
for iFile = 1:numel(fileNames), disp(fileNames{iFile}); end
IDL Example
The credentials are not needed, since only synchronous download is available and this does not require logging in. Like the Python example, the URL changes and a parameter is added to the query, however, the following code does not untar or gunzip the package. The downloaded package has the extension .tgz (before, it was .tar.gz) and IDL cannot directly untar it - you can use the function csa_untar.pro to gunzip and untar it. DO NOT USE IDL's FILE_GUNZIP on the .tgz file - it will expand until it fills your hard drive - this has been reported to the owners of IDL.
function csa_product
; Construct URL query from parameters and keywords.
csa_product_query = 'RETRIEVAL_TYPE=product&DATASET_ID=C1_CP_FGM_SPIN&START_DATE=2003-03-03T12:00:00Z&END_DATE=2003-03-03T14:00:00Z&DELIVERY_FORMAT=CDF&DELIVERY_INTERVAL=hourly'
;Create IDLnetURL object and set properties
csa_product_obj = obj_new('IDLnetUrl')
csa_product_obj->SetProperty, VERBOSE=1
csa_product_obj->SetProperty, url_scheme = 'https'
csa_product_obj->SetProperty, url_host = 'csa.esac.esa.int/'
csa_product_obj->SetProperty, url_path = 'csa-sl-tap/data'
csa_product_obj->SetProperty, url_query = csa_product_query
;send request to CSA TAP system, saving response in csa_buffer.dat
csa_product_response = csa_product_obj->get(filename='csa_buffer.dat')
csa_product_obj->getproperty, response_header=csa_product_header
;check a .tar.gz file was downloaded and if so rename buffer to correct filename and return correct filename, otherwise return 0
csa_filestart = strpos(csa_product_header,'filename=')
if csa_filestart ne -1 then begin
csa_fileend = strpos(csa_product_header,'gz"')
csa_filename = strmid(csa_product_header,csa_filestart+10,csa_fileend-csa_filestart-8)
csa_dir_end = strpos(csa_product_response,'csa_buffer.dat')
csa_working_dir = strmid(csa_product_response,0,csa_dir_end)
file_move, csa_product_response, csa_working_dir+csa_filename
print, 'Downloaded data to '+csa_working_dir+csa_filename
outfile = csa_working_dir+csa_filename
return, outfile
endif else begin
print, 'Something went wrong.'
return, 0
endelse
end
Asynchronous DATA requests
To make a synchronous data request (up to 1 GB) into an asynchronous data request (up to 50 GB), you need to log in at
https://csa.esac.esa.int/csa-sl-tap/login
and then add the following to the request (note that 'DEFERRED' is case sensitive):
RETRIEVAL_ACCESS=DEFERRED
to make:
The log-in will create a session cookie for the request in the browser, and you will receive an email (to the email address registered to your profile) when the package is ready.
StreamED Data Requests
There are restrictions that apply to a streamed request:
- Only CEF products can be downloaded using these requests
- Only one dataset can be requested
- Only one file is delivered for the time period requested, i.e. delivery interval option is not available
- Header only cannot be requested
- If the internet connection is broken before file download has completed, the request must be made again to retrieve the whole file
To make a synchronous data request into a streamed data request, add
RETRIEVAL_ACCESS=streamed
to the request:
Metadata Requests
Jupyter Notebook with some examples.
For a metadata request, the URL is:
https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&QUERY=
[Default FORMAT=VOTable, can also be JSON or CSV]
then add mandatory SELECT <parameter> and FROM <table>
plus other optional conditions/conditional statements, separating with appropriate delimiters + , =
The metadata requests are directed at tables of information. For the CSA, there are 11 different tables containing different levels of information, much like the resource_class used in the old CAIO queries. The first example queries the table called csa.dataset which contains a column labelled dataset_id. The inventory example below accesses a different table called csa.dataset_inventory. The other tables include those that contain information on files (csa.file) and parameters (csa.parameter). A full list of the tables and their columns will be given in the user manual, but the Jupyter Notebook contains instructions for listing all tables and their columns.
Example: to get a list of all dataset IDs in CSV format, we need to query the csa.v_dataset table (SELECT dataset_id FROM csa.v_dataset):
This list will be unordered; order in ascending order by adding +ORDER+BY+1
(if only one field on the list, or +ORDER+BY+<field_name>
), or descending order with +ORDER+BY+1+desc
- this needs to go at the end:
Example: to get a list of all datasets that include FGM, we need to add the WHERE statement and use quotes and wildcards, where %25 is the URL encoding of % (percentage sign), which is the wildcard (instead of the more usual *).
If you require assistance to alter code, please contact us.
Inventory Requests
In the old (CAIO) system, an inventory request was self-contained: the selected field was DATASET_INVENTORY and this included the fields of dataset_id, start_time, end_time, num_instances and inventory_version. In the new (TAP) system, these fields must be listed separately in the query; however, this also means that it's fully customisable.
The CAIO request asks for the inventory and gives the start time and end time:
https://csa.esac.esa.int/csa/aio/metadata-action?SELECTED_FIELDS=DATASET_INVENTORY&RESOURCE_CLASS=DATASET_INVENTORY&RETURN_TYPE=CSV&QUERY=DATASET_INVENTORY.DATASET_ID%20like%20'C1_CP_FGM_SPIN'%20AND%20DATASET_INVENTORY.START_TIME%20%3C=%20'2002-05-01T00:00:00Z'%20AND%20DATASET_INVENTORY.END_TIME%20%3E=%20'2002-04-01T00:00:00Z'
Broken down into its constituent parts:
https://csa.esac.esa.int/csa/aio/metadata-action?
SELECTED_FIELDS=DATASET_INVENTORY&
RESOURCE_CLASS=DATASET_INVENTORY&
RETURN_TYPE=CSV&
QUERY=
DATASET_INVENTORY.DATASET_ID%20like%20'C1_CP_FGM_SPIN'%20
AND%20
DATASET_INVENTORY.START_TIME%20%3C=%20'2002-05-01T00:00:00Z'%20
AND%20
DATASET_INVENTORY.END_TIME%20%3E=%20'2002-04-01T00:00:00Z'
The closest equivalent TAP command is:
https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id,start_time,end_time,num_instances,inventory_version+FROM+csa.v_dataset_inventory+WHERE+dataset_id='C1_CP_FGM_SPIN'+AND+start_time<='2002-05-01T00:00:00'+AND+end_time>='2002-04-01T00:00:00'+ORDER+BY+start_time
Broken down into parts, this looks like:
https://csa.esac.esa.int/csa-sl-tap/tap/sync?
REQUEST=doQuery&
LANG=ADQL&
FORMAT=CSV&
QUERY=
SELECT+dataset_id,start_time,end_time,num_instances,inventory_version+
FROM+csa.v_dataset_inventory+
WHERE+dataset_id='C1_CP_FGM_SPIN'+
AND+
start_time<='2002-05-01T00:00:00'+
AND+
end_time>='2002-04-01T00:00:00'+
ORDER+BY+start_time
Remember that, as stated above for metadata, the TAP query results are not ordered by default; one has to request that the results are ordered by a given field name. Further note that the start and end times are slightly counterintuitive in order to include all relevant records - this has not changed since the move from CAIO.