How to Use TAP - Solar Orbiter Archive
How to Use TAP to access metadata and data
This page is a from basics 'how-to' for retrieving information from the SOAR using TAP. First the foundation, Tables, is covered, then metadata queries, and data queries using the data_item_id, the filename, or a query.
There is another example notebook written by Éric Buchlin: Fun with the SOAR TAP API!
Tables
The most important thing to understand when approaching accessing SOAR metadata and data using means other than the web interface is that the metadata and data are stored in TABLES (hence Table Access Protocol). There are many different tables for different groups of data and are basically simple tables, with rows and columns.
The example shown below is a section of a table called v_sc_data_item:
This table contains the names and metadata for all the latest versions of the publicly-available SCIENCE products. Descriptions of the most useful columns in this table are given below.
Other useful tables include a version of the above table but for LOW LATENCY products (v_ll_data_item), and v_sc_repository_file and v_ll_repository_file which contain metadata for all versions of the SCIENCE and LOW LATENCY files, respectively.
These tables can be queried for metadata using URL requests that use ADQL syntax (similar to SQL, used for searching databases).
The columns are the same for v_sc_data_item and v_ll_data_item for the Science and Low Latency data respectively. The most useful columns in the table below, but all columns are given in the page on Tables, views and columns. NOTE is_active=True otherwise deactivated files will be included.
COLUMN NAME |
DESCRIPTION |
EXAMPLE |
begin_time |
Date and time of the beginning of the observation in ISO format |
'2022-03-21T00:00:00.0' |
data_item_id |
The ID for this data item, being the filename up to and including the datetime section of the filename. This is the ID used for downloading the file, and is the - this is the filename minus the version, free-field (if present) and extension. |
'solo_L2_mag-rtn-normal-1-minute_20220321' |
"descriptor" |
The descriptor of the file. Note that this is in quotes because it is a reserved word for TAP. This is in lower-case, as in the filename. |
'mag-rtn-normal-1-minute' |
end_time |
The end date and time of the observation in the file. |
'2022-03-22T00:00:00.0' |
file_format |
BIN, FITS, H5, JP2, CDF |
'CDF' |
filename |
The filename - this is the data_item_id plus the version, free-field (if present) and the extension. |
'solo_L2_mag-rtn-normal-1-minute_20220321_V01.cdf' |
filesize |
Size of the file in bytes |
32605 |
generation_time |
Date and time the file was produced in ISO format |
'2022-09-20T13:50:16.0' |
insertion_time |
Date and time file was ingested into the archive in ISO format |
'2022-09-20T15:19:07.451' |
instrument |
Main instrument name, one of EUI, RPW, EPD, SWA, MAG, SPICE, SOLOHI, METIS, PHI or STIX |
'MAG' |
is_active |
Request only those rows which have NOT been deactivated. |
'True' |
"level" |
Processing level of the data: L0, L1, L2, L3. Note that this is in quotes because it is a reserved word for TAP. |
'L2' |
sensor |
Sensor or telescope used. |
NULL for the example |
Metadata Searches
To perform a metadata search, the first thing needed is the name of the table that holds the information that we want. There are several ways of finding out which table contains which columns, but for now, if we want publicly available science data, the table is v_sc_data_item.
The next thing is knowing what information we want to find, i.e., which columns to request that fulfil our search criteria. We can see from the contents of this table, given above, we can find e.g., the data_item_id for all MAG data (instrument='MAG') for April 2022 (begin_time>='2022-04-01T00:00:00' and end_time<='2022-05-01T00:00:00' - note the '>=' and '<=' so we get the whole month). The request is for data_item_id because that is the detail needed to go on and request the data in a data request.
This information is expressed in an ADQL query as:
SELECT data_item_id FROM v_sc_data_item WHERE instrument='MAG' AND begin_time>='2022-04-01T00:00:00' AND end_time<='2022-05-01T00:00:00' AND is_active='True'
SELECT and FROM are mandatory, what columns are wanted (in this case, the data_item_id) from which table. The WHERE narrows the search with conditions are these are joined with AND.
We will also ask for only Level 2 files (AND level='L2') and ask that the results are sorted since they aren't by default (ORDER BY begin_time)
SELECT data_item_id FROM v_sc_data_item WHERE instrument='MAG' AND begin_time>='2022-04-01T00:00:00' AND end_time<='2022-05-01T00:00:00' AND level='L2' AND is_active='True' ORDER BY begin_time
This can be used in a browser, by giving an address for where to look for this information, what language this information is presented in, in what format to return the results, and our search in a form that doesn't confuse the browser:
So where:
https://soar.esac.esa.int/soar-sl-tap/tap/sync?
what we're doing and in what language:
REQUEST=doQuery&LANG=ADQL
what format to return the results (VOTable is the default, CSV and JSON are the other options):
&FORMAT=json
and then add the ADQL query in a way that the browser can understand it (URL encoding) - the spaces need to be replaced with '+', the '>' with '%3E', and the '<' with '%3C':
&QUERY=SELECT+data_item_id+FROM+v_sc_data_item+WHERE+instrument='MAG'+AND+begin_time>='2022-04-01T00:00:00'+AND+end_time<='2022-05-01T00:00:00'+AND+level='L2'+AND+is_active='True'+ORDER+BY+begin_time
This can be copied and pasted into a browser address bar (the hyperlink may not work in Chrome) to get the results:
https://soar.esac.esa.int/soar-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=json&QUERY=SELECT+data_item_id+FROM+v_sc_data_item+WHERE+instrument='MAG'+AND+begin_time%3E='2022-04-01T00:00:00'+AND+end_time%3C='2022-05-01T00:00:00'+AND+level='L2'+is_active='True'+ORDER+BY+begin_time
Running this request will return a JSON file (with a name similar to 1666622968487OPE-result.json) containing the data_item_id's of the files that fulfil the criteria, in this case, MAG L2 Science files for April 2022. These data_item_id's can then be put into a data request.
Data Request
With the data_item_id:
Knowing the data_item_id of a file we want, we can then download the data by making a data request. This consists of different parts that make up a different URL to put into the browser.
We are going to the same address to get the information, but asking for data ('data?' instead of 'tap/sync?'):
https://soar.esac.esa.int/soar-sl-tap/data?
then product_type=SCIENCE because we haven't specified a table (an alternative might be 'LOW_LATENCY')
product_type=SCIENCE&
then RETRIEVAL_TYPE=PRODUCT or ALL_PRODUCTS, depending if we want the latest version or all versions (which will come in a tar file)
RETRIEVAL_TYPE=PRODUCT&
then a data_item_id from our metadata search, e.g., the first one on the list we retrieved above is solo_L2_mag-srf-burst_20220401:
data_item_id=solo_L2_mag-srf-burst_20220401
Again, this can be copied and pasted into a browser address bar to get the results:
https://soar.esac.esa.int/soar-sl-tap/data?product_type=SCIENCE&RETRIEVAL_TYPE=PRODUCT&data_item_id=solo_L2_mag-srf-burst_20220401
With the filename:
The above the way to get the data without specifying the table or making a query, but the queries can be used to get data too. For example, if we wanted to download an AUXILIARY file, the object of our query is the filepath and filename and the table in this case is v_auxiliary_file:
SELECT filepath,filename FROM soar.v_auxiliary_file WHERE filename='solo_ANC_soc-kernels-complete_V106_20200910_001.zip'
built up into the URL as before:
https://soar.esac.esa.int/soar-sl-tap/data?
retrieval_type=PRODUCT&
QUERY=SELECT+filepath,filename+FROM+soar.v_auxiliary_file+WHERE+filename='solo_ANC_soc-kernels-complete_V106_20200910_001.zip'
Or as a query:
SELECT filepath,filename FROM v_sc_file_repository WHERE filename LIKE 'solo_L2_mag-srf-burst_20220402_%.cdf'
LIKE allows us to use a wildcard, which for strings in tables is '%' for multiple character wildcard (like the usual *, and URL encoded as '%25') and '_' for a single character wildcard (like the usual '?', and URL encoded as '%5F')
https://soar.esac.esa.int/soar-sl-tap/data?
retrieval_type=PRODUCT&
QUERY=SELECT+filepath,filename+FROM+soar.v_sc_repository_file+WHERE+filename LIKE 'solo_L2_mag-srf-burst_20220401_%25.cdf'+is_active='True'
https://soar.esac.esa.int/soar-sl-tap/data?retrieval_type=PRODUCT&QUERY=SELECT+filepath,filename+FROM+soar.v_sc_repository_file+WHERE+filename LIKE 'solo_L2_mag-srf-burst_20220401_%25.cdf'+is_active='True'
This query is actually the same as the first data query example above, but with RETRIEVAL_TYPE=ALL_PRODUCTS, since we put the wildcard on the version number - at the time of writing this, there is only V01 for that file. However, it demonstrates how to use a wildcard in a data request.