How to Use TAP to access metadata and data

 

This page is a from basics 'how-to' for retrieving information from the SOAR using TAP. First the foundation, Tables, is covered, then metadata queries, and data queries using the data_item_id, the filename, or a query.

 

There is another example notebook written by Éric Buchlin: Fun with the SOAR TAP API!

 

Tables

The most important thing to understand when approaching accessing SOAR metadata and data using means other than the web interface is that the metadata and data are stored in TABLES (hence Table Access Protocol). There are many different tables for different groups of data and are basically simple tables, with rows and columns.

 

The example shown below is a section of a table called v_sc_data_item:

This table contains the names and metadata for all the latest versions of the publicly-available SCIENCE products. Descriptions of the most useful columns in this table are given below.

 

Other useful tables include a version of the above table but for LOW LATENCY products (v_ll_data_item), and v_sc_repository_file and v_ll_repository_file which contain metadata for all versions of the SCIENCE and LOW LATENCY files, respectively. 

 

These tables can be queried for metadata using URL requests that use ADQL syntax (similar to SQL, used for searching databases).

 

The columns are the same for v_sc_data_item and v_ll_data_item for the Science and Low Latency data respectively. The most useful columns in the table below, but all columns are given in the page on Tables, views and columns. NOTE is_active=True otherwise deactivated files will be included.

COLUMN NAME

DESCRIPTION

EXAMPLE

begin_time

Date and time of the beginning of the observation in ISO format

'2022-03-21T00:00:00.0'

data_item_id

The ID for this data item, being the filename up to and including the datetime section of the filename. This is the ID used for downloading the file, and is the - this is the filename minus the version, free-field (if present) and extension.

'solo_L2_mag-rtn-normal-1-minute_20220321'

"descriptor"

The descriptor of the file. Note that this is in quotes because it is a reserved word for TAP. This is in lower-case, as in the filename.

'mag-rtn-normal-1-minute'

end_time

The end date and time of the observation in the file.

'2022-03-22T00:00:00.0'

file_format

BIN, FITS, H5, JP2, CDF

'CDF'

filename

The filename - this is the data_item_id plus the version, free-field (if present) and the extension.

'solo_L2_mag-rtn-normal-1-minute_20220321_V01.cdf'

filesize

Size of the file in bytes

32605

generation_time

Date and time the file was produced in ISO format

'2022-09-20T13:50:16.0'

insertion_time

Date and time file was ingested into the archive in ISO format

'2022-09-20T15:19:07.451'

instrument

Main instrument name, one of EUI, RPW, EPD, SWA, MAG, SPICE, SOLOHI, METIS, PHI or STIX

'MAG'

is_active

Request only those rows which have NOT been deactivated.

'True'

"level"

Processing level of the data: L0, L1, L2, L3. Note that this is in quotes because it is a reserved word for TAP.

'L2'

sensor

Sensor or telescope used.

NULL for the example

 

Metadata Searches

To perform a metadata search, the first thing needed is the name of the table that holds the information that we want. There are several ways of finding out which table contains which columns, but for now, if we want publicly available science data, the table is v_sc_data_item.

The next thing is knowing what information we want to find, i.e., which columns to request that fulfil our search criteria. We can see from the contents of this table, given above, we can find e.g., the data_item_id for all MAG data (instrument='MAG') for April 2022 (begin_time>='2022-04-01T00:00:00' and end_time<='2022-05-01T00:00:00' - note the '>=' and '<=' so we get the whole month). The request is for data_item_id because that is the detail needed to go on and request the data in a data request. 

This information is expressed in an ADQL query as:

SELECT data_item_id FROM v_sc_data_item WHERE instrument='MAG' AND begin_time>='2022-04-01T00:00:00' AND end_time<='2022-05-01T00:00:00' AND is_active='True'

SELECT and FROM are mandatory, what columns are wanted (in this case, the data_item_id) from which table. The WHERE narrows the search with conditions are these are joined with AND.

 

We will also ask for only Level 2 files (AND level='L2') and ask that the results are sorted since they aren't by default (ORDER BY begin_time)

SELECT data_item_id FROM v_sc_data_item WHERE instrument='MAG' AND begin_time>='2022-04-01T00:00:00' AND end_time<='2022-05-01T00:00:00' AND level='L2' AND is_active='True' ORDER BY begin_time

 

This can be used in a browser, by giving an address for where to look for this information, what language this information is presented in, in what format to return the results, and our search in a form that doesn't confuse the browser:

So where:

https://soar.esac.esa.int/soar-sl-tap/tap/sync?

what we're doing and in what language:

REQUEST=doQuery&LANG=ADQL

what format to return the results (VOTable is the default, CSV and JSON are the other options):

&FORMAT=json

and then add the ADQL query in a way that the browser can understand it (URL encoding) - the spaces need to be replaced with '+', the '>' with '%3E', and the '<' with '%3C':

&QUERY=SELECT+data_item_id+FROM+v_sc_data_item+WHERE+instrument='MAG'+AND+begin_time>='2022-04-01T00:00:00'+AND+end_time<='2022-05-01T00:00:00'+AND+level='L2'+AND+is_active='True'+ORDER+BY+begin_time

This can be copied and pasted into a browser address bar (the hyperlink may not work in Chrome) to get the results:

https://soar.esac.esa.int/soar-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=json&QUERY=SELECT+data_item_id+FROM+v_sc_data_item+WHERE+instrument='MAG'+AND+begin_time%3E='2022-04-01T00:00:00'+AND+end_time%3C='2022-05-01T00:00:00'+AND+level='L2'+is_active='True'+ORDER+BY+begin_time

https://soar.esac.esa.int/soar-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=json&QUERY=SELECT+data_item_id+FROM+v_sc_data_item+WHERE+instrument=%27MAG%27+AND+begin_time%3E=%272022-04-01T00:00:00%27+AND+end_time%3C=%272022-05-01T00:00:00%27+AND+level=%27L2%27+is_active=%27True%27+ORDER+BY+begin_time

Running this request will return a JSON file (with a name similar to 1666622968487OPE-result.json) containing the data_item_id's of the files that fulfil the criteria, in this case, MAG L2 Science files for April 2022. These data_item_id's can then be put into a data request.

 

Data Request

With the data_item_id:

Knowing the data_item_id of a file we want, we can then download the data by making a data request. This consists of different parts that make up a different URL to put into the browser.

 

We are going to the same address to get the information, but asking for data ('data?' instead of 'tap/sync?'):

https://soar.esac.esa.int/soar-sl-tap/data?

then product_type=SCIENCE because we haven't specified a table (an alternative might be 'LOW_LATENCY')

product_type=SCIENCE&

then RETRIEVAL_TYPE=PRODUCT or ALL_PRODUCTS, depending if we want the latest version or all versions (which will come in a tar file)

RETRIEVAL_TYPE=PRODUCT&

then a data_item_id from our metadata search, e.g., the first one on the list we retrieved above is solo_L2_mag-srf-burst_20220401:

data_item_id=solo_L2_mag-srf-burst_20220401

 

Again, this can be copied and pasted into a browser address bar to get the results:

https://soar.esac.esa.int/soar-sl-tap/data?product_type=SCIENCE&RETRIEVAL_TYPE=PRODUCT&data_item_id=solo_L2_mag-srf-burst_20220401

https://soar.esac.esa.int/soar-sl-tap/data?product_type=SCIENCE&RETRIEVAL_TYPE=PRODUCT&data_item_id=solo_L2_mag-srf-burst_20220401

 

With the filename:

The above the way to get the data without specifying the table or making a query, but the queries can be used to get data too. For example, if we wanted to download an AUXILIARY file, the object of our query is the filepath and filename and the table in this case is v_auxiliary_file:

SELECT filepath,filename FROM soar.v_auxiliary_file WHERE filename='solo_ANC_soc-kernels-complete_V106_20200910_001.zip'

 

built up into the URL as before:

https://soar.esac.esa.int/soar-sl-tap/data?

retrieval_type=PRODUCT&

QUERY=SELECT+filepath,filename+FROM+soar.v_auxiliary_file+WHERE+filename='solo_ANC_soc-kernels-complete_V106_20200910_001.zip'

https://soar.esac.esa.int/soar-sl-tap/data?retrieval_type=PRODUCT&QUERY=SELECT+filepath,+filename+FROM+soar.v_auxiliary_file+where+filename+='solo_ANC_soc-kernels-complete_V106_20200910_001.zip'

 

Or as a query:

SELECT filepath,filename FROM v_sc_file_repository WHERE filename LIKE 'solo_L2_mag-srf-burst_20220402_%.cdf'

LIKE allows us to use a wildcard, which for strings in tables is '%' for multiple character wildcard (like the usual *, and URL encoded as '%25') and '_' for a single character wildcard (like the usual '?', and URL encoded as '%5F')

https://soar.esac.esa.int/soar-sl-tap/data?

retrieval_type=PRODUCT&

QUERY=SELECT+filepath,filename+FROM+soar.v_sc_repository_file+WHERE+filename LIKE 'solo_L2_mag-srf-burst_20220401_%25.cdf'+is_active='True'

 

https://soar.esac.esa.int/soar-sl-tap/data?retrieval_type=PRODUCT&QUERY=SELECT+filepath,filename+FROM+soar.v_sc_repository_file+WHERE+filename LIKE 'solo_L2_mag-srf-burst_20220401_%25.cdf'+is_active='True'

https://soar.esac.esa.int/soar-sl-tap/data?retrieval_type=PRODUCT&QUERY=SELECT+filepath,filename+FROM+soar.v_sc_repository_file+WHERE+filename+LIKE+'solo_L2_mag-srf-burst_20220401_%25.cdf'+is_active=%27True%27

 

This query is actually the same as the first data query example above, but with RETRIEVAL_TYPE=ALL_PRODUCTS, since we put the wildcard on the version number - at the time of writing this, there is only V01 for that file. However, it demonstrates how to use a wildcard in a data request.