Metadata - SOHO
Metadata
In the data and quicklook plots section, we assumed that the observation.oid names were known to the user. The only way to access them is to use the TAP server. The TAP system allows read access to the database concerning products. The available data and metadata are arranged into various tables which collect together useful attributes for searching for the data/metadata that you need.
3.2.1. Tables
Tables can be simply accessed for SSA at the following link
https://ssa.esac.esa.int/ssa-sl-tap/tap/tables
Clicking on the first link (SSA development machine) will display an XML file (see 3.2 - Figure 1)
3.2 - Figure 1: tables.XML file directly accessible at https://ssa.esac.esa.int/ssa-sl-tap/tap/tables
Please note that this XML file will be properly displayed only by Chrome, Firefox or Edge but not by Safari, at least on a Mac. This file is long (thousands of lines) and a bit cryptic to have an overview when using a browser or a simple text editor. One easy way to dig into this file is to use Python. A Jupyter Notebook to explain how to dig into such file is available for the Cluster Science Archive at https://www.cosmos.esa.int/web/csa-guide/metadata-request (see link at bottom of the page), just adapt it to SOHO (a dedicated Jupyter notebook for SOHO is in progress). For people not familiar with Python, there are other ways to visualize the content of this file for instance Excel or Topcat.
Let's try to dig in step by step.
- Please go to https://ssa.esac.esa.int/ssa-sl-tap/tap/tables and save the file you see to something like tables_soho.xml.
- Open it with Excel, what matters here are the last two columns filled with information, in this case the R and Z columns
- Hide all the columns from S to Y to focus on the R and Z columns, and look in the Z column to soho.observation (see 3.2 - Figure 2)
3.2 - Figure 2: Same tables_soho.XML file displayed in 3.2 - Figure 1, this time displayed in Excel, focusing on the R and Z columns (columns S to Y have been hidden)
In this Excel spreadsheet, a number of tables named soho.XXX are available and listed in the Z column (right side column in 3.2 - Figure 2). Each table is composed of various columns related to the metadata extracted from the files and available in the SOHO TAP server. The key table is the soho.mv_observation table, composed of the columns listed in 3.2 - Figure 2. In particular it contains the observation.oid of all the files in the SOHO database, their time coverage, instrument, detector filenames, processing level (L0_RawData, L1_Calibrated or L2_MissionLong). The mv in soho.mv_observation means materialized view. Not very user friendly to be honest and a bit technical but if you are curious/interested please have a look at this page https://www.postgresql.org/docs/current/rules-materializedviews.html. This materialized view contains all the information included in the soho.observation table plus key extra information on instrument, detector or postcard.
In section 3.2.2, we will learn how to extract key columns of this table. But as you can see in the tables_soho.XML file, there are many other tables. One is related to the instruments in general (soho.instrument) and 12 other tables for each of the 12 SOHO instruments (e.g. soho.cds_observation, soho.celias_observation and so on). The other tables are listed below for completeness but are of very little use. The HTTP request to download any of this table (explained in section 3.2.2 in details) are listed next to the table name. Such request can be copy pasted into your favorite browser. This will immediately download a CSV file that can be opened in Excel for you to get a detailed view of their content. This is essential to make complex request (see section 3.2.3). It is thus highly recommended to download the tables of your favorite instruments to build up your own queries to extract the observation.oid fulfilling your criteria or any other metadata information that you want to access.
Key table
Table |
HTTP request to the SOHO TAP server |
---|---|
soho.mv_observation | https://ssa.esac.esa.int/ssa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+*+FROM+soho.mv_observation |
SOHO TAP server tables instrument specific
Extra tables
soho.observation
soho.observing_mode
soho.coronal_image
soho.full_disk_solar_image
soho.science_object
soho.science_slit
soho.study
Tables to be suppressed
soho.lyra_observation
soho.swap_observation
reason: useful when SOHO and Proba-2 archive were merged, not anymore
One other way is to use the TOPCAT software available at http://www.star.bris.ac.uk/~mbt/topcat/
Once installed, click on the top left icon ('Open a new table'), this will open a Load a new table window; in this window click on the 9th icon (mouse over tip: Query remote databases using SQL-like query). This will open a third window entitled Table Access Protocol (TAP) Query. At the bottom, under Selected TAP service, insert the following text in front of TAP URL:
https://ssa.esac.esa.int/ssa-sl-tap/tap/tables
and click on Use Service. This process is illustrated in the first snapshot below; in the second snapshot, select your soho.mv_observation table and click on Columns
The full HTTP request syntax is described below in 3.2 - Table 1
Parameter |
Value |
Comments |
---|---|---|
TAP server | https://ssa.esac.esa.int/ssa-sl-tap/tap/sync? | Name of the SSA TAP server |
REQUEST | REQUEST=doQuery& | Requests to execute the provided query |
LANG | LANG=ADQL& | Query language |
FORMAT |
One of the following |
Results output format (note that 'votable' returns a binary compressed votable) |
QUERY | QUERY= | ADQL query |
PHASE | run | Mandatory for "Asynchronous". Query job initial phase |
3.2 - Table 1: HTTP request syntax to query metadata information from the SSA TAP server
3.2.2. Select column(s) from a table
Let's try to extract the following columns from the soho.observation table
Let's try to extract the following columns
begin_date
end_date
file_format
file_name
instrument_name
detector_name
observation_oid
processing_level
The ADQL query to extract multiple columns from a table is
QUERY=SELECT+<column 1>,<column 2>+FROM+<table>
hence
QUERY=SELECT+begin_date,end_date,file_format,file_name,observation_oid,processing_level+FROM+soho.observation
and the related HTTP request shall read
|
that you can copy paste in the address bar of your favourite browser or just click on it. Please note that the table can also simply be called mv_observation instead of soho.mv_observation
This will download immediately a CSV file (see 3.2 - Figure 4)
3.2 - Figure 4: CSV file extracted from the SOHO TAP server, listing of the time_range, file format, filename, observation ID and processing level. You may need to change of the date format of the column begin_date and end_date in Excel to be displayed like this (select these columns, press Ctrl+1, and format the Date field US style)
If you want all columns from a table just use the * sign as follows
QUERY=SELECT+*+FROM+soho.mv_observation
|
3.2.3. More complex ADQL queries
You can directly extract not only some of the columns of a table but also query exactly the information you need, instead of searching your CSV file. ADQL indeed allows you to a query a table using conditions as follows:
WHERE+<search condition>+AND+(<search condition>+OR+<search condition>)+AND+<search condition>
Let's start with an example, more general information on ADQL follows. For now, let's imagine that you want to extract all of the LASCO C2 observation.oids of level 0 (iRawData) or Level 1 (Calibrated) higher measured after the 3rd of January 2021
|
that you can copy paste in the address bar of your favourite browser or just click on it. An excerpt of the csv file output in Excel is presented in 3.2 - Figure 5
3.2 - Figure 5: CSV file extracted from the SOHO TAP server science data item view named soho.mv_observation, listing 7 columns corresponding LASCO observations of processing level 0 or 1 measured after the 3rd of January 2021. You may need to change of the date format of the columns begin_date and end_date in Excel to be displayed like this (select these columns, press Ctrl+1, and format the Date field, European style)
Once all the observation_oid of interest are extracted, you can request the related files (see section 3.1.1).
The views/tables are queried using Astronomical Data Query Language. The syntax is summarised in the Query Syntax Section 2.2 of (Ortiz et al., 2008)
SELECT [ ALL | DISTINCT ] [ TOP unsigned_decimal ] { * | { value_expression [ [AS] column_name ] }, ... } FROM { { table_name [ [AS] identifier ] | ( SELECT ....) [ [AS] identifier ] | table_name [NATURAL] [ INNER | { LEFT | RIGHT | FULL [OUTER] } ] JOIN table_name [ON search_condition | USING ( column_name,...) ] }, ... } [ WHERE search_condition ] [ GROUP BY group_by_term, ... ] [ HAVING search_condition ] [ ORDER BY { order_by_expression } [ ASC | DESC], ... ] [ OFFSET unsigned_decimal ]
SELECT and FROM are mandatory parts of the query; SELECT <this> FROM <tableX>. The other syntax is for refining the search.
This page on SQL pattern matching may also be very useful: https://www.postgresql.org/docs/9.3/functions-matching.html