Using wget and curl - CSA Guide
WGET and CURL - Direct command line tools
wget and curl are used to put a request directly on the command line
WGET
To make a TAP request with wget
-
take the https form of the request (data doesn't change, but metadata requests need URL encoding - see metadata page)
-
put it in double quotes (make sure 'smart quotes' are not enabled)
-
add one of the following to the beginning:
wget --content-disposition
or
wget -O myfilename.tgz or .csv
The content disposition option will give it an appropriate name and extension. The -O (O for Oscar) option will name the file with the extension you give it, so take care that the extension is the correct one.
For example, a data request will return a .tgz file:
wget -O myDataRequest.tgz "https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C1_CP_FGM_SPIN&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z"
While a metadata request will return the format requested:
wget -O myMetadataRequest.csv "https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=CSV&QUERY=SELECT+dataset_id,measurement_types+FROM+csa.v_dataset+WHERE+measurement_types+like+'%25Electric_Field%25'"
Note that wget has a default timeout of 900 seconds (15 minutes) and so if the request is complex, it's possible that it may timeout. In this case, it's safer to use an asynchronous request as detailed below.
Logging in for asynchronous request
Asynchronous requests require a login (register if you don't have a login) and for wget this means obtaining a cookie file. Use the following syntax replacing COOKIEFILE
with your preferred path and filename, and YOURUSERID
and YOURPASSWORD
appropriately:
wget --keep-session-cookies --save-cookies COOKIEFILE --post-data 'username=YOURUSERID&password=YOURPASSWORD' "https://csa.esac.esa.int/csa-sl-tap/login"
Note that if your password contains special characters, you might need to replace those characters with the URL encoded equivalent, e.g., replace &
with %26
Once you have the cookie file, then you can make your asynchronous request using this cookie file and adding RETRIEVAL_ACCESS=DEFERRED
, for example:
wget --load-cookies cookies.txt "https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL_ACCESS=DEFERRED&DATASET_ID=C1_CE_WBD_WAVEFORM_BM2_CDF&START_DATE=2020-09-02T04:20:00Z&END_DATE=2020-09-02T06:10:00Z&delivery_interval=TenMin&delivery_format=cdf" -O request_response.xml
Once the request has been made, an email will be sent to the registered address with a link to the data itself.
For more information on using scripted access to this data link, see the Asynchronous Data Requests page.
Check the job using the URL in the <uws:parameter id="email_base_url"> field, e.g.,:
wget --load-cookies cookies.txt "https://csa.esac.esa.int/csa-sl-tap/tap/async/1651077797541OPE" -O request_response.xml
and once the phase is COMPLETED, the data is downloadable from the URL in the <uws:result> field, e.g.,:
wget -O myDataRequest.tgz --load-cookies cookies.txt "https://csa.esac.esa.int/csa-sl-tap/tap/async/1651077797541OPE/results/hmiddlet1651077797557"
CURL
curl is an alternative to wget, which can download files from HTTP requests, but can also print the results of a metadata request to the screen, which can be handy for quick queries. It also doesn't have a default timeout, like wget.
To print metadata results to screen, simply put the request in double quotes and put curl on the front:
curl "https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=json&QUERY=SELECT+dataset_id+FROM+csa.v_dataset+WHERE+experiments='PEACE'"
To write that metadata to a file, add --output <filename> to the command, taking care to match up the filename extension to the file format requested, i.e., in this case, .json.
curl --output metadata.json "https://csa.esac.esa.int/csa-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=json&QUERY=SELECT+dataset_id+FROM+csa.v_dataset+WHERE+experiments='PEACE'"
The same method works for data to a tgz file:
curl --output myRequest.tgz "https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&DATASET_ID=C1_CP_FGM_SPIN&START_DATE=2003-03-03T00:00:00Z&END_DATE=2003-03-05T00:00:00Z"
Logging in for asynchronous request
The curl version of the authentication syntax is (replacing COOKIEFILE
, YOURUSERID
and YOURPASSWORD
):
curl -k -c COOKIEFILE -X POST -d username=YOURUSERID -d password=YOURPASSWORD -L https://csa.esac.esa.int/csa-sl-tap/login
and the asynchronous command syntax is given below:
curl -b COOKIEFILE -L "https://csa.esac.esa.int/csa-sl-tap/data?RETRIEVAL_TYPE=product&RETRIEVAL_ACCESS=DEFERRED&DATASET_ID=C1_CE_WBD_WAVEFORM_BM2_CDF&START_DATE=2020-09-02T04:20:00Z&END_DATE=2020-09-02T06:10:00Z&delivery_interval=TenMin&delivery_format=cdf" -o response.xml
The returned file is the XML response discussed on the Asynchronous Data Requests page.
To update the XML file to retrieve the data URL once the job is complete, use the URL in the <uws:parameter id="email_base_url"> field, e.g.,:
curl -k -b cookies.txt -L https://csa.esac.esa.int/csa-sl-tap/tap/async/1651076536218OPE -o response.xml
and once the phase is COMPLETED, use the URL in the uws:result field to retrieve the data, e.g.,:
curl -k -b cookies.txt -L --output myRequest.tgz "https://csa.esac.esa.int/csa-sl-tap/tap/async/1651076536218OPE/results/hmiddlet1651076536234"