Home - ESAC Stats Workshop 2016
ESAC DATA ANALYSIS AND STATISTICS WORKSHOP 2016
Another ESAC Data Analysis and Statistics workshop will be held at ESAC during the week of October 25 to 28, 2016. This is the third of a continuing series of annual workshops, the firsts of which were held in October 2014 and October 2015.
Registration for this workshop is open to everyone, and no prior knowledge of statistics or advanced data analysis methods is required. The tutors will be Roberto Trotta and Željko Ivezić.
The Gaia Data Release #1 workshop organised jointly by ESA and DPAC, will take place the week immediately following the ESAC Data Analysis and Statistics workshop, and everyone is welcome to register and stay to attend this workshop as well. Note, however, that they are organised independently of one another, and therefore you must register for each one separately.
Table of contents
- Dates & Location
- Registration
- Tutors
- Agenda
- Fee
- Logistics
- Workshop materials
- Software & installation instructions
- Organizing Committee
- Get notifications
- Funding
DATES & LOCATION
The workshop will take place on 25-28 October 2016, in rooms D1/D2 at ESAC.
TUTORS
Tuesday 25 October 2016
09:00 - 09:30: Welcome, registration and Installations troubleshooting.
09:30 - 09:35: Welcome to ESAC (Mark Kidger, ESAC Science Faculty Sentinel)
09:35 - 11:00: Introduction to statistics: Probabilities; random variables; parent distributions; samples, central limit theorem; the likelihood function and maximum likelihood principle; frequentist confidence intervals. (Roberto Trotta)
11:00 - 11:30: Coffee/tea break
11:30 - 13:30: Basic concepts in statistics: hands-on session. (Roberto Trotta)
13:30 - 14:30: Lunch @ ESAC canteen
14:30 - 15:30: Introduction to Bayesian inference: Bayes theorem; conceptual and philosophical principles, general advantages of the Bayesian approach; priors. (Roberto Trotta)
15:30 - 16:00: coffee/tea break
16:00 - 17:30: Introduction to Bayesian inference: hands-on session. (Roberto Trotta)
Wednesday 26 October 2016
09:30 - 11:00: Bayesian parameter estimation: inferential solution; the gaussian linear model; Markov Chain Montercarlo (Metropolis Hastings, Hamiltonian MC, Gibbs sampling); practical and numerical issues. (Roberto Trotta).
11:00 - 11:30: Coffee/tea break
11:30 - 13:30: Bayesian parameter estimation: hands-on session. (Roberto Trotta)
13:30 - 14:30: Lunch @ ESAC canteen
14:30 - 15:30: Bayesian model comparison: the three levels of inference; the Bayesian evidence; differences with respect to hypothesis testing; computation of the evidence (SDDR; Laplace approximation, nested sampling). (Roberto Trotta)
15:30 - 16:00: coffee/tea break
16:00 - 17:30: Bayesian model comparison: hands-on session (Roberto Trotta)
Thursday 27 October 2016
09:30 - 11:00: Introduction to astroML. (Željko Ivezić)
11:00 - 11:30: Coffee/tea break
11:30 - 13:30: Introduction to astroML: hands-on session (Željko Ivezić)
13:30 - 14:30: Lunch @ ESAC canteen
14:30 - 15:30: Density estimation, One-dimensional introduction: Knuth's histograms, Scargle's Bayesian Blocks algorithm; gaussian mixture models; kernel density estimates (KDE); hands-on session. (Željko Ivezić)
15:30 - 16:00: coffee/tea break
16:00 - 17:30: Density estimation: high-D KDE; Bayesian nearest neighbor method; extreme deconvolution in high-D; hands-on session. (Željko Ivezić)
Friday, 28 October 2016
09:30 - 11:00: Clustering and classification: clustering (unsupervised classification); 1D hypothesis testing; clustering with Gaussian Mixture models (GMM); hierarchical clustering algorithm. (Željko Ivezić)
11:00 - 11:30: Coffee/tea break
11:30 - 13:30: Supervised classification: potpourri of supervised classification methods: naive Bayes; quadratic disciminant analysis, GMM, KNN, support vector machines; classification comparison with ROC curves; hands-on session. (Željko Ivezić)
13:30 - 14:30: Lunch @ ESAC canteen
14:30 - 15:30: Dimensionality Reduction: principal Component Analysis; non-negative Matrix Factorization; independent Component Analysis; manifold learning (Locally Linear Embedding); hands-on session. (Željko Ivezić)
15:30 - 16:00: coffee/tea break
16:00 - 17:30: Regression and a few misc. points: (Gaussian) errors in both variables; regression with non-Gaussian errors and/or outliers learning curves; fast matching using KD trees; hands-on session. (Željko Ivezić)
FEE
The workshop fee is 50 Euro for non-ESAC (ESA or CAB @ ESAC) attendees and is payable in cash upon arrival and registration at ESAC. It covers the coffee breaks and the daily bus transportation from downtown Madrid to ESAC during the conference.
LOGISTICS
Venue
The workshop will take place at the European Space Astronomy Centre (ESAC), Villanueva de la Cañada, near Madrid.
Hotel
The official hotel for this workshop is:
Leonardo Hotel Madrid City Center
Alberto Aguilera 18, 28015 Madrid
Shuttle bus
A workshop shuttle bus will depart from and arrive to Leonardo Hotel and will be clearly signed with an “ESA” sign at the front. It is covered by the conference fee and you don’t need to stay in the conference hotel to be able to use it. If you loose this bus, the cost of arriving to ESAC cannot be covered by the fee. Departure and arrival times are the following:
Departure of the bus On Tuesday 25 of October 2016 8h00 arrival at ESAC at 09:00
Departure of ESAC On Tuesday 25 of October 2016 17h30 arrival at Leonardo Hotel at 18:30.
Departure of the bus On Tuesday 26, 27, 28 of October 2016 8h30 arrival at ESAC at 09:30
Departure of ESAC On Tuesday 26, 27,28 of October 2016 17h30 arrival at Leonardo Hotel at 18:30.
Social event
On Wednesday 26 October 2016 at 8:00 pm, we will be having the workshop dinner at the basque traditional Segaretxe restaurant in downtown Madrid, at a 13 minutes walk from the hotel. There will be full meat, fish or vegetarian menus for a total cost of 30 Euro (VAT included). At registration time we will ask you about the dinner, your eventual menu choice and will collect the dinner price in cash in advance in case you will be joining for it.
WORKSHOP MATERIAL
Lectures for Wednesday 26 October 2016:
Lecture on Bayesian Inference (pdf, 7.0 MB)
Lecture on Model comparison (pdf, 6.6 MB)
Ipython/Jupyter notebooks for Roberto Trotta's hands-on session:
Linear model two populations notebook
For Thursday and Friday:
Please copy these files to your laptop:
http://www.astro.washington.edu/users/ivezic/astroML/ESAC2016/dataAll.tar.gz (245 MB)
http://www.astro.washington.edu/users/ivezic/astroML/pythonAll.tar.gz
We will discuss what to do with it during the first lecture on Thursday morning.
SOFTWARE & INSTALLATION INSTRUCTIONS
The workshop and hands-on sessions will be based on python.
Attendees are expected to come to the workshop with a working python installation, in order to participate in the hands-on session.
More precisely, the workshop participants should install the following software before the workshop:
python
, with the following packages:numpy
,matplotlib
,scipy
,scikit-learn
,emcee
ipython notebook
- astroML: please see http://www.astroml.org
MAC OS X
On Mac OS X, X11/XQuartz will have to be installed, in addition to the software mentioned above.
All instructions below assume that the bash
shell is used, as it is the default shell on Mac OS X. (Adapt instructions accordingly if you changed your default shell.)
PYTHON & IPYTHON NOTEBOOK
We recommend the all-in-one scientific Python installer Anaconda.
- Download Anaconda from http://continuum.io/downloads
For Mac OS X 10.7 (Lion), 10.8 (Mountain Lion), or 10.9 (Mavericks), pick "Mac OS X — 64-Bit Python 2.7 Graphical Installer".
If you have Mac OS X 10.6 (Snow Leopard), you may use an older version of anaconda. - Double-click to install, and be sure to leave the default "Modify PATH" option.
Most of the necessary python modules already come by default with Anaconda: numpy
, matplotlib
, scipy
, scikit-learn
.
The only python module that needs to be added is emcee:
3. Install emcee in anaconda: conda install -c williamsmj emcee
Test the installation:
- Launch python:
python
This should start python, and the version should mention Anaconda.
Exit with Control-D. - Launch ipython:
ipython
This should start ipython, and the version should mention Anaconda.
Exit with Control-D. - Launch ipython notebook:
ipython notebook
This should open your default browser, and present you with a .
Exit by closing the page (in the browser) and with Control-C (in the terminal).
Note: When the OS language is not English,ipython notebook
may crash with the error "ValueError: unknown locale: UTF-8".
In that case, before launchingipython notebook
, type:
export LC_CTYPE=en_GB.UTF-8
- Launch python and test the different modules:
import numpy
print numpy.__version__
import matplotlib
print matplotlib.__version__
import scipy
print scipy.__version__
import sklearn
print sklearn.__version__
import emcee
print emcee.__version__
All the python modules should load properly, and they should all print their version.
Exit with Control-D.
LINUX
All instructions below assume that the bash shell is used; adapt instructions accordingly if you use a different shell.
PYTHON & IPYTHON NOTEBOOK
We recommend the all-in-one scientific Python installer Anaconda.
- Download Anaconda from http://continuum.io/downloads
The file will be namedAnaconda-2.1.0-Linux-x86.sh
(or a very similar name, adapt instructions accordingly) - Install Anaconda with:
bash Anaconda-2.1.0-Linux-x86.sh
Note that you should typebash
, regardless of whether or not you are actually using thebash
shell.
Follow the text-only prompts.
When there is a colon at the bottom of the screen press the down arrow to move down through the text. - Type yes and press enter to approve the license.
- Press enter to approve the default location for the files.
- Type yes and press enter to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).Most of the necessary python modules already come by default with Anaconda:
numpy
,matplotlib
,scipy
,scikit-learn
.The only python module that needs to be added is emcee:
- Install emcee in anaconda (on a 64-bit linux): conda install -c lrp emcee
Note: if you are on a 32-bit linux, use the following command instead: conda install -c auto emcee
Test the installation:
- Launch python:
python
This should start python, and the version should mention Anaconda.
Exit with Control-D. - Launch ipython:
ipython
This should start python, and the version should mention Anaconda.
Exit with Control-D. - Launch ipython notebook:
ipython notebook
This should open your default browser, and present you with a .
Exit by closing the page (in the browser) and with Control-C (in the terminal). - Launch python and test the different modules:
import numpy
print numpy.__version__
import matplotlib
print matplotlib.__version__
import scipy
print scipy.__version__
import sklearn
print sklearn.__version__
import emcee
print emcee.__version__
All the python modules should load properly, and they should all print their version.
Exit with Control-D.
WINDOWS
The main issue in Windows is the lack of a packaged version of emcee
.
PYTHON & IPYTHON NOTEBOOK
We recommend the all-in-one scientific Python installer Anaconda.
- Download Anaconda from http://continuum.io/downloads
The file will be namedAnaconda-2.1.0-Windows-x86_64.exe
- This package contains Python 2.7.
- Install Anaconda following the wizard and accepting all the defaults.
Most of the necessary python modules already come by default with Anaconda: numpy
, matplotlib
, scipy
, scikit-learn
.
The only python module that needs to be added is emcee
but emcee
is not available packaged for Windows, so it should be downloaded from GitHub and installed:
- Download a ZIP package with the
emcee
code from https://github.com/dfm/emcee/zipball/master - Unpack the archive in a temporary directory
- Change to the temporary directory created in step 2 and run:
python setup.py
install
This will add emcee
to the package library managed by Anaconda.
Test the installation:
- Launch python:
python
This should start python, and the version should mention Anaconda.
Exit with Control-D. - Launch ipython:
ipython
This should start python, and the version should mention Anaconda.
Exit with Control-D. - Launch ipython notebook:
ipython notebook
This should open your default browser, and present you with a .
Exit by closing the page (in the browser) and with Control-C (in the terminal). - Launch python and test the different modules:
import numpy
print numpy.__version__
import matplotlib
print matplotlib.__version__
import scipy
print scipy.__version__
import sklearn
print sklearn.__version__
import emcee
print emcee.__version__
All the python modules should load properly, and they should all print their version.
Exit with Control-D.
ORGANIZING COMMITTEE
- Michele Armano
- Guillaume Belanger
- Hervé Bouy
- Uwe Lammers
- Bruno Merín
- William O'Mullane
- Pablo Riviere-Marichalar
- Celia Sánchez
- Luis Manuel Sarro
- Roland Vavrek
GET NOTIFICATIONS
You can join the mailing list about statistics at ESAC here.
In case of questions you can send an email to bruno.merin at esa.int.
FUNDING
The SOC warmly thanks the ESAC Science Faculty for funding this workshop.