6. A copilot for ESA Datalabs

 

ESA supervisor: Sandor Kruk
Collaborator(s): Pablo Gomez, Jan Reerink

Site: ESAC

Natural Language Processing (NLP) techniques have recently gained traction in astronomy with the rise of Large Language Models (LLMs). LLMs have been employed in various applications, such as tailoring models specifically for astronomy (Dung Nguyen et al., 2023), in scientific publications (Astarita et al., 2024), and creating query-based chat bots like Pathfinder (Iyer et al., 2024). In previous work, we have been developing a Retrieval-Augmented-Generation (RAG) pipeline for integrating open-source LLMs with scientific publications and internal documentation. The objective of this project is to deploy such a RAG pipeline in the ESA Datalabs science platform and to explore its use with an open-source LLM for code, such as Codestral (https://mistral.ai/news/codestral/), to provide a free coding assistant for users. This tool, integrated with the collaborative features of Jupyter Notebooks, would be incredibly powerful feature for the users of the platform.

Project duration: 6 months.

Desirable expertise or programming language:

•    Knowledge of natural language processing.
•    Knowledge of open-source large language models.
•    Experience with python programming and Jupyter notebooks.
•    Familiar with software engineering concepts and version control.
•    Background coursework in computer science or data science would be a plus.

 

To apply for this project please fill in an online application form through the following link.