Data Engineer/Scientist Trainee

Référence du poste : DATA-Appr-2026-01

Cette offre vous intéresse ?

Nous rejoindre

Vous avez envie de faire carrière au sein d'un organisme public qui a pour mission de protéger efficacement la santé des populations ? Rejoignez-nous.

Santé publique France is France’s national public health agency. A public institution under the supervision of the Minister of Health, created through the merger of several public institutions by Order 2016-246 of April 15, 2016, the agency works to promote public health. As a scientific, expert, and public health safety agency, its missions include:

  1. Epidemiological observation and monitoring of the health status of the population; 
  2. Monitoring health risks threatening the population; 
  3. Promoting health and reducing health risks; 
  4. Developing prevention and health education; 
  5. Preparation for and response to health threats, alerts, and crises; 
  6. Issuing health alerts.
     

The agency is organized into 12 scientific, cross-functional, or support divisions.

The agency’s strategic priorities and work program, established by its Board of Directors, are organized into three areas: Strengthening the capacity for anticipation and rapid response to address health threats; Measuring and assessing the extent of diseases and risk factors to guide their prevention and control; Strengthening the health impact of all public policies and the prevention and promotion of health.

Data Support, Processing, and Analysis Division

Mission

The DATA Division is leading several strategic projects aimed at modernizing the data processing chains derived from Santé publique France’s surveillance systems. These projects rely on innovative approaches in data engineering and science to address public health challenges, particularly through predictive modeling, artificial intelligence
, and advanced big data analysis. Three major systems illustrate this dynamic:

  • The SurSaUD program provides real-time syndromic surveillance by leveraging data from hospital emergency departments, SOS Médecins, and death certificates.
  • Notifiable Diseases (MSO) tracks diseases with a significant public health impact in real time through the systematic collection and analysis of reports submitted by healthcare professionals.
  • The Orchidée project implements multi-thematic epidemiological surveillance based on hospital data

Activities

This data enables the generation of a large volume of time series, describing the evolution of health indicators across various spatio-temporal scales. Structuring and analyzing this data is a strategic priority for strengthening surveillance, modeling, and alert capabilities.

In this context, the DATA Division has launched a project aimed at building an automated, reliable, and scalable data processing pipeline to leverage this data using advanced processing and analysis methods.

The apprentice will be involved in the entire data processing pipeline, from collection through to the production and dissemination of indicators. Among other things, the apprentice will participate in the following activities:

  • Designing, developing, and maintaining data management systems and automated pipelines covering the entire data processing and reporting cycle.
  • Cleaning, structuring, and preparing data, ensuring its quality, reliability, traceability, and compliance with regulatory requirements.
  • Exploring and proposing technological solutions to improve data quality and reliability.
  • Identifying opportunities for acquiring and integrating new data sources.
  • Design, develop, and optimize statistical, machine learning, and deep learning methods for constructing epidemiological indicators, signal detection, and predictive modeling.
  • Develop monitoring and reporting tools (APIs, interactive dashboards, automated reports) to meet the agency’s operational and strategic needs.
  • Identify and integrate new data sources, as well as emerging technological approaches (AI, distributed processing, etc.) to enrich analyses and enhance responsiveness in crisis situations.
  • Work closely with epidemiologists, biostatisticians, and engineers to translate business needs into robust technical solutions.
  • Draft methodological notes, contribute to the dissemination of results (bulletins, study reports, and scientific articles), and train teams in the use of the developed tools.

These activities take place within a dynamic and collaborative technical environment, utilizing modern development tools, languages tailored to data science, and high-performance computing infrastructure. The apprentice will work within a multidisciplinary team, interacting closely with epidemiologists, data scientists, statisticians, engineers, and members of the IT Department as well as the Chief Information Security Officer (CISO).

The main tools and technologies used include:

  • Languages: Python, R
  • Collaborative environment: GitLab (version control, continuous integration, issue management)
  • Automation and orchestration: Apache Airflow (deployment, workflow monitoring), Docker
  • Formats and databases: PostgreSQL, DuckDB, Parquet files, CSV
  • Visualization: Quarto, Shiny (R and Python)
  • Development environments: VS Code, RStudio, IA Mistral
  • High-performance computing: Apache Spark, via Santé publique France’s internal computing servers

Our latest news

news

“Protecting the Public from the Risks of Alcohol.” The special report in *La...

news

Call for Applications to Fill Vacancies on the National Committee on...

news

Sexual Health Week 2026: Screening and Prevention Remain Essential