ThinkingEarth and Copernicus Foundation Models: Advancing Earth Observation

Our work leverages Self-Supervised Learning (SSL) and Graph Neural Networks (GNNs) to build task-agnostic Copernicus Foundation Models and a Graph representation of the Earth.
Copernicus Foundation Models and Their Role
Copernicus Foundation Models are central to the ThinkingEarth project, helping to extract useful information from EO data, particularly from Copernicus Sentinel missions.
Innovation in EO Foundation Model Research
To tackle key environmental challenges like biodiversity monitoring and climate action, we have curated large-scale datasets, including:
- SSL4EO-S12-ML – A global multi-label dataset (~5TB) combining multispectral and SAR imagery with open land-cover products.
- SSL4EO-S – A ~15TB dataset integrating Sentinel mission data and Copernicus DEM GLO-30, linking EO with climate science.
- Kuro Siwo – A manually annotated dataset (~1.33TB) for global flood mapping.
- FoMo-Bench – A benchmark for forest monitoring, spanning diverse datasets (~10TB).
Big data plays a crucial role in EO foundation models. While raw satellite imagery is valuable, additional resources like WorldCover (global land cover maps at 10m resolution) and Dynamic World (real-time land-use mapping) provide rich semantic information. Though noisy at the pixel level, they can be refined into scene-level annotations to enhance self-supervised learning.
Expanding EO Foundation Models at Scale
Looking ahead, we aim to:
- Develop unified models across different sensors and timeframes, handling various spectral and non-spectral data.
- Bridge EO and climate science by incorporating gridded surface and atmospheric data (e.g., S5P products).
- Enhance vision-language models to make EO models more interactive through semantic alignment.
A key initiative, GeoLangBind, unifies EO data through language-driven alignment, improving model reasoning and cross-modal learning.
Strengthening Model Adaptability
To improve model performance across different datasets, we focus on:
- Transfer Learning for knowledge reuse.
- Domain Adaptation to refine models for new data.
- Domain Generalisation for robustness across diverse EO datasets.
Techniques like SupCon and SoftCon use multi-label land cover annotations for self-supervised pretraining. FoMo-Net pre-trains a sensor-agnostic model using diverse EO datasets, while CLIP-based models align EO imagery with text for better cross-modal learning.
To ensure effectiveness, we will assess these methods using advanced performance metrics and statistical distance evaluations.
Conclusion
By integrating AI with EO data, ThinkingEarth and Copernicus Foundation Models drive innovation in climate action, ecosystem preservation, and sustainability. Through continued research and collaboration, we aim to create robust, scalable, and flexible EO models that can provide actionable insights for global environmental challenges.
Share
Read next

Enhancing Land Cover Mapping with ThinkingEarth’s Self-Learning Techniques

Graph Neural Networks and ThinkingEarth
