Foundation Models

Copernicus Foundation Models

Copernicus Foundation Models are a cornerstone of the ThinkingEarth project. They aim to learn rich, transferable representations from Earth observation data, especially from the Sentinel missions, that can be applied to a wide range of downstream tasks, from land cover mapping to disaster response.

In parallel, the project is advancing the development of the Earth as a Graph model, an approach that focuses on representing the Earth’s interconnected systems through graph structures. This report presents the initial progress made in designing and implementing the Earth as a Graph model, outlining its conceptual framework, key milestones, and future development directions. Positioned alongside the Copernicus Foundation Models, Earth as a Graph contributes to a broader effort to build robust, generalisable Earth representations that support diverse environmental and climate-related applications.

To unlock the full potential of satellite data, ThinkingEarth is developing foundation models tailored to Earth observation (EO). These models are designed to learn from vast amounts of unlabelled data and generalise across regions, sensors, and tasks, making them powerful tools for global-scale applications. Innovations like SupCon and SoftCon use land cover labels to guide self-supervised learning, improving how models recognise and relate satellite scenes. Meanwhile, FoMo-Net aims to unify learning across diverse data types, from radar to multispectral and elevation data, with a single, sensor-agnostic architecture.

Pushing further, work on remote sensing vision-language models adapts the CLIP framework to EO, allowing AI to connect images with natural language in new ways and enabling powerful zero-shot classification. These advancements mark a major leap toward flexible, intelligent systems that can help us monitor and understand our planet at scale.

These models are trained on some of the largest and most diverse satellite datasets ever assembled, such as SSL4EO-S12-ML, SSL4EO-S, Kuro Siwo, and FoMo-Bench. Together, these datasets span tens of terabytes and cover a wide variety of modalities and use cases, including multispectral imagery, SAR data, climate grids, and detailed forest inventories.

The project has developed several cutting-edge model families using innovative self-supervised and cross-modal learning techniques:

SoftCon and SupCon enhance representation learning using multi-label land cover annotations.
FoMo-Net enables training across multiple remote sensing modalities with a unified architecture.
MindTheModalityGap aligns satellite imagery with natural language using vision-language models like CLIP, opening the door to zero-shot EO classification.

By leveraging both these models and the datasets they are built upon, ThinkingEarth enables more accurate, scalable, and accessible AI tools for environmental monitoring, climate action, and global sustainability challenges.

Data sets and benchmarks

The development of robust foundation models for Earth observation relies on large, diverse, and high-quality datasets. Within ThinkingEarth and its related research efforts, several pioneering datasets have been introduced, ranging from global-scale, self-supervised collections to fine-grained, manually annotated benchmarks. These include multi-label land cover datasets, unified multimodal satellite archives, annotated flood mapping sets, and forest monitoring benchmarks. Together, they provide the essential building blocks for training, evaluating, and advancing general-purpose models capable of supporting a wide variety of geospatial tasks.

SSL4EO-S12-ML Dataset access

Kuro Siwo Dataset access

FoMo-Bench Dataset access

The Earth as a Graph

The Earth is a dynamic, interconnected system of regions, climates, and environments, where changes in one part can ripple across the whole. Traditional models often rely on fixed equations to capture this complexity, but they struggle with the evolving, interdependent nature of the real world. Graph-based machine learning offers a powerful alternative. By representing the Earth as a graph, where nodes can stand for locations or climate variables and edges capture their interactions, we can better reflect the true structure of our planet’s ecosystem.

In the ThinkingEarth project, this graph-based approach is a core foundation. Building on earlier developments such as the Copernicus Foundation Models, Earth as a Graph is designed to learn from data and reveal patterns and relationships within the Earth system. Through graph neural networks and self-supervised learning, we aim to create rich, interpretable representations of the planet, paving the way for more accurate and flexible climate applications.

Published Research

Part of the work presented has been published in peer-reviewed journals and conference proceedings. These scientific articles showcase how foundation models are advancing Earth observation research.

Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining

Kuro Siwo: 33 billion under the water. A global multi-temporal satellite dataset for rapid flood mapping

FoMo: Multi-Modal, Multi-Scale and Multi-Task Remote Sensing Foundation Models for Forest Monitoring

Project Insights

In addition to external publications, the ThinkingEarth team has produced in-house articles that explain our approach to foundation models and how they're being applied to better understand our planet.