How ThinkingEarth Leverages Large-Scale Datasets

News Item
27 March 2025

At ThinkingEarth, we harness the power of Self-Supervised Learning (SSL) and Graph Neural Networks (GNNs) to construct task-agnostic Copernicus Foundation Models and a graph-based representation of the Earth. By integrating diverse Earth Observation (EO) datasets, we aim to improve environmental monitoring, climate action, and sustainable development.
At ThinkingEarth, we harness the power of Self-Supervised Learning (SSL) and Graph Neural Networks (GNNs) to construct task-agnostic Copernicus Foundation Models and a graph-based representation of the Earth. By integrating diverse Earth Observation (EO) datasets, we aim to improve environmental monitoring, climate action, and sustainable development.

Copernicus Foundation Models and Their Role

Copernicus Foundation Models serve as the backbone of ThinkingEarth, enabling us to extract meaningful insights from EO data, particularly from Copernicus Sentinel missions. These models facilitate large-scale geospatial analysis by learning representations that generalise across multiple tasks, sensors, and temporal scales.

Key Datasets Powering ThinkingEarth

To develop robust EO Foundation Models, we rely on several large-scale datasets that provide diverse spectral, temporal, and geospatial information. Each dataset contributes to different aspects of our research:

SoftCon (Contrastive Learning for EO Data)

  • Purpose: SoftCon enables contrastive self-supervised learning for EO data by leveraging multi-label land cover annotations.
  • Usage in ThinkingEarth: It helps pre-train models on diverse land cover conditions, improving downstream classification and segmentation tasks.

DOFA (Domain-Oriented Feature Adaptation)

  • Purpose: DOFA is designed for weakly supervised domain adaptation in EO, ensuring models can generalise across regions with different distributions.
  • Usage in ThinkingEarth: We incorporate DOFA techniques to enhance model adaptability, reducing the need for extensive labelled data when applying models to new geographic areas.

MUDDAT (Multi-Source Domain Adaptation Dataset for EO)

  • MUDDAT is a benchmark dataset for evaluating domain adaptation techniques in EO.
  • Usage in ThinkingEarth: It allows us to assess the robustness of our models when transferring knowledge between different EO datasets.

Kuro Siwo (Global Flood Mapping Dataset)

  • Purpose: Kuro Siwo provides manually annotated flood maps for global-scale flood detection.
  • Usage in ThinkingEarth: Our flood prediction models are trained using Kuro Siwo to improve the accuracy of flood risk assessments and early warning systems.

GAIA (Geo-Aware Image Annotations)

  • Purpose: GAIA offers high-quality semantic labels for EO images, aiding in land cover classification and change detection.
  • Usage in ThinkingEarth: GAIA enhances scene-level annotations in self-supervised learning pipelines, making our models more context-aware.

Fo-Mo Bench (Forest Monitoring Benchmark)

  • Purpose: Fo-Mo Bench is a large-scale benchmark for forest monitoring, integrating data from multiple sources to track deforestation and forest health.
  • Usage in ThinkingEarth: We utilise Fo-Mo Bench to fine-tune our models for vegetation monitoring and carbon stock estimation.

Expanding EO Foundation Models at Scale

By integrating these datasets, we are expanding the capabilities of EO Foundation Models through:

  • Unified Sensor Fusion: Developing models that process multispectral, SAR, and other sensor data seamlessly.
  • Climate Science Integration: Incorporating climate variables (e.g., atmospheric composition from Sentinel-5P) to improve model understanding.
  • Vision-Language Models for EO: Enhancing the interpretability of EO data using semantic alignment techniques such as GeoLangBind, which connects EO imagery with textual descriptions.

Enhancing Model Adaptability

To ensure our models remain robust across diverse environments, we apply:

  • Transfer Learning: Reusing knowledge from pre-trained models to new tasks.
  • Domain Adaptation: Refining models for application to new geographies.
  • Domain Generalisation: Ensuring robustness across different datasets.

Advanced techniques such as SupCon and SoftCon play a critical role in our self-supervised pretraining strategies, while FoMo-Net and CLIP-based models enhance cross-modal learning.

Conclusion

ThinkingEarth, powered by AI-driven Copernicus Foundation Models, is revolutionising EO analytics for climate resilience and environmental sustainability. Through continuous dataset integration, model refinement, and interdisciplinary collaboration, we are building the next generation of scalable, flexible, and impactful EO models.

Share

Read next


How Thinking Earth helps biodiversity monitoring in urban environments
News Item
16 December 2025

How Thinking Earth helps biodiversity monitoring in urban environments

Cities are dynamic ecosystems where green and blue spaces — parks, street trees, water bodies, urban gardens — deliver critical ecosystem services: they improve air and water quality, support pollinators and wildlife, reduce urban heat, and enhance human well‑being.
Consistent Flood Mapping and Forecasting with ThinkingEarth
News Item
28 November 2025

Consistent Flood Mapping and Forecasting with ThinkingEarth

Floods are among the most destructive natural hazards worldwide, affecting millions of people every year and causing extensive damage to infrastructure, ecosystems, and agricultural production.
The ThinkingEarth Hackathon at BiDS 2025: showcasing AI innovation for Earth Observation
Event
9 October 2025

The ThinkingEarth Hackathon at BiDS 2025: showcasing AI innovation for Earth Observation

The ThinkingEarth Hackathon, held as part of Big Data from Space (BiDS) 2025 in Riga, brought together some of the most creative minds in artificial intelligence (AI) and Earth Observation (EO) to explore the potential of Copernicus-scale foundation models.
Newsletter of the project Thinking Earth

Stay tuned and subscribe to our quarterly newsletter

By submitting your e-mail address, you agree to our privacy policy for the site.