How ThinkingEarth Leverages Large-Scale Datasets

Copernicus Foundation Models and Their Role
Copernicus Foundation Models serve as the backbone of ThinkingEarth, enabling us to extract meaningful insights from EO data, particularly from Copernicus Sentinel missions. These models facilitate large-scale geospatial analysis by learning representations that generalise across multiple tasks, sensors, and temporal scales.
Key Datasets Powering ThinkingEarth
To develop robust EO Foundation Models, we rely on several large-scale datasets that provide diverse spectral, temporal, and geospatial information. Each dataset contributes to different aspects of our research:
SoftCon (Contrastive Learning for EO Data)
- Purpose: SoftCon enables contrastive self-supervised learning for EO data by leveraging multi-label land cover annotations.
- Usage in ThinkingEarth: It helps pre-train models on diverse land cover conditions, improving downstream classification and segmentation tasks.
DOFA (Domain-Oriented Feature Adaptation)
- Purpose: DOFA is designed for weakly supervised domain adaptation in EO, ensuring models can generalise across regions with different distributions.
- Usage in ThinkingEarth: We incorporate DOFA techniques to enhance model adaptability, reducing the need for extensive labelled data when applying models to new geographic areas.
MUDDAT (Multi-Source Domain Adaptation Dataset for EO)
- MUDDAT is a benchmark dataset for evaluating domain adaptation techniques in EO.
- Usage in ThinkingEarth: It allows us to assess the robustness of our models when transferring knowledge between different EO datasets.
Kuro Siwo (Global Flood Mapping Dataset)
- Purpose: Kuro Siwo provides manually annotated flood maps for global-scale flood detection.
- Usage in ThinkingEarth: Our flood prediction models are trained using Kuro Siwo to improve the accuracy of flood risk assessments and early warning systems.
GAIA (Geo-Aware Image Annotations)
- Purpose: GAIA offers high-quality semantic labels for EO images, aiding in land cover classification and change detection.
- Usage in ThinkingEarth: GAIA enhances scene-level annotations in self-supervised learning pipelines, making our models more context-aware.
Fo-Mo Bench (Forest Monitoring Benchmark)
- Purpose: Fo-Mo Bench is a large-scale benchmark for forest monitoring, integrating data from multiple sources to track deforestation and forest health.
- Usage in ThinkingEarth: We utilise Fo-Mo Bench to fine-tune our models for vegetation monitoring and carbon stock estimation.
Expanding EO Foundation Models at Scale
By integrating these datasets, we are expanding the capabilities of EO Foundation Models through:
- Unified Sensor Fusion: Developing models that process multispectral, SAR, and other sensor data seamlessly.
- Climate Science Integration: Incorporating climate variables (e.g., atmospheric composition from Sentinel-5P) to improve model understanding.
- Vision-Language Models for EO: Enhancing the interpretability of EO data using semantic alignment techniques such as GeoLangBind, which connects EO imagery with textual descriptions.
Enhancing Model Adaptability
To ensure our models remain robust across diverse environments, we apply:
- Transfer Learning: Reusing knowledge from pre-trained models to new tasks.
- Domain Adaptation: Refining models for application to new geographies.
- Domain Generalisation: Ensuring robustness across different datasets.
Advanced techniques such as SupCon and SoftCon play a critical role in our self-supervised pretraining strategies, while FoMo-Net and CLIP-based models enhance cross-modal learning.
Conclusion
ThinkingEarth, powered by AI-driven Copernicus Foundation Models, is revolutionising EO analytics for climate resilience and environmental sustainability. Through continuous dataset integration, model refinement, and interdisciplinary collaboration, we are building the next generation of scalable, flexible, and impactful EO models.
Share
Read next

ThinkingEarth and Copernicus Foundation Models: Advancing Earth Observation

Enhancing Land Cover Mapping with ThinkingEarth’s Self-Learning Techniques
