DOFA and ThinkingEarth: Towards Adaptive and Scalable Earth Observation Models

News Item

29 August 2025

Earth Observation (EO) is entering a new era where models are no longer tied to a single data type. From optical to radar, multispectral to hyperspectral, each modality provides a distinct environmental signal but leveraging them together in a unified framework has remained a challenge. Recent advances such as DOFA (Domain-Oriented Feature Adaptation) and DOFA-CLIP are addressing these gaps, offering promising directions for large-scale, adaptive EO understanding — directions highly relevant to the goals of ThinkingEarth.

What is DOFA?

DOFA was introduced as a technique for weakly supervised domain adaptation in EO. Its purpose is to help models trained on one region or dataset generalise effectively to new geographic areas with limited labelled data. By reducing the need for costly ground truth, DOFA lowers barriers to applying EO models globally — a critical step for large-scale monitoring of climate, agriculture, and urban environments.

DOFA-CLIP: A Unified Vision–Language Model for EO

Building on this foundation, DOFA-CLIP takes multimodality one step further. It introduces:

GeoLangBind-2M, a large-scale EO image–text dataset covering six heterogeneous modalities, paired with rich natural language descriptions.
VECT (Vision-models Enhanced Contrastive Text-image pretraining), which strengthens spatial awareness by combining multiple vision foundation models.
MaKA (Modality-aware Knowledge Agglomeration), which refines feature learning by accounting for the specifics of each sensor type.

The result is a single Transformer backbone capable of aligning diverse EO modalities with natural language — enabling zero-shot generalisation to unseen data types and tasks.

Why This Matters for ThinkingEarth

ThinkingEarth is developing a self-learning, adaptive AI framework for EO, designed to support real-world applications where data diversity and geographic variability are the norm. DOFA’s principles of domain adaptation and multimodal bridging directly strengthen ThinkingEarth’s use cases:

Food security under climate change: Crops respond differently across regions, and EO data often vary in spectral quality and resolution. DOFA-like adaptation allows models trained in one agricultural zone to be applied in another with minimal re-labelling, improving early-warning systems for food insecurity.
Energy communities and smart grids: Renewable energy planning requires integrating satellite data from different sensors (e.g., solar radiation, land cover, urban morphology). DOFA-CLIP’s capacity to align heterogeneous modalities ensures more robust modelling for decentralised energy management.
Climate resilience: Multimodal fusion helps monitor deforestation, water stress, and urban heat islands. By adapting across regions, ThinkingEarth can provide comparable indicators for diverse environments, from Europe to Sub-Saharan Africa.

Looking Ahead

The innovations of DOFA and DOFA-CLIP point towards a future where EO models are adaptive, multimodal, and interpretable. For ThinkingEarth, incorporating these advances means not only higher performance on benchmarks, but also more trustworthy and actionable insights across its thematic use cases.