Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model

Chenyang Liu, Keyan Chen, Rui Zhao, Zhengxia Zou, and Zhenwei Shi*✉

Share us a ⭐ if you're interested in this repo

Official repository of the paper: "Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model"
The dataset and model will be publicly available here.

Latest Updates

2025-02: The dataset and model will be publicly available.
2025-01: The paper is available.

Git-10M dataset

The Git-10M dataset is a global-scale remote sensing image-text pair dataset, consisting of 10 million image-text pairs with geographical locations and resolution information.

Text2Earth model

Building on the Git-10M dataset, we developed Text2Earth, a 1.3 billion parameter generative foundation model. Text2Earth excels in resolution-controllable text2image generation and demonstrates robust generalization and flexibility across multiple tasks.

Zero-Shot text2image generation: Text2Earth can generate specific image content based on user-free text input, without scene-specific fine-tuning or retraining.

On the previous benchmark dataset RSICD, Text2Earth surpasses the previous models with a significant improvement of +26.23 FID and +20.95% Zero-shot OA metric.
Unbounded Remote Sensing Scene Construction: Using our Text2Earth, users can seamlessly and infinitely generate remote sensing images on a canvas, effectively overcoming the fixed-size limitations of traditional generative models. Text2Earth’s resolution controllability is the key to maintaining visual coherence across the generated scene during the expansion process.
Remote Sensing Image Editing: Text2Earth can perform scene modifications based on user-provided text such as replacing or removing geographic features. And it ensures that these modifications are seamlessly integrated with the surrounding areas, maintaining continuity and coherence.
Cross-Modal Image Generation: Text2Earth can be used for Text-Driven Multi-modal Image Generation, including RGB, SAR, NIR, and PAN images.

Text2Earth also exhibits potential in Image-to-Image Translation, containing cross-modal translation and image enhancement, such as PAN to RGB (PAN2RGB), NIR to RGB (NIR2RGB), PAN to NIR (PAN2NIR), super-resolution, and image dehazing.

Citation

If you find this paper useful in your research, please consider citing:

@article{liu2025text2earth,
  title={Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model},
  author={Liu, Chenyang and Chen, Keyan and Zhao, Rui and Zou, Zhengxia and Shi, Zhenwei},
  journal={arXiv preprint arXiv:2501.00895},
  year={2025}
}

License

This repo is distributed under MIT License. The code can be used for academic purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.idea		.idea
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model

Share us a ⭐ if you're interested in this repo

Latest Updates

Table of Contents

Git-10M dataset

Text2Earth model

Citation

License

About

Releases

Packages

Chen-Yang-Liu/Text2Earth

Folders and files

Latest commit

History

Repository files navigation

Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model

Share us a ⭐ if you're interested in this repo

Latest Updates

Table of Contents

Git-10M dataset

Text2Earth model

Citation

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages