The Earth Foundation Model Paper Is Interesting. Wake Me Up When It Runs on My Data.

1 May 2026 7 min read

GIS analyst compares polished satellite data with messy real-world GIS imagery. — The gap between theoretical geospatial elegance and operational GIS reality. Image generated by ChatGPT.

NULL ISLAND DISPATCH — ISSUE #10 | MAY 1, 2026

HE SIGNAL

GIM International Just Published Its 2026 Profession Survey. It's Worth Reading.

Every year GIM International surveys geospatial professionals worldwide to take the pulse of an industry in motion. This year's results landed today, and they're more honest than most industry surveys manage to be.

The headline finding isn't that GIS is thriving — though the data supports that. It's the tension underneath: practitioners who have never felt more relevant, and never felt less certain about what their role looks like in 18 months. A mid-level GIS manager at a small city government in the southeast USA put it plainly: his team spends its time on data maintenance because "the analysis will mean less and less as the data degrades." Technical progress, organizational stagnation. That's not a technology problem.

The shift the survey keeps returning to is the move from data acquisition to decision support. From measurement to meaning. Respondents across every region and sector describe a version of the same transition: less time in the field, more time in the office, more complexity in what clients and managers expect from the output. A geodetic engineer in Romania noted that field time has dropped while office time has grown, and that storage and compute needs have exploded. The cloud will help, he said — but bandwidth and data safety will limit it.

That tension — between what the technology promises and what the organization can absorb — is the real story of the profession in 2026. Keep it in mind as you read the rest of this issue.

Full article: gim-international.com

THE MAIN EVENT

The Earth Foundation Model Paper Is Interesting. Wake Me Up When It Runs on My Data.

I'll be direct: I am tired of paradigm shifts that live exclusively in papers.

This week, a study published in Communications Earth & Environment by Zhu et al. (nature.com/articles/s43247-025-03127-x) is making the rounds. The proposal: Earth Foundation Models — AI models trained from the ground up on unlabeled satellite and climate data, task-agnostic by design, spatially aware as a first principle rather than as a fine-tuning afterthought. The argument is that this is categorically different from taking a general-purpose language or vision model and pointing it at a raster tile. The model learns spatial representation as its primary objective, not as a downstream application layered on top of something built to predict the next word.

That's a meaningful distinction. And the concept is directionally correct.

Here's where I get off the hype train.

I've run SAM2 on reclamation imagery in Pike County, Kentucky. I've built pipelines that ingest KyFromAbove orthophotos, deal with COG tile boundaries, handle the EPSG:3089 quirks, and manage the spectral variation between spring and fall acquisition windows. I know what it looks like when a model that performs beautifully on benchmark imagery hits a real dataset. The failures are almost never where the paper predicts them. They're in the preprocessing pipeline. They're in the coordinate system handling. They're in the way the model was never trained on anything that looks like a reclaimed mine bench or an Appalachian ridgeline.

Earth Foundation Models solve a real problem: general-purpose vision models have no spatial prior. They don't know that a pixel at 37°N means something different environmentally than a pixel at 27°N. They don't know that a 30cm resolution image of a forested hillside in October carries different spectral information than the same hillside in April. Training on earth observation data at scale, with those relationships baked in, is the right approach.

But the Zhu et al. paper doesn't tell me the inference latency on a 1TB orthoimagery tile set. It doesn't tell me what hardware I need to run it. It doesn't give me a reproducible benchmark on open satellite data I can verify. It doesn't close the loop to a checkpoint I can download and run tonight. The gap between "this architecture learns better spatial representations" and "this tool changes how I do my job next quarter" is not small, and it is not discussed in the paper.

Compare that gap to the AI news from the rest of this week. GPT-5.5 dropped on Monday. You can use it tonight. DeepSeek V4 preview is live via API right now. And the move that I think matters most for the GeoAI community: Meta shipped Muse Spark this month as a closed-weight proprietary model. No public weights. No download. Available only on meta.ai.

Meta spent three years building an open-source identity. Llama 4 Maverick's 10-million-token context window, released in March, is legitimately useful for local GIS workflows in organizations with data governance constraints. That open-source pipeline has been one of the most practical developments for government and enterprise GIS shops in the last two years. Muse Spark closing the weights is a signal worth paying attention to — not because it changes anything today, but because it tells you something about where the economics of frontier model development are heading. If Meta can't sustain open weights at the frontier, nobody can.

So here's the honest picture: the general AI frontier is accelerating on one timeline. Spatial AI infrastructure is being built on a different timeline, by different people, with different constraints. The Earth Foundation Models paper is one data point in a slow, important build. The GPT-5.5 release is a fast, immediately usable capability step. They're not in competition, but they're also not the same story — and treating the paper as a breakthrough you should reorganize your workflow around right now is the kind of mistake that wastes months.

What would change my assessment? A public checkpoint, reproducible benchmarks on open satellite data, and inference benchmarks on hardware a government GIS shop or research lab could realistically run. If those exist in six months, we have a different conversation. Until then, I'm watching the paper and not building around it.

HANDS ON

Getting Earth Foundation Models Actually Running: A Realistic Starting Point

The Zhu et al. paper is not the only game in town. There are three Earth Foundation Model checkpoints you can actually run today, and the experience of getting them working is its own kind of education.

THE THREE WORTH KNOWING

Clay (Clay Foundation, Apache 2.0) — trained on multi-spectral satellite data including Sentinel-2, Landsat, and NAIP. Weights are on HuggingFace. The most accessible entry point.

Prithvi (IBM/NASA, Apache 2.0) — trained on HLS (Harmonized Landsat Sentinel-2) data. Designed for remote sensing tasks including flood mapping, crop segmentation, and burn scar detection. Also on HuggingFace.

segment-geospatial (Dr. Qiusheng Wu, MIT license) — a Python package wrapping SAM and SAM2 for geospatial workflows. Not a foundation model in the Zhu et al. sense, but the most immediately practical tool in this space. Handles rasterio/GDAL integration in a way that spares you significant pain.

GETTING CLAY RUNNING ON A REAL TILE

Step 1 — Set up the environment. Clay requires Python 3.10+, torch, and the
clay-foundation package. Do not attempt this in arcgispro-py3. Create a
separate conda environment:

conda create -n clay python=3.10
conda activate clay
pip install torch torchvision
pip install clay-foundation

Step 2 — Prepare your tile. Clay expects multi-spectral raster input: Sentinel-2
L2A bands (B02, B03, B04, B08 minimum), reprojected to EPSG:4326, normalized to
surface reflectance. If you're working with KyFromAbove NAIP, you can use 4-band
imagery but performance on NIR-dependent tasks degrades. No synthetic band
generation required — Clay handles it, but knows it's missing information.

Step 3 — Run inference:

from clay import ClayModel
import rasterio
import numpy as np

with rasterio.open("your_tile.tif") as src:
tile = src.read().astype(np.float32) / 10000.0
transform = src.transform
crs = src.crs

model = ClayModel.from_pretrained("clay-foundation/clay-v1")
embeddings = model.encode(tile)

What comes back is a spatial embedding — a dense vector representation of your
tile encoding learned spatial and spectral relationships. Use it to train a
lightweight classification head, run similarity search across a tile library,
or initialize a segmentation pipeline.

WHERE IT BREAKS DOWN

Clay was trained on globally distributed imagery at 10–30m resolution. If your
data is 30cm KyFromAbove orthophotos, you're outside the training distribution.
The model still runs. The embeddings are still useful. But the performance
characteristics in the paper don't apply to your data, and you need to validate
against labeled samples you trust before drawing conclusions.

That last sentence is the whole tutorial condensed: these models are tools, not
oracles. Run them, validate them, find where they break on your data. The paper
tells you what they can do in ideal conditions. Your job is to find out what they
do in yours.

FROM THE COMMUNITY

KyFromAbove Funding Is Under Threat

KAMP — the Kentucky Association of Mapping Professionals (https://kamp.wildapricot.org/) — has a statement on their homepage calling for reinstatement of KyFromAbove program funding in the state budget. For those outside Kentucky: KyFromAbove is the statewide aerial imagery and LiDAR acquisition program that provides the foundational data layer for nearly every GIS workflow in the Commonwealth — flood mapping, mine permit review, infrastructure planning, environmental monitoring. KAMP's position is direct: removing funding makes Kentucky more vulnerable to natural disaster and entrenches economic inequality in regions that can't subsidize their own data acquisition. If you work with Kentucky data, this matters. kamp.wildapricot.org

GPN Deadlines Worth Knowing

ESIG Awards applications open, due June 1. Young Professional Scholarships open, due June 1. GIS-Pro 2026 is October 12–15 in Milwaukee. Vanguard Cabinet Salary and Promotion Survey still open through July 31.

→ Young Professionals survey: https://arcg.is/v4ezO0
→ Managers survey: https://arcg.is/1OTW1z1

CLOSING

The GIM International finding that's stayed with me today is a small one. A GIS manager at a small city government somewhere in the southeast said his team does maintenance instead of analysis because they don't have staff to do both — and if the data degrades, the analysis doesn't matter anyway. No foundation model fixes that. No frontier release changes it. The profession's real constraints are organizational, not technical, and they were true before LLMs and they'll be true after Earth Foundation Models become production tools.

That's not pessimism. It's a useful calibration.

The work that needs doing is still the work that needs doing.

Null Island Dispatch publishes weekly at null-island-dispatch.com.

Until next week.
— Chris Lyons

HE SIGNAL

GIM International Just Published Its 2026 Profession Survey. It's Worth Reading.

THE MAIN EVENT

The Earth Foundation Model Paper Is Interesting. Wake Me Up When It Runs on My Data.

HANDS ON

Getting Earth Foundation Models Actually Running: A Realistic Starting Point

THE THREE WORTH KNOWING

GETTING CLAY RUNNING ON A REAL TILE

WHERE IT BREAKS DOWN

FROM THE COMMUNITY

KyFromAbove Funding Is Under Threat

GPN Deadlines Worth Knowing

CLOSING

Sign up for Null Island Dispatch

Subscribe to Null Island Dispatch