Geospatial · Remote Sensing · 2026

Denver Urban Tree Classification: Mapping a Neighborhood's Canopy from Free Imagery

Denver's foresters manage tens of thousands of street and park trees with a small crew, and the first thing they need is a map of what is growing where. I wanted to see how much of that map I could build from free public imagery, without sending anyone into the field to identify trees one at a time. The answer came in two halves. Telling one species from another this way works for some species but not reliably enough to replace a survey, because too many species reflect light in almost the same way. But sorting trees into the few groups a forester treats differently works well enough to be useful. This project is that second map: every tree crown in Denver's Hale neighborhood, sorted into five management classes from aerial imagery, a satellite time series, and airborne laser scanning.

The project grew out of a short course on applying machine learning to environmental satellite data, run by the American Meteorological Society with NASA and CIRA. Two of its sessions set the direction, one on machine learning for land-cover classification and one tracing the field from random forests to foundation models, and that span became the shape of the work below, from a gradient-boosted classifier to the foundation models tested at the end.

Tom Shanks

Aerial view of the Hale neighborhood with every tree crown colored by its predicted management class. — Every mapped tree crown in the Hale neighborhood, colored by predicted management class and drawn over the aerial image it was classified from. More than 18,000 crowns are shown. The red ribbons along certain streets are elm plantings, a real pattern the model recovers.

Overall accuracy: 69.0%
Spatially validated: ~66%
Over pixel baseline: +30.7 pts
Crowns mapped: 18,272

Why species is the wrong target

The first instinct is to classify each tree by species, so I tried it. With a simple pixel-by-pixel approach across ten common species the accuracy sat near 17 percent, barely above guessing. NAIP, the US Department of Agriculture's aerial photography, carries only four color bands (red, green, blue, and near-infrared), and at the level of single pixels many species reflect those four bands almost identically. The fuller crown model built below recovers a good deal more of the species signal, as I come back to later, but for what a city actually does the species label is still not the target worth chasing.

I changed the question to match what the data can support and what a forester actually needs. Crews are not dispatched by Latin name. They act on a few practical categories: ash at risk from the Emerald Ash Borer, elm at risk from Dutch Elm Disease, water-demanding maples, drought-tolerant conifers, and everything else. Collapsing ten species into those five management classes turns a problem the imagery cannot solve into one it can.

The classes also separate over the seasons, which a single photograph cannot capture but a satellite that revisits the same spot can. I built a five-date series from Sentinel-2, the European Space Agency's free satellite, across the 2025 growing season and measured greenness with NDVI (a standard index where higher values mean more living leaf). Each class moves through the year a little differently. Conifers hold their color into November while the deciduous classes leaf out and fade, and that motion gives the model a signal a midsummer snapshot would miss.

Line chart of average Sentinel-2 greenness for each management class across five 2025 dates. — Average Sentinel-2 greenness for each class through the 2025 season. The conifer line stays high into the autumn while the deciduous classes rise and fall, and that seasonal shape helps tell them apart.

Building the map

To classify a tree I first have to find it. Instead of labeling individual pixels I delineated whole tree crowns, which keeps each tree as one object and follows how Cross (2019) approached tree mapping in Costa Rican forest. The crowns come from a canopy height model, a raster of how tall the vegetation stands, built from DRCOG's 2020 airborne LiDAR (laser scanning that measures height directly). A watershed segmentation, masked to vegetation, split the canopy into 18,272 individual crowns.

For each crown I measured 109 features: color and near-infrared statistics from NAIP, the five-date Sentinel-2 series and its seasonal changes, several vegetation indices including the red-edge and water indices Cross found most useful for separating species, and height and shape from the LiDAR. I attached a known species to a crown wherever it fell within three meters of a tree in the city's arborist inventory, which gave 3,764 labeled crowns to learn from. I compared four classifiers by cross-validation, and a gradient-boosted model came out ahead.

Pipeline diagram from imagery and LiDAR to crowns to features to classifier to map. — The pipeline, end to end: align the rasters, build the LiDAR height model, segment the canopy into crowns, measure features per crown, then classify. It runs from a single script with free data.

A look without labels

Before trusting a trained model it helps to see whether the trees group on their own. Plotting every crown by greenness and height and letting a clustering algorithm sort them, with no labels at all, shows real structure in the data. It also shows why the job is hard. When I color the same plot by the true management classes, maple sits almost on top of the catch-all deciduous group, which is exactly where the trained model later makes most of its mistakes.

Two scatter plots of crowns by greenness and height: clusters found without labels, and the true classes. — Each crown placed by greenness and height. On the left, groups a clustering algorithm finds with no labels. On the right, the true classes, where maple and other deciduous overlap heavily.

Results

The model reaches 69.0 percent overall accuracy across the five classes, up from 38.3 percent for an earlier version that worked pixel by pixel, a gain of about 31 points. Balanced accuracy, which averages the per-class scores so the large groups do not dominate, is 62.8 percent. The classes a forester cares most about score highest. Conifers and ash are recovered well, while maple is the weakest because it shares so much spectral and structural ground with the other deciduous trees.

Management class	Precision	Recall	F1	Crowns
Ash (Emerald Ash Borer risk)	0.71	0.63	0.66	204
Conifer (drought tolerant)	0.84	0.71	0.77	120
Elm (Dutch Elm Disease risk)	0.77	0.57	0.65	118
Maple (water demanding)	0.69	0.39	0.50	180
Other deciduous	0.65	0.85	0.74	508

Confusion matrix for the five management classes. — Where the model confuses classes. Most errors are maple and the risk classes being read as the large other-deciduous group.

Bar chart of feature importance by source. — Which inputs the model relies on. Color and season dominate; the LiDAR height adds about a tenth.

The model leans most on color and season rather than on structure. The Sentinel-2 time series and the NAIP bands together carry close to nine tenths of the decision, and the LiDAR height about a tenth. So the result is mostly a question of reflectance, of how each canopy reflects light across the bands and across the year. That is the same premise Cross relied on with commercial satellite imagery in Costa Rica, carried here to free public data and to an urban canopy.

Bar chart comparing the pixel baseline and the crown model accuracy. — The earlier pixel-based version reached 38.3 percent on the same five classes. Working with whole crowns and the fuller feature set raised that to 69.0 percent.

How honest is that number

A single accuracy number from a random split tends to flatter a map like this. Neighbouring crowns are photographed under the same light and often belong to the same planting, so if some land in training and their neighbours in the test set, the model can look better than it really is. To get an honest figure I held out whole spatial blocks at a time, 200 and 400 meters across, so entire stretches of the neighborhood stayed out of training together.

Under that stricter test the accuracy settles around 66 percent, roughly two points below the random-split figure and steady across block sizes. The small gap tells me the map generalizes across the neighborhood rather than memorizing it. The one class that slips is maple, for the same reason it was weakest to begin with. This follows a habit from NASA and NOAA's remote-sensing training: give the model an honest grade, and do not assume a model trained on one place transfers to another.

Evaluation	Overall accuracy	Balanced accuracy
Random split, same model	0.685	0.623
Spatial blocks, 200 m	0.664	0.597
Spatial blocks, 400 m	0.664	0.597

Bar chart comparing random and spatial-block cross-validation accuracy. — Random versus spatial-block validation on the same model. Holding out whole blocks lowers the score by about two points.

Crowns colored by spatial cross-validation fold. — Crowns grouped into spatial folds, so whole blocks of the neighborhood are tested together rather than split apart.

How far species can go

Having built the model, I went back to the harder question and asked it to name the species directly. Across the twenty-one species with enough labeled trees to learn from, it was right about fifty-four percent of the time. That is a long way past the seventeen percent a pixel-based approach managed, and far above the roughly five percent of pure guessing. The distinctive species come back well, blue spruce, American elm, honeylocust and ash among them, while the look-alikes blur, the lindens with each other and the oaks and ornamental maples into the broadleaf crowd.

So species is not hopeless from free imagery once you work crown by crown rather than pixel by pixel. It is simply not reliable enough to stand in for a field survey, and a city does not need it to be. The five management classes stay the headline because they are what a crew acts on, but the species result shows how much signal the crown approach pulls out of the same free data.

From map to action

The accuracy figure matters less than the work the map saves. The Emerald Ash Borer kills untreated ash and is moving through the Front Range, and a crew can only inspect so many trees in a season. The most useful thing the model produces is a short, ranked list of where the ash most likely are, so the first trucks go to the right blocks. About sixteen hundred crowns are flagged as likely ash for a crew to confirm and treat. It points the field work rather than replacing it. In a later test, adding an October image, when ash turns color on its own schedule, improved how reliably the model finds ash by about seven points, the single most useful addition for this borer-triage job.

Aerial map of the neighborhood with predicted ash crowns highlighted in orange. — Predicted ash crowns across the neighborhood, the trees most exposed to the Emerald Ash Borer. A screening layer to prioritize inspection, drawn over the aerial image.

Watching the canopy change over time

The crown map is a single snapshot. To watch the canopy change I needed a second date and a sensor that revisits often, so I turned to the Harmonized Landsat-Sentinel product, or HLS, a free NASA dataset that puts the Landsat and Sentinel-2 satellites on one 30 meter grid and delivers calibrated surface reflectance every few days. Surface reflectance is the analysis-ready standard the training I took emphasized, and it is the one thing the crown model above does not use, since that model still learns from raw NAIP brightness values while HLS measures light corrected to the ground.

I built a cloud-free summer composite of Hale for 2016 and for 2025, measured greenness in each, and compared them. The neighborhood canopy turned out to be broadly stable. Average greenness barely moved, and the patches that grew greener and browner roughly cancel, with about two percent of the area losing vegetation and two and a half percent gaining it. I am careful not to over-read the rest, because two summer snapshots a few years apart differ a little just from sun angle and the exact day they were captured, so the stable picture is the trustworthy part rather than the sign of every small change.

The trade is resolution. At 30 meters a single HLS pixel covers several trees plus the street and yards around them, so HLS cannot pick out one crown the way the 1 meter model does. The two work together. The crown model says what each tree is, and HLS says how the whole neighborhood's greenness is holding up over time.

Four panels: NDVI greenness for Hale in 2016 and 2025, their difference, and a canopy change map. — Harmonized Landsat-Sentinel surface reflectance over Hale, 2016 and 2025: greenness each year, the change between them, and the resulting canopy classes. The neighborhood reads as broadly stable, with small areas of loss and gain that roughly balance.

Where the field is going: a foundation model

The newest approach in this field does not hand-build features at all. Geospatial foundation models are large neural networks pretrained on enormous amounts of satellite imagery, much as a language model is pretrained on text, and the same pretrained model can then be reused for many tasks. To see where my project sits next to that frontier, I ran Prithvi, a foundation model from NASA and IBM trained on Harmonized Landsat-Sentinel imagery, over central Denver. I handed it a six-band HLS image and, with no labels, let it turn the scene into its own learned features, then grouped those into land-cover types.

What comes back is honest about the promise and the limit at once. The model organizes the scene on its own, but Prithvi reads the image in roughly 480 meter patches, so it sees central Denver as a largely uniform residential canopy with only coarse distinctions. That is the same lesson from a new direction. The variation that matters for urban forestry lives at the scale of a single tree, which is exactly where my one meter crown model works, and where a foundation model would need fine-tuning on a graphics card to follow. I include it as a marker of the road ahead, and because the project now spans the full range the field uses, from a random forest to a foundation model.

Left, an HLS true-color image of central Denver with Hale outlined. Right, Prithvi's learned features grouped into land-cover clusters without labels. — Prithvi-EO, a NASA and IBM geospatial foundation model, run over a 6.7 km HLS image of central Denver with Hale outlined. Left, the true-color image. Right, the model's own learned features grouped without any labels. At its 480 meter patch scale the neighborhood reads as a largely uniform residential canopy, a reminder that single-tree detail still needs the 1 meter crown model.

Learned models at the crown scale

The newest approach lets a model learn its own features instead of using hand-built ones, so I tried that at the scale this project works at, one tree at a time, on a graphics card here at my desk. I started with the simplest version, a general vision foundation model called DINOv2, and gave it a small aerial chip of each tree. With only the red, green and blue of a single photo to work from it reached about a third of the crowns, far behind the engineered model.

Then I built a model that sees everything the engineered features see: a small network that takes the near-infrared and the laser-measured height alongside the color image and reads the six-date Sentinel-2 series for how each tree changes through the year. That closed almost all of the gap. It matched the engineered model on balanced accuracy, the figure that weights every class equally, and it was the better of the two on the hardest disease classes, lifting elm recall from 0.57 to 0.82 and maple from 0.39 to 0.55. It came in lower on overall accuracy, because it gives up some of the large catch-all class to do better on the rare ones.

The pattern across the three models is the honest lesson of the whole project. A model is only as good as the data it is shown. The general model with color alone trails badly, the same idea fed the full multispectral, seasonal and height data pulls level with years of hand-built features, and the gradient boosting still edges ahead on the headline number because, with only a few thousand labeled trees, hand-built features are hard to beat outright. That last point is what makes pretraining the real next step.

Crown-scale model	Overall accuracy	Balanced accuracy
Engineered features + gradient boosting	0.690	0.628
Multimodal deep net (full data, local GPU)	0.569	0.632
DINOv2 foundation model (RGB only)	0.329	0.360

Grouped bar chart of per-class recall for the engineered model, the multimodal deep net, and DINOv2, all run locally on an RTX 3060. — Recall by management class for the three crown-scale models, all run locally on an RTX 3060. The multimodal net (blue), which sees the full multispectral, seasonal and height data, beats the engineered model (green) on the hard elm and maple classes and ties it on balanced accuracy, while the RGB-only foundation model (gray) trails throughout.

Where this leaves it

The project now runs the full span the field uses, on a single neighborhood and on free data: an object-based crown model that maps management classes and reaches into species, an honest spatial check on what that accuracy really means, a surface-reflectance look at how the canopy changes over time, and a stack of learned models, from two foundation models to a purpose-built deep net, run over the same ground. The real frontier from here, and the step the deep net points straight at, is a foundation model pretrained on exactly the kind of multispectral and seasonal data this problem depends on, at fine resolution, which is a research undertaking rather than a missing afternoon's work.

Data and tools

Every input is free and openly licensed, and the whole pipeline runs from a single script, so the result is reproducible and the dataset is easy to share. The methodology is adapted from Cross (2019), who classified tree species in Costa Rican forest from WorldView-3 satellite imagery. The contribution here is carrying that object-based, reflectance-driven approach to free imagery and to the practical problem of managing an urban canopy.

Dataset	Source	License
NAIP 2023 aerial imagery (4 band, 30 cm)	USDA NRCS, via AWS	Public domain
Denver tree inventory (arborist survey)	Denver Open Data Portal	CC BY
Canopy height model	DRCOG 2020 3DEP airborne LiDAR	Public domain
Sentinel-2 surface reflectance, 5 dates	ESA Copernicus	Free, open
HLS surface reflectance (L30/S30, 30 m)	NASA LP DAAC, via Earthdata	Free, open

Built with Python, Rasterio, GeoPandas, scikit-image, scikit-learn and XGBoost, with QGIS for inspection. Study area: the Hale neighborhood, about 4.6 square kilometers.

Source available on request.

← Projects