SeedLearn

🌱 SeedLearn

Building AI tools for plant identification, grounded in ecology and botanical expertise

SeedLearn is an interdisciplinary research project developing AI approaches for plant identification using field images, botanical knowledge, and ecological data. Our current foundation is tropical tree seedlings, where identification is especially difficult and where we already have a strong expert-curated dataset.

Overview

SeedLearn is designed as a broader framework for plant identification in ecologically complex systems. We are starting with tropical tree seedlings because they are one of the most difficult life stages to identify in tropical forests, yet they are critical for understanding forest regeneration, biodiversity, and restoration outcomes.

Example seedling

Capparidastrum frondosum (Capparaceae)

Research Team

SeedLearn is led by Nohemi Huanca-Nunez and brings together an interdisciplinary team spanning ecology, artificial intelligence, and computer vision.

Nohemi Huanca-Nunez — project lead; tropical forest ecology and integration of ecological knowledge into AI systems
Liza Comita — tropical forest ecology and long-term forest datasets
Helene Muller-Landau — forest ecology and trait data integration
Fabian Michelangeli — tropical botany, systematics, and biodiversity research
Arman Cohan — computer vision and AI methods
Holly Rushmeier — Yale University; computer graphics and visual computing
Mitch Horn — AI and data science development and modeling pipeline
Kaili Liu — multimodal AI and knowledge integration
Luke Browne — ecological data processing and data integration

Why This Matters

Tropical forests can contain over 300 species per hectare, and many seedlings look nearly identical despite belonging to different species with distinct ecological roles.

This creates a major bottleneck for:

biodiversity monitoring
forest restoration
ecological research

Current AI approaches often depend on large, well-labeled image datasets, but that assumption breaks down in real ecosystems where many species are rare and visually similar. In tropical forests, image-only identification is especially difficult, while valuable botanical knowledge from field guides, species descriptions, and taxonomic expertise remains largely underused. SeedLearn aims to bridge this gap.

Current Scope and Long-Term Vision

SeedLearn begins with seedling identification because seedlings are among the hardest plant life stages to identify in the field, with many closely related species appearing remarkably similar at this stage.

Our current scope includes:

tropical tree seedling images
expert-validated species identifications
initial multimodal AI model development

Our long-term vision is to expand toward more general plant identification workflows by integrating additional life stages, richer trait information, and complementary data sources such as species descriptions, ecological context, and other plant measurements.

This phased approach helps keep the project scientifically grounded while building toward a more general system over time.

The Challenge

Aphelandra scabra
Acanthaceae

Miconia simplex
Melastomataceae

Cojoba rufescens
Fabaceae

Alseis blackiana
Rubiaceae

Examples of tropical tree seedlings included in the SeedLearn image dataset.

Dataset

The current project is built on a curated dataset of tropical seedling images collected through long-term ecological research.

thousands of images of individual seedlings
multiple images per individual
broad taxonomic coverage across species, genera, and families
expert-validated identifications

These data provide a strong foundation for developing and evaluating AI models in real-world ecological settings, and they serve as the first stage of a broader identification framework.

Current Progress

curated and organized a large seedling image dataset
developed initial AI modeling pipelines
ongoing model development and evaluation
defining how the framework can expand beyond seedlings as new datasets become available

Project Support

This project is supported by the 2025 Yale AI Seed Grant, which enabled the initial development of the SeedLearn pipeline, with seedlings as the first use case.

Contact

If you are interested in collaboration, datasets, or applications of this work, please feel free to reach out.

Nohemi Huanca-Nunez
nohemi.huanca@yale.edu

Nohemi Huanca Nunez