SeedLearn
🌱 SeedLearn
SeedLearn is an interdisciplinary research project developing AI approaches for plant identification using field images, botanical knowledge, and ecological data. Our current foundation is tropical tree seedlings, where identification is especially difficult and where we already have a strong expert-curated dataset.
Overview
SeedLearn is designed as a broader framework for plant identification in ecologically complex systems. We are starting with tropical tree seedlings because they are one of the most difficult life stages to identify in tropical forests, yet they are critical for understanding forest regeneration, biodiversity, and restoration outcomes.
Capparidastrum frondosum (Capparaceae)

Research Team
SeedLearn is led by Nohemi Huanca-Nunez and brings together an interdisciplinary team spanning ecology, artificial intelligence, and computer vision.
- Nohemi Huanca-Nunez — project lead; tropical forest ecology and integration of ecological knowledge into AI systems
- Liza Comita — tropical forest ecology and long-term forest datasets
- Helene Muller-Landau — forest ecology and trait data integration
- Fabian Michelangeli — tropical botany, systematics, and biodiversity research
- Arman Cohan — computer vision and AI methods
- Holly Rushmeier — Yale University; computer graphics and visual computing
- Mitch Horn — AI and data science development and modeling pipeline
- Kaili Liu — multimodal AI and knowledge integration
- Luke Browne — ecological data processing and data integration
Why This Matters
Tropical forests can contain over 300 species per hectare, and many seedlings look nearly identical despite belonging to different species with distinct ecological roles.
This creates a major bottleneck for:
- biodiversity monitoring
- forest restoration
- ecological research
Current AI approaches often depend on large, well-labeled image datasets, but that assumption breaks down in real ecosystems where many species are rare and visually similar. In tropical forests, image-only identification is especially difficult, while valuable botanical knowledge from field guides, species descriptions, and taxonomic expertise remains largely underused. SeedLearn aims to bridge this gap.
Current Scope and Long-Term Vision
SeedLearn begins with seedling identification because seedlings are among the hardest plant life stages to identify in the field, with many closely related species appearing remarkably similar at this stage.
Our current scope includes:
- tropical tree seedling images
- expert-validated species identifications
- initial multimodal AI model development
Our long-term vision is to expand toward more general plant identification workflows by integrating additional life stages, richer trait information, and complementary data sources such as species descriptions, ecological context, and other plant measurements.
This phased approach helps keep the project scientifically grounded while building toward a more general system over time.
The Challenge

Acanthaceae
Melastomataceae

Fabaceae

Rubiaceae
Examples of tropical tree seedlings included in the SeedLearn image dataset.
Dataset
The current project is built on a curated dataset of tropical seedling images collected through long-term ecological research.
- thousands of images of individual seedlings
- multiple images per individual
- broad taxonomic coverage across species, genera, and families
- expert-validated identifications
These data provide a strong foundation for developing and evaluating AI models in real-world ecological settings, and they serve as the first stage of a broader identification framework.
Current Progress
- curated and organized a large seedling image dataset
- developed initial AI modeling pipelines
- ongoing model development and evaluation
- defining how the framework can expand beyond seedlings as new datasets become available
Project Support
This project is supported by the 2025 Yale AI Seed Grant, which enabled the initial development of the SeedLearn pipeline, with seedlings as the first use case.
Contact
If you are interested in collaboration, datasets, or applications of this work, please feel free to reach out.
Nohemi Huanca-Nunez
nohemi.huanca@yale.edu
