In 2015, ecologists at Yale published a study in Nature that showed that there are upwards of three trillion trees in the world. That is an astounding number, especially considering that previous estimates had guessed 400 billion. However, the number is only as good as the method used to get there. In the Yale study, researchers combined satellite images with ground-based tree counts on every continent, covering some 400,000 hectares. By extrapolating the data from the known plots, they could estimate tree cover across the globe. However, with the addition of one key ingredient, I believe we can go a step further, and identify every canopy tree on earth.
Each tree carries its own unique signature, from the shape of leaves and color of bark to the form of the crown. A trained naturalist can identify a host of trees, but definitely would not have the time to attempt three trillion. Enter the neural network. It is pretty much what it sounds like, a computer program designed to learn like a brain. Neural networks learn like tabula rasa babies – by trying and failing many, many times. There is no direct teaching or programming necessary, just a huge dataset of known quantities for the computer to sample and resample. Imagine the network like a child with a huge plastic shape sorter. Given a million varied blocks, the neural network will attempt to match each block to the right hole. Each success will make a “neural” connection, each failure will erase one. Over enough time, it will be able to correctly sort blocks for the holes.
To extend the analogy, imagine images of trees are the blocks, and labels the holes. In the first stage, training, many images are presented to the neural network, each with a different correct species label. The computer receives the inputs, and makes connections to the labels, constantly checking and revising connections. If the inputs are plentiful enough to sum up the variation in the subject, the network will associate each different image with its correct label. In the second stage, inference, the network is given novel images and outputs their labels. Programmers have trained neural networks to do an outstanding variety of tasks, including recognizing images, translating speech, and predicting the next frames of videos.
To train our neural network, what we need is a huge training dataset. Fortunately, biologists have been doing just this for years. In many experimental forests and even cities like New York and LA, hundreds of thousands of trees have been identified and geotagged. These databases allowed the Yale team to count trees in the study I mentioned earlier. Using the geotags, aerial images can be created for each location that are ideal for training the network.
As good as neural networks are, they have rarely done better than humans at identifying images, and even the best biologists cannot identify most trees from aerial photos. However, networks do have the advantage that they can simultaneously incorporate several types of inputs. In addition to color satellite photos, NASA captures other ranges of wavelengths, including infrared. This would allow more information to discriminate tree species, since differences between species may only be visible in infrared.
Researchers have begun using neural networks to figure out tree species. A team at Caltech was able to teach a network to identify 18 tree species out of the 200 in Pasadena, CA. Eighteen may seem a meager number compared to all the species in the world, but keep in mind that the researchers in this study used a limited training dataset and relied on crowdsourcing to identify the trees in the first place. The results of this study form an incredible resource for tree care in Los Angeles. Working at a different scale, researchers in China used neural networks to identify species based on leaf images.
I am very far from a computer programmer with the know-how to carry out this project. And I recognize that our known datasets are biased toward temperate forests and would therefore limit tree identification in the tropics. Still, I think that the combination of neural networks with tree ID databases holds immense promise for understanding the diversity of trees on the planet. The oceans and the forests of earth store immense amounts of carbon, and knowing the state of both will become increasingly important as climate change wreaks havoc and forests continue to be lost. As flawed as this database might prove, I expect it would form a crucial starting point in having a terrific understanding of the world’s forests. I would view it as a draft human genome, not the final report, but a starting point for a whole new field of ecological research – is metaforestry taken?