Spotlight Series: Albert Y. Kim on the Importance of Knowing Your Data’s Context
Albert Y. Kim (he/him) is an Assistant Professor of Statistical & Data Sciences at Smith College in Northampton, Massachusetts (USA). He will spend his upcoming sabbatical working virtually at the ForestGEO SCBI site in Front Royal, Virginia (USA) as a visiting scholar. Albert greatly enjoys hockey, Nintendo games, and any outdoor activity, and he’s an ardent champion of knowing your data’s context.
When did you realize you wanted to be a scientist/work in forest ecology? How did you decide to go down this career path?
I realized that I wanted to work in forest ecology while I was faculty at Middlebury College in Middlebury, Vermont (USA). Being trained as a statistician who loves answering questions with data, I was eagerly looking for people to collaborate with. Professor David Allen from the Biology Department approached me with data from the Michigan Big Woods ForestGEO site, where he is a Principal Investigator. One thing led to another, and we recently published a paper on modeling interspecies relationships of trees.
Even though I had no formal work experience in ecology or the environment prior to working with Prof. Allen, I always had a personal interest in the topic. One of the books that had the most profound impact on my childhood was 50 Simple Things Kids Can Do to Save the Earth. Furthermore, I have always enjoyed pursuing outdoor activities in all four seasons.
What led you down the path to your current job?
After my PhD in statistics, I worked as a “Decision Support Engineering Analyst” at Google (the term “data scientist” wasn’t as prevalent then). While I hold fond memories of my team and my colleagues, I realized a few years in that the job wasn’t the right fit for me at the time. It was during this time I saw an ad for a one-year visiting professor position at Reed College, and the rest is history, so to speak. After stints at Reed, Middlebury, and Amherst Colleges, I am now at Smith College, a women’s college in western Massachusetts (USA).
When did you first get involved in the ForestGEO network?
After my work with the Michigan Big Woods data, I started investigating potential locations to spend my junior research sabbatical. I stumbled upon the Smithsonian Conservation Biology Institute’s (SCBI) ForestGEO GitHub data repository. GitHub is a website used by computer programmers and data scientists to facilitate collaboration on computer code and data (think Google Docs/Dropbox on steroids). I was absolutely blown away at the scale of the data collected, all while following strict data collection protocols. I remarked to myself: “Wow! It seems you can’t sneeze in this forest without the data being recorded somewhere!” I reached out to one of SCBI’s Principal Investigators, Dr. Kristina Anderson-Teixeira, about a potential collaboration. My timing was just right: I’ll be spending my Spring 2021 sabbatical working with SCBI.
What is the most interesting or unique aspect of your site?
Despite the COVID-19 lockdown, I obtained permission in July to safely accompany Cameron Dow as he collected field data outdoors for the annual 2020 mortality census at the SCBI site. Entering the site felt like entering Jurassic Park! From all the monitoring equipment, to the different types of scientific research being conducted, right down to various fencing to protect the integrity of the ecosystems, it was an eye-opening experience to see where all the data saved in the spreadsheets on GitHub came from.
Describing the 2020 annual SCBI mortality census.
What questions are you currently addressing in your research/site?
Given that my background is in statistics and data science, I provide the SCBI team of forest ecologists with expertise on statistical/mathematical modeling. For example, one project I’m currently working on relates to the growth phenology of trees, or, in other words, the effect that climate has on growth. As the earth’s climate warms, we are observing Northern Hemisphere growing seasons that start earlier and earlier in the calendar year. We’re interested in pinpointing the relationship between within-year variations in climate and different growth rates. That being said, we have to keep in mind that species differ in this relationship. For example, oaks respond differently during the year to warming climates than tulip poplars.
Another interesting project I plan on undertaking during my sabbatical relates to ecological forecasting. The question is: can we use various sources of data to create models that forecast and predict how much trees will grow in the future? Such models must be able to produce results quickly and make frequent near-term predictions. Using such a rapidly updating schedule, we can hopefully produce models that self-correct quickly to produce accurate long-term forecasts. In the end, we hope to combine different sources of data to calibrate our forecasts of growth, including data from tree core-based ring increments, data collected using measuring tape, and data from devices called dendrometer bands.
What kind of capacity building opportunities does your site provide for students, early-career researchers, and the local community?
I’m very impressed by the diverse research team and network of collaborators that Dr. Anderson-Teixeira has assembled. The team represents a broad cross-section of society and its members are at various stages of their careers, which I think is a real strength of the team. After all, a “diverse science is a rich science.” I’m particularly impressed with the interns I’ve interacted with this year: Nidhi Vinod, Cameron Dow, and Bianca Gonzalez. They all do great work and are positioned to do great things in the future. Furthermore, I’m impressed at how much community outreach SCBI does, for example, hosting Conservation Discovery Day for local students.
What is your favorite part about your work?
I love being able to view forest dynamics from both top-down and bottom-up perspectives. By top-down I mean the large-scale data collection that occurs across ForestGEO sites and the fitting of models that attempt to explain and predict forest dynamics. Conversely, by bottom-up I mean actually being in the woods, whether doing field work at research plots or going for a walk in the wooded areas near my home. Being able to use all five senses and appreciate first-hand where the data comes from prevents me from getting lost in too many layers of abstraction and keeps the data’s context front and center. As I like to often say to my students: “Numbers are numbers, but data has context. If you want to truly know your data, you need to know its context.”
What do you like to do when you’re not studying forest dynamics?
More than anything else, I enjoy pursuing outdoor activities, especially with my spouse: walking, hiking, backcountry camping, downhill & cross-country skiing, stand-up paddle boarding, etc. In this day of near constant screen time and connectivity, being able to disconnect, detach, and commune with nature helps me stay sane.
Paradoxically, I’ve also recently renewed my love for Nintendo video games, having purchased a Nintendo Switch. Current games I’m into include Animal Crossing, Mario Kart 8, and Zelda Breath of the Wild.
Lastly, being from Montreal, Quebec (Canada), I’m a big hockey fan. Nothing brings me joy like skating outside, whether on a flooded municipal tennis court or on a pond. As for who I follow, I’m a big fan of the Montreal Canadiens. My favorite all-time players include Kirk Muller, Andrei Markov, P.K. Subban, and Nick Suzuki.
Allen, David & Kim, Albert Y. (2020). A permutation test and spatial cross-validation approach to assess models of interspecific competition between trees. PLOSOne. https://doi.org/10.1371/journal.pone.0229930
Ismay, Chester & Kim, Albert Y. (2020). Statistical Inference via Data Science: A ModernDive into R and the Tidyverse. Chapman and Hall/CRC.