Collecting, analyzing and processing 'big data'
The world of environmental modeling and monitoring provides complex tools and systems built to handle staggering amounts of digitized information and technologies that empower people to make sense of even the most complex systems.
The world of environmental modeling and monitoring can sometimes sound like a sci-fi movie, from laser beams shooting from the sky to space stations affixed with devices called “GEDI” (pronounced “Jedi”). There are also complex tools and systems built to handle staggering amounts of digitized information and technologies that empower people to make sense of even the most complex systems.
Welcome to the world of forest management and modeling expert Andrew Finley, an associate professor in the Michigan State University (MSU) Department of Forestry, whose work may sometimes sound more like science fiction than real life.
Finley builds open-source tools and theoretical frameworks that help interpret “big data” — sets of information so massive that they are nearly impossible to manage, process and store. His work provides researchers and practitioners the tools they need to answer questions related to big data, such as how much biomass is in a forest, what pollutants are in the air we breathe, and how does climate change affect our lives?
Big data is collected in various ways. For example, data from social media interactions and tracking information from GPS-enabled devices is collected to analyze human-computer interactions.
Complex data collection systems include LiDAR (Light Detection Imaging and Ranging), which works like a radar but uses light from a laser. (NASA GEDI [Global Ecosystem Dynamics Investigation] and NASA G-LiHT, Goddard’s LiDAR, Hyperspectral and Thermal Imager are examples of systems that use LiDAR.) These laser systems are mounted onto airplanes, satellites and space stations to capture location-specific data. When data is collected, there’s tons of it.
“We’re awash in data,” Finley said. “The challenge is how do we combine these data sources in a statistically valid way to identify true patterns and not just spurious ones. The challenge is increasing because the data volumes are increasing at such a rapid rate.”
His work furthers both theory and applied science to make big data usable. For example, Finley was part of a team that won the 2017 Outstanding Statistical Application Award from the American Statistical Association. In the article, the international team develops and applies a model to process a massive space-time dataset from air pollution monitoring stations. Their work helped show the extent and movement of airborne environmental pollutants used for assessing impact on human health. In addition to the application, the model is a new tool that can be used in a wide range of big data sciences.
Finley’s research is about making accessible tools for data processing. He creates statistical models and applications that work faster and more efficiently, and convert datasets into maps, graphs, and other visual and interactive media. Practitioners on the ground working with spatial and temporal data have used his tools in a range of topics, such as determining air quality, forest health or the reason for fluctuation in housing prices. As these examples suggest, these tools can be implemented broadly to answer numerous questions.
“It’s about how we bring information together to make a final, mapped product that is much more than a pretty picture,” he said. “The focus is on making a statistically valid product that articulates uncertainty, which can be fed into other decision support systems.”
Another example of his applied and theoretical work is an ongoing collaboration mapping forests in interior Alaska. Finley is working with NASA and the U.S. Forest Service to determine forest characteristics, soil conditions, biomass and carbon density. Utilizing data from NASA G-LiHT and an on-the-ground tree inventory from the U.S. Forest Service, Finley and collaborators are documenting the unknown.
They are capturing data from interior Alaskan forests to determine how much carbon is stored within them. The health and breadth of Alaskan forests have a direct impact on carbon in our atmosphere and climate change. When complete, this project will provide the first inventory of 110 million acres of forestland in interior Alaska — an area a bit smaller than twice
the size of Michigan.
The resulting products are 3-D, seamless, interactive maps of Alaskan forests. They’re created through LiDAR (3-D mapping of foliage distribution and canopy structure), spectroscopy (species composition, age and health) and thermal data (surface temperatures and heat/ moisture stress, which helps estimate the future health of forests). Each second, G-LiHT fires 150,000 laser beams and captures 75 frames from each camera. The amount of data collected is stunning.
“More data doesn’t always mean more information. The other part of my job is to take this massive amount of data and extract what can be useful,” Finley said.
He is the conductor who makes thousands of terabytes of discordant data into usable, reliable products that can be used by scientists, researchers, forest managers and legislators to make policies. Most importantly, Finley’s work embraces MSU land-grant roots and enables anyone to access tools and models to improve worldwide knowledge. He uses open-source programming languages (C, R and Fortran) as the structure for his work. He also teaches an online course, FOR/STT 875 R Programming for Data Sciences. He is instructing the next generation of programmers to process, manage and utilize big data.
“Very little is new, we’re almost always building on someone else’s hard work — truly standing on the shoulders of giants,” he said. “I see my work and that of my students as following our land-grant charge. We’re developing new tools, disseminating them through free software and training people to use both. Then the tools are applied to tackle pressing environmental challenges.”