For five days last July, fourteen high school students attended a virtual coding camp sponsored by Argonne National Laboratory’s Educational Programs and Outreach and the Argonne Leadership Computing Facility (ALCF), co-taught by laboratory scientists and informal learning educators, and focused on learning techniques for probing and analyzing massive scientific datasets. And while the COVID-19 pandemic fundamentally changed how these students met, worked, and collaborated over the course of the week, they, and their instructors, found new and creative methods to explore the fascinating world of data science.
Argonne’s Big Data Camp is the most recent addition to a growing number of STEM camps for middle school and high school students, offered by the laboratory’s education and outreach department, and aimed at teaching computer science skills and computational thinking. The curriculums are developed and taught by Argonne Educational Programs staff and Argonne scientists. The Big Data Camp curriculum targets juniors and seniors who have programming experience.
“We teach students how researchers produce and process data to better understand a whole range of complex problems from the urgent, like those posed by the current pandemic, to the theoretical, such as the nature of dark matter,” said Michael E. Papka, Argonne senior scientist and camp instructor, who also helped establish the Big Data Camp three years ago. “We build on their existing programming knowledge and introduced them to some of the same techniques researchers use at the lab to interrogate data and gain new insights.”
Campers learned some of the history of data science, such as how physician John Snow first studied patterns in data to map the spread of cholera in mid-nineteenth century Britain, and identified the source of the outbreak. Today’s scientific experimental facilities generate datasets that are vastly larger, and the camp covered contemporary search and analysis techniques such as data visualization methods.
A toolkit for accessing, exploring, and sharing data
For one task, students accessed and explored massive datasets generated by the University of Chicago’s Array of Things (AoT) project, which employs a vast network of computer-embedded ‘intelligent’ sensors located throughout Chicago to monitor various activities, such as traffic hotspots, and environmental conditions, such as air quality. The AoT project, and its follow-on Software-Defined Sensor Network (SAGE) project, also funded by the National Science Foundation (NSF), will deploy sensor nodes that support machine learning frameworks in three environmental testbeds and one additional urban testbed in the U.S., and in four other countries, to provide even more data—all of which will be open and hosted in the cloud.
To bring order to the massive, amorphous datasets generated by the AoT project, the campers used the open-source web application Jupyter to create documents that contain live code, equations, narrative text, and visualizations—all the essential tools that allowed the campers to analyze the data, but also to share what they learned and communicate it.
Later in the week, the campers worked together in small groups to first design a problem and then apply their newly learned programming knowledge and analysis techniques.
“Creating their own project allowed them to see the potential challenges in solving big data problems,” said Janet Knowles, a member of ALCF’s visualization team who mentored one of the groups. “I was there to provide guidance, but the kids had to come up with an interesting project with a useful solution.”
Pivoting for the pandemic
By late March, shortly after schools were required to move instruction online because of the COVID-19 pandemic, the organizing team that included Papka, visualization specialists Joe Insley and Silvio Rizzo, operations specialist Ti Leggett, and Argonne Learning Center Lead John Domyancich, needed a strategy to adapt the camp’s hands-on, largely group-oriented lesson plan to a fully-remote experience. They saw several challenges, starting with how to provide students with resources capable of supporting the large data processing needs of the camp, and considered possible workarounds—but never once did they consider cancelling the course.
The team assembled a toolkit that everyone could use to access datasets and to collaborate in live active sessions. Firstly, the campers needed significant data storage and processing power—nearly all of these datasets involved were large: too large to e-mail, and certainly too large for campers to store or process locally. Second, the campers would be working on a variety of devices, from tablets to desktop computers. Lastly, the campers would need a persistent resource for the entire week. The team worked with the developers of Chameleon, a cloud infrastructure service that typically supports the NSF research community, to stand-up a virtual server that could be configured to meet all of the needs of the camp. “Chameleon gave us a powerful computing backend, which the students could access and fully utilize from any device,” said Papka.
Participants of the 2020 Big Data Camp seemed to take all the changes in stride. “I learned a lot about data visualization that I hope to apply to projects in the future,” said one camper at the conclusion of the course. “Thanks for making Big Data Camp possible and for turning it into such a wonderful online learning experience,” said another.
“The feedback we received from the campers is part of a conversation we’re having now about ways to adapt the camp’s curriculum and develop learning environments to make it accessible to a larger group of students,” said Domyancich.
Big Data Camp volunteer instructors and mentors for 2020 included ALCF staff members Michael E. Papka, Joe Insley, Silvio Rizzi, Janet Knowles, Ti Leggett, and Katherine Riley; Argonne Mathematics and Computer Science Director Valerie Taylor; Argonne nuclear engineering postdoctoral student Aaron Oaks; Northern Illinois University Assistant Professor David Koop; former Deputy Associate Laboratory Director for Computing, Environment and Life Sciences Robin Graham; and Argonne Educational Programs and Outreach staff members Kelly Sturner and John Domyancich.
The ALCF is a U.S. Department of Energy (DOE) Office of Science User Facility.
# # #
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.
The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.