ExaFEL addresses the need for an exascale data analysis workflow for LCLS at SLAC

June 10, 2022 – The Exascale Computation Project (ECP) ExaFEL effort aims to help researchers making molecular films using the Linac Coherent Light Source (LCLS). Their exascale data analysis workflow for serial femtosecond crystallography will aid in the observation of the dynamic motion of atoms.[1] The LCLS is located at the SLAC National Accelerator Laboratory and is operated by Stanford University for the US Department of Energy. The project builds on an earlier demonstration project and a collaboration between NERSC, ESnet and SLAC.[2]

The LCLS is the world’s first X-ray free hard electron laser, making it a superb instrument for observing the dynamics of atomic interactions in a molecular system. This is partly due to the resolving power (e.g. ability to resolve detail at the atomic level) of the instrument (X-rays have a much shorter wavelength than visible light) combined with the momentum ultrafast and with the luminosity (also called power) of the laser.[3]

Scientists use ultrafast pulses of powerful LCLS laser energy to illuminate a carefully prepared sample of a system of interest. The sample can be chosen to elucidate a chemical reaction, the functioning of photosynthesis, the formation of chemical bonds, the acceleration of reactions by catalysis, etc.[4] Data is captured by sensors during each laser pulse and processed by the LCLS workflow to efficiently create a stop-motion snapshot of the atoms and molecules in the system.[5] The concept is similar to that of a strobe light, which can be used to illuminate and create the visual appearance of a stop motion image of moving objects. SLAC provides a short video explaining the concept. Unlike capturing an image with a camera, the LCLS workflow must use computationally expensive X-ray diffraction algorithms to process each X-ray snapshot.

Creating a movie from these x-ray snapshots is a computational challenge because each x-ray pulse destroys the sample. This means that X-ray images cannot simply be viewed one after another like what is seen when a strobe light illuminates dancers moving on a dance floor.[6] Instead, scientists use sophisticated algorithms that examine large aggregates of X-ray snapshots, in which each snapshot presents a randomly oriented view of the sample, to organize and piece together a molecular movie that captures the dynamics of how atoms move over time.

The complexity of the algorithms, coupled with the large number of snapshots that need to be processed, makes the generation of molecular films a very data-intensive and computationally expensive task. The scientific benefits are undeniable as the resulting films provide an invaluable and unique source of experimental observation (some transformative examples are shown here). Scientists study these films to create and test or disprove hypotheses about the dynamics of atomic behavior in their system of interest. The ability to observe and form hypotheses that are verified or refuted by data is a foundation of the scientific method.

Need for exascale calculation

Accelerating the LCLS workflow is essential to help scientists by providing results while their experiment is running so that they collect the best data when using LCLS. Real-time results give experimenters the ability to make adjustments and gather better, more informative data. The result is better science and better use of the instrument.

The need for performance is vital for processing the data of the LCLS-II upgrade as the laser can be programmed to operate at 1 million pulses per second compared to the 120 pulses per second frequency of the LCLS laser current. [7] [8] The faster pulse will generate orders of magnitude more data that needs to be processed quickly. Exascale supercomputing hardware provides the network and computing capacity needed to handle the massive increase in data produced by LCLS-II sensors. Amedeo Perazzo, director of the ExaFEL PI and Control and Data Systems Division at SLAC National Accelerator Laboratory, notes, “Today and in the future, rapid turnaround time is needed for scientists to get the most out of best out of their time at LCLS and not flying blind. .”

Rethink the current workflow

Adapting current tools so that they can work on future exascale hardware requires innovative thinking and new approaches.

Perazzo notes that the ExaFEL team needs to consider new algorithms and computing frameworks to take advantage of GPUs and other high-performance capabilities in future US exascale supercomputers. These new approaches mean that the team must replace and/or augment existing algorithms and computing frameworks just for the CPU. The expanded capability offered by GPU-accelerated machines along with new AI technology allows the team to explore new approaches that can increase the resolution of computed results and ultimately improve the quality of viewed movies. by scientists.

Snapshot creation

GPUs are instrumental in generating diffraction patterns of multiple conformations of a protein sample to account for beam fluctuations, beamline stray scattering, and detector noise. These simulated images will be used to characterize the performance of the new algorithms under realistic conditions while the team waits for large datasets to be produced by future LCLS-II experiments.

Make molecular films

Chuck Yoon, leader of the advanced analysis methods group at SLAC National Accelerator Laboratory, observes: “We want to sample a set of experiments from the initial state to their final system state. This requires sophisticated and established algorithms to reconstruct the path. He notes that the production of films of molecular systems may require the processing of data collected over very short to very long periods of the order of a femtosecond (10−15 second or 1 quadrillionth of a second) to minutes due to orders of magnitude variation in reaction time scales. Many snapshots must be taken to capture a few fleeting moments when some of the most interesting conformational changes occur. Figure 1 illustrates the order-of-magnitude variation in time scale for a spectrum of important reactions studied with LCLS. In addition to improving performance, Yoon notes, “the team is looking to use AI and GPU technology to create and establish new, higher-resolution algorithms that can run in desired timeframes.”

To read the full version of Ron Farber’s technical highlight, visit this link.


Source: Rob Farber, Contributing Editor for ECP

Comments are closed.