At the From Images to Knowledge (I2K) 2024 conference, Stephan Saalfeld and Tobias Pietzsch ran a workshop Lazy parallel processing and visualization of large data with ImgLib2, BigDataViewer, the N5-API, and Spark:
Modern microscopy and other scientific data acquisition methods generate large high-dimensional datasets exceeding the size of computer main memory and often local storage space. In this workshop, you will learn to create lazy processing workflows with ImgLib2, using the N5 API for storing and loading large n-dimensional datasets, and how to use Spark to parallelize such workflows on compute clusters. You will use BigDataViewer to visualize and test processing results. Participants will perform practical exercises, and learn skills applicable to a high performance / cluster computing.
All exercises from the workshop can be found in the i2k2024-lazy-workshop repository on GitHub in the form of notebooks, including:
Environment setup - Creating an environment to run the Jupyter notebook server, a fast Java kernel, and a few other dependencies.
Lazy ImgLib2 basics - An introduction to the various ways in which ImgLib2 is lazy.
Image I/O with N5 - How to work with the N5 API and ImgLib2.
Lazy image processing with cell images - How to use the ImgLib2 cache library to implement lazy processing workflows at the level of cells (blocks, chunks, boxes, hyperrectangles, Intervals).
ImgLib2 blocks to optimize performance - Using the ImgLib2 “blocks” API to perform computations on blocks of (image) data more efficiently than going pixel-by-pixel using
RandomAccess
,Type
, etc.Distributed computation using Spark - How to use what we learned about lazy evaluation with ImgLib2 on a Spark cluster.