// N5Factory can make N5Readers and N5Writers
var factory = new N5Factory();
// trying to open a reader for a container that does not yet exist will throw an error
// var n5Reader = factory.openReader("my-container.n5");
// creating a writer creates a container at the given location
// if it does not already exist
var n5Writer = factory.openWriter("my-container.n5");
// now we can make a reader
var n5Reader = factory.openReader("my-container.n5");
// test if the container exists
.exists(""); // true
n5Reader
// "" and "/" both refer to the root of the container
.exists("/"); // true n5Reader
This tutorial for Java developers covers the most basic functionality of the N5 API for storing large, chunked n-dimensional image data and structured metadata. The N5 API and documentation refer to n-dimensional images as “datasets”, terminology inherited from HDF5. We will use this terminology in this tutorial. If you are used to work with Python and Numpy, an n-dimensional image or dataset is what you know as an ndarray
. We will learn about:
- creating readers and writers
- modifying and inspecting the hierarchy (“folder structure”)
- saving and loading datasets
- saving and loading metadata
Readers and writers
N5Reader
s and N5Writer
s form the basis of the N5 API and allow you to read and write data, respectively. We generally recommend using an N5Factory
to create readers and writers:
The N5 API gives you access to a number of different storage formats: HDF5, Zarr, and N5’s own format. N5Factory
’s convenience methods try to infer the storage format from the extension of the path you provide:
In fact, it is possible to read with N5Writer
s since every N5Writer
is also an N5Reader
, so from now on we’ll just be using the n5Writer
.
We use the the N5 storage format for the rest of the tutorial, but it will work just as well over either an HDF5 file or Zarr container.
Groups
N5 containers form hierarchies of groups - think “nested folders on your file system.” It’s easy to create groups and test if they exist:
The list
method lists groups that are children of the given group:
and deepList
recursively lists every descendent of the given group:
Notice that these methods only give information about what groups are present and do not provide information about metadata or datasets.
Some storage / access systems (AWS-S3) separate permissions for reading and listing, meaning it may be possible to access data but not list.
Datasets
N5 stores datasets (n-dimensional arrays) in particular groups in the hierarchy.
Datasets must be terminal (leaf) nodes in the container hierarchy - i.e. a dataset can not contain another group or dataset. (Is this strictly true? May be confusing with names like multiscale “datasets”)
We recommend using code from n5-ij or n5-imglib2 to write datasets. The examples in this post will use the latter.
The N5Utils
class in n5-imglib2 has many useful methods, but in this post, we’ll cover simple methods for reading and writing. First, N5Utils.save
writes a dataset and required metadata to the container at a group that you specify. The group will be created if it does not already exist. The parameters will be discussed in more detail below.
You can write in parallel by providing an ExecutorService
to this variant of N5Utils.save
Reading the dataset from the container is also easy with N5Utils.open
:
This save method DOES NOT perform any checks prior to writing data and will overwrite data that exists in the specified location. Be sure to check and take appropriate action if it is possible that data could already be at a particular location and container to avoid data loss or corruption.
This example shows that data can be over written:
Parameter details
groupPath
is the location inside the container that will store the dataset. You can store an dataset at the root of a container by specifying ""
or "/"
as the groupPath
. In this case, the container will only be able to store one dataset (see the warning above).
blockSize
is a very important parameter. HDF5, N5, and Zarr all break up the datasets they store into equally sized blocks or “chunks”. The block size parameter specifies the size of these blocks.
For the example above, we stored an image of size 64 x 64
using blocks sized 32 x 32
. As a result, N5 uses four blocks to store the entire image:
Quiz: How many blocks would there be if the block size was 64 x 8
?
Click here to show the answer.
There would be eight blocks.
One block covers the first dimension, but it takes 8 blocks to cover the second dimension (\(8 \times 8 = 64\)). Also demonstrated by the code below:
N5 lets you store your image in a single file if you want - just provide a block size that is equal to or larger than the image size.
compression
Each block is compressed independently, using the specified compression. Use RawCompression
to store blocks without compression.
Notice that blocks were previously ~1700-2000 bytes and are now ~4100 without compression.
The available compression options at the time of this writing are:
Metadata
N5 can also store rich structured metadata in addition to array data. This tutorial will discuss basic, low-level metadata operations. Advanced operations and metadata standards may be described in a future tutorial.
Basics
N5Writer
s have a setAttribute
method for writing metadata to the storage backend. It takes three arguments:
<T> void setAttribute(String groupPath, String attributePath, T attribute)
groupPath
: the group in which to store this metadataattributePath
: the name of this attributeattribute
: the metadata attribute to be stored. Can be an arbitrary type (denotedT
).
There are differences between an attribute “name” and an attribute “path”, but attribute “paths” are an advanced topic and will be covered elsewhere.
Similarly, N5Reader
s have a getAttribute
method:
<T> T getAttribute(String groupPath, String attributePath, Class<T> clazz)
The last argument (Class<T>
) lets you specify the type that getAttribute
should return. An N5Exception
will be thrown if the requested type can not be created from the requested attribute. If an attribute does not exist, null
will be returned (see the last example of this section). Consider these examples:
Sometimes it is possible to interpret an attribute as multiple different types:
Rich metadata
It possible to save attributes of arbitrary types, enabling you to struture your metadata into classes that are easy to save and load directly. For example, if we define a metadata class FunWithMetadata
:
then make an instance and save it:
To retrieve all the metadata in a group as JSON:
Removing metadata
You can remove attributes by their name as well. To return the element that was removed, just provide the class for that element (this mirrors the remove method for List
s in Java.
Working with Dataset Metadata
Metadata used to describe datasets can be get
and set
the same as all other metadata. However there are special DatasetAttributes
methods to safely work with dataset metadata. N5Reader.getDatasetAttributes
and N5Writer.setDatasetAttributes
ensure the metadata is always a valid representation of dataset metadata. Setting DatasetAttributes
however should only be done when the dataset is initially saved. This ensure the required metadata is tightly coupled with the data. For example, set
ting dataset metadata should be done through the N5Writer.createDataset methods (or indirectly through the N5Utils.save
methods mentioned above)
The attributes that N5 uses to read datasets can be set with setAttribute
, and modifying them could corrupt your data. Do not manually set these attributes unless you absolutely know what you’re doing!
dimensions
blockSize
dataType
compression
The attributes that describe datasets are also accessible using getAttribute
, try running:
.getAttribute("data", "dimensions", long[].class); n5Writer
though using getDatasetAttributes().getDimensions()
are generally recommended.