Parallel IO NHR Workshop

Name: Parallel IO NHR Workshop
Start: 2024-05-07T11:00:00+02:00
End: 2024-05-08T17:00:00+02:00
Location: Bundesstraße 45a

7 May 2024, 11:00 → 8 May 2024, 17:00 Europe/Berlin

R034 (Bundesstraße 45a)

R034

Bundesstraße 45a

Anja Gerbes (TU Dresden), Anna Fuchs (Universität Hamburg), Jannek Squar, Panos Adamidis

Description

Climate science is of great societal relevance. Resolving small-scale physical processes helps reduce uncertainties introduced by parameterisations, thus improving climate change projections. The objective is to compute a coupled atmosphere-ocean setup at a global resolution of 1km with a performance of 1 simulated year per day (SYPD). Such simulations require computational power available only on exascale supercomputer systems.

The output data is in the order of petabytes, and achieving the desired performance requires efficient I/O. The goal of this workshop is to highlight current and future methods for parallel I/O on large parallel file systems, spanning from the application level to the system level. Possible topics of interest include:

Lossless compression and chunking
Selection of appropriate data formats for I/O: HDF5, NetCDF, zarr, etc.
Optimization of I/O for climate models: application, middleware, file system
Post-processing of large data sets (reading large amounts of data)
Monitoring: effects of applications on the file system versus effects of the file system on the application
Comparison between file systems and object stores
Key metric: time-to-solution

Contact

anja.gerbes@tu-dresden.de

anna.fuchs@uni-hamburg.de

adamidis@dkrz.de

jannek.squar@uni-hamburg.de

Tuesday, 7 May
- Tue, 7 May
- Wed, 8 May
- 11:00 → 12:00
  
  Reception coffee 1h
- 12:00 → 12:40
  
  I/O in Climate Modeling 40m
  
  Opening talk
  
  Speaker: Dr Panos Adamidis
- 12:40 → 13:20
  
  Lustre at DKRZ: Stripping Strategies & Best Practices 40m
  
  TBD
  
  Speaker: Carsten Beyer (DKRZ)
- 13:20 → 14:00
  
  Field notes from explorations on a big file system 40m
  
  In this talk I will present observations on various aspects of the use of DKRZ's 120 TB lustre file system, regarding throughput and usage patterns. They range from back-of-the-envelope calculations via aggregate statistics to the usage of an individual dataset, and may provide valuable insights into the users' needs and things to consider in the development of I/O solutions and the procurement of future storage systems.
  
  Speaker: Dr Florian Ziemen (DKRZ)
- 14:00 → 14:15
  
  Coffee Break 15m
- 14:15 → 14:55
  
  IO Benchmarking in HPC Systems 40m
  
  Description: TBD
  
  Speakers: Dr Jannek Squar, Anna Fuchs (Universität Hamburg)
- 14:55 → 15:35
  
  I/O performance in CLAIX23 infrastructure 40m
  
  In this topic, I would like to compare the I/O performance changes from the system that we benchmark using IO500. Additionally, we want to see the performance of ICON in the grand scheme of things within CLAIX since currently we are working with DKRZ in the Green HPC project
  
  Speaker: Radita Liem (RWTH Aachen)
- 15:35 → 16:10
  
  Coffee Break And Discussion 35m
  
  Please grab a cup of coffee and some finger food and head to the conference room, where you can engage in discussions about the previous talks.
- 16:10 → 16:50
  
  Parallel HDF5 25 years after 40m
  
  25 years have passed since the first release of parallel HDF5. The software is still under active development to address constantly evolving HPC requirements. In our talk we will give an overview of the current state of parallel HDF5 library and its new compression and sub-filing capabilities. We will also talk about HDF5 tuning knobs for the HPC applications that were developed over the years.
  
  Speaker: Elena Pourmal
- 16:50 → 17:30
  
  hiopy - Optimizing Model Output for Analysis 40m
  
  As climate models reach the kilometer scale, horizontal model grids outgrow the size of computer screens as well as the capacity of the human eye.
  In consequence, a model output analyst can't observe the full output at once, but will always use subsets or coarser versions of the data.
  The time to an analysis result can be reduced dramatically, if output datasets are optimized for the changed read workload. Hiopy is a new way of writing
  ICON model output, which utilizes YAC and the Zarr format to create such optimized datasets directly from the running model for immediate consumption.
  
  Speaker: Dr Tobias Kölling
- 17:30 → 18:00
  
  Panel Discussion 30m
- 19:00 → 22:00
  Social Event: TBD
  - 19:00
    
    Excursion 3h
    
    Description: TBD
Wednesday, 8 May
- Tue, 7 May
- Wed, 8 May
- 09:00 → 09:30
  
  Receiption Coffee 30m
  
  TODO
- 09:30 → 10:10
  
  Enabling purposeful use of large-volume Earth System Modelling datasets: ideas and concepts explored at DKRZ 40m
  
  Current state-of-the-art and upcoming Earth System Model (ESM) simulations produce output on the order of single- to double digit petabytes per individual climatic timescale-spanning simulation. Creating an infrastructure environment enabling the purposeful analysis of such data amounts requires revamping data handling paradigms for ESM datasets. We present concepts, ideas and prototypes developed along the requirements of the ESM-community to enable efficient ESM output access and analysis across the storage hardware hierarchy at DKRZ
  
  Speaker: Dr Karsten Peters-von Gehleт
- 10:10 → 10:50
  
  Databases for HPC and Parallel IO 40m
  
  Early explorations into using an RDBMS as a data store for parallel IO workloads led to a conclusion that the technology was ill fitted for the task. The community has accepted this
  “wisdom” and been reluctant to support any new efforts into investigating databases. I think it is time to revisit.
  
  Speaker: Dr Jay Lofstead
- 10:50 → 11:30
  
  Discussion 40m
- 11:30 → 12:30
  
  Lunch 1h
- 12:30 → 13:10
  
  Leveraging Flexible Storage System Components for HPC Research 40m
  
  Abstract: Research has become increasingly data-driven, putting additional pressure on the underlying storage systems. Gaining insights into the their behavior is critical understanding and optimizing I/O performance. However, existing storage systems often lack the necessary functionality and are difficult to modify and extend. Therefore, the Parallel Computing and I/O research group is developing several storage system components within the JULEA and Haura projects, making it possible to cover the entire storage stack from application I/O interfaces to block device access. This allows rapidly prototyping new approaches and optimizations.
  
  Speaker: Prof. Michael Kuhn (OVGU)
- 13:10 → 13:50
  
  Using the DAOS Storage APIs with Weather and Climate Applications 40m
  
  The Distributed Asynchronous Object Storage (DAOS) is an open source scale-out storage system that is designed from the ground up to support Storage Class Memory (SCM) and NVMe storage in user space (https://docs.daos.io/). This presentation provides an overview of the DAOS architecture, and describes the various APIs that are available to the user to benefit from the performance advantages that DAOS offers in comparison to traditional parallel filesystems like GPFS or Lustre. We will also solicit feedback from the community to guide future development efforts.
  
  Speaker: Michael Hennecke
- 13:50 → 14:30
  
  Discussion 40m
- 14:30 → 14:50
  
  Coffee Break 20m
- 14:50 → 16:00
  
  Finalisation: Working Groups and Discussion
  
  Description: TBD
- 16:00 → 16:30
  
  Summary 30m
  
  TBD