Parallel IO NHR Workshop

Europe/Berlin
R034 (Bundesstraße 45a)

R034

Bundesstraße 45a

Anja Gerbes (TU Dresden), Anna Fuchs (Universität Hamburg), Jannek Squar, Panos Adamidis
Description

Climate science is of great societal relevance. Resolving small-scale physical processes helps reduce uncertainties introduced by parameterisations, thus improving climate change projections. The objective is to compute a coupled atmosphere-ocean setup at a global resolution of 1km with a performance of 1 simulated year per day (SYPD). Such simulations require computational power available only on exascale supercomputer systems.

The output data is in the order of petabytes, and achieving the desired performance requires efficient I/O. The goal of this workshop is to highlight current and future methods for parallel I/O on large parallel file systems, spanning from the application level to the system level. Possible topics of interest include:

  • Lossless compression and chunking                                                      
  • Selection of appropriate data formats for I/O: HDF5, NetCDF, zarr, etc.
  • Optimization of I/O for climate models: application, middleware, file system
  • Post-processing of large data sets (reading large amounts of data)
  • Monitoring: effects of applications on the file system versus effects of the file system on the application
  • Comparison between file systems and object stores
  • Key metric: time-to-solution
     
    • 11:00 12:00
      Reception coffee 1h
    • 12:00 12:40
      I/O in Climate Modeling 40m

      Opening talk

      Speaker: Dr Panos Adamidis
    • 12:40 13:20
      Lustre at DKRZ: Stripping Strategies & Best Practices 40m

      TBD

      Speaker: Carsten Beyer (DKRZ)
    • 13:20 14:00
      Field notes from explorations on a big file system 40m

      In this talk I will present observations on various aspects of the use of DKRZ's 120 TB lustre file system, regarding throughput and usage patterns. They range from back-of-the-envelope calculations via aggregate statistics to the usage of an individual dataset, and may provide valuable insights into the users' needs and things to consider in the development of I/O solutions and the procurement of future storage systems.

      Speaker: Dr Florian Ziemen (DKRZ)
    • 14:00 14:15
      Coffee Break 15m
    • 14:15 14:55
      IO Benchmarking in HPC Systems 40m

      Description: TBD

      Speakers: Dr Jannek Squar, Anna Fuchs (Universität Hamburg)
    • 14:55 15:35
      I/O performance in CLAIX23 infrastructure 40m

      In this topic, I would like to compare the I/O performance changes from the system that we benchmark using IO500. Additionally, we want to see the performance of ICON in the grand scheme of things within CLAIX since currently we are working with DKRZ in the Green HPC project

      Speaker: Radita Liem (RWTH Aachen)
    • 15:35 16:10
      Coffee Break And Discussion 35m

      Please grab a cup of coffee and some finger food and head to the conference room, where you can engage in discussions about the previous talks.

    • 16:10 16:50
      Parallel HDF5 25 years after 40m

      25 years have passed since the first release of parallel HDF5. The software is still under active development to address constantly evolving HPC requirements. In our talk we will give an overview of the current state of parallel HDF5 library and its new compression and sub-filing capabilities. We will also talk about HDF5 tuning knobs for the HPC applications that were developed over the years.

      Speaker: Elena Pourmal
    • 16:50 17:30
      hiopy - Optimizing Model Output for Analysis 40m

      As climate models reach the kilometer scale, horizontal model grids outgrow the size of computer screens as well as the capacity of the human eye.
      In consequence, a model output analyst can't observe the full output at once, but will always use subsets or coarser versions of the data.
      The time to an analysis result can be reduced dramatically, if output datasets are optimized for the changed read workload. Hiopy is a new way of writing
      ICON model output, which utilizes YAC and the Zarr format to create such optimized datasets directly from the running model for immediate consumption.

      Speaker: Dr Tobias Kölling
    • 17:30 18:00
      Panel Discussion 30m
    • 19:00 22:00
      Social Event: TBD
      • 19:00
        Excursion 3h

        Description: TBD

  • Wednesday, 8 May
    • 09:00 09:30
      Receiption Coffee 30m

      TODO

    • 09:30 10:10
      Enabling purposeful use of large-volume Earth System Modelling datasets: ideas and concepts explored at DKRZ 40m

      Current state-of-the-art and upcoming Earth System Model (ESM) simulations produce output on the order of single- to double digit petabytes per individual climatic timescale-spanning simulation. Creating an infrastructure environment enabling the purposeful analysis of such data amounts requires revamping data handling paradigms for ESM datasets. We present concepts, ideas and prototypes developed along the requirements of the ESM-community to enable efficient ESM output access and analysis across the storage hardware hierarchy at DKRZ

      Speaker: Dr Karsten Peters-von Gehleт
    • 10:10 10:50
      Databases for HPC and Parallel IO 40m

      Early explorations into using an RDBMS as a data store for parallel IO workloads led to a conclusion that the technology was ill fitted for the task. The community has accepted this
      “wisdom” and been reluctant to support any new efforts into investigating databases. I think it is time to revisit.

      Speaker: Dr Jay Lofstead
    • 10:50 11:30
      Discussion 40m
    • 11:30 12:30
      Lunch 1h
    • 12:30 13:10
      Leveraging Flexible Storage System Components for HPC Research 40m

      Abstract: Research has become increasingly data-driven, putting additional pressure on the underlying storage systems. Gaining insights into the their behavior is critical understanding and optimizing I/O performance. However, existing storage systems often lack the necessary functionality and are difficult to modify and extend. Therefore, the Parallel Computing and I/O research group is developing several storage system components within the JULEA and Haura projects, making it possible to cover the entire storage stack from application I/O interfaces to block device access. This allows rapidly prototyping new approaches and optimizations.

      Speaker: Prof. Michael Kuhn (OVGU)
    • 13:10 13:50
      Using the DAOS Storage APIs with Weather and Climate Applications 40m

      The Distributed Asynchronous Object Storage (DAOS) is an open source scale-out storage system that is designed from the ground up to support Storage Class Memory (SCM) and NVMe storage in user space (https://docs.daos.io/). This presentation provides an overview of the DAOS architecture, and describes the various APIs that are available to the user to benefit from the performance advantages  that DAOS offers in comparison to traditional parallel filesystems like GPFS or Lustre. We will also solicit feedback from the community to guide future development efforts.

      Speaker: Michael Hennecke
    • 13:50 14:30
      Discussion 40m
    • 14:30 14:50
      Coffee Break 20m
    • 14:50 16:00
      Finalisation: Working Groups and Discussion

      Description: TBD

    • 16:00 16:30
      Summary 30m

      TBD