Skip to content

v3.0rc1

Compare
Choose a tag to compare
@adammoody adammoody released this 16 Apr 20:08
· 747 commits to develop since this release

This is release candidate for v3.0.

  • Improved support for large datasets and shared access to files. Applications can now configure SCR to bypass the cache and access datasets on the global file system:
    • Since bypass mode is more general, it is enabled by default. To use cache, one must disable bypass mode (SCR_CACHE_BYPASS=0).
    • For datasets that are too large to fit in cache or for systems that have no cache available, SCR can use the global file system. This improves portability so that applications can use SCR on any cluster.
    • For applications that write shared files, SCR can use bypass mode during the SCR Checkpoint/Output API.
    • For applications that write datasets as a file-per-process but require shared access to files during restart, one can write to cache but enable SCR_GLOBAL_RESTART. This rebuilds and flushes cached datasets during SCR_Init. It also enables bypass mode for restart, so an application can read its dataset from the global file system using the SCR Restart API.
  • Applications can now instruct SCR to load a specific checkpoint by naming it in the SCR_CURRENT parameter before calling SCR_Init.
  • Restart loop:
    • SCR now supports a loop around SCR_Have_restart, SCR_Start_restart, and SCR_Complete_restart. If an application detects a problem during its restart, it can pass valid=0 to SCR_Complete_restart. SCR will then load the next most recent checkpoint, which the application can query with another call to SCR_Have_restart.
  • New API calls:
    • SCR_Config(const char* config) to set and query SCR configuration parameters before SCR_Init()
    • SCR_Current(const char* name) enables an application that reads its checkpoint without using the SCR Restart API to inform SCR about which checkpoint it loaded so that SCR can still track the proper ordering of checkpoints
    • SCR_Delete(const char* name) to ask SCR to delete a dataset
    • SCR_Drop(const char* name) to ask SCR to drop a dataset from the index without deleting the underlying data files
  • New flush methods
  • New redundancy scheme:
    • Reed-Solomon encoding (SCR_COPY_TYPE=RS) allows a configurable number of failures per group, from 1 to N-1 where N is the set size. Use SCR_SET_SIZE to specify the group size and SCR_SET_FAILURES to specify the number of failures per group.
  • SCR configuration parameters now support interpolation of environment variables in configuration files, e.g.,
    >>: cat .scrconf
    SCR_CACHE_BASE=$BBPATH
    
  • SCR now preserves file metadata including atime, mtime, uid, gid, and mode bits
  • New logging options:
    • text file - written to the SCR prefix directory (SCR_LOG_TXT_ENABLE=1)
    • syslog - one can configure the syslog prefix, facility, and level to be used (SCR_LOG_SYSLOG_ENABLE=1)
  • Apps can now configure SCR to maintain a sliding window of checkpoints on the parallel file system with an SCR_PREFIX_SIZE parameter. After flushing a new checkpoint, SCR will delete older checkpoints
  • Default cache and control directories have been moved from /tmp to /dev/shm on Linux systems
  • Assists for application developers when integrating the SCR API
    • A new SCR_CACHE_PURGE parameter configures SCR to delete datasets from cache in new runs
    • A new SCR_PREFIX_PURGE parameter similarly deletes datasets from the prefix directory in new runs
    • Added internal checks to warn developers about incorrect API usage
  • Added Python bindings for SCR library (beta)
    • Implemented in an scr.py module (import scr)
    • Uses C Foreign Function Interface (CFFI) to wrap C functions in libscr
    • Supports Python 2 and 3
  • Refactored code base to use ECP-VeloC components https://github.com/ecp-veloc/
    • Improves code modularity and reuse
    • Improved testing
    • scr-top package (https://github.com/llnl/scr-top) includes source for SCR and its ECP-VeloC components to simplify direct installs, e.g.,
      tar -xzf scr-top-v3.0rc1.tgz
      cd scr-top-v3.0rc1
      mkdir build install
      cd build
      cmake -DCMAKE_INSTALL_PREFIX=../install -DSCR_RESOURCE_MANAGER=SLURM ../
      make install