v3.0rc1
This is release candidate for v3.0.
- Improved support for large datasets and shared access to files. Applications can now configure SCR to bypass the cache and access datasets on the global file system:
- Since bypass mode is more general, it is enabled by default. To use cache, one must disable bypass mode (
SCR_CACHE_BYPASS=0
). - For datasets that are too large to fit in cache or for systems that have no cache available, SCR can use the global file system. This improves portability so that applications can use SCR on any cluster.
- For applications that write shared files, SCR can use bypass mode during the SCR Checkpoint/Output API.
- For applications that write datasets as a file-per-process but require shared access to files during restart, one can write to cache but enable
SCR_GLOBAL_RESTART
. This rebuilds and flushes cached datasets duringSCR_Init
. It also enables bypass mode for restart, so an application can read its dataset from the global file system using the SCR Restart API.
- Since bypass mode is more general, it is enabled by default. To use cache, one must disable bypass mode (
- Applications can now instruct SCR to load a specific checkpoint by naming it in the
SCR_CURRENT
parameter before callingSCR_Init
. - Restart loop:
- SCR now supports a loop around
SCR_Have_restart
,SCR_Start_restart
, andSCR_Complete_restart
. If an application detects a problem during its restart, it can passvalid=0
toSCR_Complete_restart
. SCR will then load the next most recent checkpoint, which the application can query with another call toSCR_Have_restart
.
- SCR now supports a loop around
- New API calls:
SCR_Config(const char* config)
to set and query SCR configuration parameters beforeSCR_Init()
SCR_Current(const char* name)
enables an application that reads its checkpoint without using the SCR Restart API to inform SCR about which checkpoint it loaded so that SCR can still track the proper ordering of checkpointsSCR_Delete(const char* name)
to ask SCR to delete a datasetSCR_Drop(const char* name)
to ask SCR to drop a dataset from the index without deleting the underlying data files
- New flush methods
- Added IBM BB API (https://github.com/IBM/CAST), e.g.,
SCR_FLUSH_TYPE=BBAPI
- Added pthreads, e.g.,
SCR_FLUSH_TYPE=PTHREAD
- Added IBM BB API (https://github.com/IBM/CAST), e.g.,
- New redundancy scheme:
- Reed-Solomon encoding (
SCR_COPY_TYPE=RS
) allows a configurable number of failures per group, from 1 to N-1 where N is the set size. UseSCR_SET_SIZE
to specify the group size andSCR_SET_FAILURES
to specify the number of failures per group.
- Reed-Solomon encoding (
- SCR configuration parameters now support interpolation of environment variables in configuration files, e.g.,
>>: cat .scrconf SCR_CACHE_BASE=$BBPATH
- SCR now preserves file metadata including atime, mtime, uid, gid, and mode bits
- New logging options:
- text file - written to the SCR prefix directory (
SCR_LOG_TXT_ENABLE=1
) - syslog - one can configure the syslog prefix, facility, and level to be used (
SCR_LOG_SYSLOG_ENABLE=1
)
- text file - written to the SCR prefix directory (
- Apps can now configure SCR to maintain a sliding window of checkpoints on the parallel file system with an
SCR_PREFIX_SIZE
parameter. After flushing a new checkpoint, SCR will delete older checkpoints - Default cache and control directories have been moved from
/tmp
to/dev/shm
on Linux systems - Assists for application developers when integrating the SCR API
- A new
SCR_CACHE_PURGE
parameter configures SCR to delete datasets from cache in new runs - A new
SCR_PREFIX_PURGE
parameter similarly deletes datasets from the prefix directory in new runs - Added internal checks to warn developers about incorrect API usage
- A new
- Added Python bindings for SCR library (beta)
- Implemented in an
scr.py
module (import scr
) - Uses C Foreign Function Interface (CFFI) to wrap C functions in libscr
- Supports Python 2 and 3
- Implemented in an
- Refactored code base to use ECP-VeloC components https://github.com/ecp-veloc/
- Improves code modularity and reuse
- Improved testing
- scr-top package (https://github.com/llnl/scr-top) includes source for SCR and its ECP-VeloC components to simplify direct installs, e.g.,
tar -xzf scr-top-v3.0rc1.tgz cd scr-top-v3.0rc1 mkdir build install cd build cmake -DCMAKE_INSTALL_PREFIX=../install -DSCR_RESOURCE_MANAGER=SLURM ../ make install