Argobots 1.1 keeps the ABI and API compatibility with Argobots 1.0 while adding several new features, optimizing the core Argobots implementation, and fixing bugs. The following summarizes the changes.
Changes in Argobots 1.1
New Features
Tool Interface for Debugging and Profiling
Argobots 1.1 exposes an interface for a tool to catch internal Argobots events such as thread creation, synchronization, and yielding. This interface can be used like MPI's PMPI or OpenMP's OMPT. ABTX_prof is a header-based profiler over this interface, which measures ULT-specific performance metrics such as the average execution time of each ULT, the number of created ULTs, and the number of yield operations. See ABTX_prof for details.
Stack Unwinding for Debugging
Argobots can be compiled with libunwind to enable a stack unwinding feature. This feature should be useful especially when the user dumps Argobots information by ABT_info_trigger_print_all_thread_stacks()
, which can be invoked in a signal handler.
Static Initializers for ABT_mutex
and ABT_cond
ABT_mutex
and ABT_cond
support static initializers so that the developer can easily port existing applications multithreaded with POSIX threads. Those static initializers can be used to speed up creation of ABT_mutex
and ABT_cond
since statically initialized ABT_mutex
and ABT_cond
needs neither ABT_mutex/cond_create()
nor ABT_mutex/cond_free()
.
Extended Work Unit-Specific Data
Previously, work unit-specific data (similar to thread-local storage or TLS) in Argobots is accessible only on its owner work unit. Argobots 1.1 allows the user to access work unit-specific data via ABT_thread
or ABT_task
handles. It is convenient to "attach" data to a work unit. The user can also utilize an optional destructor that is automatically called on ABT_thread_free()
or ABT_task_free()
to release the attached data.
New Utility Functions
Argobots 1.1 adds several new setter/getter functions that Argobots 1.0 lacks.
Extended Affinity Interface
Affinity plays an important role when it comes to high-performance computing. Argobots 1.1 extends the affinity interface to enable a complex affinity setting via the ABT_SET_AFFINITY
environmental variable. The grammar is similar to OpenMP's OMP_PLACES
. See this for details.
Note that Argobots 1.1 disables the affinity setting by default. --enable-affinity
is needed to turn on the affinity feature.
Performance Optimization
Argobots 1.1 improves the performance of the following components in particular.
- Work Unit-Specific Data (
ABT_key
) - Argobots 1.1 significantly (10x or more) reduced the overheads of operations that access work unit-specific data by utilizing a unit-specific data cache and a redesigned hash table. See this PR for details.
- ULT Stack Pool
- Argobots 1.1 improves the performance and the scalability of ULT stack allocation by adopting a bucket-based lock-free LIFO pool with a per-execution stream local cache. See this PR for details.
- Synchronization Objects over Tasklets and External Threads
- Argobots 1.1 supports and optimizes synchronization operations called on a tasklet or an external thread (i.e., POSIX thread) by implementing them with either
futex
(on Linux systems) orpthread_cond_t
(on non-Linux systems). Specifically, an external thread that waits on an Argobots synchronization object sleeps without spinning similarly topthread_cond_wait()
. See this PR for details.
Better API Documentation
Argobots 1.1 enriched the API documentation, which clarifies the following.
- Which parallel entity can legally call a function (i.e., a ULT, a tasklet, or an external thread.)
- What input causes an error.
- What error code is returned.
- What input causes undefined behavior.
More Supported Platforms
Argobots 1.1 officially supports the following compilers.
- GNU Compiler (
gcc
) (>= 4.8) - Clang/LLVM (
clang
) - Intel C Compiler (
icc
) - IBM XL compiler (
xlc
) (>= 16.1.1) - PGI compiler (
pgcc
) (>= 20.9)
Bug Fixes with Thorough Testing
Argobots 1.1 employed a new testing framework called "rtrace" to check the memory leak not only in successful paths but also in failure paths. For example, ABT_init()
internally calls 10-20 resource allocation functions (e.g., malloc()
and mmap()
). This rtrace library tests all the possible success/failure patterns to check if ABT_init()
either succeeds or returns an error after freeing all the allocated memory during the initialization. We tested major Argobots functions and fixed bugs so that every Argobots routine either succeeds or returns an error without a side effect. See this for details.
We also started to check Argobots 1.1 with Coverity to ensure its software quality (Coverity). In addition to Valgrind, Argobots 1.1 supports GCC address sanitizers and Clang address sanitizers so that users can use address sanitizers for their programs that use Argobots.
Miscellaneous Changes
- The producer-consumer check of ABT_pool was removed since this check has been corrupted. The user should not rely on an error returned by Argobots to check if the pool access is correct. Any user who is not sure should use an MPMC pool, which does not have this access concern at all.
- The work-unit migration target check was simplified, so some users might see some errors disappear. It becomes the user's responsibility to keep migration targets alive. Note that the check mechanism in Argobots 1.0 is not fully functional.
- ABT_task type is changed to the same as
ABT_thread
type. It should not cause any problem even if the user mixes ULTs and tasklets. However, it might cause a compilation issue in a case where a user program uses type information, for example, for C++ template. - ABT_unit_set_associated_pool() becomes optional. In Argobots 1.1, unit-pool association is automatically updated by other routines such as
ABT_xstream_run_unit()
andABT_pool_push()
. - The other minor changes (e.g., passing
num_waiters = 0
toABT_barrier_create()
) are written in the code document. - The Argobots logo is renewed. Please use the new one. For a research presentation, we would truly appreciate if you could also cite our Argobots paper.