Live Webcast 15th Annual Charm++ Workshop

-->
Charm++ Discussions
Charm++ Bug Tracking

Release Highlights

  • The License has changed to Apache 2.0 with LLVM exception. This change to a popular Open Source license is intended to simplify use, collaboration, and greater community involvement in the development of Charm++. The NOTICE file contains the pertinent disclaimers.

  • The CkIO library (previously only supporting file output) has been enhanced to support file input. The input layer enables two-phase, collective input from a single file via an array of buffer chares, which read from the file system and buffer data until the application requests it. As of this release, the number of buffer chares is not automated by CkIO, and the user is responsible for selecting the input decomposition for best performance.

  • The OFI layer has been extended to support the CXI (Cassini) extensions for Slingshot-11. This can be accessed by adding the cxi parameter to the build line and allows for greatly improved performance on machines such as (Frontier, Perlmutter, Delta) at larger node counts. Adding cxi as a build parameter is not necessary on most Slingshot-11 installations, as it will be autodetected.

even more...

What's new in Charm++ 8.0.0

  • The License has changed to Apache 2.0 with LLVM exception. This change to a popular Open Source license is intended to simplify use, collaboration, and greater community involvement in the development of Charm++. The NOTICE file contains the pertinent disclaimers.

  • The CkIO library (previously only supporting file output) has been enhanced to support file input. The input layer enables two-phase, collective input from a single file via an array of buffer chares, which read from the file system and buffer data until the application requests it. As of this release, the number of buffer chares is not automated by CkIO, and the user is responsible for selecting the input decomposition for best performance.

  • The OFI layer has been extended to support the CXI (Cassini) extensions for Slingshot-11. This can be accessed by adding the cxi parameter to the build line and allows for greatly improved performance on machines such as (Frontier, Perlmutter, Delta) at larger node counts. Adding cxi as a build parameter is not necessary on most Slingshot-11 installations, as it will be autodetected.

  • Added support for NVIDIA's nvc/nvfortran compilers and Intel's new icx/ifx compilers.

  • Fixed a bug in location management when doing dynamic insertion and deletion of chare array elements.

  • Deprecated the "atomic" keyword in SDAG code (in favor of "serial").

  • Added support for TLSglobals on Apple ARM systems and the ability to disable TLSglobals support at build-time.

  • Performance optimizations for node group broadcasts.

  • Added CMI-SHMEM module for optimizing small/medium-sized messages between processes on the same host.

  • Fixed demand creation via ckcallback::send and enabled passing options to ckcallback::send.

  • Improved portability and usability of AMPI's automatic global variable privatization methods.

  • Transferred organization ownership of the repository on github from UIUC-PPL to charmplusplus.

What's new in Charm++ 7.0.0

    This is a feature release, with the following major changes:

  • Highlights:
    • A significant overhaul of the load balancing infrastructure and the addition of TreeLB, a new, more flexible and extensible base LB, intended to replace the previous CentralLB and HybridBaseLB designs (load balancers using these previous designs are still supported). Some included strategies have been rewritten to use TreeLB, unused and extraneous strategies have been removed. Please see https://charm.readthedocs.io/en/latest/charm++/manual.html#load-balancing for more information.

    • Experimental support for intra-node GPU messages, allowing data to be sent to/from GPU memory without going through host memory. Please see https://charm.readthedocs.io/en/latest/charm++/manual.html#gpu-support for more information.

  • Misc:
    • Charm++ now builds with CMake by default when using the ./build script (requires CMake version 3.4 or higher). The old build system is still available by using ./buildold. Please see https://charm.readthedocs.io/en/latest/charm++/manual.html#installation-with-cmake for more information.

    • Charm++'s development branch has been renamed from 'master' to 'main'. Please see https://github.com/UIUC-PPL/charm/pull/3303 for details.

    • We have adopted GitHub discussions (https://github.com/UIUC-PPL/charm/discussions) as the preferred venue for Charm++ questions and discussions instead of the charm@cs.illinois.edu mailing list.

    • Support for Blue Gene/Q targets has been deprecated.

    • Charm++ now runs on the new Apple M1 chips with the targets: multicore-darwin-arm8, netlrts-darwin-arm8, and mpi-darwin-arm8.

    • Charm++ now runs on the new Cray Shasta machines with the targets (in beta): ofi-crayshasta, and mpi-crayshasta.

    • The BigSim emulator has been removed from Charm++.

    • The following unmaintained machine layers have been removed from Charm++:

      • uth (Machine layer that uses user-level threads for execution)
      • sim (Machine layer that simulates a simple message-passing machine with communications processors; based on the Dagger simulator)
      • shmem (Machine layer built on top of the OpenSHMEM API)
    • The following modules have been removed: ARMCI, Jade, and Charj.

    • The 'mpi-linux-mips64' target has been removed.

  • Charm++ Features & Fixes:
    • We have renamed the VERSION file to charm-version.h and made it compatible with C. Charm++ programs that need to check the Charm++ version should use the CHARM_VERSION, CHARM_VERSION_MAJOR, CHARM_VERSION_MINOR, and CHARM_VERSION_PATCH C macros defined when compiling a Charm++ application. We also provide a CHARM_VERSION_GIT macro for the exact git revision of Charm++. In shell scripts, you can determine the version information like this: $ grep "CHARM_VERSION " charm-version.h | awk '{print $3}'.

    • Support for variable sized messages in TRAM.

    • Added a pup_buffer API with zero copy functionality.

    • Array broadcasts are now node aware, avoiding unnecessary duplication of messages. Expedited nokeep array broadcasts are also allowed now.

    • Added CmiNodeReduce API for Converse node level reductions.

    • Builds with CMK_OPTIMIZE=1 will only segfault to aid in debugging in CmiAbort when ++debug or +truecrash are provided.

    • Callbacks used in liveViz have been made ASLR safe.

    • Updated implementation of atomics, locks, and fences to use C++11/C11 versions where suitable.

    • Fixed bugs in HAPI and updated implementation to use new CUDA APIs.

    • Added whenidle attribute to indicate entry method to be called when a PE is idle.

    • Improved performance, support, and fixes for UCX. IBM Power is now supported with UCX.

    • The CmiSyncSend family now tries to avoid copies for nokeep messages.

    • Fixed bug with element IDs with CMK_GLOBAL_LOCATION_UPDATE.

    • Fixed bug in BlockMap in array creation.

    • Added CcdPROCESSOR_LONG_IDLE to run a function during long periods of idleness and CcdSCHEDLOOP to run a function during every scheduler loop.

    • Added isCheckpoint() and isMigration() methods for users to condition logic inside PUP based on its purpose.

    • Added execution metadata to Projections logs.

    • Added several new benchmarks and tests.

    • Fixed bug where out-of-order migration updates may cause a hang.

  • Adaptive MPI:
    • Isomalloc sync is now enabled by default and the implementation no longer uses the filesystem to pass data. Isomalloc sync can be disabled with +no_isomalloc_sync.

    • Added AMPI-only build target, AMPI-only. This build optimizes AMPI by disabling features of Charm++ that AMPI doesn't use.

    • Fixed tlsglobals migration on macOS.

    • Fixed bugs in activation of migration callbacks.

    • Fixed a bug in MPI_Waitsome.

    • Accept arguments from AMPI_BUILD_FLAGS environment variable for ampiCC.

    • Improved portability of fsglobals and pipglobals.

What's new in Charm++ 6.10.2

    This is a bugfix release, with the following minor changes:

  • Fixes:
    1. Verbs layer - Fixed memory leaks in acknowledgment handling for large message transfers.

    2. GNI layer - Fixed a minor issue related to freeing short messages sent while using the Zero copy API on gni-crayxe platforms.

    3. Fixed a memory leak in the copy based implementation of the Zero copy API impacting non-RDMA enabled layers like netlrts.

What's new in Charm++ 6.10.1

    This is a bugfix release, with the following minor changes:

  • Fixes:
    1. Fix verbs layer send completion errors on recent InfiniBand hardware/drivers.

    2. Avoid aborting with a segfault when calling CmiAbort in production builds.

What's new in Charm++ 6.10.0

    This is a feature release, with the following major changes:

  • Misc:
    1. Updated the license to clarify the restriction on commercial use of the software in the academic distribution.

    2. We have moved away from .tex in favor of .rst files to make building the documentation more portable. The documentation is now available at https://charm.readthedocs.io/ .

    3. We have moved bug/issue tracking from Redmine to GitHub, and code review from Gerrit to GitHub. Our GitHub repository is at: https://github.com/UIUC-PPL/charm .

    4. As a preview feature, Charm++ can now be built with CMake (version 3.4 or higher). To try it, you can replace your ./build command with ./buildcmake, which supports most of the options of ./build. The old build system is still available. Please see https://charm.readthedocs.io/en/latest/charm++/manual.html#installation-with-cmake for more information.

    5. Upcoming deprecation notice: The next release of Charm++ will feature a significant overhaul of the load balancing infrastructure. There will be changes to the process of selecting and using load balancers, writing custom load balancers, and the internals of the load balancing infrastructure. Programs that rely on custom load balancers or the internals of the LB infrastructure will likely require some changes for compatibility.

    6. Upcoming deprecation notice: The next release of Charm++ will remove the BigSim emulation facility from the runtime system.

  • Known Issues:
    1. Recent InfiniBand machines crash in SMP builds due to problems in the verbs layer implementation. Users are recommended to use UCX for the time being if possible. (https://github.com/UIUC-PPL/charm/issues/2532)

    2. UCX sometimes hangs/crashes on Frontera. (https://github.com/UIUC-PPL/charm/issues/2635, https://github.com/UIUC-PPL/charm/issues/2636)

  • Charm++ Features & Fixes:
    1. Support for a new Unified Communication X (UCX) networking backend in LRTS, thanks to Mellanox and Charmworks staff.

    2. The Zero Copy API now supports broadcast operations, and is used internally for transmission of large readonly objects during startup.

    3. Get and put operations, used in the Zero Copy Direct API, now return CkNcpyStatus::(in)complete for users to check for immediate completion as opposed to waiting for the completion callback.

    4. Addition of a new Zero Copy Post API, for avoiding the receive-side message copy. This can be used in both point-to-point and broadcast operations.

    5. Defined a new API, CkWithinNodeBroadcast, for broadcasting a message from a Group element to all other Group elements in the same process or logical node. If the target entry method is [nokeep], this API avoids making any copies of the message.

    6. Callbacks to [inline] entry methods are now executed inline by default. Previously, this was only done when the callback was constructed with an optional parameter.

    7. Eliminated the need for mainchares in user-driven interop mode by adding a new split-phase initialization API, fixed a bug in the interop exit sequence, and new support for using CkCallback::ckExit when using interop.

    8. Allocate pinned host memory pool for GPUs dynamically on demand, instead of statically at compilation time.

    9. Memory copy operations in GPUManager WorkRequest API are reverted to be asynchronous.

    10. Added an optional parameter for freeing the CkCallback object in GPUManager WorkRequest API.

    11. Fixed a bug in MetaLB and adding tests for MetaLB.

    12. Fixed a bug in SDAG's code generation for forall statements with negative steps.

    13. TRAM and [aggregate] entry methods now support multi-dimensional chare arrays.

    14. Virtual inheritance from multiple PUPable base classes is now allowed.

    15. Support for PUPing C++11 random number engines and engine adaptors, as well as for PUPing templated abstract base classes.

    16. Section reductions are now optimized for streamable operations.

    17. Core dump files are now available for --with-production builds.

    18. Defined a new XI-Builder interface, a library front-end for XLAT-I's code generation.

    19. Fixes to the perfReport and memory tracemodes as well as record/replay in SMP mode, and improvements to PAPI-enabled builds.

    20. Due to being broken since before v6.8, mlog and causalft builds have been removed.

    21. Added a charmc option "-module-names" which prints the module names in a .ci file, one module name per line in the output.

    22. charmrun implements ++no-* for flag-type parameters. For example, ++no-scalable-start. Also fixed use of ++scalable-start and ++batch together.

    23. Performance measurement programs from the tests and examples directories have been recategorized into a new "benchmarks" directory.

    24. Charm++ can now be built with -std=c++17, and all eligible C files in the Charm++ runtime have been transitioned to compile as C++.

    25. Support for mpi-win-x86_64-gcc builds.

    26. Various improvements to Charm4Py, such as a new sections implementation, are described in the charm4py repository on GitHub.

    27. The CmiAbort and CkAbort functions now support printf-style format strings. Please make sure to replace '%' with '%%' in the argument string to print a '%'.

  • Adaptive MPI:
    1. AMPI now uses Charm++'s Zero Copy API to transfer large messages efficiently using RDMA and CMA wherever possible and profitable.

    2. More efficient implementations of MPIBcast, all MPI(I)(all)gather(v) routines, reductions with non-commutative operations, and user-defined datatype creation.

    3. Added support for MPIWin(Un)lock_all and MPI_Type_match_size.

    4. Fixes to MPI_Mrecv, MPI_Info_dup, and MPI_BOTTOM error handling.

    5. Stubs for MPI functions currently unimplemented in AMPI are now provided to allow more MPI codes to build. These emit -Wdeprecated-declarations diagnostics when used.

    6. AMPI's mpif.h is now compilable in line-extended fixed format.

    7. TLSglobals now works on Mac OS.

    8. Two new global variable privatization methods have been added, Process-in-Process Globals (pipglobals) and Filesystem Globals (fsglobals).

    9. AMPI's nm_globals.sh script now works on both Linux and Mac OS and provides more useful output for identifying writable global/static variables.

    10. Fixed AMPI's CUDA support, with the AMPI+CUDA example now working as expected.

What's new in Charm++ 6.9.0

This is a feature release, with the following major additions:

  • Highlights
    1. Charm++ now requires C++11 and better supports use of modern C++ in applications.
    2. New "Zero Copy" messaging APIs for more efficient communication of large arrays.
    3. charm4py provides a new Python interface to Charm++, without the need for .ci files.
    4. AMPI performance, standard compliance, and usability improvements.
    5. GPU Manager improvements for asynchronous offloading and a new CUDA-like API (HAPI).
  • Charm++ Features & Fixes
    1. Added new, more intuitive process launching commands based on hwloc support, such as '++processPer{Host,Socket,Core,PU} [num]' and '++oneWthPer{Host,Socket,Core,PU}'. Also added a '++autoProvision' option, which by default uses all hardware resources available to the job.
    2. Added a new 'zero copy' direct API which allows users to communicate large message buffers directly via RDMA on networks that support it, avoiding any intermediate buffering of data between the sender and the receiver. The API is also optimized for shared memory.
    3. A new Python interface to Charm++, named charm4py, is now available for Python users. More documentation on it can be found here: https://charm4py.readthedocs.io
    4. Charmxi now supports r-value references, std::array, std::tuple, the 'typename' keyword, parameter packs, variadic templates, array indices with template parameters, and attributes on explicit instantiations of templated entry methods.
    5. Projections traces of templated entry methods now display demangled template type names.
    6. [local] and [inline] entry method attributes now work for templated entry methods and now support perfect forwarding of their arguments.
    7. Added various type traits for generic programming with Charm++ entities inside charm++_type_traits.h
    8. Chare array index types are now exposed as 'array_index_t'.
    9. Support for default arguments to Group entry methods.
    10. Charm++ now throws a runtime error when a user calls an SDAG entry method containing a 'when' clause directly, without calling it via a proxy.
    11. Users can now pass std::vector's directly to contribute() rather than passing the size and buffer address separately. Cross-array section reduction contributions can now take a callback.
    12. Added a simplified STL-based interface for section creation.
    13. Added PUP support for C++ enums, for std::deque and std::forward_list, for STL containers of objects with abstract base classes, and for avoiding default construction during unpacking by defining a constructor that takes a value of type PUP::reconstruct.
    14. Improved performance for PUP of STL containers of arithmetic types and types declared as PUPbytes.
    15. Allow setting queueing type and priorities on entry methods with no parameters.
    16. Enable setting Group and Node Group dependencies on all types of entry methods and constructors, as well as multiple dependencies.
    17. Support for model-based runtime load balancing strategy selection via MetaLB. This can be enabled with +MetaLBModelDir [path-to-model] used alongside +MetaLB option. A trained model can be found in charm/src/ck-ldb/rf_model.
    18. A new lock-free producer-consumer queue implementation has been added as a build option '--enable-lockless-queue' for LRTS's within-process messaging queues in SMP mode.
    19. CkLoop now supports lambda syntax, adds a Hybrid mode that combines static scheduling with dynamic work stealing, and adds Drone mode support in which chares are mapped to rank 0 on each logical node so that other PEs can act as drones to execute tasks.
    20. Updated our integrated LLVM OpenMP runtime to support more OpenMP directives.
    21. Updated f90charm interface for more functionality and usability, and fixed all example programs.
    22. The Infiniband 'verbs' communication layer now automatically selects the fastest active Infiniband device and port at startup.
    23. Fixed '-tracemode utilization', tracing of user-level threads, and nested local/inline methods.
    24. Fixed a performance bug introduced in v6.8.0 for dynamic location management.
    25. Added support for using Boost's lightweight uFcontext user-level threads, now the default ULT implementation on most platforms.
    26. '++debug' now works using lldb on Mac (Darwin) systems.
    27. CkAbort() is now marked with the C++ attribute [[noreturn]].
    28. CkExit() now takes an optional integer argument which is returned from the program's exit.
    29. Improved error checking throughout, and fixes to race conditions during startup.
  • AMPI Changes
    1. Improved performance of point-to-point message matching and reduced per-rank memory footprint.
    2. Fixes to derived datatypes handling, MPI_Sendrecv_replace, MPI_(I)Alltoall{v,w}, MPI_(I)Scatter(v), MPI_IN_PLACE in gather collectives, MPI_Buffer_detach, MPI_Type_free, MPI_Op_free, and MPI_Comm_free.
    3. Implemented support for generalized requests, MPI_Comm_create_group, keyval attribute callbacks, the distributed graph virtual topology, large count routines, matched probe and recv, and MPI_Comm_idup(_with_info) routines.
    4. Added support for using -tlsglobals for privatization of global/static variables in shared objects. Previously -tlsglobals required static linking.
    5. '-memory os-isomalloc', which uses the system's malloc underneath, now works everywhere Isomalloc does. Both versions of Isomalloc now wrap calls to posix_memalign(), and we removed the need to link with '-Wl,--allow-multiple-definition' on some systems.
    6. Updated AMPI_Migrate() with built-in MPI_Info objects, such as AMPI_INFO_LB_SYNC.
    7. AMPI now only renames the user's MPI calls from MPI_* to AMPI_* if Charm++/AMPI is built on top of another MPI implementation for its communication substrate.
    8. Support for compiling mpif.h in both fixed form and free form.
    9. PMPI profiling interface support added.
    10. Added an ampirun script that wraps charmrun to enable easier integration with build and test scripts that take mpirun/mpiexec as an option.
  • GPU Manager Changes
    1. Enable concurrent kernel execution by removing the limit imposed by the internal implementation that used only three streams.
    2. New API (Hybrid API, or HAPI) that is more similar to the CUDA API.
    3. Added NVIDIA NVTX support for profiling host-side functions.
    4. Deprecated the workRequest API. New users are now strongly recommended to use the new API, or Hybrid API (HAPI).
  • Build System Changes
    1. Charm++ now requires C++11 support, and as such defaults to using bgclang on BGQ. Compilers GCC v4.8+, ICC v15.0+, XLC v13.1+, Cray CC v8.6+, MSVC v19.00.24+ and Clang v3.3+ are required.
    2. Building Charm++ from the git repository now requires autoconf and automake.
    3. Support for the Flang Fortran compiler added.
    4. Users can now specify compiler versions to our top-level build script when building with gcc or clang.
    5. Windows users can now build Charm++ with GCC, Clang, or MSVC.
    6. All of Charm++ and AMPI can now be built as shared objects.
    7. Added a CMake wrapper for compiling .ci files.
    8. Charm++ is now available in Spack under the name 'charmpp'.
    9. Added {pamilrts,mpi,multicore,netlrts}-linux-ppc64le build targets for new IBM POWER systems.
    10. Added {multicore,netlrts}-linux-arm8 build targets for AArch64 / ARM64 systems.

The code can be found in our Git repository as tag 'v6.9.0' or in a tarball

What's new in Charm++ 6.8.2

This is a backwards-compatible patch/bug-fix release, containing just a few changes. The primary improvements are:

  • Fix a crash in SMP builds on the OFI network layer
  • Improve performance of the PAMI network layer on POWER8 systems by adjusting message-size thresholds for different protocols

The code can be found in our Git repository as tag 'v6.8.2' or in a tarball

What's new in Charm++ 6.8.1

This is a backwards-compatible patch/bug-fix release. Roughly 100 bug fixes, improvements, and cleanups have been applied across the entire system. Notable changes are described below:

  • General System Improvements
    1. Enable network- and node-topology-aware trees for group and chare array reductions and broadcasts

    2. Add a message receive 'fast path' for quicker array element lookup

    3. Feature #1434: Optimize degenerate CkLoop cases

    4. Fix a rare race condition in Quiescence Detection that could allow it to fire prematurely (bug #1658)

      • Thanks to Nikhil Jain (LLNL) and Karthik Senthil for isolating this in the Quicksilver proxy application
    5. Fix various LB bugs

      1. Fix RefineSwapLB to properly handle non-migratable objects
      2. GreedyRefine: improvements for concurrent=false and HybridLB integration
      3. Bug #1649: NullLB shouldnt wait for LB period
    6. Fix Projections tracing bug #1437: CkLoop work traces to the previous entry on the PE rather than to the caller

    7. Modify [aggregate] entry method (TRAM) support to only deliver PE-local messages inline for [inline]-annotated methods. This avoids the potential for excessively deep recursion that could overrun thread stacks.

    8. Fix various compilation warnings

  • Platform Support
    1. Improve experimental support for PAMI network layer on POWER8 Linux platforms

      • Thanks to Sameer Kumar of IBM for contributing these patches
    2. Add an experimental 'ofi' network layer to run on Intel Omni-Path hardware using libfabric

      • Thanks to Yohann Burette and Mikhail Shiryaev of Intel for contributing this new network layer
    3. The GNI network layer (used on Cray XC/XK/XE systems) now respects the ++quiet command line argument during startup

  • AMPI Improvements
    1. Support for in-place collectives and persistent requests

    2. Improved Alltoall(v,w) implementations

    3. AMPI now passes all MPICH-3.2 tests for groups, virtual topologies, and infos

    4. Fixed Isomalloc to not leave behind mapped memory when migrating off a PE

The complete list of issues that have been merged/resolved in 6.8.1 can be found here.

What's new in 6.8.0

Over 900 commits (bugfixes + improvements + cleanups) have been applied across the entire system. Major changes are described below:

  • Charm++ Features
    1. Calls to entry methods taking a single fixed-size parameter can now automatically be aggregated and routed through the TRAM library by marking them with the [aggregate] attribute.
    2. Calls to parameter-marshalled entry methods with large array arguments can ask for asynchronous zero-copy send behavior with a 'nocopy' tag in the parameter's declaration.
    3. The runtime system now integrates an OpenMP runtime library so that code using OpenMP parallelism will dispatch work to idle worker threads within the Charm++ process.
    4. Applications can ask the runtime system to perform automatic high-level end-of-run performance analysis by linking with the '-tracemode perfReport' option.
    5. Added a new dynamic remapping/load-balancing strategy, GreedyRefineLB, that offers high result quality and well bounded execution time.
    6. Improved and expanded topology-aware spanning tree generation strategies, including support for runs on a torus with holes, such as Blue Waters and other Cray XE/XK systems.
    7. Charm++ programs can now define their own main() function, rather than using a generated implementation from a mainmodule/mainchare combination. This extends the existing Charm++/MPI interoperation feature.
    8. Improvements to Sections:
      1. Array sections API has been simplified, with array sections being automatically delegated to CkMulticastMgr (the most efficient implementation in Charm++). Changes are reflected in Chapter 14 of the manual.
      2. Group sections can now be delegated to CkMulticastMgr (improved performance compared to default implementation). Note that they have to be manually delegated. Documentation is in Chapter 14 of Charm++ manual.
      3. Group section reductions are now supported for delegated sections via CkMulticastMgr.
      4. Improved performance of section creation in CkMulticastMgr.
      5. CkMulticastMgr uses the improved spanning tree strategies. See above.
    9. GPU manager now creates one instance per OS process and scales the pre-allocated memory pool size according to the GPU memory size and number of GPU manager instances on a physical node.
    10. Several GPU Manager API changes including:
      1. Replaced references to global variables in the GPU manager API with calls to functions.
      2. The user is no longer required to specify a bufferID in dataInfo struct.
      3. Replaced calls to kernelSelect with direct invocation of functions passed via the work request object (allows CUDA to be built with all programs).
    11. Added support for malleable jobs that can dynamically shrink and expand the set of compute nodes hosting Charm++ processes.
    12. Greatly expanded and improved reduction operations:
      1. Added built-in reductions for all logical and bitwise operations on integer and boolean input.
      2. Reductions over groups and chare arrays that apply commutative, associative operations (e.g. MIN, MAX, SUM, AND, OR, XOR) are now processed in a streaming fashion. This reduces the memory footprint of reductions. User-defined reductions can opt into this mode as well.
      3. Added a new 'Tuple' reducer that allows combining multiple reductions of different input data and operations from a common set of source objects to a single target callback.
      4. Added a new 'Summary Statistics' reducer that provides count, mean, and standard deviation using a numerically-stable streaming algorithm.
    13. Added a '++quiet' option to suppress charmrun and charm++ non-error messages at startup.
    14. Calls to chare array element entry methods with the [inline] tag now avoid copying their arguments when the called method takes its parameters by const&, offering a substantial reduction in overhead in those cases.
    15. Synchronous entry methods that block until completion (marked with the [sync] attribute) can now return any type that defines a PUP method, rather than only message types.
  • AMPI Features
    1. More efficient implementations of message matching infrastructure, multiple completion routines, and all varieties of reductions and gathers.
    2. Support for user-defined non-commutative reductions, MPI_BOTTOM, cancelling receive requests, MPI_THREAD_FUNNELED, PSCW synchronization for RMA, and more.
    3. Fixes to AMPI's extensions for load balancing and to Isomalloc on SMP builds.
    4. More robust derived datatype support, optimizations for truly contiguous types.
    5. ROMIO is now built on AMPI and linked in by ampicc by default.
    6. A version of HDF5 v1.10.1 that builds and runs on AMPI with virtualization is now available at https://charm.cs.illinois.edu/gerrit/#/admin/projects/hdf5-ampi
    7. Improved support for performance analysis and visualization with Projections.
  • Platforms and Portability
    1. The runtime system code now requires compiler support for C++11 R-value references and move constructors. This is not expected to be incompatible with any currently supported compilers.
    2. The next feature release (anticipated to be 6.9.0 or 7.0) will require full C++11 support from the compiler and standard library.
    3. Added support for IBM POWER8 systems with the PAMI communication API, such as development/test platforms for the upcoming Sierra and Summit supercomputers at LLNL and ORNL. Contributed by Sameer Kumar of IBM.
    4. Mac OS (darwin) builds now default to the modern libc++ standard library instead of the older libstdc++.
    5. Blue Gene/Q build targets have been added for the 'bgclang' compiler.
    6. Charm++ can now be built on Cray's CCE 8.5.4+.
    7. Charm++ will now build without custom configuration on Arch Linux
    8. Charmrun can automatically detect rank and node count from Slurm/srun environment variables.

The complete list of issues that have been merged/resolved in 6.8.0 can be found here. The associated git commits can be viewed here.

6.7.1

Changes in this release are primarily bug fixes for 6.7.0. The major exception is AMPI. A brief list of changes follows:

  • Charm++ Bug Fixes
    1. Startup and exit sequences are more robust
    2. Error and warning messages are generally more informative
    3. CkMulticast's set and concat reducers work correctly
  • Adaptive MPI Features
    1. AMPI's extensions have been renamed to use the prefix 'AMPI_' and to follow MPI's naming conventions
    2. AMPI_Migrate(MPI_Info) is now used for both dynamic load balancing and all fault tolerance schemes
    3. AMPI now officially supports MPI-2.2, and has support for MPI-3.1's nonblocking and neighborhood collectives
  • Platforms and Portability
    1. Cray regularpages build has been fixed
    2. Clang compiler target for BlueGene/Q systems added
    3. Communication thread tracing for SMP mode added
    4. AMPI compiler wrappers are easier to use with autoconf and cmake

The complete list of issues that have been merged/resolved in 6.7.1 can be found here. The associated git commits can be viewed here.

6.7.0

Here is a list of significant changes that this release contains over version 6.6.1

  • Features
    1. New API for efficient formula-based distributed spare array creation.
    2. Missing MPI-2.0 API additions to AMPI.
    3. Out-of-tree build is now supported.
    4. New target: multicore-linux-arm7
    5. PXSHM auto detects the node size.
    6. Added support for ++mpiexec with poe.
    7. Add new API related to migration in AMPI.
    8. CkLoop is now built by default.
    9. Scalable startup is now the default behavior when launching a job using chamrun.

    Over 120 bug fixes, spanning areas across the entire system. Here is a list of the major fixes:

  • Bug Fixes
    1. Bug fix to handle CUDA threads correctly at exit.
    2. Bug fix in the recovery code on a node failure.
    3. Bug fixes in AMPI functions - MPI_Comm_create, MPI_Testall.
    4. Disable ASLR on Darwin builds to fix multi-node executions.
    5. Add flags to enable compilation of Charm++ on newer Cray compilers with C++11 support.
  • Deprecations and Deletions
    1. CommLib has been deleted.
    2. +nodesize option for PXSHM is deprecated
    3. CmiBool has been dropped in favor of C++'s bool.
    4. CBase_Foo::pup need not be called from Foo::pup.

The complete list of issues that have been merged/resolved in 6.7.0 can be found here. The associated git commits can be viewed here.

6.6.1

Changes in this release are primarily bug fixes for 6.6.0. A concise list of affected components follows:

  1. CkIO
  2. Reductions with syncFT
  3. mpicxx based MPI builds
  4. Increased support for macros in CI file
  5. GNI + RDMA related communication
  6. MPI_STATUSES_IGNORE support for AMPIF
  7. Restart on different node count with chkpt
  8. Immediate msgs on multicore builds

A complete listing of features added and bugs fixed can be seen in our issue tracker here.

6.6.0

  • Machine target files for Cray XC systems ('gni-crayxc') have been added
  • Interoperability with MPI code using native communication interfaces on Blue Gene Q (PAMI) and Cray XE/XK/XC (uGNI) systems, in addition to the universal MPI communication interface
  • Support for partitioned jobs on all machine types, including TCP/IP and IB Verbs networks using 'netlrts' and 'verbs' machine layers
  • A substantially improved version of our asynchronous library, CkIO, for parallel output of large files
  • Narrowing the circumstances in which the runtime system will send overhead-inducing ReductionStarting messages
  • A new fully distributed load balancing strategy, DistributedLB, that produces high quality results with very low latency
  • An API for applications to feed custom per-object data to specialized load balancing strategies (e.g. physical simulation coordinates)
  • SMP builds on LRTS-based machine layers (pamilrts, gni, mpi, netlrts, verbs) support tracing messages through communication threads
  • Thread affinity mapping with +pemap now supports Intel's Hyperthreading more conveniently
  • After restarting from a checkpoint, thread affinity will use new +pemap/+commap arguments
  • Queue order randomization options were added to assist in debugging race conditions in application and runtime code
  • The full runtime code and associated libraries can now compile under the C11 and C++11/14 standards.
  • Numerous bug fixes, performance enhancements, and smaller improvements in the provided runtime facilities
  • Deprecations
    • The long-unsupported FEM library has been deprecated in favor of ParFUM
    • The CmiBool typedefs have been deleted, as C++ bool has long been universal
    • Future versions of the runtime system and libraries will require some degree of support for C++11 features from compilers

The latest development version of Charm++ can be downloaded directly from our source archive. The Git version control system is used, which is available from here.

This development version may not be as portable or robust as the released versions. Therefore, it may be prudent to keep a backup of old copies of Charm++.

  1. Check out the latest development version of Charm++ from the repository:

    • $ git clone https://github.com/charmplusplus/charm

  2. This will create a directory named charm. Move to this directory:

    $ cd charm

    To obtain the current stable release, 8.0.0, checkout the v8.0.0 tag:

    • $ git checkout v8.0.0
  3. And now build Charm (netlrts-linux example):

    $ ./build charm++ netlrts-linux-x86_64 [ --with-production | -g ]

This will make a netlrts-linux-x86_64 directory, with bin, include, lib etc subdirectories.

The latest development version of Projections can be downloaded directly from our source archive. The Git version control system is used, which is available from here. To build Projections, you will also need gradle.

  1. Check out Projections from the repository:

    • $ git clone https://github.com/charmplusplus/projections

  2. This will create a directory named projections. Move to this directory:

    $ cd projections

  3. And now build Projections:

    $ make

The latest development version of Charm Debug can be downloaded directly from our source archive. The Git version control system is used, which is available from here.

  1. Check out Charm Debug from the repository:

    • $ git clone https://github.com/UIUC-PPL/ccs_tools

  2. This will create a directory named ccs_tools. Move to this directory:

    $ cd ccs_tools

  3. And now build Charm Debug:

    $ ant