--- 1.2 --- New Features: - Distributed Data Structures: array (qarray), queue (qdqueue), memory pool (qpool) - Lock-free Data Structures: queue (qlfqueue) - CPU Affinity for shepherds/threads on most systems (not OSX 10.4) - Machine topology information can be queried - qthread_num_shepherds() - qthread_distance() - qthread_sorted_sheps()/_remote() - qthread_init(0) creates 1 shepherd per location - Portable CAS - future_fork_to() - qthread_cacheline() returns the machine's cache line size in bytes Improvements: - Portability improvements - Several bugs in queue management fixed - Workarounds for bad compiler management of volatile operations - Avoid ABA problems in lock-free memory pools - Better memory barriers on Sparc - Better valgrind support - Removed dependency on external cprops library by depending on C++ hash maps --- 1.1.20090123 --- New Features: - Can now force 64-bit alignment - Can now migrate threads between shepherds - More atomic increments (floating point) - Aligned_t qsort() - C++ template wrappers to standard FEB functions - Environment variable controls stack size - SST support - High-resolution timers for profiling Improvements: - Lock-free internal memory pooling - Lock-free thread queueing - Better Apple PPC64/PPC/x86/x86_64 support - Better Sparc support - Better architecture detection - Documentation fixes - Detects more makecontext() irregularities - Better C++ portability/compatibility - Better compatibility with unusual compilers - Better test cases - Uses compiler-builtin atomic operations when available - Better 64-bit support - Real serial mode - Faster qt_loop_balance synchronization - Fixed atomic increment volatility declaration - Thread counting also checks hash stripe balance --- 1.0 --- New Features: - Added error handling. Improvements: - Reorganized osx_compat stuff - Released under BSD OSS license with Sandia's blessing! - Fixed some portability issues with increment functions. --- 0.8 --- New Features: - Added qloop functions to provide precisely balanced loop spawning. Still relatively primitive, but the interface allows for bare-minimum overhead (in terms of context-swaps and memory footprint) with more advanced scheduling. - Configurable setrlimit and atomic increment use Improvements: - Atomic increments are more widely used throughout library. - Added qutil documentation - Improved futurelib documentation - Added workaround for GCC's broken gcse - Bugfixes and portability improvements on architectures/environments where atomic increments are unsupported - Minor improvements to qalloc (from Vitus) - Eliminated memory imbalancing by reintroducing locks. (Basic testing shows it does not dramatically increase overhead.) --- 0.7 --- New Features: - qthread_incr() can be used for atomic increments --- 0.6 --- Improvements: - Threads now use pthread thread-local (TSD) memory instead of doing lookups into bottleneck data-structures - Futurelib and qthreads are now more tightly coupled, which obviates some bottleneck data-structures - Removed locks when spawning qthreads and/or futures from a qthread (locks are now only needed in special cases) - qthread_lock() and qthread_unlock() are more parallel - Better futurelib documentation (see README) --- 0.5 --- New Features: - Threads can have return values, which obey FEB semantics - qthread_stackleft() returns the number of bytes left in the stack (with some inaccuracy) Improvements: - Functions are now anonymous, and there is no major distinction between detached and undetached threads - Removed qthread_join(); use qthread_readFF on a thread's return value instead - FutureLib uses behavior-templates rather than type-templates Bugfixes: - Corrected a race condition in the FEB handling that could lead to deadlock --- 0.4 --- New Features: - man pages for all major functions - added qthread_feb_status() Improvements: - changed the qthread_f prototype to be easier to use --- 0.3 --- New Features: - added qthread_writeF(), which as the same arguments and effect as writeEF, but does not block - added qthread_prepare(), qthread_schedule(), and associated functions to decouple thread creation from thread scheduling Improvements: - added information about compiling with the PGI compiler - added support for "make check" to test most of the major functionality in the library - made qthread_fork() (and related functions) significantly faster if called from within a qthread by making it possible to avoid using mutexes to protect memory pools - qthread_shep() may now take a NULL if you don't have a qthread_t handy (qthread_shep(NULL) is faster than qthread_shep(qthread_self())) Bug fixes: - corrected the behavior of qthread_readFF() and readFE() (they were dereferencing things too many times) - qthread_unlock() will now function correctly if unlocking something that's already unlocked (it could get into a deadlock situation before, because it wasn't cleaning up after itself) - fixed a typo in qalloc_dynmalloc() that could cause deadlock on some architectures - corrected memory pooling to eliminate assertion failures