Release Notes for Intel(R) Xeon Phi(TM) processor x200 software for Linux* Release Release Version: 1.5.4 Wed 12/27/2017 NOTE: This document refers to systems containing the following Intel(R) products: Intel(R) Many Integrated Core (MIC) Architecture, Intel(R) Xeon Phi(TM) Processor X200 Product Family *Other names and brands may be claimed as the property of others DISCLAIMER: Intel is making no claims of usability, efficacy or warranty. The license.txt contained herein completely defines the license and use of this software except in the cases of the GPL components. This document contains information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information. The code contained in these modules may be specific to the Intel product line Intel(R) Xeon Phi(TM) Processor X200 Product Family and is not backward compatible with other Intel products. Additionally, Intel makes no commitments for support of the code or instruction set in future products. Table Of Contents 1. Changes 2. Product Known Issues 3. Software Known Issues 4. Resolved Issues 1. Changes 2. Product Known Issues ------------------------------------------------------------------------- Change number: 0000001 Component: OS Description: “Cluster 0/4” in SNC4 does not get full bandwidth even for local accesses Impact: OS puts idle threads in mwait. In SNC4, the addresses for these OS mwaits were being assigned such they all (288 of them, one per thread) were getting mapped into the tag directories in cluster 0. This was overwhelming the mechanism architected to handle mwait wakeup, causing significant slowdowns for the memory traffic getting routed through the tag directories in cluster 0. We don’t see this behavior for Quadrant and All-to-all modes because the mwait addresses are distributed to all tag directories, and therefore the mwait mechanisms do not get overwhelmed. References: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/fork.c?id=725fc629ff2545b061407305ae51016c9f928fce Mitigation: OS patch has been released that distributes the OS mwait lines more uniformly over all clusters in SNC4, preventing this issue. ------------------------------------------------------------------------- Change number: 0000002 Component: OS Description: In SNC4, the traffic targeting only remote clusters shows much lower BW compared to the traffic targeting only local clusters Impact: The processor follows YX routing for sending messages on the mesh. In SNC4 mode, when targeting all traffic to a remote cluster, the traffic (both address and data) predominantly flows in one direction: for example same Y and same X direction to go to the remote cluster that is diagonally across. This largely unidirectional flow cuts down effective BW on the mesh and causes congestion that results on further slowdowns. Mesh resources are provisioned for uniform distribution of traffic, such as those seen in Quadrant, all-to-all modes and local traffic in SNC4 mode. It is not provisioned for predominantly remote traffic in SNC4 mode. Mitigation: No HW or SW fix for this. Usage model needs to avoid drawing most of the traffic from remote clusters in SNC4 mode. They should keep the traffic local. If the access pattern is such that large amounts of remote traffic cannot be avoided then consider using Quadrant mode. ------------------------------------------------------------------------- Change number: 0000003 Component: OS Description: Application Performance can degrade over time when MCDRAM is used in cache mode Impact: The direct-mapped nature of MCDRAM-as-cache is by design. Real applications that need more than 16GB of memory will not exhibit this time-based degradation; it is really only a problem for benchmarking when running smaller datasets. Mitigation: Workaround is to reboot the node. Refer to section 8.1 in xppsl user guide for additional workaround 3. Software Known Issues 4. Resolved Issues XPPSM-1043 [OOF] COIProxy on host cannot be destroyed when the COIProcess create fails XPPSM-1101 The KMP addons package obsoletes itself on SLES XPPSM-1104 Incorrect CPU version returned by CPUID