opafabricanalysis [-b|-e] [-s] [-d dir] [-c file] [-t portsfile] [-p ports] [-T topology_input]
Specifies the list of local HFI ports used to access fabrics for analysis.
opafabricanalysis opafabricanalysis -p '1:1 1:2 2:1 2:2'
The fabric analysis tool checks the following:
The following environment variables are also used by this command:
If the specified port does not exist or is empty, the first active port on the local system is used. In more complex configurations, you must specify the exact ports to use for all fabrics to be analyzed.
You can specify the topology_input file to be used with one of the following methods:
On the command line using the -T option.
In a file specified through the environment variable
FF_TOPOLOGY_FILE.Using the ff_topology_file configuration option in opafastfabric.conf.
If the specified file does not exist, no topology_input file is used. Alternately the filename can be specified as NONE to prevent use of an input file.
For more information on topology_input, refer to opareport.
By default, the error analysis includes PMA counters and slow links (that is, links running below enabled speeds). You can change this using the FF_FABRIC_HEALTH configuration parameter in opafastfabric.conf. This parameter specifies the opareport options and reports to be used for the health analysis. It also can specify the PMA counter clearing behavior (-I seconds, -C, or none at all).
When a topology_input file is used, it can also be useful to extend FF_FABRIC_HEALTH to include fabric topology verification options such as -o verifylinks.
The thresholds for PMA counter analysis default to /etc/opa/opamon.conf. However, you can specify an alternate configuration file for thresholds using the -c option. The opamon.si.conf file can also be used to check for any non-zero values for signal integrity (SI) counters.
All files generated by opafabricanalysis start with fabric in their file name. This is followed by the port selection option identifying the port used for the analysis. Default is 0:0.
The opafabricanalysis tool generates files such as the following within FF_ANALYSIS_DIR:
Baseline
During a baseline run, the following files are also created in FF_ANALYSIS_DIR/latest.
baseline/fabric.0:0.snapshot.xml
opareport snapshot of complete fabric components and SMA configuration.
opareport summary of fabric components and basic SMA configuration.
opareport summary of internal and external links.
latest/fabric.0:0.snapshot.xml
opareport snapshot of complete fabric components and SMA configuration.
stderr of opareport during snapshot.
stdout of opareport for errors encountered during fabric error analysis.
stderr of opareport during fabric error analysis.
stdout of opareport for fabric components and SMA configuration.
stderr of opareport for fabric components.
diff of baseline and latest fabric components.
stdout of opareport summary of internal and external links.
stderr of opareport summary of internal and external links.
diff of baseline and latest fabric internal and external links.
stderr of opareport comparison of links.
opareport comparison of links against baseline. This is typically easier to read than the links.diff file and contains the same information.
stderr of opareport comparison of components.
opareport comparison of components against baseline. This is typically easier to read than the comps.diff file and contains the same information.
The .diff and .changes files are only created if differences are detected.
If the -s option is used and failures are detected, files related to the checks that failed are also copied to the time-stamped directory name under FF_ANALYSIS_DIR.
Based on opareport -o links:
Unconnected/down/missing cables
Added/moved cables
Changes in link width and speed
Changes to Node GUIDs in fabric (replacement of HFI or Switch hardware)
Adding/Removing Nodes [FI, Virtual FIs, Virtual Switches, Physical Switches, Physical Switch internal switching cards (leaf/spine)]
Changes to server or switch names
Based on opareport -o comps:
Overlap with items from links report
Changes in port MTU, LMC, number of VLs
Changes in port speed/width enabled or supported
Changes in HFI or switch device IDs/revisions/VendorID (for example, ASIC hardware changes)
Changes in port Capability mask (which features/agents run on port/server)
Changes to ErrorLimits and PKey enforcement per port
Changes to IOUs/IOCs/IOC Services provided
Location (port, node) and number of SMs in fabric. Includes:
Based on opareport -s -C -o errors -o slowlinks: