Tuesday, July 12, 2016

Everything you wanted to know about Intel OmniPath Host Fabric Interface

Everything you wanted to know about 
Intel Omni-Path Host Fabric Interface

What is Intel Omni-Path?
 
Intel Omni-Path  is the technology behinds Intel's push on High Speed Networking on the HPC market.


The Hardware Host Fabric Interface, or the hfi
 
Once the Omnipath HFI is installed on the Linux system on lspci it will be

[root@sjsc-xxx ~]# lspci | grep -i Omni-Path
82:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 10)


For a detailed view of the hardware spec:

[root@sjsc-xxx ~]# lspci -vvv -s 82:00.0
82:00.0 Fabric controller: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete] (rev 10)
Subsystem: Intel Corporation Omni-Path HFI Silicon 100 Series [discrete]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 11
Region 0: Memory at ec000000 (64-bit, non-prefetchable) [size=64M]
Expansion ROM at f0000000 [disabled] [size=128K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <8us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L0s <4us, L1 <64us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 4s to 13s, TimeoutDis-, LTR-, OBFF Disabled
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=256 Masked-
Vector table: BAR=0 offset=00100000
PBA: BAR=0 offset=00110000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt+ UnxCmplt+ RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [148 v1] #19
Capabilities: [178 v1] Transaction Processing Hints
Device specific mode supported
No steering table available

[root@sjsc-xxx ~]# 



The Software Stack
Omnipath drivers can be downloaded from Intel Download Center website:
https://downloadcenter.intel.com/product/92007/Intel-Omni-Path-Host-Fabric-Interface-Adapter-100-Series-1-Port-PCIe-x16

https://downloadcenter.intel.com/download/26064/Intel-Omni-Path-Fabric-Software-Including-Intel-Omni-Path-Host-Fabric-Interface-Driver-?product=92007 is the Intel drivers for hfi.

The driver package contains the hfi1 driver along with the firmware for the host fabric interface (hfi).

Once all the low level driver/firmware software is loaded properly and we have Omni-Path compatible cable connected, then we can see the port comes up to the active state.

[root@localhost ~]#


[   12.964126] hfi1 0000:82:00.0: hfi1_0: set_link_state: current INIT, new ARMED
[   12.964131] hfi1 0000:82:00.0: hfi1_0: logical state changed to PORT_ARMED (0x3)
[   12.964134] hfi1 0000:82:00.0: hfi1_0: send_idle_message: sending idle message 0x103
[   12.964212] hfi1 0000:82:00.0: hfi1_0: read_idle_message: read idle message 0x103
[   12.964215] hfi1 0000:82:00.0: hfi1_0: handle_sma_message: SMA message 0x1
[   12.964681] hfi1 0000:82:00.0: hfi1_0: set_link_state: current ARMED, new ACTIVE
[   12.964684] hfi1 0000:82:00.0: hfi1_0: logical state changed to PORT_ACTIVE (0x4)
[   12.964697] hfi1 0000:82:00.0: hfi1_0: send_idle_message: sending idle message 0x203
[   12.965279] hfi1 0000:82:00.0: hfi1_0: read_idle_message: read idle message 0x203
[   12.965281] hfi1 0000:82:00.0: hfi1_0: handle_sma_message: SMA message 0x2
[   16.143492] hfi1 0000:82:00.0: hfi1_0: Switching to NO_DMA_RTAIL
[root@localhost ~]#
For Intel Omni-Path to work the port of hfi card needs to be connected to the Omni-Path switch or directly to another hfi card on some other system for point to point access.

Then you will see the ipoib port in the ifconfig's output:

[root@sjsc-xxx ~]# ifconfig ib0
ib0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 2044
       inet 172.18.51.69  netmask 255.255.224.0  broadcast 172.18.63.255
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
       infiniband 80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
       RX packets 0  bytes 0 (0.0 B)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 0  bytes 0 (0.0 B)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@sjsc-xxx ~]#

 

 The driver and it's depencencies:

[root@sjsc-xxx ~]# lsmod |grep hfi
hfi1                  655730  1
ib_mad                 47817  4 hfi1,ib_cm,ib_sa,ib_umad
ib_core                98787  11 hfi1,rdma_cm,ib_cm,ib_sa,iw_cm,ib_mad,ib_ucm,ib_umad,ib_uverbs,ib_ipoib
[root@sjsc-xxx ~]#



[root@sjsc-xxx ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0
DEVICE=ib0
NAME="Infiniband ib0"
TYPE=InfiniBand
ONBOOT=yes
NM_CONTROLLED=no
BOOTPROTO=static
PREFIX=19
IPADDR=172.18.51.69
[root@sjsc-xxx ~]#

[root@sjsc-xxx ~]# ethtool  ib0
Settings for ib0:
No data available
[root@sjsc-xxx ~]# ifconfig ib0
ib0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 2044
       inet 172.18.51.69  netmask 255.255.224.0  broadcast 172.18.63.255
Infiniband hardware address can be incorrect! Please read BUGS section in ifconfig(8).
       infiniband 80:00:00:02:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00  txqueuelen 256  (InfiniBand)
       RX packets 0  bytes 0 (0.0 B)
       RX errors 0  dropped 0  overruns 0  frame 0
       TX packets 0  bytes 0 (0.0 B)
       TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[root@sjsc-xxx ~]#



The hfi firmware: 
[root@sjsc-xxx ~]# dmesg |grep -i firmware |grep hfi |grep version

[   21.337504] hfi1 0000:82:00.0: hfi1_0: 8051 firmware version 0.38
[   21.344356] hfi1 0000:82:00.0: hfi1_0: Lane 0 firmware: version 0x1055, prod_id 0x0041
[   21.353209] hfi1 0000:82:00.0: hfi1_0: Lane 1 firmware: version 0x1055, prod_id 0x0041
[   21.362050] hfi1 0000:82:00.0: hfi1_0: Lane 2 firmware: version 0x1055, prod_id 0x0041
[   21.370900] hfi1 0000:82:00.0: hfi1_0: Lane 3 firmware: version 0x1055, prod_id 0x0041
[root@sjsc-xxx ~]#

 The hfi1 driver info:


[root@localhost ~]# modinfo hfi1
filename:       /lib/modules/3.10.0-327.4.4.el7.x86_64/updates/hfi1.ko
version:        0.10-121
description:    Intel Omni-Path Architecture driver
license:        Dual BSD/GPL
rhelversion:    7.2
srcversion:     E2F417E21B6A8F9F673CC41
alias:          pci:v00008086d000024F1sv*sd*bc*sc*i*
alias:          pci:v00008086d000024F0sv*sd*bc*sc*i*
depends:        ib_core,ib_mad
vermagic:       3.10.0-327.4.4.el7.x86_64 SMP mod_unload modversions
parm:           lkey_table_size:LKEY table size in bits (2^n, 1 <= n <= 23) (uint)
parm:           max_pds:Maximum number of protection domains to support (uint)
parm:           max_ahs:Maximum number of address handles to support (uint)
parm:           max_cqes:Maximum number of completion queue entries to support (uint)
parm:           max_cqs:Maximum number of completion queues to support (uint)
parm:           max_qp_wrs:Maximum number of QP WRs to support (uint)
parm:           max_qps:Maximum number of QPs to support (uint)
parm:           max_sges:Maximum number of SGEs to support (uint)
parm:           max_mcast_grps:Maximum number of multicast groups to support (uint)
parm:           max_mcast_qp_attached:Maximum number of attached QPs to support (uint)
parm:           max_srqs:Maximum number of SRQs to support (uint)
parm:           max_srq_sges:Maximum number of SRQ SGEs to support (uint)
parm:           max_srq_wrs:Maximum number of SRQ WRs support (uint)
parm:           piothreshold:size used to determine sdma vs. pio (ushort)
parm:           sdma_comp_size:Size of User SDMA completion ring. Default: 128 (uint)
parm:           sdma_descq_cnt:Number of SDMA descq entries (uint)
parm:           sdma_idle_cnt:sdma interrupt idle delay (ns,default 250) (uint)
parm:           num_sdma:Set max number SDMA engines to use (uint)
parm:           desct_intr:Number of SDMA descriptor before interrupt (uint)
parm:           qp_table_size:QP table size (uint)
parm:           pcie_caps:Max PCIe tuning: Payload (0..3), ReadReq (4..7) (int)
parm:           aspm:PCIe ASPM: 0: disable, 1: enable, 2: dynamic (uint)
parm:           pcie_target:PCIe target speed (0 skip, 1-3 Gen1-3) (uint)
parm:           pcie_force:Force driver to do a PCIe firmware download even if already at target speed (uint)
parm:           pcie_retry:Driver will try this many times to reach requested speed (uint)
parm:           pcie_pset:PCIe Eq Pset value to use, range is 0-10 (uint)
parm:           num_user_contexts:Set max number of user contexts to use (uint)
parm:           krcvqs:Array of the number of non-control kernel receive queues by VL (array of uint)
parm:           rcvarr_split:Percent of context's RcvArray entries used for Eager buffers (uint)
parm:           eager_buffer_size:Size of the eager buffers, default: 2MB (uint)
parm:           rcvhdrcnt:Receive header queue count (default 2048) (uint)
parm:           hdrq_entsize:Size of header queue entries: 2 - 8B, 16 - 64B (default), 32 - 128B (uint)
parm:           user_credit_return_threshold:Credit return threshold for user send contexts, return when unreturned credits passes this many blocks (in percent of allocated blocks, 0 is off) (uint)
parm:           max_mtu:Set max MTU bytes, default is 8192 (uint)
parm:           cu:Credit return units (uint)
parm:           prescan_rxq:Used to toggle rx prescan. Set to 1 to enable prescan (uint)
parm:           cap_mask:Bit mask of enabled/disabled HW features
parm:           kdeth_qp:Set the KDETH queue pair prefix (uint)
parm:           num_vls:Set number of Virtual Lanes to use (1-8) (uint)
parm:           rcv_intr_timeout:Receive interrupt mitigation timeout in ns (uint)
parm:           rcv_intr_count:Receive interrupt mitigation count (uint)
parm:           link_crc_mask:CRCs to use on the link (ushort)
parm:           loopback:Put into loopback mode (1 = serdes, 3 = external cable (uint)
parm:           mem_affinity:Bitmask for memory affinity control: 0 - device, 1 - process (uint)
[root@localhost ~]#

The kernel version info:
[root@localhost ~]# uname -r
3.10.0-327.4.4.el7.x86_64
[root@localhost ~]#   

 Further reading: http://www.anandtech.com/show/9561/exploring-intels-omnipath-network-fabric

No comments:

Post a Comment