Linux* Base Drivers for Intel® Ethernet Network Connection

NOTE: This release includes four Linux* Base Drivers for Intel® Ethernet Network Connection. These drivers are named igb,  e1000, e1000e and igbvf.
  • igb driver supports all 82575, 82576 and 82580-based gigabit network connections.
  • The igbvf driver supports 82576-based virtual function devices that can only be activated on kernels that support SR-IOV. SR-IOV requires the correct platform and OS support.
  • e1000 driver supports all PCI and PCI-X gigabit network connections.
  • e1000e driver supports all PCI Express gigabit network connections, except those that are 82580, 82575 and 82576-based.

NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by the e1000 driver, not the e1000e due to the 82546 part being used behind a PCI Express bridge.

First identify your adapter.  Then follow the appropriate steps for building, installing, and configuring the driver.

UPGRADING: If you currently have the e1000 driver installed and need to install e1000e, perform the following:

  • If your version of e1000 is 7.6.15.5 or less, upgrade to e1000 version 8.x, using the instructions in the e1000 section.
  • Install the e1000e driver using the instructions in the e1000e section.
  • Modify /etc/modprobe.conf to point your PCIe devices to use the new e1000e driver using alias ethX e1000e, or use your distribution's specific method for configuring network adapters like RedHat's setup/system-config-network
    or SuSE's yast2.

Identifying Your Adapter

First identify your adapter.  Then select the name of the specified base driver: igb, e1000 or e1000e.

For more information on how to identify your adapter, go to the Adapter & Driver ID Guide at:

http://support.intel.com/support/go/network/adapter/idguide.htm

For the latest Intel network drivers for Linux, refer to the following website. Select the link for your adapter.  

http://support.intel.com/support/go/network/adapter/home.htm


Using the igb Base Driver

Overview

Building and Installation

Command Line Parameters

Additional Configurations

Known Issues

Overview

The Linux base drivers support the 2.4.x and 2.6.x kernels. These drivers includes support for Itanium® 2-based systems.

These drivers are only supported as a loadable module. Intel is not supplying patches against the kernel source to allow for static linking of the drivers. For questions related to hardware requirements, refer to the documentation supplied with your Intel Gigabit adapter. All hardware requirements listed apply to use with Linux.

The following features are now available in supported kernels:

Channel Bonding documentation can be found in the Linux kernel source: /documentation/networking/bonding.txt

The igb driver supports IEEE 1588 time stamping for kernels 2.6.30 and above.

The driver information previously displayed in the /proc file system is not supported in this release. Alternatively, you can use ethtool (version 1.6 or later), lspci, and ifconfig to obtain the same information. Instructions on updating ethtool can be found in the section Additional Configurations later in this document.


Building and Installation

To build a binary RPM* package of this driver, run 'rpmbuild -tb igb.tar.gz'.

NOTES:
  • For the build to work properly, the currently running kernel MUST match the version and configuration of the installed kernel sources. If you have just recompiled the kernel reboot the system now.

  • RPM functionality has only been tested in Red Hat distributions.

  1. Move the base driver tar file to the directory of your choice. For example, use '/home/username/igb' or '/usr/local/src/igb'.

  2. Untar/unzip the archive, where <x.x.x> is the version number for the driver tar file:

    tar zxf igb-<x.x.x>.tar.gz

  3. Change to the driver src directory, where <x.x.x> is the version number for the driver tar:

    cd igb-<x.x.x>/src/

  4. Compile the driver module:

    # make install

    The binary will be installed as:

    /lib/modules/<KERNEL VERSION>/kernel/drivers/net/igb/igb.[k]o

    The install location listed above is the default location. This may differ for various Linux distributions.

  5. Load the module using either the insmod or modprobe command:

    modprobe igb

    insmod igb

    Note that for 2.6 kernels the insmod command can be used if the full path to the driver module is specified. For example:

        insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/igb/igb.ko

    With 2.6 based kernels also make sure that older igb drivers are removed from the kernel, before loading the new module:

    rmmod igb; modprobe igb

  6. Assign an IP address to the interface by entering the following, where <x> is the interface number:

    ifconfig eth<x> <IP_address>

  7. Verify that the interface works. Enter the following, where <IP_address> is the IP address for another machine on the same subnet as the interface that is being tested:

    ping <IP_address>

TROUBLESHOOTING: Some systems have trouble supporting MSI and/or MSI-X interrupts. If you believe your system needs to disable this style of interrupt, the driver can be built and installed with the command:

# make CFLAGS_EXTRA=-DDISABLE_PCI_MSI install

Normally the driver will generate an interrupt every two seconds, so if you can see that you're no longer getting interrupts in cat /proc/interrupts for the ethX igb device, then this workaround may be necessary.

To build igb driver with DCA:

If your kernel supports DCA, the driver will build by default with DCA enabled.


Command Line Parameters

If the driver is built as a module, the following optional parameters are used by entering them on the command line with the modprobe command using this syntax:

modprobe igb [<option>=<VAL1>,<VAL2>,...]

For example, with two Gigabit PCI adapters, entering:

modprobe igb TxDescriptors=80,128

loads the igb driver with 80 TX descriptors for the first adapter and 128 TX descriptors for the second adapter.

The default value for each parameter is generally the recommended setting, unless otherwise noted.

NOTES:
  • For more information about the AutoNeg, Duplex, and Speed parameters, see the Speed and Duplex Configuration section in this document.

  • For more information about the InterruptThrottleRate, RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay parameters, see the application note at: http://www.intel.com/design/network/applnots/ap450.htm.

  • A descriptor describes a data buffer and attributes related to the data buffer. This information is accessed by the hardware.

Parameter Name Valid Range/Settings Default Description
InterruptThrottleRate
Valid Range: 0,1,3,100-100000 (0=off, 1=dynamic, 3=dynamic conservative)
 
3 The driver can limit the amount of interrupts per second that the adapter will generate for incoming packets. It does this by writing a value to the adapter that is based on the maximum amount of interrupts that the adapter will generate per second.

Setting InterruptThrottleRate to a value greater or equal to 100 will program the adapter to send out a maximum of that many interrupts per second, even if more packets have come in. This reduces interrupt load on the system and can lower CPU utilization under heavy load, but will increase latency as packets are not processed as quickly.

The default behaviour of the driver previously assumed a static InterruptThrottleRate value of 8000, providing a good fallback value for all traffic types, but lacking in small packet performance and latency. The hardware can handle many more small packets per second however, and for this reason an adaptive interrupt moderation algorithm was implemented.

The driver has two adaptive modes (setting 1 or 3) in which it dynamically adjusts the InterruptThrottleRate value based on the traffic that it receives. After determining the type of incoming traffic in the last timeframe, it will adjust the InterruptThrottleRate to an appropriate value for that traffic.

The algorithm classifies the incoming traffic every interval into classes. Once the class is determined, the InterruptThrottleRate value is adjusted to suit that traffic type the best. There are three classes defined: "Bulk traffic", for large amounts of packets of normal size; "Low latency", for small amounts of traffic and/or a significant percentage of small packets; and "Lowest latency", for almost completely small packets or minimal traffic.

In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 for traffic that falls in class "Bulk traffic". If traffic falls in the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is increased stepwise to 20000. This default mode is suitable for most applications.

For situations where low latency is vital such as cluster or grid computing, the algorithm can reduce latency even more when InterruptThrottleRate is set to mode 1. In this mode, which operates the same as mode 3, the InterruptThrottleRate will be increased stepwise to 70000 for traffic in class "Lowest latency".

Setting InterruptThrottleRate to 0 turns off any interrupt moderation and may improve small packet latency, but is generally not suitable for bulk throughput traffic

NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and RxAbsIntDelay parameters. In other words, minimizing the receive and/or transmit absolute delays does not force the controller to generate more interrupts than what the Interrupt Throttle Rate allows.

LLIPort 0-65535
 
0 (disabled) LLI (Low Latency Interrupts): LLI allows for immediate generation of an interrupt upon processing receive packets that match certain criteria as set by the parameters described below. LLI parameters are not enabled when Legacy interrupts are used. You must be using MSI or MSI-X (see cat /proc/interrupts) to successfully use LLI.

LLI is configured with the LLIPort command-line parameter, which specifies which TCP port should generate Low Latency Interrupts.

For example, using LLIPort=80 would cause the board to generate an immediate interrupt upon receipt of any packet sent to TCP port 80 on the local machine.

CAUTION: Enabling LLI can result in an excessive number of interrupts/second that may cause problems with the system and in some cases may cause a kernel panic.

LLIPush 0-1
 
0 (disabled)
 
LLIPush can be set to be enabled or disabled (default). It is most effective in an environment with many small transactions.
NOTE: Enabling LLIPush may allow a denial of service attack.
LLISize 0-1500 0 (disabled) LLISize causes an immediate interrupt if the board receives a packet smaller than the specified size.
 
IntMode
 
0-2 (0 = Legacy Int, 1 = MSI and 2 = MSI-X) 2 IntMode controls allow load time control over the type of interrupt registered for by the driver. MSI-X is required for multiple queue support, and some kernels and combinations of kernel .config options will force a lower level of interrupt support. 'cat /proc/interrupts' will show different values for each type of interrupt.
RSS 0-8 1 0 - Assign up to whichever is less, number of CPUS or number of queues
X - Assign X queues where X is less than the maximum number of queues

NOTE: for 82575-based adapters the maximum number of queues is 4; for  82576-based and newer adapters it is 8.

This parameter is also affected by the VMDq parameter in that it will limit the queues more.

VMDQ
Model 0 1 2 3+
82575 4 4 3 1
82576 8 2 2 2
82580 8 1 1 1
VMDQ 0 - 4 on 82575-based adapters; and 0 - 8 for 82576/82580-based adapters.

0 = disabled
1 = sets the netdev as pool 0
2+ = add additional queues but they currently are not used.
0 Supports enabling VMDq pools as this is needed to support SR-IOV.

This parameter is forced to 1 or more if the max_vfs module parameter is used.  In addition the number of queues available for RSS is limited if this is set to 1 or greater.
max_vfs 0-7

If the value is greater than 0 it will also force the VMDq parameter to be 1 or more.
0

This parameter adds support for SR-IOV. It causes the driver to spawn up to max_vfs worth of virtual function.
QueuePairs 0-1 1 (TX and RX will be paired onto one interrupt vector) If set to 0, when MSI-X is enabled, the TX and RX will attempt to occupy separate vectors.

This option can be overridden to 1 if there are not sufficient interrupts available. This can occur if any combination of RSS, VMDQ, and max_vfs  results in more than 4 queues being used.
Node 0-n

0 - n: where n is the number of the NUMA node that should be used to allocate memory for this adapter port.

-1: uses the driver default of allocating memory on whichever processor is running insmod/modprobe.

-1 (off) The Node parameter will allow you to pick which NUMA node you want to have   the adapter allocate memory from.  All driver structures, in-memory queues, and receive buffers will be allocated on the node specified.  This parameter is only useful when interrupt affinity is specified, otherwise some portion of the time the interrupt could run on a different core than the memory is allocated on, causing slower memory access and impacting throughput, CPU, or both. 
EEE 0-1 1 (enabled)

A link between two EEE-compliant devices will result in periodic bursts of data followed by periods where the link is in an idle state. This Low Power Idle (LPI) state is supported in both 1Gbps and 100Mbps link speeds.

NOTE: EEE support requires autonegotiation.

DMAC 0, 250, 500, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000. 0 (disabled)

Enables or disables DMA Coalescing feature. Values are in usec’s and increase the internal DMA Coalescing feature’s internal timer. DMA (Direct Memory Access) allows the network device to move packet data directly to the system's memory, reducing CPU utilization. However, the frequency and random intervals at which packets arrive do not allow the system to enter a lower power state. DMA Coalescing allows the adapter to collect packets before it initiates a DMA event. This may increase network latency but also increases the chances that the system will enter a lower power state.

Turning on DMA Coalescing may save energy with kernel 2.6.32 and later. This will impart the greatest chance for your system to consume less power. DMA Coalescing is effective in helping potentially saving the platform power only when it is enabled across all active ports.

InterruptThrottleRate (ITR) should be set to dynamic. When ITR=0, DMA Coalescing is automatically disabled.

A whitepaper containing information on how to best configure your platform is available on the Intel website.

MDD (Malicious Driver Detection) Valid Range: 0, 1; 0 = Disable, 1 = Enable Default Value: 1 This parameter is only relevant for I350 devices operating in SR-IOV mode. When this parameter is set, the driver detects malicious VF driver and disables its TX/RX queues until a VF driver reset occurs.


Additional Configurations

Configuring the Driver on Different Distributions

Configuring a network driver to load properly when the system is started is distribution dependent. Typically, the configuration process involves adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well as editing other system startup scripts and/or configuration files. Many popular Linux distributions ship with tools to make these changes for you. To learn the proper way to configure a network device for your system, refer to your distribution documentation. If during this process you are asked for the driver or module name, the name for the Linux Base Driver for the Gigabit family of adapters is e1000.

As an example, if you install the igb driver for two Gigabit adapters (eth0 and eth1) and want to set the interrupt mode to MSI-X and MSI respectively, add the following to modules.conf or /etc/modprobe.conf:

alias eth0 igb
alias eth1 igb
options igb IntMode=2,1

Viewing Link Messages

Link messages will not be displayed to the console if the distribution is restricting system messages. In order to see network driver link messages on your console, set dmesg to eight by entering the following:

dmesg -n 8
NOTE: This setting is not saved across reboots.

Jumbo Frames

Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) to a value larger than the default value of 1500. Use the ifconfig command to increase the MTU size. For example:

ifconfig eth<x> mtu 9000 up

This setting is not saved across reboots. The setting change can be made permanent by adding MTU=9000 to the file: /etc/sysconfig/network-scripts/ifcfg-eth<x> (Red Hat distributions). Other distributions may store this setting in a different location.

NOTES:
  • To enable Jumbo Frames, increase the MTU size on the interface beyond 1500.

  • The maximum MTU setting for Jumbo Frames is 9216. This value coincides with the maximum Jumbo Frames size of 9234.

  • Using Jumbo frames at 10 or 100 Mbps is not supported and may result in poor performance or loss of link.

ethtool

The driver utilizes the ethtool interface for driver configuration and diagnostics, as well as displaying statistical information. ethtool version 3 or later is required for this functionality, although we strongly recommend downloading the latest version at:

http://ftp.kernel.org/pub/software/network/ethtool/.

Speed and Duplex Configuration

Speed and Duplex are configured through the ethtool* utility. ethtool is included with all versions of Red Hat after Red Hat 7.2. For other Linux distributions, download and install ethtool from the following website: http://ftp.kernel.org/pub/software/network/ethtool/.

Enabling Wake on LAN* (WoL)

WoL is configured through the ethtool* utility. ethtool is included with all versions of Red Hat after Red Hat 7.2. For other Linux distributions, download and install ethtool from the following website: http://ftp.kernel.org/pub/software/network/ethtool/.

For instructions on enabling WoL with ethtool, refer to the website listed above.

WoL will be enabled on the system during the next shut down or reboot. For this driver version, in order to enable WoL, the e1000 driver must be loaded prior to shutting down or suspending the system.

NOTES: Wake On LAN is only supported on port A of multi-port devices.

Wake On LAN is not supported for the Intel® Gigabit VT Quad Port Server Adapter.

Multiqueue

In this mode, a separate MSI-X vector is allocated for each queue and one for "other" interrupts such as link status change and errors. All interrupts are throttled via interrupt moderation. Interrupt moderation must be used to avoid interrupt storms while the driver is processing one interrupt. The moderation value should be at least as large as the expected time for the driver to process an interrupt. Multiqueue is off by default.

Requirements: MSI-X support is required for Multiqueue. If MSI-X is not found, the system will fallback to MSI or to Legacy interrupts. This driver supports multiqueue in kernel versions 2.6.24 and greater. This driver supports receive multiqueue on all kernels that support MSI-X.

NOTES: Do not use MSI-X with the 2.6.19 or 2.6.20 kernels.

On some kernels a reboot is required to switch between a single queue mode and multiqueue modes, or vice-versa.

LRO

Large Receive Offload (LRO) is a technique for increasing inbound throughput of high-bandwidth network connections by reducing CPU overhead. It works by aggregating multiple incoming packets from a single stream into a larger buffer before they are passed higher up the networking stack, thus reducing the number of packets that have to be processed. LRO combines multiple Ethernet frames into a single receive in the stack, thereby potentially decreasing CPU utilization for receives.

NOTE: LRO requires 2.4.22 or later kernel version.

IGB_LRO is a compile time flag. The user can enable it at compile time to add support for LRO from the driver. The flag is used by adding CFLAGS_EXTRA="-DIGB_LRO" to the make file when it's being compiled.

    # make CFLAGS_EXTRA="-DIGB_LRO" install

You can verify that the driver is using LRO by looking at these counters in ethtool:

lro_aggregated - count of total packets that were combined
lro_flushed - counts the number of packets flushed out of LRO
lro_recycled - reflects the number of buffers returned to the ring from recycling

NOTE: IPv6 and UDP are not supported by LRO.

MAC and VLAN anti-spoofing feature

When a malicious driver attempts to send a spoofed packet, it is dropped by the hardware and not transmitted. An interrupt is sent to the PF driver notifying it of the spoof attempt.
When a spoofed packet is detected the PF driver will send the following message to the system log (displayed by the "dmesg" command):

Spoof event(s) detected on VF(n)

Where n=the VF that attempted to do the spoofing.

Setting MAC Address, VLAN and Rate Limit Using IProute2 Tool

You can set a MAC address of a Virtual Function (VF), a default VLAN and the rate limit using the IProute2 tool. Download the latest version of the iproute2 tool from Sourceforge if your version does not have all the features you require.

Known Issues

NOTE: After installing the driver, if your Intel Ethernet Network Connection is not working, verify that you have installed the correct driver.

Using the igb driver on 2.4 or older 2.6 based kernels

Due to limited support for PCI-Express in 2.4 kernels and older 2.6 kernels, the igb driver may run into interrupt related problems on some systems, such as no link or hang when bringing up the device.

We recommend the newer 2.6 based kernels, as these kernels correctly configure the PCI-Express configuration space of the adapter and all intervening bridges. If you are required to use a 2.4 kernel, use a 2.4 kernel newer than 2.4.30. For 2.6 kernels we recommend using the 2.6.21 kernel or newer.

Alternatively, on 2.6 kernels you may disable MSI support in the kernel by booting with the "pci=nomsi" option or permanently disable MSI support in your kernel by configuring your kernel with CONFIG_PCI_MSI unset.

Intel® Active Management Technology 2.0, 2.1, 2.5 not supported in conjunction with Linux driver

Detected Tx Unit Hang in Quad Port Adapters

In some cases ports 3 and 4 don't pass traffic and report 'Detected Tx Unit Hang' followed by 'NETDEV WATCHDOG: ethX: transmit timed out' errors. Ports 1 and 2 don't show any errors and will pass traffic.

This issue MAY be resolved by updating to the latest kernel and BIOS. The user is encouraged to run an OS that fully supports MSI interrupts. You can check your system's BIOS by downloading the Linux Firmware Developer Kit that can be obtained at http://www.linuxfirmwarekit.org/

Compiling the Driver

When trying to compile the driver by running make install, the following error may occur:  "Linux kernel source not configured - missing version.h"

To solve this issue, create the version.h file by going to the Linux source tree and entering:

# make include/linux/version.h

Performance Degradation with Jumbo Frames

Degradation in throughput performance may be observed in some Jumbo frames environments. If this is observed, increasing the application's socket buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. See the specific application manual and /usr/src/linux*/Documentation/networking/ip-sysctl.txt for more details.

Jumbo frames on Foundry BigIron 8000 switch

There is a known issue using Jumbo frames when connected to a Foundry BigIron 8000 switch. This is a 3rd party limitation. If you experience loss of packets, lower the MTU size.

Multiple Interfaces on Same Ethernet Broadcast Network

Due to the default ARP behavior on Linux, it is not possible to have one system on two IP networks in the same Ethernet broadcast domain (non-partitioned switch) behave as expected. All Ethernet interfaces will respond to IP traffic for any IP address assigned to the system. This results in unbalanced receive traffic.

If you have multiple interfaces in a server, either turn on ARP filtering by entering:

        echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter

(this only works if your kernel's version is higher than 2.4.5)

NOTE: This setting is not saved across reboots. The configuration change can be made permanent by adding the line:

net.ipv4.conf.all.arp_filter = 1

to the file /etc/sysctl.conf

   or,

install the interfaces in separate broadcast domains (either in different switches or in a switch partitioned to VLANs).

Disable rx flow control with ethtool

In order to disable receive flow control using ethtool, you must turn off auto-negotiation on the same command line.

For example:

     ethtool -A eth? autoneg off rx off

Unplugging network cable while ethtool -p is running

In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging the network cable while ethtool -p is running will cause the system to become unresponsive to keyboard commands, except for control-alt-delete. Restarting the system appears to be the only remedy.

Trouble passing traffic with on ports 1 and 2 using RHEL3

There is a known hardware compatibility issue on some systems with RHEL3 kernels. Traffic on ports 1 and 2 may be slower than expected and ping times higher than expected.

This issue MAY be resolved by updating to the latest kernel and BIOS. You can check your system's BIOS by downloading the Linux Firmware Developer Kit that can be obtained at http://www.linuxfirmwarekit.org/

Do Not Use LRO When Routing Packets

Due to a known general compatibilty issue with LRO and routing, do not use LRO when routing packets.

Build error with Asianux 3.0 - redefinition of typedef 'irq_handler_t'

Some systems may experience build issues due to redefinition of irq_handler_t. To resolve this issue build the driver (step 4 above) using the command:

# make CFLAGS_EXTRA=-DAX_RELEASE_CODE=1 install

MSI-X Issues with Kernels between 2.6.19 - 2.6.21 (inclusive)

Kernel panics and instability may be observed on any MSI-X hardware if you use irqbalance with kernels between 2.6.19 and 2.6.21. If such problems are encountered, you may disable the irqbalance daemon or upgrade to a newer kernel.

Rx Page Allocation Errors

Page allocation failure. order:0 errors may occur under stress with kernels 2.6.25 and above. This is caused by the way the Linux kernel reports this stressed condition.

Under Redhat 5.4-GA - System May Crash when Closing Guest OS Window after Loading/Unloading Physical Function (PF) Driver

Do not remove the igb driver from Dom0 while Virtual Functions (VFs) are assigned to guests. VFs must first use the xm "pci-detach" command to hot-plug the VF device out of the VM it is assigned to or else shut down the
VM.

SLES10 SP3 random system panic when reloading driver

This is a known SLES-10 SP3 issue. After requesting interrupts for MSI-X vectors, system may panic.

Currently the only known workaround is to build the drivers with CFLAGS_EXTRA=-DDISABLE_PCI_MSI if the driver need to be loaded/unloaded. Otherwise the driver can be loaded once and will be safe, but unloading it will lead to the issue.

Enabling SR-IOV in a 32-bit Microsoft* Windows* Server 2008 Guest OS using Intel® 82576-based GbE or Intel® 82599-based 10GbE controller under KVM

KVM Hypervisor/VMM supports direct assignment of a PCIe device to a VM. This includes traditional PCIe devices, as well as SR-IOV-capable devices using Intel 82576-based and 82599-based controllers.

While direct assignment of a PCIe device or an SR-IOV Virtual Function (VF) to a Linux-based VM running 2.6.32 or later kernel works fine, there is a known issue with Microsoft Windows Server 2008 VM that results in a "yellow bang" error. This problem is within the KVM VMM itself, not the Intel driver, or the SR-IOV logic of the VMM, but rather that KVM emulates an older CPU model for the guests, and this older CPU model does not support MSI-X interrupts, which is a requirement for Intel SR-IOV.

If you wish to use the Intel 82576 or 82599-based controllers in SR-IOV mode with KVM and a Microsoft Windows Server 2008 guest try the following workaround. The workaround is to tell KVM to emulate a different model of CPU when using qemu to create the KVM guest:

"-cpu qemu64,model=13"

Host May Reboot after Removing PF when VF is Active in Guest

Using kernel versions earlier than 3.2, do not unload the PF driver with active VFs. Doing this will cause your VFs to stop working until you reload the PF driver and may cause a spontaneous reboot of your system.


Using the e1000 Base Driver

Overview

Building and Installation

Command Line Parameters

Speed and Duplex Configuration

Additional Configurations

Known Issues

Overview

The Linux Base Drivers support the 2.4.x and 2.6.x kernels. These drivers includes support for Itanium® 2-based systems.

These drivers are only supported as a loadable module. Intel is not supplying patches against the kernel source to allow for static linking of the drivers. For questions related to hardware requirements, refer to the documentation supplied with your Intel Gigabit adapter. All hardware requirements listed apply to use with Linux.

The following features are now available in supported kernels:

Channel Bonding documentation can be found in the Linux kernel source: /documentation/networking/bonding.txt

The driver information previously displayed in the /proc file system is not supported in this release. Alternatively, you can use ethtool (version 1.6 or later), lspci, and ifconfig to obtain the same information. Instructions on updating ethtool can be found in the section Additional Configurations later in this document.

NOTE: The Intel® 82562v 10/100 Network Connection only provides 10/100 support.

Building and Installation

To build a binary RPM* package of this driver, run 'rpmbuild -tb e1000.tar.gz'.

NOTES:
  • For the build to work properly, the currently running kernel MUST match the version and configuration of the installed kernel sources. If you have just recompiled the kernel reboot the system now.

  • RPM functionality has only been tested in Red Hat distributions.

  1. Move the base driver tar file to the directory of your choice. For example, use '/home/username/e1000' or '/usr/local/src/e1000'.

  2. Untar/unzip the archive, where <x.x.x> is the version number for the driver tar file:

    tar zxf e1000-<x.x.x>.tar.gz

  3. Change to the driver src directory, where <x.x.x> is the version number for the driver tar:

    cd e1000-<x.x.x>/src/

  4. Compile the driver module:

    # make install

    The binary will be installed as:

    /lib/modules/<KERNEL VERSION>/kernel/drivers/net/e1000/e1000.[k]o

    The install location listed above is the default location. This may differ for various Linux distributions.

  5. Load the module using either the insmod or modprobe command:

    modprobe e1000

    insmod e1000

    Note that for 2.6 kernels the insmod command can be used if the full path to the driver module is specified. For example:

        insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/e1000/e1000.ko

    With 2.6 based kernels also make sure that older e1000 drivers are removed from the kernel, before loading the new module:

    rmmod e1000; modprobe e1000

  6. Assign an IP address to the interface by entering the following, where <x> is the interface number:

    ifconfig eth<x> <IP_address>

  7. Verify that the interface works. Enter the following, where <IP_address> is the IP address for another machine on the same subnet as the interface that is being tested:

    ping <IP_address>


Command Line Parameters

If the driver is built as a module, the following optional parameters are used by entering them on the command line with the modprobe command using this syntax:

modprobe e1000 [<option>=<VAL1>,<VAL2>,...]

For example, with two Gigabit PCI adapters, entering:

modprobe e1000 TxDescriptors=80,128

loads the e1000 driver with 80 TX descriptors for the first adapter and 128 TX descriptors for the second adapter.

The default value for each parameter is generally the recommended setting, unless otherwise noted.

NOTES:
  • For more information about the AutoNeg, Duplex, and Speed parameters, see the Speed and Duplex Configuration section in this document.

  • For more information about the InterruptThrottleRate, RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay parameters, see the application note at: http://www.intel.com/design/network/applnots/ap450.htm.

  • A descriptor describes a data buffer and attributes related to the data buffer. This information is accessed by the hardware.

Parameter Name Valid Range/Settings Default Description
AutoNeg 0x01-0x0F, 0x20-0x2F 0x2F This parameter is a bit mask that specifies which speed and duplex settings the board advertises. When this parameter is used, the Speed and Duplex parameters must not be specified.

This parameter is supported only on adapters using copper connections.

NOTE: Refer to the Speed and Duplex section of this readme for more information on the AutoNeg parameter.

Duplex 0-2 (0=auto-negotiate, 1=half, 2=full) 0 Defines the direction in which data is allowed to flow. Can be either one or two-directional. If both Duplex and the link partner are set to auto-negotiate, the board auto-detects the correct duplex. If the link partner is forced (either full or half), Duplex defaults to half-duplex.

This parameter is supported only on adapters using copper connections.

FlowControl 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) Read flow control settings from the EEPROM This parameter controls the automatic generation(Tx) and response(Rx) to Ethernet PAUSE frames.
InterruptThrottleRate (not supported on Intel(R) 82542, 82543 or 82544-based adapters)
Valid Range: 0,1,3,4, 100-100000 (0=off, 1=dynamic, 3=dynamic conservative, 4=simplified balancing)
 
3 The driver can limit the amount of interrupts per second that the adapter will generate for incoming packets. It does this by writing a value to the adapter that is based on the maximum amount of interrupts that the adapter will generate per second.

Setting InterruptThrottleRate to a value greater or equal to 100 will program the adapter to send out a maximum of that many interrupts per second, even if more packets have come in. This reduces interrupt load on the system and can lower CPU utilization under heavy load, but will increase latency as packets are not processed as quickly.

The default behaviour of the driver previously assumed a static InterruptThrottleRate value of 8000, providing a good fallback value for all traffic types, but lacking in small packet performance and latency. The hardware can handle many more small packets per second however, and for this reason an adaptive interrupt moderation algorithm was implemented.

Since 7.3.x, the driver has two adaptive modes (setting 1 or 3) in which it dynamically adjusts the InterruptThrottleRate value based on the traffic that it receives. After determining the type of incoming traffic in the last timeframe, it will adjust the InterruptThrottleRate to an appropriate value for that traffic.

The algorithm classifies the incoming traffic every interval into classes. Once the class is determined, the InterruptThrottleRate value is adjusted to suit that traffic type the best. There are three classes defined: "Bulk traffic", for large amounts of packets of normal size; "Low latency", for small amounts of traffic and/or a significant percentage of small packets; and "Lowest latency", for almost completely small packets or minimal traffic.

In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 for traffic that falls in class "Bulk traffic". If traffic falls in the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is increased stepwise to 20000. This default mode is suitable for most applications.

For situations where low latency is vital such as cluster or grid computing, the algorithm can reduce latency even more when InterruptThrottleRate is set to mode 1. In this mode, which operates the same as mode 3, the InterruptThrottleRate will be increased stepwise to 70000 for traffic in class "Lowest latency".

In simplified mode the interrupt rate is based on the ratio of tx and rx traffic. If the bytes per second rate is approximately equal, the interrupt rate will drop as low as 2000 interrupts per second. If the traffic is mostly transmit or mostly receive, the interrupt rate could
be as high as 8000.

Setting InterruptThrottleRate to 0 turns off any interrupt moderation and may improve small packet latency, but is generally not suitable for bulk throughput traffic

NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and RxAbsIntDelay parameters. In other words, minimizing the receive and/or transmit absolute delays does not force the controller to generate more interrupts than what the Interrupt Throttle Rate allows.

CAUTION: If you are using the Intel(R) PRO/1000 CT Network Connection (controller 82547), setting InterruptThrottleRate to a value greater than 75,000, may hang (stop transmitting) adapters under certain network conditions. If this occurs a NETDEV WATCHDOG message is logged in the system event log. In addition, the controller is automatically reset, restoring the network connection. To eliminate the potential for the hang, ensure that InterruptThrottleRate is set no greater than 75,000 and is not set to 0.

NOTE: When e1000 is loaded with default settings and multiple adapters are in use simultaneously, the CPU utilization may increase non-linearly. In order to limit the CPU utilization without impacting the overall throughput, we recommend that you load the driver as follows:

modprobe e1000 InterruptThrottleRate=3000,3000,3000

This sets the InterruptThrottleRate to 3000 interrupts/sec for the first, second, and third instances of the driver. The range of 2000 to 3000 interrupts per second works on a majority of systems and is a good starting point, but the optimal value will be platform-specific. If CPU utilization is not a concern, use RX_POLLING (NAPI) and default driver settings.

RxDescriptors 80-4096 256
This value specifies the number of receive buffer descriptors allocated by the driver. Increasing this value allows the driver to buffer more incoming packets, at the expense of increased system memory utilization.

Each descriptor is 16 bytes. A receive buffer is also allocated for each descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending on the MTU setting. The maximum MTU size is 16110.

NOTE: MTU designates the frame size. It only needs to be set for Jumbo Frames. Depending on the available system resources, the request for a higher number of receive descriptors may be denied. In this case, use a lower number.

RxIntDelay 0-65535 (0=off) 0 This value delays the generation of receive interrupts in units of 1.024 microseconds. Receive interrupt reduction can improve CPU efficiency if properly tuned for specific network traffic. Increasing this value adds extra latency to frame reception and can end up decreasing the throughput of TCP traffic. If the system is reporting dropped receives, this value may be set too high, causing the driver to run out of available receive descriptors.

CAUTION: When setting RxIntDelay to a value other than 0, adapters may hang (stop transmitting) under certain network conditions. If this occurs a NETDEV WATCHDOG message is logged in the system event log. In addition, the controller is automatically reset, restoring the network connection. To eliminate the potential for the hang ensure that RxIntDelay is set to zero.

RxAbsIntDelay 0-65535 (0=off) 8 This value, in units of 1.024 microseconds, limits the delay in which a receive interrupt is generated. Useful only if RxIntDelay is non-zero, this value ensures that an interrupt is generated after the initial packet is received within the set amount of time. Proper tuning, along with RxIntDelay, may improve traffic throughput in specific network conditions.

This parameter is supported only on 82540, 82545 and later adapters.

Speed 0, 10, 100, 1000 0 Speed forces the line speed to the specified value in megabits per second (Mbps). If this parameter is not specified or is set to 0 and the link partner is set to auto-negotiate, the board will auto-detect the correct speed. Duplex must also be set when Speed is set to either 10 or 100.

This parameter is supported only on adapters using copper connections.

TxDescriptors 80-4096 256 This value is the number of transmit descriptors allocated by the driver. Increasing this value allows the driver to queue more transmits. Each descriptor is 16 bytes.
TxDescriptorStep 1 (use every Tx Descriptor)
4 (use every 4th Tx Descriptor)
1 (use every Tx Descriptor) On certain non-Intel architectures, it has been observed that intense TX traffic bursts of short packets may result in an improper descriptor writeback. If this occurs, the driver will report a "TX Timeout" and reset the adapter, after which the transmit flow will restart, though data may have stalled for as much as 10 seconds before it resumes.

The improper writeback does not occur on the first descriptor in a system memory cache-line, which is typically 32 bytes, or 4 descriptors long.

Setting TxDescriptorStep to a value of 4 will ensure that all TX descriptors are aligned to the start of a system memory cache line, and so this problem will not occur.

NOTES: Setting TxDescriptorStep to 4 effectively reduces the number of TxDescriptors available for transmits to 1/4 of the normal allocation. This has a possible negative performance impact, which may be compensated for by allocating more descriptors using the TxDescriptors module parameter.

There are other conditions which may result in "TX Timeout", which will not be resolved by the use of the TxDescriptorStep parameter. As the issue addressed by this parameter has never been observed on Intel Architecture platforms, it should not be used on Intel platforms.

TxIntDelay 0-65535 (0=off) 8 This value delays the generation of transmit interrupts in units of 1.024 microseconds. Transmit interrupt reduction can improve CPU efficiency if properly tuned for specific network traffic. If the system is reporting dropped transmits, this value may be set too high causing the driver to run out of available transmit descriptors.
TxAbsIntDelay 0-65535 (0=off) 32 This value, in units of 1.024 microseconds, limits the delay in which a transmit interrupt is generated. Useful only if TxIntDelay is non-zero, this value ensures that an interrupt is generated after the initial packet is sent on the wire within the set amount of time. Proper tuning, along with TxIntDelay, may improve traffic throughput in specific network conditions.

This parameter is supported only on 82540, 82545 and later adapters.

XsumRX 0-1 1 A value of '1' indicates that the driver should enable IP checksum offload for received packets (both UDP and TCP) to the adapter hardware.

This parameter is not supported on the 82542-based adapter.

Copybreak 0-xxxxxxx (0=off) 256 Usage: insmod e1000.ko copybreak=128

Driver copies all packets below or equaling this size to a fresh rx buffer before handing it up the stack.

This parameter is different than other parameters, in that it is a single (not 1,1,1 etc.) parameter applied to all driver instances and it is also available during runtime at /sys/module/e1000/parameters/copybreak
SmartPowerDownEnable 0-1
 
0 (disabled) Allows Phy to turn off in lower power states. The user can turn off this parameter in supported chipsets.
KumeranLockLoss 0-1 1 (enabled) This workaround skips resetting the Phy at shutdown for the initial silicon releases of ICH8 systems.
TxDescPower

 
6-12 12 This value represents the size-order of each transmit descriptor. The valid size for descriptors would be 2^6 (64) to 2^12 (4096) bytes each. As this value decreases one may want to consider increasing the TxDescriptors value to maintain the same amount of frame memory.


Speed and Duplex Configuration

Three keywords are used to control the speed and duplex configuration. These keywords are Speed, Duplex, and AutoNeg.

If the board uses a fiber interface, these keywords are ignored, and the fiber interface board only links at 1000 Mbps full-duplex.

For copper-based boards, the keywords interact as follows:

The default operation is auto-negotiate. The board advertises all supported speed and duplex combinations, and it links at the highest common speed and duplex mode IF the link partner is set to auto-negotiate.

If Speed = 1000, limited auto-negotiation is enabled and only 1000 Mbps is advertised (The 1000BaseT spec requires auto-negotiation.)

If Speed = 10 or 100, then both Speed and Duplex should  be set. Auto-negotiation is disabled, and the AutoNeg parameter is ignored. Partner SHOULD also be forced.

The AutoNeg parameter is used when more control is required over the auto-negotiation process. It should be used when you wish to control which speed and duplex combinations are advertised during the auto-negotiation process. The parameter may be specified as either a decimal or hexadecimal value as determined by the bitmap below.

Bit Position 7 6 5 4 3 2 1 0
Decimal Value 128 64 32 16 8 4 2 1
Hex Value 80 40 20 10 8  4 2 1
Speed (Mbps): N/A N/A 1000 N/A 100 100 10 10
Duplex:     Full   Full Half Full Half

Some examples of using AutoNeg:

     modprobe e1000 AutoNeg=0x01 (Restricts autonegotiation to 10 Half)
     modprobe e1000 AutoNeg=1 (Same as above)
     modprobe e1000 AutoNeg=0x02 (Restricts autonegotiation to 10 Full)
     modprobe e1000 AutoNeg=0x03 (Restricts autonegotiation to 10 Half or 10 Full)
     modprobe e1000 AutoNeg=0x04 (Restricts autonegotiation to 100 Half)
     modprobe e1000 AutoNeg=0x05 (Restricts autonegotiation to 10 Half or 100 Half)
     modprobe e1000 AutoNeg=0x020 (Restricts autonegotiation to 1000 Full)
     modprobe e1000 AutoNeg=32 (Same as above)

Note that when this parameter is used, Speed and Duplex must not be specified.

If the link partner is forced to a specific speed and duplex, then this parameter should not be used. Instead, use the Speed and Duplex parameters previously mentioned to force the adapter to the same speed and duplex.


Additional Configurations

Configuring the Driver on Different Distributions

Configuring a network driver to load properly when the system is started is distribution dependent. Typically, the configuration process involves adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well as editing other system startup scripts and/or configuration files. Many popular Linux distributions ship with tools to make these changes for you. To learn the proper way to configure a network device for your system, refer to your distribution documentation. If during this process you are asked for the driver or module name, the name for the Linux Base Driver for the Gigabit family of adapters is e1000.

As an example, if you install the e1000 driver for two Gigabit adapters (eth0 and eth1) and set the speed and duplex to 10full and 100half, add the following to modules.conf or /etc/modprobe.conf:

alias eth0 e1000
alias eth1 e1000
options e1000 Speed=10,100 Duplex=2,1

Viewing Link Messages

Link messages will not be displayed to the console if the distribution is restricting system messages. In order to see network driver link messages on your console, set dmesg to eight by entering the following:

dmesg -n 8
NOTE: This setting is not saved across reboots.

Jumbo Frames

Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) to a value larger than the default value of 1500. Use the ifconfig command to increase the MTU size. For example:

ifconfig eth<x> mtu 9000 up

This setting is not saved across reboots. The setting change can be made permanent by adding MTU=9000 to the file: /etc/sysconfig/network-scripts/ifcfg-eth<x> (Red Hat distributions). Other distributions may store this setting in a different location.

NOTES:
  • To enable Jumbo Frames, increase the MTU size on the interface beyond 1500.

  • The maximum MTU setting for Jumbo Frames is 16110. This value coincides with the maximum Jumbo Frames size of 16128.

  • Some Intel gigabit adapters that support Jumbo Frames have a frame size limit of 9238 bytes, with a corresponding MTU size limit of 9216 bytes. The adapters with this limitation are based on the Intel® 82571EB, 82572EI, 82573L, 82566, 82562, 82575, and 80003ES2LAN controllers. These correspond to the following product names:
    Intel® PRO/1000 PT Server Adapter
    Intel® PRO/1000 PT Desktop Adapter
    Intel® PRO/1000 PT Network Connection
    Intel® PRO/1000 PT Dual Port Server Adapter
    Intel® PRO/1000 PT Dual Port Network Connection
    Intel® PRO/1000 PF Server Adapter
    Intel® PRO/1000 PF Network Connection
    Intel® PRO/1000 PF Dual Port Server Adapter
    Intel® PRO/1000 PB Server Connection
    Intel® PRO/1000 PL Network Connection
    Intel® PRO/1000 EB Network Connection with I/O Acceleration
    Intel® PRO/1000 EB Backplane Connection with I/O Acceleration
    Intel® PRO/1000 PT Quad Port Server Adapter
    Intel® PRO/1000 PF Quad Port Server Adapter
    Intel® 82566DM-2 Gigabit Network Connection
    Intel® Gigabit PT Quad Port Server ExpressModule

  • Using Jumbo frames at 10 or 100 Mbps is not supported and may result in poor performance or loss of link.

  • The following adapters do not support Jumbo Frames:
    Intel® PRO/1000 Gigabit Server Adapter
    Intel® PRO/1000 PM Network Connection
    Intel® 82562V 10/100 Network Connection
    Intel® 82566DM Gigabit Network Connection
    Intel® 82566DC Gigabit Network Connection
    Intel® 82566MM Gigabit Network Connection
    Intel® 82566MC Gigabit Network Connection
    Intel® 82562GT 10/100 Network Connection
    Intel® 82562G 10/100 Network Connection
    Intel® 82566DC-2 Gigabit Network Connection
    Intel® 82562V-2 10/100 Network Connection
    Intel® 82562G-2 10/100 Network Connection
    Intel® 82562GT-2 10/100 Network Connection

ethtool

The driver utilizes the ethtool interface for driver configuration and diagnostics, as well as displaying statistical information. ethtool version 3 or later is required for this functionality, although we strongly recommend downloading the latest version at:

http://ftp.kernel.org/pub/software/network/ethtool/.

Enabling Wake on LAN* (WoL)

WoL is configured through the ethtool* utility. ethtool is included with all versions of Red Hat after Red Hat 7.2. For other Linux distributions, download and install ethtool from the following website: http://ftp.kernel.org/pub/software/network/ethtool/.

For instructions on enabling WoL with ethtool, refer to the website listed above.

WoL will be enabled on the system during the next shut down or reboot. For this driver version, in order to enable WoL, the e1000 driver must be loaded prior to shutting down or suspending the system.

NOTES: Wake On LAN is only supported on port A for the following devices:
  • Intel® PRO/1000 PT Dual Port Network Connection
  • Intel® PRO/1000 PT Dual Port Server Connection
  • Intel® PRO/1000 PT Dual Port Server Adapter
  • Intel® PRO/1000 PF Dual Port Server Adapter
  • Intel® PRO/1000 PT Quad Port Server Adapter
  • Intel® Gigabit PT Quad Port Server ExpressModule

NAPI

NAPI (Rx polling mode) is supported in the e1000 driver. NAPI is enabled or disabled based on the configuration of the kernel. To override the default, use the following compile-time flags.

To enable NAPI, compile the driver module, passing in a configuration option:

# make CFLAGS_EXTRA=-DE1000_NAPI install

To disable NAPI, compile the driver module, passing in a configuration option:

# make CFLAGS_EXTRA=-DE1000_NO_NAPI install

See ftp://robur.slu.se/pub/Linux/net-development/NAPI/usenix-paper.tgz for more information on NAPI.


Known Issues

NOTE: After installing the driver, if your Intel Ethernet Network Connection is not working, verify that you have installed the correct driver.

Intel® Active Management Technology 2.0, 2.1, 2.5 not supported in conjunction with Linux driver

Detected Tx Unit Hang in Quad Port Adapters

In some cases ports 3 and 4 don't pass traffic and report 'Detected Tx Unit Hang' followed by 'NETDEV WATCHDOG: ethX: transmit timed out' errors. Ports 1 and 2 don't show any errors and will pass traffic.

This issue MAY be resolved by updating to the latest kernel and BIOS. The user is encouraged to run an OS that fully supports MSI interrupts. You can check your system's BIOS by downloading the Linux Firmware Developer Kit that can be obtained at http://www.linuxfirmwarekit.org/

82573(V/L/E) TX Unit Hang Messages

Several adapters with the 82573 chipset display "TX unit hang" messages during normal operation with the e1000 driver. The issue appears both with TSO enabled and disabled, and is caused by a power management function that is enabled in the EEPROM. Early releases of the chipsets to vendors had the EEPROM bit that enabled the feature. After the issue was discovered newer adapters were released with the feature disabled in the EEPROM.

If you encounter the problem in an adapter, and the chipset is an 82573-based one, you can verify that your adapter needs the fix by using ethtool:

 # ethtool -e eth0
 Offset          Values
 ------          ------
 0x0000          00 12 34 56 fe dc 30 0d 46 f7 f4 00 ff ff ff ff
 0x0010          ff ff ff ff 6b 02 8c 10 d9 15 8c 10 86 80 de 83
                                                           ^^

The value at offset 0x001e (de) has bit 0 unset. This enables the problematic power saving feature. In this case, the EEPROM needs to read "df" at offset 0x001e.

A one-time EEPROM fix is available as a shell script. This script will verify that the adapter is applicable to the fix and if the fix is needed or not. If the fix is required, it applies the change to the EEPROM and updates the checksum. The user must reboot the system after applying the fix if changes were made to the EEPROM.

Example output of the script:

 # bash fixeep-82573-dspd.sh eth0
 eth0: is a "82573E Gigabit Ethernet Controller"
 This fixup is applicable to your hardware
 executing command: ethtool -E eth0 magic 0x109a8086 offset 0x1e value 0xdf
 Change made. You *MUST* reboot your machine before changes take effect!

The script can be downloaded at http://e1000.sourceforge.net/files/fixeep-82573-dspd.sh

Dropped Receive Packets on Half-duplex 10/100 Networks

If you have an Intel PCI Express adapter running at 10mbps or 100mbps, half-duplex, you may observe occasional dropped receive packets. There are no workarounds for this problem in this network configuration. The network must be updated to operate in full-duplex, and/or 1000mbps only.

Compiling the Driver

When trying to compile the driver by running make install, the following error may occur:  "Linux kernel source not configured - missing version.h"

To solve this issue, create the version.h file by going to the Linux source tree and entering:

# make include/linux/version.h

Performance Degradation with Jumbo Frames

Degradation in throughput performance may be observed in some Jumbo frames environments. If this is observed, increasing the application's socket buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. See the specific application manual and /usr/src/linux*/Documentation/networking/ip-sysctl.txt for more details.

Jumbo frames on Foundry BigIron 8000 switch

There is a known issue using Jumbo frames when connected to a Foundry BigIron 8000 switch. This is a 3rd party limitation. If you experience loss of packets, lower the MTU size.

Allocating Rx Buffers when Using Jumbo Frames

Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if the available memory is heavily fragmented. This issue may be seen with PCI-X adapters or with packet split disabled. This can be reduced or eliminated by changing the amount of available memory for receive buffer allocation, by increasing /proc/sys/vm/min_free_kbytes.

Multiple Interfaces on Same Ethernet Broadcast Network

Due to the default ARP behavior on Linux, it is not possible to have one system on two IP networks in the same Ethernet broadcast domain (non-partitioned switch) behave as expected. All Ethernet interfaces will respond to IP traffic for any IP address assigned to the system. This results in unbalanced receive traffic.

If you have multiple interfaces in a server, either turn on ARP filtering by entering:

        echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter

(this only works if your kernel's version is higher than 2.4.5)

NOTE: This setting is not saved across reboots. The configuration change can be made permanent by adding the line:

net.ipv4.conf.all.arp_filter = 1

to the file /etc/sysctl.conf

   or,

install the interfaces in separate broadcast domains (either in different switches or in a switch partitioned to VLANs).

82541/82547 can't link or is slow to link with some link partners

There is a known compatibility issue with 82541/82547 and some low-end switches where the link will not be established, or will be slow to establish. In particular, these switches are known to be incompatible with 82541/82547:

     Planex FXG-08TE
     I-O Data ETG-SH8

To workaround this issue, the driver can be compiled with an override of the PHY's master/slave setting. Forcing master or forcing slave mode will improve time-to-link.

     # make CFLAGS_EXTRA=-DE1000_MASTER_SLAVE=<n>

Where <n> is:

        0 = Hardware default
        1 = Master mode
        2 = Slave mode
        3 = Auto master/slave

Disable rx flow control with ethtool

In order to disable receive flow control using ethtool, you must turn off auto-negotiation on the same command line.

For example:

     ethtool -A eth? autoneg off rx off

Unplugging network cable while ethtool -p is running

In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging the network cable while ethtool -p is running will cause the system to
become unresponsive to keyboard commands, except for control-alt-delete. Restarting the system appears to be the only remedy.


Using the e1000e Base Driver

Overview

Building and Installation

Command Line Parameters

Speed and Duplex Configuration

Additional Configurations

Known Issues

Overview

The Linux Base Drivers support the 2.4.x and 2.6.x kernels. These drivers includes support for Itanium® 2-based systems.

These drivers are only supported as a loadable module. Intel is not supplying patches against the kernel source to allow for static linking of the drivers. For questions related to hardware requirements, refer to the documentation supplied with your Intel Gigabit adapter. All hardware requirements listed apply to use with Linux.

The following features are now available in supported kernels:

Channel Bonding documentation can be found in the Linux kernel source: /documentation/networking/bonding.txt

The driver information previously displayed in the /proc file system is not supported in this release. Alternatively, you can use ethtool (version 1.6 or later), lspci, and ifconfig to obtain the same information. Instructions on updating ethtool can be found in the section Additional Configurations later in this document.

NOTE: The Intel® 82562v 10/100 Network Connection only provides 10/100 support.

Building and Installation

To build a binary RPM* package of this driver, run 'rpmbuild -tb e1000e.tar.gz'.

NOTES:
  • For the build to work properly, the currently running kernel MUST match the version and configuration of the installed kernel sources. If you have just recompiled the kernel reboot the system now.

  • RPM functionality has only been tested in Red Hat distributions.

  1. Move the base driver tar file to the directory of your choice. For example, use '/home/username/e1000e' or '/usr/local/src/e1000e'.

  2. Untar/unzip the archive, where <x.x.x> is the version number for the driver tar file:

    tar zxf e1000e-<x.x.x>.tar.gz

  3. Change to the driver src directory, where <x.x.x> is the version number for the driver tar:

    cd e1000e-<x.x.x>/src/

  4. Compile the driver module:

    # make install

    The binary will be installed as:

    /lib/modules/<KERNEL VERSION>/kernel/drivers/net/e1000e/e1000e.[k]o

    The install location listed above is the default location. This may differ for various Linux distributions.

  5. Load the module using either the insmod or modprobe command:

    modprobe e1000e

    insmod e1000e

    Note that for 2.6 kernels the insmod command can be used if the full path to the driver module is specified. For example:

        insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/e1000e/e1000e.ko

    With 2.6 based kernels also make sure that older e1000e drivers are removed from the kernel, before loading the new module:

    rmmod e1000e; modprobe e1000e

  6. Assign an IP address to the interface by entering the following, where <x> is the interface number:

    ifconfig eth<x> <IP_address>

  7. Verify that the interface works. Enter the following, where <IP_address> is the IP address for another machine on the same subnet as the interface that is being tested:

    ping <IP_address>

TROUBLESHOOTING: Some systems have trouble supporting MSI and/or MSI-X interrupts. If you believe your system needs to disable this style of interrupt, the driver can be built and installed with the command:

# make CFLAGS_EXTRA=-DDISABLE_PCI_MSI install

Normally the driver will generate an interrupt every two seconds, so if you can see that you're no longer getting interrupts in cat /proc/interrupts for the ethX e1000e device, then this workaround may be necessary.


Command Line Parameters

If the driver is built as a module, the following optional parameters are used by entering them on the command line with the modprobe command using this syntax:

modprobe e1000e [<option>=<VAL1>,<VAL2>,...]

The default value for each parameter is generally the recommended setting, unless otherwise noted.

NOTES:
  • For more information about the InterruptThrottleRate, RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay parameters, see the application note at: http://www.intel.com/design/network/applnots/ap450.htm.

  • A descriptor describes a data buffer and attributes related to the data buffer. This information is accessed by the hardware.

Parameter Name Valid Range/Settings Default Description
InterruptThrottleRate
0,1,3,4, 100-100000 (0=off, 1=dynamic, 3=dynamic conservative, 4-simplified balancing)
 
3 The driver can limit the amount of interrupts per second that the adapter will generate for incoming packets. It does this by writing a value to the adapter that is based on the maximum amount of interrupts that the adapter will generate per second.

Setting InterruptThrottleRate to a value greater or equal to 100 will program the adapter to send out a maximum of that many interrupts per second, even if more packets have come in. This reduces interrupt load on the system and can lower CPU utilization under heavy load, but will increase latency as packets are not processed as quickly.

The default behavior of the driver previously assumed a static InterruptThrottleRate value of 8000, providing a good fallback value for all traffic types, but lacking in small packet performance and latency.

The driver has two adaptive modes (setting 1 or 3) in which it dynamically adjusts the InterruptThrottleRate value based on the traffic that it receives. After determining the type of incoming traffic in the last timeframe, it will adjust the InterruptThrottleRate to an appropriate value for that traffic.

The algorithm classifies the incoming traffic every interval into classes. Once the class is determined, the InterruptThrottleRate value is adjusted to suit that traffic type the best. There are three classes defined: "Bulk traffic", for large amounts of packets of normal size; "Low latency", for small amounts of traffic and/or a significant percentage of small packets; and "Lowest latency", for almost completely small packets or minimal traffic.

In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 for traffic that falls in class "Bulk traffic". If traffic falls in the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is increased stepwise to 20000. This default mode is suitable for most applications.

For situations where low latency is vital such as cluster or grid computing, the algorithm can reduce latency even more when InterruptThrottleRate is set to mode 1. In this mode, which operates the same as mode 3, the InterruptThrottleRate will be increased stepwise to 70000 for traffic in class "Lowest latency".

In simplified mode the interrupt rate is based on the ratio of tx and rx traffic. If the bytes per second rate is approximately equal, the interrupt rate will drop as low as 2000 interrupts per second. If the traffic is mostly transmit or mostly receive, the interrupt rate could be as high as 8000.

Setting InterruptThrottleRate to 0 turns off any interrupt moderation and may improve small packet latency, but is generally not suitable for bulk throughput traffic

NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and RxAbsIntDelay parameters. In other words, minimizing the receive and/or transmit absolute delays does not force the controller to generate more interrupts than what the Interrupt Throttle Rate allows.

NOTE: When e1000e is loaded with default settings and multiple adapters are in use simultaneously, the CPU utilization may increase non-linearly. In order to limit the CPU utilization without impacting the overall throughput, we recommend that you load the driver as follows:

modprobe e1000e InterruptThrottleRate=3000,3000,3000

This sets the InterruptThrottleRate to 3000 interrupts/sec for the first, second, and third instances of the driver. The range of 2000 to 3000 interrupts per second works on a majority of systems and is a good starting point, but the optimal value will be platform-specific. If CPU utilization is not a concern, use RX_POLLING (NAPI) and default driver settings.

RxIntDelay 0-65535 (0=off) 0 This value delays the generation of receive interrupts in units of 1.024 microseconds. Receive interrupt reduction can improve CPU efficiency if properly tuned for specific network traffic. Increasing this value adds extra latency to frame reception and can end up decreasing the throughput of TCP traffic. If the system is reporting dropped receives, this value may be set too high, causing the driver to run out of available receive descriptors.

CAUTION: When setting RxIntDelay to a value other than 0, adapters may hang (stop transmitting) under certain network conditions. If this occurs a NETDEV WATCHDOG message is logged in the system event log. In addition, the controller is automatically reset, restoring the network connection. To eliminate the potential for the hang ensure that RxIntDelay is set to zero.

RxAbsIntDelay 0-65535 (0=off) 8 This value, in units of 1.024 microseconds, limits the delay in which a receive interrupt is generated. Useful only if RxIntDelay is non-zero, this value ensures that an interrupt is generated after the initial packet is received within the set amount of time. Proper tuning, along with RxIntDelay, may improve traffic throughput in specific network conditions.
TxIntDelay 0-65535 (0=off) 8 This value delays the generation of transmit interrupts in units of 1.024 microseconds. Transmit interrupt reduction can improve CPU efficiency if properly tuned for specific network traffic. If the system is reporting dropped transmits, this value may be set too high causing the driver to run out of available transmit descriptors.
TxAbsIntDelay 0-65535 (0=off) 32 This value, in units of 1.024 microseconds, limits the delay in which a transmit interrupt is generated. Useful only if TxIntDelay is non-zero, this value ensures that an interrupt is generated after the initial packet is sent on the wire within the set amount of time. Proper tuning, along with TxIntDelay, may improve traffic throughput in specific network conditions.
copybreak 0-xxxxxxx (0=off) 256 Usage: insmod e1000e.ko copybreak=128

Driver copies all packets below or equaling this size to a fresh rx buffer before handing it up the stack.

This parameter is different than other parameters, in that it is a single (not 1,1,1 etc.) parameter applied to all driver instances and it is also available during runtime at /sys/module/e1000e/parameters/copybreak
SmartPowerDownEnable 0-1
 
0 (disabled) Allows Phy to turn off in lower power states. The user can turn off this parameter in supported chipsets.
KumeranLockLoss 0-1 1 (enabled) This workaround skips resetting the Phy at shutdown for the initial silicon releases of ICH8 systems.
IntMode 0-2

(0=legacy, 1=MSI, 2=MSI-X)

2 (MSI-X)

 

Allows changing the interrupt mode at module load time, without requiring a recompile. If the driver load fails to enable a specific interrupt mode, the driver will try other interrupt modes, from least to most compatible. The interrupt order is MSI-X, MSI, Legacy. If specifying MSI (IntMode=1) interrupts, only MSI and Legacy will be attempted.
CrcStripping 0-1 1 (enabled) Strip the CRC from received packets before sending up the network stack. If you have a machine with a BMC enabled but cannot receive IPMI traffic after loading or enabling the driver, try disabling this feature.
EEE 0-1 1 (enabled for parts supporting EEE) This option allows for the ability of IEEE802.3az (a.k.a. Energy Efficient Ethernet or EEE) to be advertised to the link partner on parts supporting EEE.  EEE saves energy by putting the device into a low-power state when the link is idle, but only when the link partner also supports EEE and after the feature has been enabled during link negotiation.  It is not necessary to disable the advertisement of EEE when connected with a link partner that does not support EEE.
Node 0-n

0 - n: where n is the number of the NUMA node that should be used to allocate memory for this adapter port.

-1: uses the driver default of allocating memory on whichever processor is running insmod/modprobe.

-1 (off) The Node parameter will allow you to pick which NUMA node you want to have   the adapter allocate memory from.  All driver structures, in-memory queues, and receive buffers will be allocated on the node specified.  This parameter is only useful when interrupt affinity is specified, otherwise some portion of the time the interrupt could run on a different core than the memory is allocated on, causing slower memory access and impacting throughput, CPU, or both. 


Additional Configurations

Configuring the Driver on Different Distributions

Configuring a network driver to load properly when the system is started is distribution dependent. Typically, the configuration process involves adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well as editing other system startup scripts and/or configuration files. Many popular Linux distributions ship with tools to make these changes for you. To learn the proper way to configure a network device for your system, refer to your distribution documentation. If during this process you are asked for the driver or module name, the name for the Linux Base Driver for the Gigabit family of adapters is e1000e.

As an example, if you install the e1000e driver for two Gigabit adapters (eth0 and eth1) and want to set the interrupt mode to MSI-X and MSI respectively, add the following to modules.conf or /etc/modprobe.conf:
alias eth0 e1000e
alias eth1 e1000e
options e1000e IntMode=2,1

Viewing Link Messages

Link messages will not be displayed to the console if the distribution is restricting system messages. In order to see network driver link messages on your console, set dmesg to eight by entering the following:

dmesg -n 8
NOTE: This setting is not saved across reboots.

Jumbo Frames

Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU) to a value larger than the default value of 1500. Use the ifconfig command to increase the MTU size. For example:

ifconfig eth<x> mtu 9000 up

This setting is not saved across reboots. The setting change can be made permanent by adding MTU=9000 to the file: /etc/sysconfig/network-scripts/ifcfg-eth<x> (Red Hat distributions). Other distributions may store this setting in a different location.

NOTES:
  • To enable Jumbo Frames, increase the MTU size on the interface beyond 1500.

  • The maximum MTU setting for Jumbo Frames is 9216. This value coincides with the maximum Jumbo Frames size of 9234 bytes.

  • Using Jumbo frames at 10 or 100 Mbps is not supported and may result in poor performance or loss of link.

  • The following adapters limit Jumbo Frames sized packets to a maximum of 4088 bytes:
    Intel® 82578DM Gigabit Network Connection
    Intel® 82577LM Gigabit Network Connection

  • The following adapters do not support Jumbo Frames:
    Intel® PRO/1000 Gigabit Server Adapter
    Intel® PRO/1000 PM Network Connection
    Intel® 82562V 10/100 Network Connection
    Intel® 82566DM Gigabit Network Connection
    Intel® 82566DC Gigabit Network Connection
    Intel® 82566MM Gigabit Network Connection
    Intel® 82566MC Gigabit Network Connection
    Intel® 82562GT 10/100 Network Connection
    Intel® 82562G 10/100 Network Connection
    Intel® 82566DC-2 Gigabit Network Connection
    Intel® 82562V-2 10/100 Network Connection
    Intel® 82562G-2 10/100 Network Connection
    Intel® 82562GT-2 10/100 Network Connection
    Intel® 82578DC Gigabit Network Connection
    Intel® 82577LC Gigabit Network Connection
    Intel® 82567V-3 Gigabit Network Connection

  • Jumbo Frames cannot be configured on an 82579-based Network device, if MACSec is enabled on the system.

ethtool

The driver utilizes the ethtool interface for driver configuration and diagnostics, as well as displaying statistical information. ethtool version 3 or later is required for this functionality, although we strongly recommend downloading the latest version at:

http://ftp.kernel.org/pub/software/network/ethtool/.

NOTE: When validating enable/disable tests on some parts (82578, for example) you need to add a few seconds between tests when working with ethtool.

Speed and Duplex Configuration

Speed and Duplex are configured through the ethtool* utility. ethtool is included with all versions of Red Hat after Red Hat 7.2. For other Linux distributions, download and install ethtool from the following website: http://ftp.kernel.org/pub/software/network/ethtool/.

Enabling Wake on LAN* (WoL)

WoL is configured through the ethtool* utility. ethtool is included with all versions of Red Hat after Red Hat 7.2. For other Linux distributions, download and install ethtool from the following website: http://ftp.kernel.org/pub/software/network/ethtool/.

For instructions on enabling WoL with ethtool, refer to the website listed above.

WoL will be enabled on the system during the next shut down or reboot. For this driver version, in order to enable WoL, the e1000e driver must be loaded prior to shutting down or suspending the system.

NOTES: Wake On LAN is only supported on port A for the following devices:
  • Intel® PRO/1000 PT Dual Port Network Connection
  • Intel® PRO/1000 PT Dual Port Server Connection
  • Intel® PRO/1000 PT Dual Port Server Adapter
  • Intel® PRO/1000 PF Dual Port Server Adapter
  • Intel® PRO/1000 PT Quad Port Server Adapter
  • Intel® Gigabit PT Quad Port Server ExpressModule

NAPI

NAPI (Rx polling mode) is supported in the e1000e driver. NAPI is enabled by default.

To disable NAPI, compile the driver module, passing in a configuration option:

# make CFLAGS_EXTRA=-DE1000E_NO_NAPI install

See ftp://robur.slu.se/pub/Linux/net-development/NAPI/usenix-paper.tgz for more information on NAPI.


Known Issues

NOTE: After installing the driver, if your Intel Network Connection is not working, verify that you have installed the correct driver.

Intel® Active Management Technology 2.0, 2.1, 2.5 not supported in conjunction with Linux driver

Detected Tx Unit Hang in Quad Port Adapters

In some cases ports 3 and 4 don't pass traffic and report 'Detected Tx Unit Hang' followed by 'NETDEV WATCHDOG: ethX: transmit timed out' errors. Ports 1 and 2 don't show any errors and will pass traffic.

This issue MAY be resolved by updating to the latest kernel and BIOS. The user is encouraged to run an OS that fully supports MSI interrupts. You can check your system's BIOS by downloading the Linux Firmware Developer Kit that can be obtained at http://www.linuxfirmwarekit.org/

Adapters with 4 ports behind a PCIe bridge

Adapters that have 4 ports behind a PCIe bridge may be incompatible with some systems. The user should run the Linux firmware kit from
http://www.linuxfirmwarekit.org/ to test their BIOS, if they have interrupt or "missing interface" problems, especially with older kernels.

82573(V/L/E) TX Unit Hang Messages

Several adapters with the 82573 chipset display "TX unit hang" messages during normal operation with the e1000e driver. The issue appears both with TSO enabled and disabled, and is caused by a power management function that is enabled in the EEPROM. Early releases of the chipsets to vendors had the EEPROM bit that enabled the feature. After the issue was discovered newer adapters were released with the feature disabled in the EEPROM.

If you encounter the problem in an adapter, and the chipset is an 82573-based one, you can verify that your adapter needs the fix by using ethtool:

 # ethtool -e eth0
 Offset          Values
 ------          ------
 0x0000          00 12 34 56 fe dc 30 0d 46 f7 f4 00 ff ff ff ff
 0x0010          ff ff ff ff 6b 02 8c 10 d9 15 8c 10 86 80 de 83
                                                           ^^

The value at offset 0x001e (de) has bit 0 unset. This enables the problematic power saving feature. In this case, the EEPROM needs to read "df" at offset 0x001e.

A one-time EEPROM fix is available as a shell script. This script will verify that the adapter is applicable to the fix and if the fix is needed or not. If the fix is required, it applies the change to the EEPROM and updates the checksum. The user must reboot the system after applying the fix if changes were made to the EEPROM.

Example output of the script:

 # bash fixeep-82573-dspd.sh eth0
 eth0: is a "82573E Gigabit Ethernet Controller"
 This fixup is applicable to your hardware
 executing command: ethtool -E eth0 magic 0x109a8086 offset 0x1e value 0xdf
 Change made. You *MUST* reboot your machine before changes take effect!

The script can be downloaded at http://e1000.sourceforge.net/files/fixeep-82573-dspd.sh

Dropped Receive Packets on Half-duplex 10/100 Networks

If you have an Intel PCI Express adapter running at 10mbps or 100mbps, half-duplex, you may observe occasional dropped receive packets. There are no workarounds for this problem in this network configuration. The network must be updated to operate in full-duplex, and/or 1000mbps only.

Compiling the Driver

When trying to compile the driver by running make install, the following error may occur:  "Linux kernel source not configured - missing version.h"

To solve this issue, create the version.h file by going to the Linux source tree and entering:

# make include/linux/version.h

Performance Degradation with Jumbo Frames

Degradation in throughput performance may be observed in some Jumbo frames environments. If this is observed, increasing the application's socket buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help. See the specific application manual and /usr/src/linux*/Documentation/networking/ip-sysctl.txt for more details.

Jumbo frames on Foundry BigIron 8000 switch

There is a known issue using Jumbo frames when connected to a Foundry BigIron 8000 switch. This is a 3rd party limitation. If you experience loss of packets, lower the MTU size.

Allocating Rx Buffers when Using Jumbo Frames

Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if the available memory is heavily fragmented. This issue may be seen with PCI-X
adapters or with packet split disabled. This can be reduced or eliminated by changing the amount of available memory for receive buffer allocation, by increasing /proc/sys/vm/min_free_kbytes.

Multiple Interfaces on Same Ethernet Broadcast Network

Due to the default ARP behavior on Linux, it is not possible to have one system on two IP networks in the same Ethernet broadcast domain (non-partitioned switch) behave as expected. All Ethernet interfaces will respond to IP traffic for any IP address assigned to the system. This results in unbalanced receive traffic.

If you have multiple interfaces in a server, either turn on ARP filtering by entering:

        echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter

(this only works if your kernel's version is higher than 2.4.5)

NOTE: This setting is not saved across reboots. The configuration change can be made permanent by adding the line:

net.ipv4.conf.all.arp_filter = 1

to the file /etc/sysctl.conf

   or,

install the interfaces in separate broadcast domains (either in different switches or in a switch partitioned to VLANs).

Disable rx flow control with ethtool

In order to disable receive flow control using ethtool, you must turn off auto-negotiation on the same command line.

For example:

     ethtool -A eth? autoneg off rx off

Unplugging network cable while ethtool -p is running

In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging the network cable while ethtool -p is running will cause the system to
become unresponsive to keyboard commands, except for control-alt-delete. Restarting the system appears to be the only remedy.

MSI-X Issues with Kernels between 2.6.19 - 2.6.21 (inclusive)

Kernel panics and instability may be observed on any MSI-X hardware if you use irqbalance with kernels between 2.6.19 and 2.6.21. If such problems are encountered, you may disable the irqbalance daemon or upgrade to a newer kernel.

Rx Page Allocation Errors

Page allocation failure. order:0 errors may occur under stress with kernels 2.6.25 and above. This is caused by the way the Linux kernel reports this stressed condition.

Network throughput degradation observed with Onboard video versus add-in Video Card on 82579LM Gigabit Network Connection when used with some older kernels.

This issue can be worked around by specifying "pci=nommconf" in the kernel boot parameter or by using another kernel boot parameter "memmap=128M$0x100000000" which marks 128 MB region at 4GB as reserved and therefore OS won't use these RAM pages.

This issue is fixed in kernel version 2.6.21, where the kernel tries to dynamically find out the mmconfig size by looking at the number of buses that the mmconfig segment maps to.

This issue won't be seen on 32bit version of EL5, as in that case, the kernel sees that RAM is located around the 256MB window and avoids using the mmconfig space.

Activity LED blinks unexpectedly

If a system based on the 82577, 82578, or 82579 controller is connected to a hub, the Activity LED will blink for all network traffic present on the hub. Connecting the system to a switch or router will filter out most traffic not addressed to the local port.

Link may take longer than expected

With some Phy and switch combinations, link can take longer than expected. This can be an issue on Linux distributions that timeout when checking for link prior to acquiring a DHCP address; however there is usually a way to work around this (e.g. set LINKDELAY in the interface configuration on RHEL).

Tx flow control is disabled by default on 82577 and 82578-based adapters

Possible performance degradation on certain 82566 and 82577 devices

Internal stress testing with jumbo frames shows the reliability on some 82566 and 82567 devices is improved in certain corner cases by disabling the Early Receive feature. Doing so can impact Tx performance. To reduce the impact, the packet buffer sizes and relevant flow control settings are modified accordingly.


Using the igbvf Base Driver

Overview

Building and Installation

Command Line Parameters

Additional Configurations

Known Issues

Overview

This driver supports upstream kernel versions 2.6.30 (or higher) x86_64.

Supported Operating Systems: SLES 11 SP1 x86_64, RHEL 5.3/5.4 x86_64.

The igbvf driver supports 82576-based virtual function devices that can only be activated on kernels that support SR-IOV. SR-IOV requires the correct platform and OS support.

The igbvf driver requires the igb driver, version 2.0 or later. The igbvf driver supports virtual functions generated by the igb driver with a max_vfs value of 1 or greater. For more information on the max_vfs parameter refer to the section on the the igb driver.

The guest OS loading the igbvf driver must support MSI-X interrupts.

This driver is only supported as a loadable module at this time. Intel is not supplying patches against the kernel source to allow for static linking of the driver. For questions related to hardware requirements, refer to the documentation supplied with your Intel Gigabit adapter. All hardware requirements listed apply to use with Linux.

Instructions on updating ethtool can be found in the section Additional Configurations later in this document.

VLANs: There is a limit of a total of 32 shared VLANs to 1 or more VFs.

Building and Installation

To build a binary RPM* package of this driver, run 'rpmbuild -tb <filename.tar.gz>'. Replace <filename.tar.gz> with the specific filename of the driver.

NOTE: For the build to work properly, the currently running kernel MUST match the version and configuration of the installed kernel sources. If you have just recompiled the kernel reboot the system now.

RPM functionality has only been tested in Red Hat distributions.

  1. Move the base driver tar file to the directory of your choice. For example, use ' /home/username/igbvf or /usr/local/src/igbvf.'.

  2. Untar/unzip the archive:

    tar zxf igbvf-x.x.x.tar.gz

  3. Change to the driver src directory:

    cd igbvf-<x.x.x>/src/

  4. Compile the driver module:

    # make install

    The binary will be installed as:

    /lib/modules/<KERNEL VERSION>/kernel/drivers/net/igbvf/igbvf.[k]o

    The install location listed above is the default location. This may differ for various Linux distributions.

  5. Load the module using either the insmod or modprobe command:

    modprobe igbvf

    insmod igbvf

    Note that for 2.6 kernels the insmod command can be used if the full path to the driver module is specified. For example:

        insmod /lib/modules/<KERNEL VERSION>/kernel/drivers/net/igbvf/igbvf.ko

    With 2.6 based kernels also make sure that older e1000e drivers are removed from the kernel, before loading the new module:

    rmmod igbvf; modprobe igbvf

  6. Assign an IP address to the interface by entering the following, where <x> is the interface number:

    ifconfig eth<x> <IP_address>

  7. Verify that the interface works. Enter the following, where <IP_address> is the IP address for another machine on the same subnet as the interface that is being tested:

    ping <IP_address>

Troubleshooting: Some systems have trouble supporting MSI and/or MSI-X interrupts. If you believe your system needs to disable this style of interrupt, the driver can be built and installed with the command:

make CFLAGS_EXTRA=-DDISABLE_PCI_MSI install

Normally the driver will generate an interrupt every two seconds, so if you can see that you're no longer getting interrupts in cat /proc/interrupts for the ethX igbvf device, then this workaround may be necessary.

Command Line Parameters

If the driver is built as a module, the following optional parameters are used by entering them on the command line with the modprobe command using this syntax:

modprobe igbvf [<option>=<VAL1>,<VAL2>,...]

For example:

modprobe igbvf InterruptThrottleRate=16000,16000

The default value for each parameter is generally the recommended setting, unless otherwise noted.

NOTES:
  • For more information about the InterruptThrottleRate parameter, see the application note at: http://www.intel.com/design/network/applnots/ap450.htm.

  • A descriptor describes a data buffer and attributes related to the data buffer. This information is accessed by the hardware.

Parameter Name Valid Range/Settings Default Description
InterruptThrottleRate
0,1,3,100-100000 (0=off, 1=dynamic, 3=dynamic conservative)
 
3 The driver can limit the amount of interrupts per second that the adapter will generate for incoming packets. It does this by writing a value to the adapter that is based on the maximum amount of interrupts that the adapter will generate per second.

Setting InterruptThrottleRate to a value greater or equal to 100 will program the adapter to send out a maximum of that many interrupts per second, even if more packets have come in. This reduces interrupt load on the system and can lower CPU utilization under heavy load, but will increase latency as packets are not processed as quickly.

The default behaviour of the driver previously assumed a static InterruptThrottleRate value of 8000, providing a good fallback value for all traffic types, but lacking in small packet performance and latency. The hardware can handle many more small packets per second however, and for this reason an adaptive interrupt moderation algorithm was implemented.

The driver has two adaptive modes (setting 1 or 3) in which it dynamically adjusts the InterruptThrottleRate value based on the traffic that it receives. After determining the type of incoming traffic in the last timeframe, it will adjust the InterruptThrottleRate to an appropriate value for that traffic.

The algorithm classifies the incoming traffic every interval into classes. Once the class is determined, the InterruptThrottleRate value is adjusted to suit that traffic type the best. There are three classes defined: "Bulk traffic", for large amounts of packets of normal size; "Low latency", for small amounts of traffic and/or a significant percentage of small packets; and "Lowest latency", for almost completely small packets or minimal traffic.

In dynamic conservative mode, the InterruptThrottleRate value is set to 4000 for traffic that falls in class "Bulk traffic". If traffic falls in the "Low latency" or "Lowest latency" class, the InterruptThrottleRate is increased stepwise to 20000. This default mode is suitable for most applications.

For situations where low latency is vital such as cluster or grid computing, the algorithm can reduce latency even more when InterruptThrottleRate is set to mode 1. In this mode, which operates the same as mode 3, the InterruptThrottleRate will be increased stepwise to 70000 for traffic in class "Lowest latency".

Setting InterruptThrottleRate to 0 turns off any interrupt moderation and may improve small packet latency, but is generally not suitable for bulk throughput traffic

NOTE: Dynamic interrupt throttling is only applicable to adapters operating in MSI or Legacy interrupt mode, using a single receive queue.

NOTE: When igbvf is loaded with default settings and multiple adapters are in use simultaneously, the CPU utilization may increase non-linearly. In order to limit the CPU utilization without impacting the overall throughput, we recommend that you load the driver as follows:

modprobe igbvf InterruptThrottleRate=3000,3000,3000

This sets the InterruptThrottleRate to 3000 interrupts/sec for the first, second, and third instances of the driver. The range of 2000 to 3000 interrupts per second works on a majority of systems and is a good starting point, but the optimal value will be platform-specific. If CPU utilization is not a concern, use default driver settings.

Additional Configurations

Configuring the Driver on Different Distributions

Configuring a network driver to load properly when the system is started is distribution dependent. Typically, the configuration process involves adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well as editing other system startup scripts and/or configuration files. Many popular Linux distributions ship with tools to make these changes for you. To learn the proper way to configure a network device for your system, refer to your distribution documentation. If during this process you are asked for the driver or module name, the name for the Linux Base Driver for the Gigabit Family of Adapters is igbvf.

As an example, if you install the igbvf driver for two Gigabit adapters (eth0 and eth1) and want to set the interrupt mode to MSI-X and MSI respectively, add the following to modules.conf or /etc/modprobe.conf:

alias eth0 igbvf
alias eth1 igbvf
options igbvf InterruptThrottleRate=3,1

Viewing Link Messages

Link messages will not be displayed to the console if the distribution is restricting system messages. In order to see network driver link messages on your console, set dmesg to eight by entering the following:

dmesg -n 8

NOTE: This setting is not saved across reboots.

Jumbo Frames

Jumbo Frames support is enabled by changing the MTU to a value larger than the default of 1500. Use the ifconfig command to increase the MTU size.

For example:

ifconfig eth<x> mtu 9000 up

This setting is not saved across reboots. It can be made permanent if you add:

MTU=9000

to the file /etc/sysconfig/network-scripts/ifcfg-eth<x>. This example applies to the Red Hat distributions; other distributions may store this setting in a different location.

NOTES:
  • To enable Jumbo Frames, increase the MTU size on the interface beyond 1500.

  • The maximum MTU setting for Jumbo Frames is 9216. This value coincides with the maximum Jumbo Frames size of 9234 bytes.

  • Using Jumbo frames at 10 or 100 Mbps is not supported and may result in poor performance or loss of link.

ethtool

The driver utilizes the ethtool interface for driver configuration and diagnostics, as well as displaying statistical information. ethtool version 3.0 or later is required for this functionality, although we strongly recommend downloading the latest version at:

http://ftp.kernel.org/pub/software/network/ethtool/.

Known Issues/Troubleshooting

NOTE: After installing the driver, if your Intel Network Connection is not working, verify that you have installed the correct driver.

Driver Compilation

When trying to compile the driver by running make install, the following error may occur:

"Linux kernel source not configured - missing version.h"

To solve this issue, create the version.h file by going to the Linux source tree and entering:

make include/linux/version.h.

Multiple Interfaces on Same Ethernet Broadcast Network

Due to the default ARP behavior on Linux, it is not possible to have one system on two IP networks in the same Ethernet broadcast domain (non-partitioned switch) behave as expected. All Ethernet interfaces will respond to IP traffic for any IP address assigned to the system. This results in unbalanced receive traffic.

If you have multiple interfaces in a server, either turn on ARP filtering by entering:

echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
(this only works if your kernel's version is higher than 2.4.5),

NOTE: This setting is not saved across reboots. The configuration change can be made permanent by adding the line:

net.ipv4.conf.all.arp_filter = 1

to the file /etc/sysctl.conf

or,

install the interfaces in separate broadcast domains (either in
different switches or in a switch partitioned to VLANs).

Do Not Use LRO When Routing Packets

Due to a known general compatibility issue with LRO and routing, do not use LRO when routing packets.

Build error with Asianux 3.0 - redefinition of typedef 'irq_handler_t'

Some systems may experience build issues due to redefinition of irq_handler_t. To resolve this issue build the driver (step 4 above) using the command:

make CFLAGS_EXTRA=-DAX_RELEASE_CODE=1 install

MSI-X Issues with Kernels between 2.6.19 - 2.6.21 (inclusive)

Kernel panics and instability may be observed on any MSI-X hardware if you use irqbalance with kernels between 2.6.19 and 2.6.21. If such problems are
encountered, you may disable the irqbalance daemon or upgrade to a newer kernel.

Rx Page Allocation Errors

Page allocation failure. order:0 errors may occur under stress with kernels 2.6.25 and above. This is caused by the way the Linux kernel reports this stressed condition.

Under Redhat 5.4 - System May Crash when Closing Guest OS Window after Loading/Unloading Physical Function (PF) Driver

Do not remove the igbvf driver from Dom0 while Virtual Functions (VFs) are assigned to guests. VFs must first use the xm "pci-detach" command to hot-plug the VF device out of the VM it is assigned to or else shut down the VM.

Unloading Physical Function (PF) Driver Causes System Reboots When VM is Running and VF is Loaded on the VM

Do not unload the PF driver (igb) while VFs are assigned to guests.

Host May Reboot after Removing PF when VF is Active in Guest

Using kernel versions earlier than 3.2, do not unload the PF driver with active VFs. Doing this will cause your VFs to stop working until you reload the PF driver and may cause a spontaneous reboot of your system.

 


Last modified on 11/03/11 4:12p Revision