A Look at Cabling Best Practices for AI Data Centers

September 20, 2024

By Andrew Jimenez, Senior Director – Technical Sales, Wesco Data Center Solutions

The dramatic growth of AI is driving unprecedented changes in data center infrastructure. Much attention has been focused on greater compute density and the need for innovative cooling systems to manage heat effectively. Cabling infrastructure also requires a new approach.

AI servers use graphics processing units (GPUs) to deliver the raw computational power needed for AI workloads. GPUs are designed for parallel processing, enabling them to perform multiple calculations simultaneously. Still, one AI-enabled server is not enough to train an AI model and run some AI workloads. Multiple AI servers must be harnessed together in a high-performance computing (HPC) cluster.

This is a fundamentally different approach than the typical data center with traditional, CPU-based servers. In comparison, AI servers require much greater cabling density (up to 4-5 times more fiber connections) in a design that maximizes performance and minimizes latency.

Why AI Workloads Require a Different Cabling Architecture

In most data centers, servers connect to top-of-rack (TOR) switches, which connect to end-of-row (EOR) or middle-of-row (MOR) switches. EOR/MOR switches connect to the network core. This “leaf-and-spine” model works well for traditional workloads, even in hyperscale environments. However, AI workloads require a different cabling design.

Because AI servers are deployed in a cluster architecture, they require much more inter-server connectivity. At the same time, AI environments generally have fewer servers per rack due to the heat generated by GPUs. As a result, AI data centers require much more inter-rack cabling. They also require high-speed data transfer rates to support the computational intensity of AI workloads.

Latency is another key consideration when designing network infrastructure for AI. Training an AI model takes a lot of time, and network latency increases the amount of time required and therefore the cost. According to one recent report, up to 30 percent of the time needed to train a large AI model is spent on network latency. Therefore, the servers in an AI cluster are deployed in close proximity to minimize the length of the cable runs.

Choosing the Right Fiber-Optic Cabling

Given the number of high-speed connections that must be packed in a very small space, fiber-optic cabling is a necessity. However, there is a wide variety of fiber-optic cabling options to choose from. The best options strike the right balance between cost, reliability, and agility.

AI and other HPC workloads typically use active optical cables (AOCs). These cables have transceivers permanently attached to each end, eliminating the need to buy and deploy transceivers separately. However, AOCs are somewhat less reliable than other types of cables, and their all-in-one design limits flexibility. They are more difficult to upgrade as needs change.

For intra and inter-rack server-to-leaf cabling, consideration should be given to multimode fiber rather than single-mode fiber for multiple reasons. Multimode fiber utilizes VCSEL (vertical cavity surface emitting laser) transceivers which support up to 400Gbps data rates at 100 meters or less, making it ideal for short reach cabling applications within the data center. Additionally, VCSEL-based optics are typically less expensive than single-mode fiber transceivers.

Exploring the Benefits of Parallel Optic Technology

Parallel optic technology offers a great option for AI data center cabling. It simultaneously transmits and receives data over multiple optical fibers by spatially dividing the high-data-rate signal among each fiber lane, which makes it well suited for high-data-rate multimode fiber connections of less than 100 meters. It can use OM3 and OM4 multimode fiber to provide aggregate speeds up to 28G cost-effectively.

Additionally, parallel optics can support both Ethernet and InfiniBand protocols. By utilizing Ethernet, AI data centers gain the advantages of a well-established, open-source protocol that can be implemented quickly at one-half to one-third the cost of InfiniBand. Industry organizations such as the Ultra Ethernet Consortium are developing best practice design guidelines that would capitalize on Ethernet’s strengths while minimizing the packet loss that can cause latency.

Wesco’s network infrastructure specialists can help you select the right fiber-optic cabling products from best-in-class suppliers. We also offer a comprehensive suite of cabling services through our network of trusted partners. Let us help you optimize your data center network infrastructure to meet the unique demands of AI workloads.

Important_Links_Bar.jpg

https://www.wesco.com/us/en/knowledge-hub/articles/a-look-at-cabling-best-practices-for-ai-data-centers.html

Related Articles

Network Infrastructure Featured Product Spotlight

PBUS 14 Panduit logo 400

This webinar presented by Beth Lessard and Keith Cordero will be highlighting three Panduit solutions that will optimize network equipment and cabling to ensure that your spaces are efficiently and properly managed to support ever-evolving business needs of today and beyond. Products that will be featured include PanZone TrueEdge Wall Mount Enclsoure, Cable Managers, and Adjustable Depth 4-Post Rack.

REGISTER HERE


Editor’s Pick: Featured Product News

Siemens: SIMOVAC Non-Arc-Resistant and SIMOVAC-AR Arc-Resistant Motor Controllers

The Siemens SIMOVAC medium-voltage non-arc-resistant and SIMOVAC-AR arc-resistant controllers have a modular design incorporating up to two 12SVC400 (400 A) controllers, housed in a freestanding sheet steel enclosure. Each controller is UL 347 class E2, equipped with three current-limiting fuses, a non-load-break isolating switch, and a fixed-mounted vacuum contactor (plug-in type optional for 12SVC400). The enclosure is designed for front access, allowing the equipment to be located with the rear of the equipment close to a non-combustible wall.

Read More


Sponsored Content
Electrify Your Enterprise

Power is vital to production, and well-designed control cabinets are key. Allied Electronics & Automation offers a comprehensive collection of control cabinet solutions including PLCs, HMIs, contactors, miniature circuit breakers, terminal block connectors, DIN-rail power supplies, pushbutton switches, motor starters, overloads, power relays, industrial Ethernet switches and AC drives engineered to keep your operations running safely, reliably and efficiently.

Learn more HERE.


Products for Panel Builders

  • AutomationDirect: AchieVe FDM Series 12mm Tubular Photoelectric Sensors

    AutomationDirect: AchieVe FDM Series 12mm Tubular Photoelectric Sensors

    AutomationDirect has recently added AchieVe FDM series 12mm tubular photoelectric sensors that offer a rugged metal construction, high IP67 protection ratings, and sensing distances up to 4 meters. These photoelectric sensors feature selectable light-on/dark-on operation, a 10 to 30 VDC operating voltage range, potentiometer or teach-in button sensitivity adjustment, and a fast 1kHz switching frequency. Highly… Read More…

  • METCASE’s TECHNOMET-CONTROL HMI Enclosures Now Offer Seamless Wall Mounting

    METCASE’s TECHNOMET-CONTROL HMI Enclosures Now Offer Seamless Wall Mounting

    METCASE’s premium TECHNOMET-CONTROL HMI enclosures for displays, touch screens and panel PCs can now be conveniently mounted in any suitable indoor location using a new wall mounting kit (accessory). The new kit allows the enclosures to be mounted on walls, machines and other flat surfaces to suit the user’s required location for their HMI system.… Read More…