AI Data Centers – How to Build an Efficient + Sustainable Data Center

By

Virta Ventures

ON

August 20, 2024

A few weeks ago, we published our POV on opportunities for software innovations to increase energy efficiency and clean energy usage in data center siting / planning. Once a site is past the planning phase, more opportunities for software innovations to unlock positive impact in data center construction emerge. 

We believe that demand-side improvements in data center power efficiency are more likely to yield improvements for efficiency and sustainability than supply-side improvements in power supply, and that climate-aligned decision-making in the data center construction process is one of the biggest unlocks we have for sustainable data center operations. Today, we’re sharing these opportunities in facility design / construction that enable climate-friendly data center operations. 

Cluster Design

In optimizing for energy efficiency in cluster design, significant power is held in central processing unit selection. Data center developers should consider the primary data use case for their compute clusters and select a chip that provides high performance per watt for their primary use case. Here, developers might want to turn to hardware innovations that guarantee strong performance per watt: for instance, next-gen chip companies like Etched (chips that run AI models an order of magnitude faster and cheaper than GPUs) can deliver unprecedented performance via novel hardware design and execution. 

Developers of AI data centers might even want to turn to AI benchmark tests from groups like MLCommons to provide key metrics for chip performance that developers can use to inform this decision. Through this informed selection process, developers can unlock efficient data center operations and significant energy savings. 

In terms of emissions, there’s a carbon footprint involved with the manufacturing of each hardware component that goes into a computing cluster. As such, for developers looking to minimize their environmental impact, emissions data for computing cluster components becomes important to track in building emissions reduction strategies. This is where software can help support developers with solutions to speed decision-making: carbon intelligence company Sluicebox, for instance, provides instant, scientifically-vetted carbon intelligence for millions of electronic components by Manufacturing Part Number (MPN). Through this, Sluicebox brings extensive data on hardware carbon footprint to the table that can help developers build more sustainably. 

Chips need to be placed within server racks in ways that maximize efficiency given space constraints. Compute clusters must also be placed within the data center in a way that maximizes the efficiency of a data center despite limited space. DCIM software like Hyperview that can help developers visualize available space on racks and floors make cluster placement decisions much faster, easier, and more aligned to data center energy efficiency.

Heat load is something that comes into play when thinking about server rack placement within a data center. Developers work to minimize heat load and maximize air flow to prevent overheating of crucial hardware. CFD software products like Autodesk CFD and Simscale can help companies simulate heat load within data center designs and run scenario analysis on different server rack placements to optimize for airflow. 

To supplement server rack placement, cooling systems are also set up within data centers to manage heat load. Though either airflow cooling or liquid cooling systems might be leveraged within data centers for this purpose, most AI companies will likely want to employ liquid cooling systems. That said, this isn’t the case for every AI data center. Cooling systems for data centers should be tailored to each company's specific AI use case and designed to respond to the data center's heat load. As such, CFD systems can also model out the effects of cooling system deployment on heat load, thus playing major roles in helping companies determine what type of cooling system to deploy and how to deploy it.

Selecting sustainable materials in data center setup and construction is also a challenge that developers must face. Here, software solutions like Kaya AI that can automate the selection of more-sustainable materials during design can help developers reduce waste and emissions in facility construction while saving developers time and money in the process.

Power Distribution + Energy Storage

Power distribution units (PDUs) enable operational efficiency plays like capacity planning / management, remote monitoring, and uptime assurance – and as such are key to energy-efficient data center operations. Since PDUs sit within the data center’s power systems and work to optimize the flow of energy from supply sources to each cluster based on each cluster’s usage needs, PDUs have an unique role to play in optimizing for a data center’s energy efficiency. 

Energy storage systems integrated into data centers’ come into play here to further unlock energy efficiency on top of power distribution units by offering easier load management. Energy storage can help data centers smooth out power consumption spikes and shave off peak demand. Together with PDUs, energy storage systems are able to optimize for data center energy usage and improve data center efficiency. 

Deploying both PDUs and energy storage in tandem within data centers can help developers yield energy savings and allow for sustainable data center operations. Setup of these hardware components within data centers can be aided by predictive analytics tools within DCIM tools that allow developers to forecast projected energy usage based on data center design / architecture, allowing for informed and efficient PDU and energy storage setup. 

If a data center opts for intelligent PDUs like those offered by brands like Raritan that can integrate with software systems for smart monitoring, solutions like Sunbird that can connect to PDUs as they are set up and provide PDU-by-PDU insights on energy usage can help developers wrestle with load management, monitor energy storage needs, and ensure efficient operations. 

Networking + Interconnectivity

The last piece of the data center infrastructure puzzle: networking and connectivity to enable data flow out of a data center. 

When setting up network infrastructure, hardware plays an important role in ensuring energy efficiency and sustainability over time. Developers balance latency requirements with energy efficiency when selecting hardware, and hardware companies like Arista have responded to developers’ needs by offering developers energy-efficient network hardware that also allows for low latency at the same time. 

In planning out connectivity, developers must optimize a network infrastructure of cables, routers, and high-capacity switches that allows for efficient data flows and minimal lag. While many network planners end up designing systems on pen and paper, digital platforms can facilitate and automate portions of the network design process. For instance, open-source whiteboard tool Excalidraw offers AI features on top of their core Excalidraw digital diagramming tool that can make mapping out network infrastructure easier for planners. Other software that can meaningfully automate best-paths for network connectivity can also prove invaluable for developers building out networking plans that minimize latency and maximize energy efficiency. 

Post-Construction

Once a data center is all constructed and set up, steady-state operations and maintenance become top-of-mind for developers. A new set of tools can help monitor set-up infrastructure to ensure efficient and sustainable operations – in our next piece on innovations for the AI data center, we’ll cover where startups play in data center operations over time to drive energy efficiency and strong performance. 

Subscribe to our Substack to keep up with future insights. To get in touch, connect with us here.

Back to Insights
Share Article: