AI Data Centers – Operating an Energy-Efficient Data Center

By

Virta Ventures

ON

October 15, 2024

At Virta, we’ve written over the past few weeks about the process of optimizing a data center for energy efficiency and sustainability in the siting / planning and construction phases of project development. That said, the bulk of a data center’s energy consumption and footprint lies in its steady-state operations. Given the ballooning energy needs of AI data centers, operators must pay closer attention to energy usage than ever before. At the same time, data center operators have additional priorities to consider – reducing environmental footprint and maintaining reliability / uptime, for instance, are additional areas that are top-of-mind. Across energy efficiency, sustainability, and reliability, software and software-enabled innovations can plug into data center operators’ workflows and enable data centers to make marked improvements in all areas. 

Energy Efficiency

When thinking about where energy is consumed in a data center, servers and compute clusters might be first to mind. However, it’s important to note that energy is expended across the data center not only to operate the hardware infrastructure necessary to process data but also to operate cooling systems and other mechanisms that maximize hardware performance. Cooling systems alone represent around half of a data center’s energy usage (see below).

Source: IEEE & Data Center Knowledge

Thus, data center operators must turn to holistic solutions that can manage energy usage across the board — this is where AI comes in due to its demonstrated potential for supporting improvements in data center energy efficiency. In 2016, Google DeepMind’s Jim Gao was able to develop AI technology that lowered Google’s data center cooling bills by 40%. Based on his experience at DeepMind, Gao co-founded Phaidra, an autonomous AI control platform that allows teams to deploy AI-driven dynamic optimization controls across the full data center. Operations automation platforms like Phaidra are easy-to-deploy and able to learn from and respond to inputs faster than human operators, simplifying energy management and minimizing both energy consumption and climate impact. 

Water Efficiency

Water is crucial to data center energy efficiency in its role of powering cooling systems within the data center. For this cooling use-case, mid-sized data center consumes around 300,000 gallons of water a day, or about as much as 1,000 U.S. households, and data centers’ on-site water consumption ranks among the top 10 water users in America’s industrial and commercial sectors (Lawrence Berkeley National Laboratory).

State-of-the-art hardware can solve for water efficiency: NetworkOcean, for instance, is building underwater data centers that can eliminate freshwater consumption while increasing cooling efficiency. Software solutions with scalable built-world layers like LAIIER – a leak monitoring solution powered by a “smart tape” that allows LAIIER’s software to detect and flag issues in the built-world – can catch leaks and other issues causing data center water overuse early to minimize potential negative impacts. 

Sustainability + Climate Impact

Renewable energy usage in the data center is crucial for sustainable data center operations. However, without on-site power generation, energy generation is ultimately controlled by the utilities that data centers receive power from. Thus, solutions on the data center side that can push utilities to supply more renewable energy vs. brown power to data centers can increase data center sustainability and minimize climate impact. Mercury, a software startup tackling demand response for AI, turns data centers into flexible energy resources and manages load such that data center operators and utilities can align more easily. When data centers can reduce demand during periods of low renewable generation and shift usage to times of high renewables availability, utilities are empowered to increase the share of renewable energy in their overall supply mix. 

Additionally, as companies face increasing mandates emissions disclosures, it becomes more and more important for data center operators to track and maintain robust datasets on data center emissions. Increased data granularity helps data center operators actively track emissions and also more quickly respond to potential changes in data center emissions in order to maintain emissions at desirable levels. Mercury’s secondary power is in exemplifying the strength of software solutions that can bring powerful data to the table for both reporting and decision-making. With easier access to data ready for emissions disclosures, data center operators can unblock themselves in emissions reporting and thus prioritize initiatives for emissions mitigation. 

Reliability & Maintenance

Uptime expectations for data centers are high, even for data centers serving small-scale use cases. Take Tier 1 data centers, for instance: as a reference, Tier 1 data centers serve the most basic of data center use cases. Tier 1 data centers are expected to maintain 99.741% uptime annually. Data centers that serve highest-level use cases are expected to reduce their downtime to mere minutes a year.

A key piece of reliability lies in ensuring that energy inflow is sufficient for a data center’s energy needs. Data center capacity planning software like Verdigris allows data center operators to proactively manage capacity and reduce downtime by ensuring that relevant hardware systems always have sufficient energy to operate. 

Additionally, ensuring reliability of hardware infrastructure within the data center is paramount. Observability platforms like NetBrain that monitor data center performance to unlock proactive vs. reactive maintenance increases reliability and reduces downtime. By catching potential issues before they occur, data center monitoring technologies can help data center operators reduce the costs – operational and environmental – of hardware breakdowns.

Software systems are able to flag issues to human operators and help workers better respond to issues. That said, even with software, labor needs for hardware maintenance are still substantial. Rapid response is also challenging. Given the sensitivity of uptime in AI data center use cases, more cost- and time-efficient solutions can improve data center operational efficiency.

Software-enabled hardware plays can bridge the gap by improving data center maintenance processes while decreasing labor needs and costs. Naver, for instance, deploys SeRo, asset management automation robots, to serve their data center server maintenance needs. A data center run with software-enabled robotics support like SeRo can catch and take action on issues before they occur, avoiding costs / emissions borne via infrastructure breakdowns. With robotics, this response can happen faster and without the need to dial in on-call technicians, decreasing response time and costs of response. 

Subscribe to our Substack to keep up with future insights. To get in touch, connect with us here.

Back to Insights
Share Article: