• Link to X
  • Link to LinkedIn
  • Link to Mail
  • ABOUT UI
    • Business Partners
    • Careers
    • Contact Us
    • News & Press
    • Our Team
    • Press Releases
    • Branding Guidelines
  • CONTACT
Uptime Institute Blog
  • Journal
    • Journal Home
    • Executive
    • Operations
    • Design
  • AI Services
    • AI Infrastructure Advisory
  • Tier Certification
    • Overview
    • Design
    • Construction
    • Operations
    • Tier Gap Analysis
    • Prefabricated/Modular
    • Tier Certifications List
  • Professional Services
    • Overview
    • Infrastructure Services
    • Management and Operations Services
    • Energy and Sustainability Services
    • Consulting Services
  • Education
    • Education Framework
    • Course Calendar
    • Competency & Confidence Assessments
    • Private Education
    • Graduate Roster
  • Events
    • Industry Events
    • Leadership Events
    • Network Events
  • Network
    • Overview
    • Network Calendar
    • Network Roster
    • Request Corporate Access
    • Request Guest Access
    • Uptime Network Portal
  • Intelligence
  • Clients
    • Client Stories
  • Resources
    • Data Center Industry Surveys
    • Ebooks
    • Journal Blog
    • Product Datasheets
    • Research & Reports
    • Tier Specification Documents
    • Tools
    • Webinars
  • Click to open the search input field Click to open the search input field Search
  • Menu Menu
Blog - Latest News
Enterprises will deploy inference in-house — if they can

Enterprises will deploy inference in-house — if they can

June 3, 2026/in Design, Executive, Operations/by Max Smolaks, Research Analyst, msmolaks@uptimeinstitute.com

The development of large language models (LLMs) is a complex process, requiring specialized infrastructure and skills, as well as the ability to differentiate the result — since there is little value in replicating the work done by others.

It is becoming clear that few organizations will choose to train their own LLMs, opting instead to rely on the growing number of commercial and open-weight models. This is a sign of market maturity: when organizations need enterprise software, they don’t build it from scratch; they purchase it from established vendors or deploy open-source alternatives.

After choosing a model, the next step is to determine where it will be hosted to perform inference — using a copy of the model to generate outputs in response to user inputs. Here too, the choices are becoming clearer, with distinct benefits and drawbacks to delivering AI on-premises, in colocation, cloud, or as a service.

An Uptime Intelligence report explored the economics of inference across data center venues (see Where to deploy AI inference: a guide to the economics). Based on its findings, we can make three further observations:

1: Running generative AI in your own data center can be the lowest-cost option — but few organizations will be able to achieve these savings.

The report demonstrates that consistently high hardware utilization is the key to cost-efficient AI compute. Hyperscale cloud providers can deliver lower-cost services primarily due to the economies of scale and their ability to maximize the use of their virtualized IT infrastructure. Enterprises can only match them by leveraging existing infrastructure and in-house skills (essentially sunk costs), while maintaining high utilization of the hardware.

In practice, however, few organizations track server utilization, and even fewer manage it. The Uptime Institute IT and Power Efficiency Survey 2024 found that 53% of respondents had no utilization objective for their overall server fleet. Among those that did, only 29% reported average utilization above 65% — the threshold at which AI infrastructure becomes more cost-effective on-premises.

In addition, hardware utilization for LLM-based services is notoriously difficult to predict; it is shaped by the hidden system prompts provided by developers, the complexity of the end-user inputs, and the number of tokens generated in response. This inherent unpredictability of the workload makes the public cloud more attractive, as it allows customers to only pay for the capacity they use.

2: The choice of the model will sometimes dictate the choice of infrastructure — and vice versa.

While latency, data locality, governance, and operational control are important considerations, enterprises also need to account for limited model portability.

Organizations planning to use flagship models from providers such as OpenAI or Anthropic are restricted to consuming them as a service or via a cloud platform (which is more expensive). These models cannot be deployed on-premises or in a colocation environment. Furthermore, opting to use efficient inference hardware developed by a cloud vendor (e.g., Google TPUs, AWS Trainium or Microsoft Maia) locks an organization into purchasing the vendor’s cloud services and using models that have been adapted to run on that hardware. Therefore, decisions about hardware and data center venue should not be taken separately from decisions about the choice of LLMs.

Freely distributed open-weight models paired with GPU-based servers from vendors such as Dell, HPE, Lenovo and Supermicro offer the greatest flexibility in deployment. This path is likely to emerge as the preferred option for enterprises requiring full control over their data during inference.

3: Smaller models are a little less capable and a lot cheaper to run.

The gap between the facility requirements of typical corporate IT — averaging around 7 kW per rack in 2025 — and the demands of dense AI compute necessary to deploy the so-called “frontier” models is widening. If an AI cluster cannot be accommodated within existing data halls and necessitates additional space, any cost savings over public cloud evaporate.

Not all models require dense, liquid-cooled infrastructure. Smaller, less complex LLMs are capable of delivering functionality such as transcription, translation and summarization, and can be deployed on-premises within existing facilities with minimal changes to cooling and power distribution (see Why bigger is not better: gen AI models are shrinking).

For organizations starting small, leveraging in-house facilities or existing colocation space can be the more attractive option; the cost per token remains low even without achieving high hardware utilization. This is likely why, despite the commercial attraction of public cloud, Uptime Intelligence surveys show on-premises data centers as the most popular venue for AI workloads.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on X (Opens in new window) X
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on Reddit (Opens in new window) Reddit
  • Share on WhatsApp (Opens in new window) WhatsApp
  • Email a link to a friend (Opens in new window) Email
Tags: AI, artificial intelligence, Cloud, Colocation, Data Center, digital Infrastructure
https://journal.uptimeinstitute.com/wp-content/uploads/2026/05/Enterprises-will-deploy-inference-in-house-featured.jpg 540 1030 Max Smolaks, Research Analyst, msmolaks@uptimeinstitute.com https://journal.uptimeinstitute.com/wp-content/uploads/2022/12/uptime-institute-logo-r_240x88_v2023-with-space.png Max Smolaks, Research Analyst, msmolaks@uptimeinstitute.com2026-06-03 12:00:002026-06-02 14:27:43Enterprises will deploy inference in-house — if they can
You might also like
Consensus on regulatory goals hides national differences Consensus on regulatory goals hides national differences
Alternative clouds are vulnerable to demanding buyers Alternative clouds are vulnerable to demanding buyers
The two sides of a sustainability strategy The two sides of a sustainability strategy
ATD Interview: Elie Siam, Pierre Dammous & Partners
Understanding how server power management works Understanding how server power management works
Data shows the cloud goes where the money is Data shows the cloud goes where the money is
Rapid interconnectivity growth will add complexity and risk Rapid interconnectivity growth will add complexity and risk
Cybersecurity and the cost of human error Cybersecurity and the cost of human error

Content Categories

  • Journal Home
  • Executive
  • Operations
  • Design

Subscribe to Journal via Email

Enter your email address to subscribe to Uptime Institute Journal and receive notifications of new articles by email.

  • Recent

Tags

Accredited Tier Designer (9) AI (22) artificial intelligence (17) ATD (10) Carbon Emissions (7) Climate Change (13) Cloud (23) Cloud Computing (17) Cloud Costs (15) Cloud Infrastructure (29) Cloud Migration (8) Colocation (7) cooling (9) Data Center (253) Data Center Availability (40) Data Center Cooling (13) Data Center Design (45) Data Center Disaster Recovery (7) Data Center Energy Efficiency (34) Data Center Facilities Management (43) Data Center Operations (66) data center power (8) Data Center Staffing (18) DCIM (9) digital Infrastructure (118) energy (8) Energy Efficiency (38) Environmental Sustainability (18) IT (7) IT Efficiency (16) IT Outages (10) M&O (6) outages (11) Public Cloud (7) PUE (10) Regulations (24) Resiliency (9) security (7) Sustainability (34) Sustainability Reporting (7) Tier Certification (26) Tier Certification Constructed Facility (16) Uptime Institute FORCSS (6) Uptime Institute Network (13) Uptime Institute Symposium (6)
© 2014-2025 Uptime Institute, LLC All rights reserved.
  • Link to X
  • Link to LinkedIn
  • Link to Mail
Link to: Capacity allocation and the next generation of AI-era KPIs Link to: Capacity allocation and the next generation of AI-era KPIs Capacity allocation and the next generation of AI-era KPIsCapacity allocation and the next generation of AI-era KPIs
Scroll to top Scroll to top Scroll to top