- Published on
- ·6 min read
Azure OpenAI Reference Architecture: 3 Deployment Topologies
Not every Azure OpenAI deployment looks the same.
A startup building an internal chatbot has completely different requirements than an enterprise running AI-assisted workflows across 50 business units. The topology you choose determines your security posture, cost visibility, operational complexity, and how well you can scale.
Here are the three architectures I see deployed most often, when each one makes sense, and the tradeoffs nobody tells you about.
Topology 1: Direct API
The simplest possible deployment. Your application calls Azure OpenAI directly.
App Service / AKS
|
| HTTPS (API Key in header)
↓
Azure OpenAI Endpoint
(eastus region)
When to use this:
- Single application or team
- Early-stage product, moving fast
- Internal tools where security requirements are basic
- Less than $2,000/month in estimated spend (low enough that a single quota is fine)
What you get:
- Zero infrastructure overhead
- Single endpoint, single API key
- No latency added by intermediate layers
- Easy to debug (no middleman)
What you lose:
- No rate limiting per team or per user
- No request logging beyond what Azure provides
- No failover if the region has an outage
- No cost visibility per application or cost center
The most common mistake: Using a single API key shared across multiple applications. When you hit a rate limit, you can't tell which app caused it. Rotate one key and you break all apps.
Fix: One Azure OpenAI resource per major application. Separate managed identities instead of API keys where possible.
Topology 2: Azure API Management Gateway
APIM sits in front of one or more Azure OpenAI endpoints and handles everything you'd otherwise build yourself.
App A ──┐
App B ──┼──→ Azure API Management ──→ Azure OpenAI (eastus)
App C ──┘ │ │
│ Fallback if 429
│ ↓
Log Analytics Azure OpenAI (swedencentral)
Cost Tracking
Rate Limits per App
When to use this:
- Multiple teams or applications sharing AI capacity
- Need per-consumer rate limiting
- Compliance requires request logging
- Cost allocation across business units
- Want load balancing or regional failover
APIM backend policy for OpenAI failover:
<backend>
<retry condition="@(context.Response.StatusCode == 429 || context.Response.StatusCode >= 500)"
count="2" interval="2" delta="2" first-fast-retry="false">
<choose>
<when condition="@(context.Variables.GetValueOrDefault<int>("backendIndex") == 0)">
<set-variable name="backendIndex" value="1" />
<set-backend-service base-url="https://your-swedencentral.openai.azure.com" />
</when>
<otherwise>
<set-variable name="backendIndex" value="0" />
<set-backend-service base-url="https://your-eastus.openai.azure.com" />
</otherwise>
</choose>
</retry>
</backend>
Per-subscription rate limiting:
<inbound>
<rate-limit-by-key calls="100" renewal-period="60"
counter-key="@(context.Subscription.Id)"
increment-condition="@(context.Response.StatusCode == 200)" />
</inbound>
APIM cost tradeoff:
- Developer tier: $0/month but no SLA, no production use
- Basic tier: ~$50/month
- Standard tier: ~$300/month (needed for VNet integration)
- Premium: $3,000+/month (needed for multi-region with private endpoints)
For most teams with 3–10 apps, APIM Standard tier is the right call.
Topology 3: Private Endpoint + VNet
Everything stays off the public internet. Azure OpenAI is accessible only from within your VNet (or connected networks via ExpressRoute/VPN).
On-Premises / Corp Network
|
ExpressRoute / VPN Gateway
|
Azure Hub VNet
| ├── Spoke VNet A (App Teams)
| │ └── App Service (VNet Integration)
| │ │
| └── Spoke VNet B (Platform)
| └── Private DNS Zone
| openai.azure.com → 10.x.x.x
↓
Private Endpoint
(Azure OpenAI)
10.x.x.x:443
When you need this:
- Regulated industry (healthcare, finance, government)
- Data residency or sovereignty requirements
- Policy prohibits traffic over public internet
- ExpressRoute-connected Azure environment
The DNS gotcha that breaks 80% of first deployments:
When you create a Private Endpoint for Azure OpenAI, the FQDN yourresource.openai.azure.com must resolve to the private IP — but only from within your VNet.
If DNS resolves to the public IP, traffic bypasses the private endpoint entirely. The request still works but goes over public internet — which defeats the entire point.
Fix: Create a Private DNS Zone and link it to all VNets that need access:
# Create the private DNS zone
az network private-dns zone create \
--resource-group rg-networking \
--name "privatelink.openai.azure.com"
# Link to your VNet
az network private-dns link vnet create \
--resource-group rg-networking \
--zone-name "privatelink.openai.azure.com" \
--name link-hub-vnet \
--virtual-network hub-vnet \
--registration-enabled false
# Create DNS record for the private endpoint
az network private-endpoint dns-zone-group create \
--resource-group rg-ai \
--endpoint-name pe-openai \
--name openai-dns-group \
--private-dns-zone "privatelink.openai.azure.com" \
--zone-name openai
After this, nslookup yourresource.openai.azure.com from inside the VNet should return a 10.x.x.x address, not a public IP.
Outbound rule requirement: If your App Service or AKS uses VNet integration, confirm the NSG on the subnet allows outbound HTTPS (port 443) to your private endpoint subnet. Missing this rule causes silent failures — the TCP handshake never completes but no useful error is logged.
Choosing the Right Topology
| Requirement | Direct API | APIM Gateway | Private VNet |
|---|---|---|---|
| Fastest to deploy | ✓ | ||
| Per-app rate limiting | ✓ | ✓ | |
| Cost allocation per team | ✓ | ✓ | |
| Multi-region failover | ✓ | ✓ | |
| No public internet | ✓ | ||
| Compliance / regulated | ✓ | ||
| Shared capacity management | ✓ | ✓ | |
| Monthly cost overhead | None | $50–300 | VNet + PE costs |
Decision tree:
- Are you in a regulated industry or have public internet restrictions? → Private VNet
- Do you have 3+ teams sharing OpenAI capacity? → APIM Gateway
- Are you early-stage, single team, moving fast? → Direct API (migrate to APIM when you have 2+ apps)
Most enterprise deployments end up at Topology 3 overlaid with Topology 2 — APIM deployed inside the VNet, sitting in front of private endpoints. You get rate limiting, logging, and failover — all without any public internet exposure.
What to Do Next
If you're on Direct API today and want to migrate to APIM without breaking existing apps:
- Deploy APIM with the same backend (your existing OpenAI endpoint)
- Update one app to point at APIM instead of OpenAI directly
- Validate behavior and logs
- Migrate remaining apps one at a time
- Revoke direct API keys once all apps route through APIM
The migration takes a day. The operational improvements last years.