Independent pricing guide. Not affiliated with Databricks, Inc. Always verify at databricks.com/pricing

Updated April 2026

Databricks AI & Model Serving Pricing:
GPUs, Foundation Models, and Vector Search

Databricks AI pricing has three billing dimensions: Model Serving DBUs for endpoint compute, GPU instance time for training and serving, and per-token pricing for Foundation Model APIs. This is the area where Databricks pricing is evolving fastest and where third-party documentation is most sparse.

Databricks AI Pricing Overview

Databricks has expanded significantly into the AI platform space through Mosaic AI (acquired MosaicML), Foundation Model APIs, and integrated MLOps tooling. The pricing model for AI workloads differs from traditional data engineering compute in important ways.

Model Serving DBUs

Both CPU and GPU model serving endpoints charge $0.07/DBU (AWS). The key cost driver is the instance type: a T4 GPU instance consumes about 0.75 DBU/hr, while an A100 consumes 65+ DBU/hr. The low per-DBU rate is offset by the high DBU consumption of GPU instances.

Foundation Model APIs

Pay-per-token pricing for hosted models including Llama 3.3 and embedding models. Alternatively, provision dedicated GPU throughput for high-volume serving. Unity Catalog manages model governance at no additional charge on Premium tier.

GPU Compute Time

ML training workloads use standard Jobs Compute pricing ($0.15/DBU on AWS) but run on GPU instances that consume more DBUs per hour. Spot instance support provides significant savings for training jobs that can handle interruptions.

Foundation Model API Pricing

Per-token pricing for Databricks-hosted foundation models. These rates apply to the pay-per-token billing mode where you are charged only for tokens processed.

ModelTypeInput (per 1M tokens)Output (per 1M tokens)
Llama 3.3 70BOpen Source$0.50$1.50
Llama 3.1 8BOpen Source$0.15$0.45
BGE Large (Embeddings)Embeddings$0.10N/A

When to Use Pay-Per-Token vs Provisioned Throughput

Pay-Per-Token

  • Variable or low-volume workloads
  • Under ~50M tokens/month
  • Development and experimentation
  • No minimum commitment
  • Latency is acceptable (shared infrastructure)

Provisioned Throughput

  • High-volume production serving
  • Over ~100M tokens/month
  • Predictable throughput requirements
  • Billed per GPU-hour, not per token
  • Dedicated capacity for low latency

GPU Instance DBU Rates

GPU instances for model serving and training. The per-DBU rate is low ($0.07/DBU), but GPU instances consume many more DBUs per hour than CPU instances, making the actual cost significantly higher.

GPUCloudInstanceDBU/hrPlatform/hrInfra/hr
T4AWSg4dn.xlarge0.8$0.05$0.53
T4AZUREStandard_NC4as_T4_v30.8$0.05$0.53
A10GAWSg5.xlarge2.0$0.14$1.01
V100AWSp3.2xlarge6.5$0.46$3.06
V100AZUREStandard_NC6s_v35.0$0.35$3.06
A100 (40GB)AWSp4d.24xlarge (per GPU)65.0$4.55$32.77

Monthly Cost Examples

Light Serving (T4 GPU)

1 endpoint, 8 hrs/day, 30 days

Platform: $12.60/mo

Infrastructure: $126/mo

Total: ~$139/mo

Production Serving (A10G)

2 endpoints, 24/7

Platform: $201/mo

Infrastructure: $1,449/mo

Total: ~$1,650/mo

Heavy Training (A100)

1 instance, 100 hrs/month (spot)

Platform: $455/mo

Infrastructure (spot): ~$983/mo

Total: ~$1,438/mo

Vector Search Pricing

Databricks Vector Search provides a managed vector database for similarity search, integrated with Unity Catalog. Pricing has two components: storage for the vector index and compute for the serving endpoints.

Endpoint Pricing

Standard$0.28/hr
Storage Optimized$1.28/hr

Storage: $0.023/GB/month

vs Standalone Vector Databases

Databricks Vector SearchIntegrated with Unity Catalog
$0.28-$1.28/hr
Pinecone ServerlessServerless, pay-per-query
From $0.33/M read units
Weaviate CloudManaged cloud service
From $25/mo
Qdrant CloudManaged, also self-hostable
From $0.023/hr

ML Training Cost Estimation

Training ML models on Databricks uses Jobs Compute pricing ($0.15/DBU on AWS Premium) with GPU-enabled instance types. The primary cost driver is the GPU type and training duration. Spot instances are strongly recommended for training workloads because most modern training frameworks support checkpointing, allowing jobs to resume after spot interruptions.

Training ScenarioDurationOn-Demand CostSpot Cost
Fine-tune small model (T4, 1 GPU)4 hrs$3$1
Fine-tune medium model (A10G, 1 GPU)8 hrs$10$3
Train custom model (V100, 4 GPUs)24 hrs$92$28
Large model training (A100, 8 GPUs)72 hrs$2,950$885
Foundation model pre-training (A100 cluster)720 hrs$29,500$8,850

Estimates include both Databricks platform (DBU) and cloud infrastructure costs on AWS. Spot pricing assumes 70% discount. Actual costs vary by instance availability, region, and training efficiency.

Frequently Asked Questions

How much does Databricks model serving cost?

Databricks model serving uses a DBU-based pricing model at $0.07/DBU for both CPU and GPU serving. However, GPU instances consume far more DBUs per hour than CPU instances. A T4 GPU instance consumes about 0.75 DBU/hr ($0.053/hr in platform fees), while an A100 instance consumes 65+ DBU/hr ($4.55/hr in platform fees). Total serving cost includes both the DBU charge and the cloud infrastructure charge for the underlying GPU instances.

What is the difference between pay-per-token and provisioned throughput?

Pay-per-token charges you per million input and output tokens with no commitment. This is ideal for variable or low-volume workloads. Provisioned throughput gives you dedicated GPU capacity billed by the hour, which is cheaper per token at high volumes but requires a minimum commitment. The break-even depends on your token volume, but provisioned throughput typically becomes cost-effective above roughly 50-100M tokens per month.

How does Databricks AI pricing compare to AWS SageMaker?

For model serving, Databricks and SageMaker are roughly comparable on a per-hour basis for similar GPU instances. Databricks has an advantage for teams already running data engineering on the platform because there is no data movement cost. SageMaker has a broader selection of built-in algorithms and managed training jobs. For foundation model APIs, pricing depends on the specific model and throughput requirements.

What does Databricks Vector Search cost?

Vector Search has two cost components: storage at approximately $0.023 per GB per month (same as underlying cloud storage), and endpoint compute at $0.28 to $1.28 per hour depending on endpoint size. For comparison, Pinecone Serverless starts at $0.33 per million read units, Weaviate Cloud starts at $25/month, and self-hosted options like Qdrant have only infrastructure costs.

Can I use spot instances for ML training on Databricks?

Yes, and this is one of the biggest cost optimization levers for ML workloads. Training jobs are typically fault-tolerant (checkpointing allows resumption after interruption), making them ideal for spot instances. Using spot instances for GPU training can save 60-80% on the cloud infrastructure portion of the bill. The Databricks DBU rate remains the same regardless of spot vs on-demand instances.