Vertex AI - Self Deployed Models
Deploy and use your own models on Vertex AI through Model Garden or custom endpoints.
Model Gardenโ
tip
All OpenAI compatible models from Vertex Model Garden are supported.
Using Model Gardenโ
Almost all Vertex Model Garden models are OpenAI compatible.
- OpenAI Compatible Models
- Non-OpenAI Compatible Models
Property | Details |
---|---|
Provider Route | vertex_ai/openai/{MODEL_ID} |
Vertex Documentation | Model Garden LiteLLM Inference, Vertex Model Garden |
Supported Operations | /chat/completions , /embeddings |
- SDK
- Proxy
from litellm import completion
import os
## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = completion(
model="vertex_ai/openai/<your-endpoint-id>",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
1. Add to config
model_list:
- model_name: llama3-1-8b-instruct
litellm_params:
model: vertex_ai/openai/5464397967697903616
vertex_ai_project: "my-test-project"
vertex_ai_location: "us-east-1"
2. Start proxy
litellm --config /path/to/config.yaml
# RUNNING at http://0.0.0.0:4000
3. Test it!
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data '{
"model": "llama3-1-8b-instruct", # ๐ the 'model_name' in config
"messages": [
{
"role": "user",
"content": "what llm are you"
}
],
}'
from litellm import completion
import os
## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"
response = completion(
model="vertex_ai/<your-endpoint-id>",
messages=[{ "content": "Hello, how are you?","role": "user"}]
)
Gemma Models (Custom Endpoints)โ
Deploy Gemma models on custom Vertex AI prediction endpoints with OpenAI-compatible format.
Property | Details |
---|---|
Provider Route | vertex_ai/gemma/{MODEL_NAME} |
Vertex Documentation | Vertex AI Prediction |
Required Parameter | api_base - Full prediction endpoint URL |
Proxy Usage:
1. Add to config.yaml
model_list:
- model_name: gemma-model
litellm_params:
model: vertex_ai/gemma/gemma-3-12b-it-1222199011122
api_base: https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict
vertex_project: "my-project-id"
vertex_location: "us-central1"
2. Start proxy
litellm --config /path/to/config.yaml
3. Test it
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gemma-model",
"messages": [{"role": "user", "content": "What is machine learning?"}],
"max_tokens": 100
}'
SDK Usage:
from litellm import completion
response = completion(
model="vertex_ai/gemma/gemma-3-12b-it-1222199011122",
messages=[{"role": "user", "content": "What is machine learning?"}],
api_base="https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict",
vertex_project="my-project-id",
vertex_location="us-central1",
)
MedGemma Models (Custom Endpoints)โ
Deploy MedGemma models on custom Vertex AI prediction endpoints with OpenAI-compatible format. MedGemma models use the same vertex_ai/gemma/
route.
Property | Details |
---|---|
Provider Route | vertex_ai/gemma/{MODEL_NAME} |
Vertex Documentation | Vertex AI Prediction |
Required Parameter | api_base - Full prediction endpoint URL |
Proxy Usage:
1. Add to config.yaml
model_list:
- model_name: medgemma-model
litellm_params:
model: vertex_ai/gemma/medgemma-2b-v1
api_base: https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict
vertex_project: "my-project-id"
vertex_location: "us-central1"
2. Start proxy
litellm --config /path/to/config.yaml
3. Test it
curl http://0.0.0.0:4000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "medgemma-model",
"messages": [{"role": "user", "content": "What are the symptoms of hypertension?"}],
"max_tokens": 100
}'
SDK Usage:
from litellm import completion
response = completion(
model="vertex_ai/gemma/medgemma-2b-v1",
messages=[{"role": "user", "content": "What are the symptoms of hypertension?"}],
api_base="https://ENDPOINT.us-central1-PROJECT.prediction.vertexai.goog/v1/projects/PROJECT_ID/locations/us-central1/endpoints/ENDPOINT_ID:predict",
vertex_project="my-project-id",
vertex_location="us-central1",
)