Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window. Try Gemini 1.5 models, the latest multimodal models in Vertex AI, and see what you can build with up to a 2M token context window.

Method: projects.locations.endpoints.serverStreamingPredict

Perform a server-side streaming online prediction request for Vertex LLM streaming.

Endpoint

post


                https://{service-endpoint}/v1beta1/{endpoint}:serverStreamingPredict

Where {service-endpoint} is one of the supported service endpoints.

Path parameters

endpoint string

Required. The name of the Endpoint requested to serve the prediction. Format: projects/{project}/locations/{location}/endpoints/{endpoint}

Request body

The request body contains data with the following structure:

Fields

inputs[] object (Tensor)

The prediction input.

parameters object (Tensor)

The parameters that govern the prediction.

Response body

If successful, the response body contains a stream of StreamingPredictResponse instances.