Prometheus Grafana setup - Temporal Cloud feature guide
How to set up Grafana with Temporal Cloud observability to view metrics.
Temporal Cloud and SDKs generate metrics for monitoring performance and troubleshooting errors.
Temporal Cloud emits metrics through a Prometheus HTTP API endpoint, which can be directly used as a Prometheus data source in Grafana or to query and export Cloud metrics to any observability platform.
The open-source SDKs require you to set up a Prometheus scrape endpoint for Prometheus to collect and aggregate the Worker and Client metrics.
This section describes how to set up your Temporal Cloud and SDK metrics and use them as data sources in Grafana.
The process for setting up observability includes the following steps:
- Create or get your Prometheus endpoint for Temporal Cloud metrics and enable SDK metrics.
- For Temporal Cloud, generate a Prometheus HTTP API endpoint on Temporal Cloud using valid certificates.
- For SDKs, expose a metrics endpoint where Prometheus can scrape SDK metrics and run Prometheus on your host. The examples in this article describe running Prometheus on your local machine where you run your application code.
- Run Grafana and set up data sources for Temporal Cloud and SDK metrics in Grafana. The examples in this article describe running Grafana on your local host where you run your application code.
- Create dashboards in Grafana to view Temporal Cloud metrics and SDK metrics. Temporal provides sample community-driven Grafana dashboards for Cloud and SDK metrics that you can use and customize according to your requirements.
If you're following through with the examples provided here, ensure that you have the following:
-
Root CA certificates and end-entity certificates. See Certificate requirements for details.
-
Set up your connections to Temporal Cloud using an SDK of your choice and have some Workflows running on Temporal Cloud. See Connect to a Temporal Service for details.
-
Prometheus and Grafana installed.
Temporal Cloud metrics setup
Before you set up your Temporal Cloud metrics, ensure that you have the following:
- Global Admin privileges to the Temporal Cloud account.
- CA certificate and key for the Observability integration. You will need the certificate to set up the Observability endpoint in Temporal Cloud.
The following steps describe how to set up Observability on Temporal Cloud to generate an endpoint:
- Log in to Temporal Cloud UI as a Global Admin.
- Go to Settings and select Integrations.
- Select Configure Observability (if you're setting it up for the first time) or click Edit in the Observability section (if it was already configured before).
- Add your root CA certificate (.pem) and save it. Note that if an observability endpoint is already set up, you can append your root CA certificate here to use the generated observability endpoint with your instance of Grafana.
- To test your endpoint, run the following command on your host:
If you have Workflows running on a Namespace in your Temporal Cloud instance, you should see some data as a result of running this command.
curl -v --cert <path to your client-cert.pem> --key <path to your client-cert.key> "<your generated Temporal Cloud prometheus_endpoint>/api/v1/query?query=temporal_cloud_v0_state_transition_count"
- Copy the HTTP API endpoint that is generated (it is shown in the UI).
This endpoint should be configured as a data source for Temporal Cloud metrics in Grafana. See Data sources configuration for Temporal Cloud and SDK metrics in Grafana for details.
SDK metrics setup
SDK metrics are emitted by SDK Clients used to start your Workers and to start, signal, or query your Workflow Executions. You must configure a Prometheus scrape endpoint for Prometheus to collect and aggregate your SDK metrics. Each language development guide has details on how to set this up.
The following example uses the Java SDK to set the Prometheus registry and Micrometer stats reporter, set the scope, and expose an endpoint from which Prometheus can scrape the SDK metrics.
//You need the following packages to set up metrics in Java.
//See the Developer's guide for packages required for other SDKs.
//…
import com.sun.net.httpserver.HttpServer;
import com.uber.m3.tally.RootScopeBuilder;
import com.uber.m3.tally.Scope;
import com.uber.m3.util.Duration;
import com.uber.m3.util.ImmutableMap;
import io.micrometer.prometheus.PrometheusConfig;
import io.micrometer.prometheus.PrometheusMeterRegistry;
import io.temporal.common.reporter.MicrometerClientStatsReporter;
import java.io.IOException;
import java.io.OutputStream;
import java.net.InetSocketAddress;
import io.temporal.serviceclient.SimpleSslContextBuilder;
import io.temporal.serviceclient.WorkflowServiceStubs;
import io.temporal.serviceclient.WorkflowServiceStubsOptions;
import java.io.FileInputStream;
import java.io.InputStream;
//…
{
// See the Micrometer documentation for configuration details on other supported monitoring systems.
// Set up the Prometheus registry.
PrometheusMeterRegistry yourRegistry = new PrometheusMeterRegistry(PrometheusConfig.DEFAULT);
public static Scope yourScope(){
//Set up a scope, report every 10 seconds
Scope yourScope = new RootScopeBuilder()
.tags(ImmutableMap.of(
"customtag1",
"customvalue1",
"customtag2",
"customvalue2"))
.reporter(new MicrometerClientStatsReporter(yourRegistry))
.reportEvery(Duration.ofSeconds(10));
//Start Prometheus scrape endpoint at port 8077 on your local host
HttpServer scrapeEndpoint = startPrometheusScrapeEndpoint(yourRegistry, 8077);
return yourScope;
}
/**
* Starts HttpServer to expose a scrape endpoint. See
* https://micrometer.io/docs/registry/prometheus for more info.
*/
public static HttpServer startPrometheusScrapeEndpoint(
PrometheusMeterRegistry yourRegistry, int port) {
try {
HttpServer server = HttpServer.create(new InetSocketAddress(port), 0);
server.createContext(
"/metrics",
httpExchange -> {
String response = registry.scrape();
httpExchange.sendResponseHeaders(200, response.getBytes(UTF_8).length);
try (OutputStream os = httpExchange.getResponseBody()) {
os.write(response.getBytes(UTF_8));
}
});
server.start();
return server;
} catch (IOException e) {
throw new RuntimeException(e);
}
}
}
//…
// With your scrape endpoint configured, set the metrics scope in your Workflow service stub and
// use it to create a Client to start your Workers and Workflow Executions.
//…
{
//Create Workflow service stubs to connect to the Frontend Service.
WorkflowServiceStubs service = WorkflowServiceStubs.newServiceStubs(
WorkflowServiceStubsOptions.newBuilder()
.setMetricsScope(yourScope()) //set the metrics scope for the WorkflowServiceStubs
.build());
//Create a Workflow service client, which can be used to start, signal, and query Workflow Executions.
WorkflowClient yourClient = WorkflowClient.newInstance(service,
WorkflowClientOptions.newBuilder().build());
}
//…
To check whether your scrape endpoints are emitting metrics, run your code and go to http://localhost:8077/metrics to verify that you see the SDK metrics.
You can set up separate scrape endpoints in your Clients that you use to start your Workers and Workflow Executions.
For more examples on setting metrics endpoints in other SDKs, see the metrics samples:
SDK metrics Prometheus Configuration
How to configure Prometheus to ingest Temporal SDK metrics.
For Temporal SDKs, you must have Prometheus running and configured to listen on the scrape endpoints exposed in your application code.
For this example, you can run Prometheus locally or as a Docker container. In either case, ensure that you set the listen targets to the ports where you expose your scrape endpoints. When you run Prometheus locally, set your target address to port 8077 in your Prometheus configuration YAML file. (We set the scrape endopint to port 8077 in the SDK metrics setup example.)
Example:
global:
scrape_interval: 10s # Set the scrape interval to every 10 seconds. Default is every 1 minute.
#...
# Set your scrape configuration targets to the ports exposed on your endpoints in the SDK.
scrape_configs:
- job_name: 'temporalsdkmetrics'
metrics_path: /metrics
scheme: http
static_configs:
- targets:
# This is the scrape endpoint where Prometheus listens for SDK metrics.
- localhost:8077
# You can have multiple targets here, provided they are set up in your application code.
See the Prometheus documentation for more details on how you can run Prometheus locally or using Docker.
Note that Temporal Cloud exposes metrics through a Prometheus HTTP API endpoint (not a scrape endpoint) that can be configured as a data source in Grafana. The Prometheus configuration described here is for scraping metrics data on endpoints for SDK metrics only.
To check whether Prometheus is receiving metrics from your SDK target, go to http://localhost:9090 and navigate to Status > Targets. The status of your target endpoint defined in your configuration appears here.
Grafana data sources configuration
How to configure data sources for Temporal Cloud and SDK metrics in Grafana.
Depending on how you use Grafana, you can either install and run it locally, run it as a Docker container, or log in to Grafana Cloud to set up your data sources.
If you have installed and are running Grafana locally, go to http://localhost:3000 and sign in.
You must configure your Temporal Cloud and SDK metrics data sources separately in Grafana.
To add the Temporal Cloud Prometheus HTTP API endpoint that we generated in the Temporal Cloud metrics setup section, do the following:
- Go to Configuration > Data sources.
- Select Add data source > Prometheus.
- Enter a name for your Temporal Cloud metrics data source, such as Temporal Cloud metrics.
- In the HTTP section, paste the URL that was generated in the Observability section on the Temporal Cloud UI.
- In the Auth section, enable TLS Client Auth.
- In the TLS/SSL Auth Details section, paste the end-entity certificate and key. Note that the end-entity certificate used here must be part of the certificate chain with the root CA certificates used in your Temporal Cloud observability setup.
- Click Save and test to verify that the data source is working.
If you see issues in setting this data source, verify your CA certificate chain and ensure that you are setting the correct certificates in your Temporal Cloud observability setup and in the TLS authentication in Grafana.
To add the SDK metrics Prometheus endpoint that we configured in the SDK metrics setup and Prometheus configuration for SDK metrics sections, do the following:
- Go to Configuration > Data sources.
- Select Add data source > Prometheus.
- Enter a name for your Temporal Cloud metrics data source, such as Temporal SDK metrics.
- In the HTTP section, enter your Prometheus endpoint in the URL field.
If running Prometheus locally as described in the examples in this article, enter
http://localhost:9090
. - For this example, enable Skip TLS Verify in the Auth section.
- Click Save and test to verify that the data source is working.
If you see issues in setting this data source, check whether the endpoints set in your SDKs are showing metrics. If you don't see your SDK metrics at the scrape endpoints defined, check whether your Workers and Workflow Executions are running. If you see metrics on the scrape endpoints, but Prometheus shows your targets are down, then there is an issue with connecting to the targets set in your SDKs. Verify your Prometheus configuration and restart Prometheus.
If you're running Grafana as a container, you can set your SDK metrics Prometheus data source in your Grafana configuration. See the example Grafana configuration described in the Prometheus and Grafana setup for open-source Temporal Service article.
Grafana dashboards setup
To set up your dashboards in Grafana, either use the UI or configure them in your Grafana deployment.
In this section, we will configure our dashboards using the UI.
- Go to Create > Dashboard and add an empty panel.
- On the Panel configuration page, in the Query tab, select the "Temporal Cloud metrics" or "Temporal SDK metrics" data source
that we configured in the previous section.
If you want to add multiple queries that involve both data sources, select
–Mixed–
. - Add your metrics queries:
- For Temporal Cloud metrics, expand the Metrics browser and select the metrics you want to see. You can also select associated labels and values to sort the data on the query. The documentation on Cloud metrics lists metrics emitted from Temporal Cloud.
- For Temporal SDK metrics, expand the Metrics browser and select the metrics you want to see. A list of metrics on Worker performance are described in Developer's Guide - Worker performance. All metrics related to SDKs are described in the SDK metrics reference.
- You should see the graph show data based on the queries you have selected. Note that for SDK metrics to show, you must have some Workflow Execution data and running Workers. If you do not see any metrics data from the SDK, run your Worker and Workflow Executions and monitor your dashboard.
Temporal has a repository with some community-driven example dashboards for Temporal Cloud and Temporal SDKs that you can use and customize for your own requirements.
To import a dashboard in Grafana, do the following.
- Go to Create > Import.
- You can either copy and paste the JSON from Temporal Cloud and Temporal SDKs sample dashboards, or import the JSON files into Grafana.
If you import a dashboard from the repositories, ensure that you update dashboard data sources (
"uid": "${datasource}"
) in the JSON to the names you configured in the Data sources configuration section. - Save the dashboard and review the metrics data in the graphs.