Guidelines for implementing containers in the Index Cloud

The following document outlines guidelines and best practices for implementing containers used for real-time enrichment in the Index Cloud.

Endpoint requirements

Containers used for real-time enrichment must expose the endpoints described below and be available behind HTTP or HTTPS. These endpoints should be exposed on a port configured using the SERVER_PORT environment variable.

Orchestration endpoints

Containers must expose a liveness endpoint A URL which is configured to interact with a server in a specific way. and a readiness endpoint.

Endpoint	Description	Recommended implementation
Liveness endpoint	Containers must provide a liveness endpoint, which the orchestration layers will use to check if the container is healthy or needs to be restarted. This endpoint should follow the specifications outlined in the Kubernetes documentation for liveness endpoints.	Index Exchange (Index) recommends exposing this endpoint at `/heartbeat`.
Readiness endpoint	Containers must provide a readiness endpoint, which the orchestration layers will use to check if the container is ready to receive traffic. This endpoint should follow the specifications outlined in the Kubernetes documentation for readiness endpoints.	Index recommends exposing this endpoint at `/healthcheck`.

Metrics endpoints

Containers must expose a metrics endpoint, which provides runtime metrics in Prometheus format.

Metric	Description
http.server.request.duration	A histogram measurement as described in the OpenTelemetry specification, where histogram boundaries should be [0.0025, 0.005, 0.01, 0.025, 0.05] measured in seconds. Note that these metric boundaries differ from the OpenTelemetry specification, as more detail was needed for low-latency ranges. All other attributes used to tag this metric should follow the OpenTelemetry specification.

You can expose other metrics as required. Index will collect these metrics and make them available to you. Total metric cardinality should not exceed 10,000.

Data Partner-specific classification endpoints

Endpoint Description Recommended implementation

Endpoint	Description	Recommended implementation
Partners must expose an endpoint accessible through an HTTP POST request following Index's Real-Time Data integration specification. Note: This specification will be continually updated to support new requirements. Let us know if you have new fields that you would like Index to support.	The SLA for Index to use your response in our auction processing is 5 milliseconds. HTTP requests will be made with a timeout of 30 milliseconds, with no more than 5% of requests allowed to exceed this timeout value. If more than 5% of requests time out, traffic to your container will be automatically throttled, and Index may lower the QPS Queries Per Second (QPS). The number of bid requests a DSP processes per second. Also known as impressions per second. accordingly. Calls to this endpoint will use keep-alive connections to minimize network overhead.	Index recommends exposing this endpoint at `/enrichment`.

Partners must expose an endpoint accessible through an HTTP POST request following Index's Real-Time Data integration specification.

Note: This specification will be continually updated to support new requirements. Let us know if you have new fields that you would like Index to support.

The SLA for Index to use your response in our auction processing is 5 milliseconds. HTTP requests will be made with a timeout of 30 milliseconds, with no more than 5% of requests allowed to exceed this timeout value. If more than 5% of requests time out, traffic to your container will be automatically throttled, and Index may lower the QPS Queries Per Second (QPS). The number of bid requests a DSP processes per second. Also known as impressions per second. accordingly. Calls to this endpoint will use keep-alive connections to minimize network overhead.

Index recommends exposing this endpoint at /enrichment.

Logging

Logs from your container should be written to STDOUT, and/or STDERR as appropriate. Index will collect these logs and make them available to you. Containers should not produce an excessive number of log messages. Instead, metrics should be used to capture container performance.

Data Partner-specific resource usage requirements

Currently, the main task for containers on the Index Cloud is to classify ad requests in real time. As a result, resource usage will increase proportionally with the load on the containers.

Containers on the Index Cloud should be designed to run on x86 architecture and handle 100,000 queries per second (QPS) to their classification endpoint using 10 CPU cores and 10 GiB of memory. Resource usage should increase proportionally with the load. If resource usage exceeds these limits, discuss it with Index. Images should not exceed 1GiB.

Testing your container

All containers must be tested before deployment to the Index Cloud. Partners should test their APIs locally for correctness and performance before requesting deployment to the Index Cloud.

Index will also conduct functional testing on your API before it is deployed to our production cloud. If your container doesn't meet the specifications outlined in this document, it may not be deployed.

Using the Index testing tool

Index provides a testing tool that will simulate the load that your containers will receive in Index’s production Cloud. The docker image for this testing tool is hosted in Index’s own docker hub instance. To gain access to the testing tool, contact your Index Representative and provide the email address associated with the docker hub account that will need to access the tool.

The container can be initiated using the following command:

docker run --network="host" -t --rm --name send-rtd-requests -e COMMAND_LINE_FLAGS="-e TEST_ENDPOINT" indexexchangehub/ext-data-provider-test-tools:2024.11.20

Replace TEST_ENDPOINT in the command above with the URL you'd like to test. For example, "http://myapi.com/test". To further customize the test tool, you can pass additional flags into the container using the COMMAND_LINE_FLAGS environment variable during startup. The available flags are outlined below:

-c : Specifies the number of HTTP connections that the container will establish. Default 100 (this is realistic to Index’s production Cloud)
-e: Specifies the HTTP endpoint for testing
-l: Specifies the log level for the testing tool
-q: Specifies the maximum QPS that the container should generate. The QPS your container is sent in Index’s production Cloud will change depending on how your container scales. (horizontally vs vertically), and the bid stream coverage you wish to achieve.
-t: Specifies the HTTP timeout in milliseconds (default 30, this aligns with Index's production Cloud)

Installation

Networking requirements

Containers deployed on the Index Cloud are not be granted egress outside of the Index Cloud. Ingress to these containers is only available in the Index Cloud. Consequently, any information that the container needs to start up, such as a configuration file or seed data, should either be included in the container or mounted through a configuration file.

Security

All containers deployed to the Index Cloud are scanned by Sonatype Vulnerability Scanner on initial download, and on an ongoing basis. Containers identified as vulnerable to threats with a severity level of 9 or 10 will not be deployed to the Index Cloud.

Containers are deployed with a read-only file system and are not permitted to run as root.

Configuration files

Configuration files can be customized for the Index Cloud and mounted into your container at your preferred location in TOML format when it starts up.

Sharing your container with Index

To share your container with Index, create a Docker Hub repository to host your images. Grant Index's nexus@indexexchange.onmicrosoft.com email address access to your Docker Hub repository and push the images that you want to host in the Index Cloud into your Docker Hub repository. Images are automatically detected and sent through Index’s security scanning pipelines. After an image has been registered by Index, any updates to this image in the partner's registry will be ignored.

Image names should follow the CalVer versioning system so each image can be uniquely identified e.g. “v1.20250525.0” to indicate the 1st version built on May 25, 2025.

Updating your container

Any updates to your container are applied by the Index. Index requires at least one business day of notice to schedule a deployment. Updates should be requested no more than twice a week.

Kubernetes policies

Containers are deployed into a dedicated namespace. Egress In Kubernetes, egress refers to the traffic that flows out of a cluster, from a pod to an external endpoint. Egress traffic can be used to access external services such as databases, APIs, and other services running outside of the cluster. from pods namespace is denied by default. Exceptions to egress policy can be requested. Containers will be run with security context defined as non-root access, user 1000, group 1000, on a read-only file system.

Installation using Helm and Argo CD

Index will use this Helm chart to deploy your container into Index’s Kubernetes environment can be provided on demand. The Helm chart will expose your container's API endpoints using host networking to reduce latency.

Roll out strategy

A newer version of the container will be deployed in stages. Initially, a small set of instances will be launched. If these instances perform well, the remaining instances will be brought up. If they under-perform, the newer version of the container will be automatically rolled back.

Analysis expectations

Two metrics determine the application's performance:

Latency on the classification endpoint must not exceed the SLA (5 milliseconds).
Error rate on the classification endpoint must be less than 5%.

If you want any additional checks, please communicate them to the Index, and they can be integrated into the roll out strategy.

Expected behavior

During a roll out, instances of the newer version of the container (canary set) will immediately start receiving real classification requests. Existing instances of the older version (stable set) will continue to receive real classification requests.

The total number of instances in both sets will remain constant. As the canary set scales up, the stable set will scale down proportionally. Once the canary set reaches the expected number of instances, the old stable set will be completely removed, and the canary set will become the new stable set.

Monitoring your container in production

Currently, Index monitors your container's performance based on the following criteria:

Memory and CPU usage meets the resource usage guidelines outlined above
The percentage of requests to the classification endpoint that exceed the SLA (5 milliseconds)
The percentage of requests that result in errors

Upon request, these metrics are shared with you. Soon, Index will provide Grafana dashboards so you can monitor these metrics yourself.

Updating your container

You can update your container without affecting the code currently running in production, and refresh containerized solutions without rebuilding the entire image. For details on how to update your container, see Updating models and manifests in containers.