How Index uses stratified sampling in reporting

Media Owners
Last Updated: March 19, 2026

As data volumes grow, processing 100% of reporting data for every report can impact performance. When storing reporting data, Index Exchange (Index) uses stratified sampling to optimize performance while ensuring accuracy and precision for Reports in the Index UI and for the Reporting API.

How stratified sampling works

When Index is storing reporting data, rather than storing 100% of the data, stratified sampling creates an accurate estimate for the dataset by sampling population groups.

  1. When reporting data is being processed, the dataset is divided into groups called strata based on attributes (for example, the dataset may be divided by site_id).
  2. Each strata is independently sampled.
  3. The sampled data is extrapolated (which is when unknown values are inferred using trends in the data) and combined to form an estimate for the whole dataset.
  4. Index then stores the extrapolation in our reporting data.

Why Index uses stratified sampling

Stratified sampling allows us to:

  • Reduce infrastructure and storage overhead
  • Maintain high levels of reporting accuracy
  • Ensure scalable reporting as data volumes grow

Traffic varies significantly across partners, with large partners generating much more traffic than smaller partners, so stratified sampling ensures that each group is adequately represented in reporting.

How stratified sampling impacts reporting

Index conducts extensive data analysis, testing, and monitoring to ensure that our reporting products remain accurate and reliable. Stratified sampling reduces the variability within data, resulting in greater accuracy and precision in the results.

Sampling also occurs at very high volumes of data, which ensures accuracy in the results. In diverse datasets where population groups have varying characteristics or behaviors, stratified sampling reduces variability in the results and increases precision of information.

Impacted fields

The following reporting metrics are sampled, or impacted by sampling:

  • slot_request
  • slot_pass
  • pod_request
  • bid_request
  • bid_pass

When variability may be noticeable

In the following cases, greater variability may be present:

  • When reports include a small amount of data, for example, when multiple filters are applied to a report.
  • When grouping by high-cardinality dimensions (where there are many unique values within each field), for example, specific combinations of Marketplace deals and geo targeting. The more dimensions you group by and the more granular those dimensions are, the more variability there will be in the data.

When the quantity of data is small, normal statistical variance has a greater impact. To optimize or troubleshoot your report data:

  • If data slices are too small, variability may be more present. If a report has many filters, try removing some filters, or increase the range of data (for example, by increasing the date range of a report).
  • Since high-cardinality dimensions may increase variability, try using fewer high-cardinality fields in a report (for example, fewer ID values).

In summary, for most reporting use cases, aggregated trends and relative comparisons remain stable, and high-volume reporting will remain consistent. If you have further questions, please reach out to your Index Representative.