How Index uses stratified sampling in reporting

As data volumes grow, processing 100% of reporting data for every report can impact performance. When storing reporting data, Index Exchange (Index) uses stratified sampling to optimize performance while ensuring accuracy and precision for Reports in the Index UI The Index UI at app.indexexchange.com that allows you to manage integration settings, such as inventory, campaign, and deal settings. and for the Reporting API An Index product that provides aggregated reporting data that can be delivered through email, Amazon S3 bucket, or pulled through the API. The Reporting API allows customers to create their own customized reports and schedule their automated delivery..

How stratified sampling works

When Index is storing reporting data, rather than storing 100% of the data, stratified sampling creates an accurate estimate for the dataset by sampling population groups.

When reporting data is being processed, the dataset is divided into groups called strata based on attributes (for example, the dataset may be divided by site_id).
Each strata is independently sampled.
The sampled data is extrapolated (which is when unknown values are inferred using trends in the data) and combined to form an estimate for the whole dataset.
Index then stores the extrapolation in our reporting data.

Why Index uses stratified sampling

Stratified sampling allows us to:

Reduce infrastructure and storage overhead
Maintain high levels of reporting accuracy
Ensure scalable reporting as data volumes grow

Traffic varies significantly across partners, with large partners generating much more traffic than smaller partners, so stratified sampling ensures that each group is adequately represented in reporting.

How stratified sampling impacts reporting

Index conducts extensive data analysis, testing, and monitoring to ensure that our reporting products remain accurate and reliable. Stratified sampling reduces the variability within data, resulting in greater accuracy and precision in the results.

Sampling also occurs at very high volumes of data, which ensures accuracy in the results. In diverse datasets where population groups have varying characteristics or behaviors, stratified sampling reduces variability in the results and increases precision of information.

Impacted fields

The following reporting metrics are sampled, or impacted by sampling:

slot_request
slot_pass
pod_request
bid_request
bid_pass

When variability may be noticeable

In the following cases, greater variability may be present:

When reports include a small amount of data, for example, when multiple filters are applied to a report.
When grouping by high-cardinality dimensions (where there are many unique values within each field), for example, specific combinations of Marketplace deals and geo targeting. The more dimensions you group by and the more granular those dimensions are, the more variability there will be in the data.

When the quantity of data is small, normal statistical variance has a greater impact. To optimize or troubleshoot your report data:

If data slices are too small, variability may be more present. If a report has many filters, try removing some filters, or increase the range of data (for example, by increasing the date range of a report).
Since high-cardinality dimensions may increase variability, try using fewer high-cardinality fields in a report (for example, fewer ID values).