Threshold Segmentation
Threshold-Based Segmentation for Flexible Business Classification.
Business Context
Many retail segmentation needs don't fit standard models like RFM. Businesses often need custom segments based on specific metrics and thresholds - whether segmenting customers by spend percentiles, stores by performance quintiles, or products by sales velocity. This module provides flexible threshold-based segmentation for any business dimension.
The Business Problem
Retailers need custom segmentation rules for different business scenarios: - Create spend-based customer tiers (Bronze, Silver, Gold, Platinum) - Classify stores into performance bands (A, B, C, D stores) - Segment products by velocity (Fast, Medium, Slow movers) - Define custom categories based on business-specific thresholds
Standard segmentation approaches are too rigid, while manual classification is inconsistent and doesn't scale across large datasets.
Real-World Applications
Customer Classification
- Create VIP tiers based on total spend percentiles
- Segment by transaction frequency for different service levels
- Classify customers by recency for retention campaigns
Store Performance Tiers
- Classify stores by sales per square foot into performance bands
- Segment locations by customer conversion rates
- Create store tiers for investment prioritization
Product Categorization
- Segment SKUs by sales velocity for inventory management
- Classify products by margin contribution for pricing strategies
- Create ABC analysis categories for supply chain optimization
Technical Features
- Flexible percentile-based thresholds for consistent segment sizing
- Custom aggregation functions for different business metrics
- Configurable handling of zero-value entities
- Efficient execution using Ibis for large datasets
ThresholdSegmentation
Segments customers based on user-defined thresholds and segments.
Source code in openretailscience/segmentation/threshold.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 | |
df
cached
property
Returns the dataframe with the segment names.
__init__(df, thresholds, segments, value_col=None, agg_func='sum', zero_segment_name='Zero', zero_value_customers='separate_segment', group_col=None)
Segments customers based on user-defined thresholds and segments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame | Table
|
A dataframe with the transaction data. The dataframe must contain a customer_id column. |
required |
thresholds
|
List[float]
|
The percentile thresholds for segmentation. |
required |
segments
|
List[str]
|
A list of segment names for each threshold. |
required |
value_col
|
str
|
The column to use for the segmentation. Defaults to ColumnHelper().unit_spend. |
None
|
agg_func
|
str
|
The aggregation function to use when grouping by customer_id. Defaults to "sum". |
'sum'
|
zero_segment_name
|
str
|
The name of the segment for customers with zero spend. Defaults to "Zero". |
'Zero'
|
zero_value_customers
|
Literal['separate_segment', 'exclude', 'include_with_light']
|
How to handle customers with zero spend. Defaults to "separate_segment". |
'separate_segment'
|
group_col
|
str | list[str] | None
|
Column(s) to group by when calculating segments. When specified, segments are calculated within each group independently. For example, setting group_col="store_id" calculates Heavy/Medium/Light segments within each store. Defaults to None. |
None
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataframe is missing the columns option column.customer_id or |
Source code in openretailscience/segmentation/threshold.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |