What follows is a practitioner's account of how the methodology is constructed for different predicted scoring methods, why each design decision was made, what the evidence shows about accuracy and consistency, and where the limits are.
Relevance Scoring
The most basic application of predictive scores is filtering signal from noise. It preempts any other measure by asking a primary question: *is this record worth analyzing at all?* Or rather, is the content of this record relevant to the universe of data I am studying?
Applied in this way the concept of predictive scoring is analogous to a survey screener. A political poll of likely voters will first confirm that a respondent is registered to vote before asking them who they intend to vote for in the upcoming election.
While the potential application is wide, a particularly strong use case for relevance scoring lies in social media analytics. A persistent and frustrating problem with this dataset remains positive keyword matches where mentions of a product or brand are irrelevant to any aspect of the customer experience. For example, a user on Reddit may post a comment that they ran into an old friend at a Taco Bell, while another user comments on their excitement about a new menu item. The former record contains no relevant or actionable information to the brand, but is extremely difficult to isolate using keywords alone.
Relevance scoring uses an LLM to assign a distribution of scores (e.g., 1-3, 0-10, 1-100, etc.) based on whether a record contains a meaningful signal for your analysis. It represents a simple, straightforward application of predictive scoring: not by measuring experience but rather separating the useful and actionable from the irrelevant.
The prompt behind the score
CUSTOMER EXPERIENCE RELEVANCE (0 - 10)
Score this post on a scale of 0 to 10 based on whether it describes a customer experience relevant to a market researcher seeking actionable insights, where 0 is not at all relevant and 10 is extremely relevant.
Applied examples:
PATIENT EXPERIENCE RELEVANCE (0 - 10)
Score 0–10 for relevance to the author's personal lived experience with diabetes.
10 = clearly firsthand and highly relevant
0 = not relevant at all
Override: if the post contains "RT," "QT," or @Poshmarkspp → assign 0.
DISNEY EXPERIENCE RELEVANCE (0 - 10)
Score 0–10 for relevance to a personal experience visiting a Disney resort location.
10 = clearly firsthand and highly relevant
0 = not relevant at all
High scores: firsthand, recent experience only.
Lower scores: links, booking posts, generic recommendations.
Exclude: visit-count commentary.
Both applications share a similar structure: define high scores, anchor endpoints, specify noise patterns to exclude. The diabetes prompt adds hard overrides for known spam. The Disney prompt identifies domain-specific topics the model needs as cues. Use relevance scoring on any source (social listening, review corpora, survey open-ends) where a significant fraction of records may not contain experiential signal.
Relevance scoring stands out as particularly valuable in instances where some measure of reasoning is required. For example, a prompt tasked with segmenting consumer conversations about Disney World vs. Disneyland can parse the meaning of a comment that mentions both:
“Disneyland is actually more affordable than Disney World.”
An LLM will correctly identify the above comment as being about Disneyland. In contrast, regardless of the complexity, boolean logic has no mechanism to distinguish between the subject and object of a sentence.
Finally, an advantage of relevance scoring is the ability to use the range as a proxy for confidence. A simple boolean operation (i.e., label this record true or false) can be used to determine relevance: does it match my criteria or not? Textual data is messy and different sources are messy in different ways. As a result, deploying a scalar range is a simple de facto confidence measure baked into the process. We favor this approach because it allows for more nuance and granular control over precision and recall.
BINARY ANNOTATION
True if this post contains customer feedback relevant to a customer experience. Otherwise False.
In the above execution there is a greater margin for error because the model is forced to make a binary decision when the underlying calculation is a probability distribution. Our experimentation shows that a scalar range (i.e., 1-10, 1-100, etc.) provides a simple method to capture the complexity of the data.
When to use it: Social listening, review analysis, LLM-based chatbots, or any data source where a significant portion of records may not contain analyzable signals. Relevance scoring reduces noise before other dimensions run, improving the quality of everything downstream.
How to customize: Define what "relevant" means for your analysis. List the noise patterns in your data source. Add hard overrides for known spam or repost patterns. Anchor the endpoints explicitly.
Predicted Rating
It may be counter intuitive but when it comes to prompt engineering sometimes less is more. When it comes to a universal score to unify different sources of unstructured data Predicted Rating provides the simplest answer.
The prompt behind the score
A generic 'Predicted Rating' prompt is the simplest execution of an experience score: a 1–10 rating, no configuration required.
PREDICTED RATING
An estimation of the success in handling the user's requests, rated on a scale of 1 to 10 (with 10 being the best).
The approach is deliberately general. No construct specified, no behavioral anchors, no assumptions about the data source. This allows it to work across support conversations, bot interactions, survey responses, sales calls, and reviews without tuning.
It measures a blended read of "how did this go?": resolution, tone, effort, communication quality mixed into one number. Cross-tab it by topic/category, agent, channel, or time period and patterns emerge fast. For example, low ratings (< 4) compared with engagement categories shows which topics produce the worst experiences. Average rating by agent shows employee performance variation. Median rating by channel shows where the experience breaks down. The score itself is the filter; the other dimensions provide the explanation.
Validation: Research by Dimension Labs, OpenAI and academic institutions shows that ratings show strong reproducibility. Running the same prompt against the same dataset multiple times finds score agreement that matches exactly more than 85% of records and falls within +/-1 in 99.9% of instances.
Limitation: Predicted Rating does not map to an established survey construct and cannot be easily compared with existing survey methodologies.
When to use it: Any dataset where you need a quick, universal quality signal before investing in construct-specific scoring. Particularly useful for initial exploration of a new data source — run Predicted Rating first to understand the distribution, then decide which records warrant deeper analysis.
How to customize: Add context relevant to your project. For example: "Override: if the record does not contain enough information to assign a predicted rating with high confidence, output null." Adjust the scale endpoints if your use case demands it, though 1–10 is the default for its granularity without requiring behavioral anchors.
Predicted Satisfaction
Predicted Satisfaction narrows the analytical task to a specific, validated construct: one that maps to the instruments the enterprise already uses to measure customer sentiment. This approach takes it to the next level.
Predicted Satisfaction measures whether a customer walked away from an interaction satisfied or dissatisfied. It assigns a score from -2 to +2 (Very Dissatisfied through Very Satisfied) based on the language and tone of the conversation itself, no survey required.¹
This is a more intuitive metric than Transactional NPS (next section). Where Transactional NPS asks: "did this experience create or erode brand advocacy?", Predicted Satisfaction asks a simpler question: "was this person happy with how this interaction went?" A customer can be satisfied with a support interaction (their issue was resolved) without that experience moving them any closer to recommending you. Both signals matter, but they measure different things.
A few things to keep in mind:
• Scores every interaction, not just surveyed ones. Because the score is inferred from the textual data rather than collected via survey, it provides coverage that extends across all data sources.
• Calculated at the session level. The model analyzes the full text of a conversation and predicts how the customer would rate their satisfaction. The overall Predicted Satisfaction score is then averaged across all sessions.
• A neutral score is a signal, not the absence of one. A score of 0 (neither satisfied nor dissatisfied) indicates a functional interaction with no meaningful signal—the customer got what they needed but nothing about the experience registered positively or negatively.
The prompt behind the score
Predicted Satisfaction maps to the standard CSAT instrument, which means it can be validated against actual survey data.
PREDICTED SATISFACTION (-2 to +2, 0)
An estimation of the satisfaction of the Incoming user rated on a scale of -2 to 2 where
-2 = Very Dissatisfied
-1 = Somewhat Dissatisfied
0 = Neither satisfied nor dissatisfied
+1 = Somewhat Satisfied
+2 = Very Satisfied
If there is not enough information to confidently assign a satisfaction rating, output 0.
PREDICTED SATISFACTION (1 to 5, null)
An estimation of the satisfaction of the Incoming user rated on a scale of 1 to 5 where
1 = Very Dissatisfied
2 = Somewhat Dissatisfied
3 = Neither satisfied nor dissatisfied
4 = Somewhat Satisfied
5 = Very Satisfied
If there is not enough information to confidently assign a rating, output null value.
The integer scale -2 to +2 matches the construct's ordinal nature: the model classifies discrete levels reliably, and downstream analytics require discrete buckets. Note that null conditions appear at this level which represents a rule for when the model should refuse to score.
When to use it: Any dataset where you need a score that maps to an established survey construct: integration with reporting that already tracks CSAT or filtering data according to what topics are driving negative customer experience. This is the level where predictive scoring becomes more specifically aligned with widely used and well understood applications of existing concepts.
How to customize: Adapt the behavioral anchors to your domain or establish business practices. A hospitality company's definition of "Very Satisfied" differs from a telecom provider's. Add organization-specific null conditions: for example, "Output null if the interaction is bot-only and does not involve a human agent." Pair with confidence scoring on any dataset where signal strength varies across records (more on confidence below).
The Case Against Sentiment Analysis
Traditional sentiment analysis answers a fundamentally different question than the one businesses are actually asking. Decision-makers want to know what customers are praising or criticizing about specific products, features, and touchpoints. What sentiment delivers is tone classification—a coarse judgment about whether a document reads positive, negative, or neutral overall. Vendors can legitimately claim high accuracy against that narrow task, but the resulting metric cannot tell a product team which feature is failing or a CX leader which journey step is generating friction.
Tone also cannot carry the context required for action. Consider:
"I love the stadium, but the concessions are awful" — one sentence, two opposing signals, flattened into a single label.
In healthcare, clinically "negative" language ("this drug eliminated my tumor") often marks a successful outcome.
In gaming, slang like "sick" reverses polarity entirely*.
Sentiment is the standard because it was the only approach that worked at scale. It was never designed to account for nuance, point of view (negative according to who?), or root causes for operations.
Large language models eliminate this tradeoff. Rather than collapsing feedback into three tonal buckets, LLMs attribute sentiment at the entity level, detect satisfaction even when implied, and apply the specific taxonomy and business rules that define how an organization thinks about its customers. For predictive scoring, this distinction is critical: models built on tone inherit tone's ambiguity, while models built on context-aware signals inherit the precision needed to tie textual feedback to the outcomes it is meant to predict.
* Sentiment cannot generally account for sarcasm, it can only be trained to recognize specific instances of its use.
Advanced Satisfaction Scoring
A telecommunications company sees CSAT decline 12% over a quarter on its fiber internet product line. The standard response: pull low-scoring transcripts, build coaching plans, retrain agents on empathy and de-escalation.
Advanced scoring that isolates different dimensions of satisfaction tells a different more detailed story. Agent satisfaction across fiber interactions has not moved—it remains above the company average. Process satisfaction has dipped slightly, driven by longer hold times as call volume on the product line has increased. Outcome satisfaction has dropped sharply—agents are unable to resolve the underlying issue because the problem is intermittent connectivity that requires an engineering fix, not a support fix. Product satisfaction has collapsed.
The root cause is a firmware update that introduced instability in a specific router model. The contact center has been absorbing the damage for weeks, with agents performing well under conditions they cannot control. The CSAT decline is real.
Advanced satisfaction patterns:
A single predicted satisfaction score tells you *whether* a customer was satisfied. For conversational especially data—support transcripts, sales calls, multi-turn email threads—the text contains enough signal to answer the more important question: *satisfied with what?*
The answer requires a “decomposed” version of satisfaction that splits measurement into more granular dimensions related to different types of satisfaction.
WHAT ONE CONVERSATION PRODUCES
A single twelve-turn support conversation produces six structured fields: overall satisfaction (+1), agent satisfaction (+2), outcome satisfaction (+1), customer effort (4), product satisfaction (−1), and a churn signal (False). A composite CSAT would have returned +1. The decomposition reveals three separate signals for three separate teams.
Decomposed satisfaction
A single CSAT score collapses an interaction into one number. Decomposed satisfaction uses predictive scoring to separate that interaction into distinct dimensions.
Consider a customer who contacts you about a defective product, is escalated to an agent who is empathetic and knowledgeable, but is ultimately told no replacement is possible because of a policy limitation. An aggregate CSAT captures none of what happened. The product failed. The agent performed well. The outcome was blocked. A single number cannot tell those three stories at once.
Predictive scoring can. Four independent prompts produce four independent signals.
Agent Satisfaction:
How skillfully did the agent handle the interaction? An agent who treats a customer with care and competence scores high here even when the underlying issue cannot be resolved.
AGENT SATISFACTION (1 to 5, null)
An estimation of the customer's satisfaction with the agent handling the interaction — the skill, empathy, and competence of the person (or chatbot) who helped them. Score the agent specifically, independent of whether the outcome was achieved or the process was smooth.
1 = Very Dissatisfied (agent was rude, unskilled, or caused distress)
2 = Somewhat Dissatisfied (agent was unhelpful or lacked appropriate care)
3 = Neither satisfied nor dissatisfied (agent handled adequately but unremarkably)
4 = Somewhat Satisfied (agent was competent and courteous)
5 = Very Satisfied (agent was notably skilled, empathetic, or caring)A customer whose issue was not resolved can still rate the agent high if the agent performed well.
If there is not enough information to confidently assign a rating, output null value.
Process Satisfaction:
How smooth was the path to the outcome—wait times, transfers, channel switching, repeated information, policy obstacles? The distinction from agent satisfaction matters. A skilled agent can deliver high interpersonal quality while the customer endures a terrible process: transferred three times, placed on hold twice, asked to repeat their account number at each handoff. Aggregate CSAT buries this. Decomposed scoring surfaces it.
PROCESS SATISFACTION (1 to 5, null)
An estimation of the customer's satisfaction with the path to the outcome — wait times, transfers, channel switching, repetition of information, and other procedural friction. Score the smoothness of the journey, not the agent's behavior or whether the issue was ultimately resolved.
1 = Very Dissatisfied (long waits, repeated transfers, multiple rounds of re-explaining)
2 = Somewhat Dissatisfied (noticeable friction — a long wait, a transfer, having to repeat information)
3 = Neither satisfied nor dissatisfied (process was unremarkable, neither smooth nor frustrating)
4 = Somewhat Satisfied (process was efficient with only minor friction)
5 = Very Satisfied (process was notably smooth and effortless end-to-end)A skilled agent can deliver high interpersonal quality while the customer endures a poor process. Score them independently.
If there is not enough information to confidently assign a rating, output null value.
Outcome Satisfaction:
Was the customer's issue resolved, and to what degree? This dimension isolates what aggregate CSAT most often confuses with the interaction as a whole. A flat refusal for policy reasons scores low on outcome and, separated from process and agent, points directly at the policy itself rather than the people executing it.
OUTCOME SATISFACTION (1 to 5, null)
An estimation of the customer's satisfaction with the result of the interaction — whether the issue was resolved or the goal achieved, and to what degree. Score the result, independent of how skillfully it was delivered or how smooth the process was.
1 = Very Dissatisfied (issue not resolved; request denied, blocked, or deferred indefinitely)
2 = Somewhat Dissatisfied (partial resolution or workaround that did not address the core need)
3 = Neither satisfied nor dissatisfied (outcome was neutral — informational answer, no resolution required)
4 = Somewhat Satisfied (issue resolved with minor caveats)
5 = Very Satisfied (issue fully resolved or goal exceeded)A customer can be dissatisfied with the outcome even when the agent performed well and the process was smooth.
If there is not enough information to confidently assign a rating, output null value.
Product or Service Satisfaction:
How did the customer feel about the thing underneath the interaction? A device malfunctions three weeks after purchase. The agent is excellent, the resolution is prompt, the process is smooth. But the product failed, and the customer's confidence in the brand is shaken. Everything else: high. Product satisfaction: low. This is the dimension product and engineering teams almost never see, because it arrives at the contact center buried inside an aggregate score that cannot separate a product failure from a service failure.
PRODUCT SATISFACTION (1 to 5, null)
An estimation of the customer's satisfaction with the underlying product or service — the thing they bought or use, which sits upstream of the interaction. Score the product, not the interaction that surrounded it (the agent, the process, or the resolution).
1 = Very Dissatisfied (product failed, was defective, caused harm, or fell substantially short of expectations)
2 = Somewhat Dissatisfied (product underperformed in noticeable ways)
3 = Neither satisfied nor dissatisfied (product is adequate; no strong signal either direction)
4 = Somewhat Satisfied (product meets or modestly exceeds expectations)
5 = Very Satisfied (product is exceptional; customer expresses clear confidence in or affection for it)A customer can rate the interaction high while rating the underlying product low, and vice versa.
If there is not enough information to confidently assign a rating, output null value.
Each dimension has a different owner. Agent satisfaction lives with training and coaching. Process satisfaction lives with operations. Outcome satisfaction lives with policy. Product satisfaction lives with product and engineering. A single CSAT tells all four the same ambiguous thing. Decomposed scoring tells each of them what they actually need to know.
Transactional NPS (tNPS)
Predicted Satisfaction and its decomposed variants measure how a customer felt about an interaction. Transactional NPS asks a different question entirely: did this interaction move the customer closer to (or further from) recommending you?
NPS is traditionally a relationship metric: it measures how a customer feels about your brand overall. Transactional NPS narrows that lens to the individual interaction level: how did *this specific experience* move the needle on that customer's likelihood to recommend you?
This is an important distinction. Predicted Satisfaction tells you whether a customer walked away from an interaction happy or unhappy. Transactional NPS is asking a different question—did this interaction create or erode brand advocacy? A customer can have a satisfactory interaction that still wouldn't move them to recommend you, and a single bad experience can turn a long-time promoter into a detractor.
A few things to keep in mind:
• A directional indicator, not a true NPS score. It's inferred from the interaction rather than collected via survey, so treat it as a signal for where to focus rather than a reportable number.
• Most useful for identifying risk. Interactions that score in the detractor range (0–6) are the ones most likely to damage overall brand perception; those are your priority for deeper investigation.
• Null values are intentional. Some interactions just don't carry enough signal to indicate brand impact one way or the other. Those are filtered out rather than forced into a score.
The prompt behind the score
TRANSACTIONAL NPS (0 - 10, mull)
Predict the Net Promoter Score the customer would give based on this customer interaction. Focus on whether this experience would make the customer more or less likely to recommend the brand — not just whether they were satisfied. Score 0-10 where:
0-6 = Detractor (frustrated, unresolved issues, broken commitments, would not recommend),
7-8 = Passive (satisfied but indifferent, no strong emotion in either direction, functional tone),
9-10 = Promoter (delighted, exceeded expectations, expressed enthusiasm, would actively recommend).
If there is insufficient signal to make a prediction with confidence, output null value.
The key difference from the original prompt is the explicit framing around brand advocacy rather than satisfaction, and the use of more specific behavioral anchors, particularly for passives, where research shows their language patterns actually align more closely with detractors than promoters despite the moderate score.¹⁶ ¹⁷
How to think about the scoring
The model isn't just picking a number: it's reading against specific behavioral anchors for each NPS bucket.
• Detractor-range scores (0–6) are driven by signals like complaint language, unresolved issues, and expressed frustration.
• Passive scores (7–8) reflect a neutral tone, the customer is satisfied but not enthusiastic, with no strong emotional signal in either direction.
• Promoter-range scores (9–10) indicate delight, exceeded expectations, and language that suggests the customer would actively recommend you.
This calibration is what separates it from a general sentiment score. It's specifically tuned to the question "would this person recommend us after this experience?" rather than just "was this person happy?"
Where it gets actionable
On its own, transactional NPS tells you the *what*—this interaction was likely brand-damaging or brand-building. The real value comes when you cross-tabulate it against other dimensions in the dataset. Layering in issue type, product, channel, agent, or resolution status lets you move from "we have a detractor problem" to "we have a detractor problem driven by billing disputes in chat that go unresolved after multiple contacts." That's the level of specificity that lets you take targeted operational action rather than reacting to an aggregate score.
It also gives you coverage that traditional survey-based NPS can't. Average survey response rates for CX measurement can run as low as 7%, and research shows that promoters are significantly more likely to respond than detractors: meaning the customers you most need to hear from are the ones least likely to tell you.
When to use it: Any interaction dataset where you need to measure brand advocacy impact at the touchpoint level — support conversations, sales interactions, onboarding sessions, renewal calls. Most valuable when paired with Predicted Satisfaction to distinguish between interactions that were satisfactory and interactions that built loyalty. Not appropriate for non-experiential text (social mentions, news articles) where the author has no direct interaction with the brand.
How to customize: Choose Version 1 (research-anchored) when precision matters and you need the model to distinguish subtle behavioral signals — particularly in the passive range. Choose Version 2 (streamlined) for speed and when the primary use case is detractor identification. Add domain-specific anchors: for hospitality, promoter signals might include references to specific staff members or intent to rebook; for SaaS, they might include references to specific features or willingness to refer colleagues.
Predicted Effort
Satisfaction tells you whether a customer was happy with the outcome. NPS tells you whether the experience built or eroded advocacy. Predicted Effort tells you what neither captures: how much friction the customer had to push through to get there.
It scores each conversation 1 to 5 (5 being highest effort), reading the language, dynamics, and context of the customer interaction itself.
A customer can walk away satisfied (their issue was resolved) and even rate the experience positively, yet still have endured real friction to get there: repeat contacts, transfers, re-explaining the issue, navigating confusing processes. That friction is one of the strongest predictors of future disloyalty. The foundational research by the Corporate Executive Board (CEB, now part of Gartner) covered over 75,000 customer interactions and found that 96% of high-effort customers become more disloyal, compared to just 6% of low-effort customers. Fred Reichheld, who created NPS, found that 60–80% of customers who eventually defect had reported being "satisfied" on their most recent survey.¹⁸ Effort helps explain why: it measures the cost of the experience, not just the outcome.
Three things worth knowing:
Available out of the box. Like Predicted Rating, the base Predicted Effort score needs no configuration and runs against any conversational dataset immediately. Unlike Predicted Rating, it's fully customizable: you can rewrite the prompt to target whatever kind of effort matters most to your business.
Inversely related to satisfaction, but not redundant. A high satisfaction score and a high effort score on the same interaction is a warning sign. The customer got what they needed, but the process hurt to get there, and that pain is what predicts whether they come back.
Scored at the session level. The model reads the full conversation and predicts how much effort the customer expended, on a 1 (minimal) to 5 (significant) scale. The overall Predicted Effort score is the average across sessions.
The prompt behind the score
PREDICTED EFFORT (1 - 5, null)
Prediction of score estimating difficulty level the Incoming user is having in accomplishing what they want on a scale of 1 - 5, with 5 being the most effort.
If there is not enough information to confidently assign a score, output null value.
The prompt can be customized to decompose effort into distinct, targeted dimensions. Each runs independently against the same interaction data.
COGNITIVE EFFORT
Estimate the level of cognitive effort the customer had to exert, rated on a scale of 1 to 5 where,
1 = Very low (clear, simple exchange),
5 = Very high (confusing instructions, complex multi-step processes, unclear next steps, customer expressed confusion or uncertainty).
Important: Focus specifically on whether the customer had to process complex information, navigate ambiguous options, or interpret unclear guidance — separate from wait times or emotional frustration.
If there is not enough information to confidently assign a rating, output null value.
The base prompt gives a general read on friction. The customized version lets you diagnose what kind and maps to a different fix. A high repeat-contact score is a first-contact resolution problem. A high cognitive score is a communication or process design problem. A high emotional score might be an empathy training opportunity, or it might point to a policy that agents can't do anything about, which is a different fix entirely.
How to think about the scoring
"Effort" is broader than it first appears. When someone says an interaction was high effort, that can mean very different things, and the kind of effort changes what you do about it.
Research from the Henley Centre for Customer Management formalized this by naming three kinds of customer energy: cognitive effort (processing complex information, understanding instructions), emotional effort (stress, anxiety, frustration from uncertainty or policy friction), and time effort (waiting, elapsed time to resolution). The three are interrelated but each calls for a different fix:
Cognitive effort calls for clearer communication and simpler processes.
Emotional effort calls for empathy and proactive reassurance.
Time effort calls for operational improvements.
Use effort as a general signal or decompose it to these parts as a diagnostic tool.
Where it gets actionable
The original CEB research identified seven primary drivers of disloyalty in service interactions: repeat contacts, channel switching, transfers, having to repeat information, robotic service, burdensome policies, and general friction. Each leaves detectable signals in conversation data, and each becomes visible when you cross-tabulate Predicted Effort against other variables.
Layering effort against issue type, channel, agent, or product takes you from "effort is high" to "effort is high in billing-related interactions where customers are being transferred at least once." That specificity turns a metric into an action plan.
Pairing with satisfaction it surfaces straightforward service failures when both score poorly on the same interaction (high satisfaction, high effort). When satisfaction is high but effort is also high, you've found a hidden risk. The customer is happy for now, but the friction they endured makes them vulnerable to defection.
Pairing effort with Transactional NPS adds another layer: high satisfaction + high effort is a hidden risk; low effort + passive NPS reveals interactions that went smoothly but not memorable enough to build advocacy.
When to use it: Any conversational dataset where customers might be doing work to get what they need: support interactions, onboarding calls, claims processing, technical troubleshooting. Start with the base score, then layer in custom prompts (cognitive, emotional, repeat contact) when you want to diagnose root causes.
How to customize: The base prompt needs no configuration. To decompose effort, run multiple targeted prompts in parallel against the same dataset, each isolating a different kind of friction. Add domain-specific signals: in healthcare, cognitive effort might include navigating insurance terminology; in financial services, it might include multi-step verification. Adjust the null conditions for your data source. Bot-only interactions and single-exchange informational queries should return null rather than a forced score.
