Mechanics of LLM Classification
The methodology behind using AI for data enrichment is premised on processing records individually, *not* on putting as much data as possible into the context window of an LLM. Even as context windows continue to grow we believe this approach will remain the standard as it provides traceability back to individual customer interactions and better quality resulting from a more singular focus.
Conceptualizing AI Classification:
Imagine a very common scenario in universities across the country: a graduate student is tasked with annotating the transcripts of 1-on-1 interviews as part of a research project. To ensure consistency across the project they are provided with a code book: a set of clear definitions and illustrative examples for labeling sentiment, topics, and other relevant aspects of the conversation.
An LLM uses prompts in the same way. A prompt function is used to execute a series of analytical tasks (dimensions of analysis) for each individual record. In the resulting table, each task or dimension exists as a column (e.g., sentiment, topic) and the rows contain the values assigned to each record (e.g., positive, account management).
The value of this approach is that newly added dimensions may be joined with existing metadata related to a record or customer, providing a critical foundation for executing causal analysis (e.g., what topics are driving positive sentiment and why).
Understanding Prompt Functions:
A key component of predictive scoring is the prompt function — a structured schema that defines the analytical tasks the model will execute on each record. A prompt function contains the scoring instructions, scale definitions, null conditions, and output format in a single reproducible object.
The following example illustrates how a single predictive scoring dimension (Predicted Customer Satisfaction) is defined, structured as a schema, and output alongside existing metadata. The sections that follow will walk through each scoring type in detail.
Prompt:
An estimation of the satisfaction of the incoming user with their support experience rated on a scale of -2 to +2 where:
-2 = Very Dissatisfied, -1 = Somewhat Dissatisfied,
0 = Neither satisfied nor dissatisfied,
+1 = Somewhat Satisfied, +2 = Very Satisfied.
Prompt Function:
{
"name": "extract_session_info",
"description": "An analysis of support conversations between a support agent and a customer.",
"context": "Incoming messages are from the customer, outgoing messages are from the support agent.",
"parameters": {
"type": "object",
"properties": {
"predicted_csat": {
"description": "An estimation of the satisfaction of the incoming user with their support experience rated on a scale of -2 to +2 where -2=Very Dissatisfied, -1=Somewhat Dissatisfied, 0=Neither satisfied nor dissatisfied, +1=Somewhat Satisfied, +2=Very Satisfied. If there is not enough information to assign a customer satisfaction score with confidence, output null value.",
"type": "integer"
}
}
}
}
Prompt Function Output
session id | predicted csat | session duration | customer segment | seat location |
*meta data* | *predicted* | *meta data* | *meta data* | *meta data* |
1819288663 | +1 | 43 | Super Fan | S247 |
1646621075 | -2 | 37 | Season Ticket Holder | S312 |
1654871980 | -1 | 71 | Season Ticket Holder | S318 |
1748886903 | +2 | 49 | Season Ticket Holder | S319 |
1356884637 | -1 | 67 | Season Ticket Holder | S184 |
1552158441 | +1 | 79 | Family | S315 |
1632695651 | 0 | 46 | Season Ticket Holder | S312 |
1345919541 | -1 | 27 | Casual Fan | S392 |
1920588783 | +1 | 61 | Casual Fan | S192 |
Predictive scoring is not a single technique: it is a family of scoring patterns, each designed for a different analytical purpose. The sections that follow move from the simplest application (filtering noise) through progressively more specific constructs, ending with decomposed scoring for conversation-rich data. Each builds on the one before it.
