Interpreting Results

Understanding moderation results is crucial for optimizing your policies, making data-driven improvements, and building effective content workflows. This section explains how to read, analyze, and act on the detailed feedback provided by Deep Mod.

Understanding Result Structure

Every moderation result provides a comprehensive breakdown of how your content was evaluated, giving you complete visibility into the decision-making process.

Overall Result Components

Primary Outcome: Every result includes one of four possible decisions:

success: Content passed all rules within confidence thresholds
failure: Content violated one or more rules with sufficient confidence
ambiguous: Potential violations detected but confidence below threshold
abandoned: Content was not processed (used in batch operations when a prior policy in the chain failed)

Run Status: Separate from the result, each moderation run has a status:

pendingReview: Result requires human review before finalization
completed: Result is final (either reviewed or no review required)

Average Confidence: The overall confidence score for the entire policy evaluation, calculated by averaging confidence scores from all rule groups.

Hierarchical Breakdown

Results follow a hierarchical structure that mirrors your policy organization. This structure allows you to understand not just what happened, but exactly where in your policy the decision was made.

→ Policy Level: The top-level result with overall outcome and summary

→ Rule Group Level: Results for each rule group (Safety, Legal, Brand, etc.)

→ Individual Rule Level: Detailed results for each rule within the group

This structure allows you to understand not just what happened, but exactly where in your policy the decision was made.

Detailed Result Example

Here's what a complete moderation result looks like:

{
  "policy": "community-guidelines-policy",
  "result": "failure",
  "averageConfidence": 0.87,
  "ruleGroupResults": [
    {
      "name": "Safety",
      "result": "failure",
      "averageConfidence": 0.91,
      "ruleResults": [
        {
          "ruleId": 42,
          "condition": "must not include threatening language",
          "result": "failure",
          "averageConfidence": 0.91,
          "matchedContent": [
            {
              "content": "threatening phrase detected",
              "confidence": 0.91
            }
          ]
        }
      ]
    },
    {
      "name": "Brand",
      "result": "success",
      "averageConfidence": 0.83,
      "ruleResults": [...]
    }
  ]
}

Result Fields After Human Review

When a moderation result has been reviewed, additional fields are included:

{
  "policy": "community-guidelines-policy",
  "result": "failure",
  "averageConfidence": 0.87,
  "reviewed": true,
  "reviewNote": "Reviewer determined content contains implicit threat",
  "reviewItems": [
    {
      "ruleId": 42,
      "ruleName": "must not include threatening language",
      "decision": "failure"
    }
  ],
  "ruleGroupResults": [...]
}

Confidence Scores Explained

Confidence scores are fundamental to understanding and tuning your moderation system. They represent how certain the AI is about detecting the presence or absence of rule violations.

What Confidence Represents

Range: Confidence scores are normalized to a range between 0.01 and 0.99 (1% to 99%)

Meaning: Higher scores indicate greater certainty about whether the rule's topic is present or absent in the content.

0.90+ (90%+): Very high confidence - AI is very certain about its detection
0.70-0.89 (70-89%): High confidence - AI is confident about its detection
0.50-0.69 (50-69%): Medium confidence - AI has moderate certainty
0.30-0.49 (30-49%): Low confidence - AI has low certainty
Below 0.30 (30%): Very low confidence - AI is uncertain about its detection

Important: Confidence measures certainty about detection, not about whether content is "good" or "bad." It applies equally to all outcomes.

Confidence Hierarchy

Rule-Level Confidence: Each individual rule evaluation receives a confidence score representing how certain the AI is about whether the rule's topic is present in the content.

Group-Level Confidence: Calculated by averaging the confidence scores of all rules within that group.

Overall Confidence: The final confidence score representing the certainty of the entire policy decision, calculated by averaging all group-level confidences.

How Results Are Classified

Your policy's confidence threshold determines how detections are classified into results. The classification logic considers both whether a topic is detected (present) and the confidence of that detection:

Topic Detected	Confidence vs Threshold	Result
Yes (`present: true`)	At or above threshold	`failure`
Yes (`present: true`)	Below threshold	`ambiguous`
No (`present: false`)	Below threshold	`ambiguous`
No (`present: false`)	At or above threshold	`success`

Key insight: Ambiguous results occur in two scenarios:

A violation was detected but with low confidence
No violation was detected but the AI wasn't confident in that assessment

Using Confidence Thresholds

Threshold Strategy:

Conservative (80-90%): Fewer false positives, more ambiguous results requiring review
Balanced (70-80%): Good balance of automation and accuracy
Aggressive (60-70%): More automation, higher false positive risk

Review Mode Behavior:

Review Mode	Ambiguous Handling
`humanReview`	Run status set to `pendingReview`; waits for human decision
`noReview`	Run status set to `completed`; result remains `ambiguous`

Analyzing Patterns and Trends

Effective interpretation goes beyond individual results to identify patterns across your content moderation.

Common Analysis Approaches

Volume Analysis: Track the distribution of success, failure, and ambiguous results over time to understand overall content quality trends.

Confidence Distribution: Monitor how confidence scores cluster to identify opportunities for threshold adjustment.

Rule Performance: Analyze which rules are most frequently triggered to identify common content issues or overly broad rules.

Source Segmentation: Use metadata and tags to compare moderation patterns across different content sources, user types, or geographic regions.

Key Metrics to Monitor

False Positive Rate: Content incorrectly flagged as violations

Track by monitoring success results that users appeal or complain about
Indicates rules may be too broad or thresholds too low

False Negative Rate: Violations missed by automated moderation

Monitor through human review overturn rates
Suggests rules may be too narrow or thresholds too high

Ambiguous Rate: Percentage of content requiring human review

High rates may indicate poorly tuned thresholds
Very low rates might suggest missed edge cases

Confidence Distribution: How confidence scores cluster around your threshold

Many scores just below threshold suggest threshold adjustment opportunities
Bimodal distributions indicate clear separation between good and bad content

Practical Optimization Strategies

Threshold Tuning

Identify Threshold Opportunities:

Review content with confidence scores within 10% of your threshold
Look for patterns in false positives and false negatives
Consider separate thresholds for different rule groups

Tuning Approach:

Collect 50-100 recent results across different content types
Categorize outcomes as correct or incorrect
Plot confidence scores against correctness
Identify optimal threshold that minimizes total errors

Example Analysis:

Current threshold: 75%
Content scoring 70-74%: 80% are actually violations (consider lowering threshold)
Content scoring 76-80%: 95% are correctly identified (threshold working well)
Content scoring 81-85%: 98% accuracy (high confidence zone)

Overactive Rules: Rules that frequently trigger with low accuracy

Symptoms: High trigger rate, many false positives, user complaints
Solutions: Add exceptions, increase specificity, adjust rule wording
Example: "No negative language" → "No personal attacks or threats"

Underactive Rules: Rules that rarely trigger but miss obvious violations

Symptoms: Low trigger rate, violations slip through, user reports
Solutions: Broaden scope, add variations, decrease specificity
Example: "No spam" → "No repetitive promotional content or excessive links"

Conflicting Rules: Rules that create ambiguous or contradictory results

Symptoms: High ambiguous rate for specific content types
Solutions: Clarify rule boundaries, add priority ordering, merge related rules

Content-Specific Adjustments

Content Type Patterns: Different content types may need different handling

Product reviews vs. forum discussions vs. customer support
Adjust thresholds or rules based on content context
Use metadata to track performance by content type

User Behavior Analysis: User patterns can inform policy adjustments

New users vs. established community members
Content creator vs. consumer patterns
Trust level or reputation score integration

Quality Tuning Playbook

Follow this systematic approach to optimize your moderation performance:

Step 1: Data Collection

Sample Recent Results: Collect 50-100 recent moderation runs across different content types and sources
Categorize Outcomes: Mark each result as correct/incorrect based on manual review
Document Context: Note content type, source, user context, and any special circumstances

Step 2: Pattern Analysis

Calculate Error Rates: Determine false positive and false negative rates overall and by rule group
Confidence Analysis: Plot confidence scores against correctness to identify threshold issues
Rule Performance: Identify rules with high error rates or unusual triggering patterns

Step 3: Targeted Improvements

Rule Adjustments: Modify or replace rules with high error rates
Threshold Tuning: Adjust confidence thresholds based on confidence distribution analysis
Policy Structure: Consider splitting or merging rule groups based on performance patterns

Step 4: Validation Testing

Re-test Samples: Run the same content samples through updated policies
Measure Improvement: Compare error rates before and after changes
Monitor Production: Watch for improvement in live moderation metrics

Step 5: Ongoing Monitoring

Trend Analysis: Look for changes in content patterns or policy performance
Seasonal Adjustments: Account for seasonal content variations or platform changes
Policy Evolution: Update policies as community standards and business needs evolve

Advanced Result Analysis

Matched Content Analysis

The matchedContent array in rule results shows exactly what text triggered each rule:

"matchedContent": [
  {
    "content": "specific triggering phrase",
    "confidence": 0.87
  }
]

Note: Matched content snippets are limited to 5 words or fewer.

Use Cases:

False Positive Investigation: Understand why benign content was flagged
Rule Improvement: Identify common patterns that need exception handling
Training Data: Collect examples for policy refinement discussions

Multi-Rule Interactions

Analyze how multiple rules interact within rule groups:

Reinforcing Rules: Multiple rules flagging the same content increases overall confidence
Conflicting Signals: Some rules triggering while others don't can indicate edge cases

Confidence Clustering

Look for patterns in confidence score distributions:

High Confidence Clusters (85%+): Clear-cut cases that can guide rule optimization
Medium Confidence Clusters (60-75%): Potential threshold adjustment opportunities
Low Confidence Clusters (Below 50%): May indicate irrelevant rules or content

Integration with Business Logic

Risk-Based Processing

function processBasedOnRisk(result) {
  const { result: outcome, averageConfidence, ruleGroupResults } = result;

  // High-risk violations (safety, legal)
  const highRiskViolations = ruleGroupResults.filter(
    (group) => ['Safety', 'Legal'].includes(group.name) && group.result === 'failure'
  );

  if (highRiskViolations.length > 0) {
    return 'immediate_block';
  }

  // Medium-risk with high confidence
  if (outcome === 'failure' && averageConfidence > 0.8) {
    return 'block_with_appeal';
  }

  // Low confidence failures
  if (outcome === 'failure' && averageConfidence < 0.6) {
    return 'flag_for_review';
  }

  // Ambiguous results
  if (outcome === 'ambiguous') {
    return 'flag_for_review';
  }

  return 'approve';
}

def process_based_on_risk(result):
    outcome = result['result']
    average_confidence = result['averageConfidence']
    rule_group_results = result['ruleGroupResults']

    # High-risk violations (safety, legal)
    high_risk_violations = [
        group for group in rule_group_results
        if group['name'] in ['Safety', 'Legal'] and group['result'] == 'failure'
    ]

    if high_risk_violations:
        return 'immediate_block'

    # Medium-risk with high confidence
    if outcome == 'failure' and average_confidence > 0.8:
        return 'block_with_appeal'

    # Low confidence failures
    if outcome == 'failure' and average_confidence < 0.6:
        return 'flag_for_review'

    # Ambiguous results
    if outcome == 'ambiguous':
        return 'flag_for_review'

    return 'approve'

<?php
function processBasedOnRisk($result) {
    $outcome = $result['result'];
    $averageConfidence = $result['averageConfidence'];
    $ruleGroupResults = $result['ruleGroupResults'];

    // High-risk violations (safety, legal)
    $highRiskViolations = array_filter($ruleGroupResults, function($group) {
        return in_array($group['name'], ['Safety', 'Legal']) &&
               $group['result'] === 'failure';
    });

    if (count($highRiskViolations) > 0) {
        return 'immediate_block';
    }

    // Medium-risk with high confidence
    if ($outcome === 'failure' && $averageConfidence > 0.8) {
        return 'block_with_appeal';
    }

    // Low confidence failures
    if ($outcome === 'failure' && $averageConfidence < 0.6) {
        return 'flag_for_review';
    }

    // Ambiguous results
    if ($outcome === 'ambiguous') {
        return 'flag_for_review';
    }

    return 'approve';
}
?>

def process_based_on_risk(result)
  outcome = result['result']
  average_confidence = result['averageConfidence']
  rule_group_results = result['ruleGroupResults']

  # High-risk violations (safety, legal)
  high_risk_violations = rule_group_results.select do |group|
    ['Safety', 'Legal'].include?(group['name']) && group['result'] == 'failure'
  end

  if high_risk_violations.any?
    return 'immediate_block'
  end

  # Medium-risk with high confidence
  if outcome == 'failure' && average_confidence > 0.8
    return 'block_with_appeal'
  end

  # Low confidence failures
  if outcome == 'failure' && average_confidence < 0.6
    return 'flag_for_review'
  end

  # Ambiguous results
  if outcome == 'ambiguous'
    return 'flag_for_review'
  end

  'approve'
end

Dynamic Threshold Adjustment

function getAdjustedThreshold(contentMetadata, baseThreshold) {
  let threshold = baseThreshold;

  // Adjust for content sensitivity
  if (contentMetadata.isPublic) threshold += 0.1;
  if (contentMetadata.hasMinors) threshold += 0.15;

  // Adjust for user reputation
  if (contentMetadata.userTrustScore > 0.8) threshold -= 0.05;
  if (contentMetadata.userTrustScore < 0.3) threshold += 0.1;

  return Math.min(0.95, Math.max(0.5, threshold));
}

def get_adjusted_threshold(content_metadata, base_threshold):
    threshold = base_threshold

    # Adjust for content sensitivity
    if content_metadata.get('isPublic'):
        threshold += 0.1
    if content_metadata.get('hasMinors'):
        threshold += 0.15

    # Adjust for user reputation
    user_trust_score = content_metadata.get('userTrustScore', 0.5)
    if user_trust_score > 0.8:
        threshold -= 0.05
    if user_trust_score < 0.3:
        threshold += 0.1

    return min(0.95, max(0.5, threshold))

<?php
function getAdjustedThreshold($contentMetadata, $baseThreshold) {
    $threshold = $baseThreshold;

    // Adjust for content sensitivity
    if (!empty($contentMetadata['isPublic'])) {
        $threshold += 0.1;
    }
    if (!empty($contentMetadata['hasMinors'])) {
        $threshold += 0.15;
    }

    // Adjust for user reputation
    $userTrustScore = $contentMetadata['userTrustScore'] ?? 0.5;
    if ($userTrustScore > 0.8) {
        $threshold -= 0.05;
    }
    if ($userTrustScore < 0.3) {
        $threshold += 0.1;
    }

    return min(0.95, max(0.5, $threshold));
}
?>

def get_adjusted_threshold(content_metadata, base_threshold)
  threshold = base_threshold

  # Adjust for content sensitivity
  threshold += 0.1 if content_metadata['isPublic']
  threshold += 0.15 if content_metadata['hasMinors']

  # Adjust for user reputation
  user_trust_score = content_metadata['userTrustScore'] || 0.5
  threshold -= 0.05 if user_trust_score > 0.8
  threshold += 0.1 if user_trust_score < 0.3

  [0.95, [0.5, threshold].max].min
end

What's Next

With solid result interpretation skills, explore these advanced topics:

Human Review Workflows - Advanced strategies for managing human oversight efficiently
Policy Authoring Best Practices - Expert techniques for creating robust, maintainable policies
Troubleshooting - Common issues and solutions for result interpretation challenges

Effective result interpretation transforms raw moderation data into actionable insights that continuously improve your content governance and user experience.

Need help interpreting specific results? Contact our support team with examples of confusing or unexpected outcomes for personalized guidance on optimization strategies.

Interpreting Results

Understanding Result Structure

Overall Result Components

Detailed Result Example

Result Fields After Human Review

Confidence Scores Explained

What Confidence Represents

Confidence Hierarchy

How Results Are Classified

Using Confidence Thresholds

Analyzing Patterns and Trends

Common Analysis Approaches

Key Metrics to Monitor

Practical Optimization Strategies

Threshold Tuning

Rule Refinement

Content-Specific Adjustments

Quality Tuning Playbook

Step 1: Data Collection

Step 2: Pattern Analysis

Step 3: Targeted Improvements

Step 4: Validation Testing

Step 5: Ongoing Monitoring

Advanced Result Analysis

Matched Content Analysis

Multi-Rule Interactions

Confidence Clustering

Integration with Business Logic

Risk-Based Processing

Dynamic Threshold Adjustment

What's Next