Interpreting Results

Understanding moderation results is crucial for optimizing your policies, making data-driven improvements, and building effective content workflows. This section explains how to read, analyze, and act on the detailed feedback provided by DeepMod.

Understanding Result Structure

Every moderation result provides a comprehensive breakdown of how your content was evaluated, giving you complete visibility into the decision-making process.

Overall Result Components

Primary Outcome: Every result includes one of three possible decisions:

  • Success: Content passed all rules within confidence thresholds

  • Failure: Content violated one or more rules with sufficient confidence

  • Ambiguous: Potential violations detected but confidence below threshold

Average Confidence: The overall confidence score for the entire policy evaluation, calculated by aggregating confidence scores from all rule groups.

Hierarchical Breakdown

Results follow a hierarchical structure that mirrors your policy organization. This structure allows you to understand not just what happened, but exactly where in your policy the decision was made.

→ Policy Level: The top-level result with overall outcome and summary

→ Rule Group Level: Results for each rule group (Safety, Legal, Brand, etc.)

→ Individual Rule Level: Detailed results for each rule within the group

Result Structure Hierarchy

Detailed Result Example

Here's what a complete moderation result looks like:

{
  "policy": "community-guidelines",
  "result": "failure",
  "averageConfidence": 0.87,
  "ruleGroupResults": [
    {
      "name": "Safety",
      "result": "failure",
      "averageConfidence": 0.91,
      "ruleResults": [
        {
          "condition": "must not include threatening language",
          "result": "failure",
          "averageConfidence": 0.91,
          "matchedContent": [
            {
              "content": "threatening phrase",
              "confidence": 0.91
            }
          ]
        }
      ]
    },
    {
      "name": "Brand",
      "result": "success",
      "averageConfidence": 0.83,
      "ruleResults": [...]
    }
  ]
}

Confidence Scores Explained

Confidence scores are fundamental to understanding and tuning your moderation system. They represent how certain the AI is about its decisions, regardless of whether the outcome is Success, Failure, or Ambiguous.

What Confidence Represents

Range: Confidence scores are normalized to a range between 0.01 and 0.99 (1% to 99%)

Meaning: Higher scores indicate greater certainty about the decision being made

  • 0.90+ (90%+): Very high confidence - AI is very certain about its decision

  • 0.70-0.89 (70-89%): High confidence - AI is confident about its decision

  • 0.50-0.69 (50-69%): Medium confidence - AI has moderate certainty

  • 0.30-0.49 (30-49%): Low confidence - AI has low certainty

  • Below 0.30 (30%): Very low confidence - AI is uncertain about its decision

Confidence applies equally to all outcomes.

  • High confidence Success → AI is very certain the content passes all rules

  • High confidence Failure → AI is very certain the content violates rules

  • Low confidence (any result) → AI is uncertain, leading to Ambiguous if below threshold

Confidence Hierarchy

Rule-Level Confidence: Each individual rule evaluation receives a confidence score representing how certain the AI is about whether the content violates or complies with that specific rule.

Group-Level Confidence: Calculated by averaging the confidence scores of all rules within that group, weighted by their individual results.

Overall Confidence: The final confidence score representing the certainty of the entire policy decision, calculated by aggregating all group-level confidences.

Using Confidence Thresholds

Your policy's confidence threshold acts as the decision boundary:

Above Threshold: Clear decision (Success or Failure) - content proceeds with automated handling

Below Threshold: Ambiguous result - requires human review (if enabled) or defaults to Success (if human review disabled)

Threshold Strategy:

  • Conservative (80-90%): Fewer false positives, more human review

  • Balanced (70-80%): Good balance of automation and accuracy

  • Aggressive (60-70%): More automation, higher false positive risk

Effective interpretation goes beyond individual results to identify patterns across your content moderation.

Volume Analysis: Track the distribution of Success, Failure, and Ambiguous results over time to understand overall content quality trends.

Confidence Distribution: Monitor how confidence scores cluster to identify opportunities for threshold adjustment.

Rule Performance: Analyze which rules are most frequently triggered to identify common content issues or overly broad rules.

Source Segmentation: Use metadata and tags to compare moderation patterns across different content sources, user types, or geographic regions.

False Positive Rate: Content incorrectly flagged as violations

  • Track by monitoring Success results that users appeal or complain about

  • Indicates rules may be too broad or thresholds too low

False Negative Rate: Violations missed by automated moderation

  • Monitor through human review overturn rates

  • Suggests rules may be too narrow or thresholds too high

Ambiguous Rate: Percentage of content requiring human review

  • High rates may indicate poorly tuned thresholds

  • Very low rates might suggest missed edge cases

Confidence Distribution: How confidence scores cluster around your threshold

  • Many scores just below threshold suggest threshold adjustment opportunities

  • Bimodal distributions indicate clear separation between good and bad content

Practical Optimization Strategies

Threshold Tuning

Identify Threshold Opportunities:

  • Review content with confidence scores within 10% of your threshold

  • Look for patterns in false positives and false negatives

  • Consider separate thresholds for different rule groups

Tuning Approach:

  1. Collect 50-100 recent results across different content types

  2. Categorize outcomes as correct or incorrect

  3. Plot confidence scores against correctness

  4. Identify optimal threshold that minimizes total errors

Rule Refinement

Overactive Rules: Rules that frequently trigger with low accuracy

  • Symptoms: High trigger rate, many false positives, user complaints

  • Solutions: Add exceptions, increase specificity, adjust rule wording

  • Example: "No negative language" → "No personal attacks or threats"

Inactive Rules: Rules that rarely trigger but miss obvious violations

  • Symptoms: Low trigger rate, violations slip through, user reports

  • Solutions: Broaden scope, add variations, decrease specificity

  • Example: "No spam" → "No repetitive promotional content or excessive links"

Conflicting Rules: Rules that create ambiguous or contradictory results

  • Symptoms: High ambiguous rate for specific content types

  • Solutions: Clarify rule boundaries, add priority ordering, merge related rules

Content-Specific Adjustments

Content Type Patterns: Different content types may need different handling

  • Product reviews vs. forum discussions vs. customer support

  • Adjust thresholds or rules based on content context

  • Use metadata to track performance by content type

User Behavior Analysis: User patterns can inform policy adjustments

  • New users vs. established community members

  • Content creator vs. consumer patterns

  • Trust level or reputation score integration

Quality Tuning Playbook

Follow this guide to optimize your moderation performance:

Step 1: Data Collection

  1. Sample Recent Results: Collect 50-100 recent moderation runs across different content types and sources

  2. Categorize Outcomes: Mark each result as correct/incorrect based on manual review

  3. Document Context: Note content type, source, user context, and any special circumstances

Step 2: Pattern Analysis

  1. Calculate Error Rates: Determine false positive and false negative rates overall and by rule group

  2. Confidence Analysis: Plot confidence scores against correctness to identify threshold issues

  3. Rule Performance: Identify rules with high error rates or unusual triggering patterns

Step 3: Targeted Improvements

  1. Rule Adjustments: Modify or replace rules with high error rates

  2. Threshold Tuning: Adjust confidence thresholds based on confidence distribution analysis

  3. Policy Structure: Consider splitting or merging rule groups based on performance patterns

Step 4: Validation Testing

  1. Re-test Samples: Run the same content samples through updated policies

  2. Measure Improvement: Compare error rates before and after changes

  3. Monitor Production: Watch for improvement in live moderation metrics

Step 5: Ongoing Monitoring

  1. Trend Analysis: Look for changes in content patterns or policy performance

  2. Seasonal Adjustments: Account for seasonal content variations or platform changes

  3. Policy Evolution: Update policies as community standards and business needs evolve

Advanced Result Analysis

Matched Content Analysis

The matchedContent array in rule results shows exactly what text triggered each rule:

"matchedContent": [
  {
    "content": "specific triggering phrase",
    "confidence": 0.87
  }
]

Use Cases:

  • False Positive Investigation: Understand why benign content was flagged

  • Rule Improvement: Identify common patterns that need exception handling

  • Training Data: Collect examples for policy refinement discussions

Multi-Rule Interactions

Analyze how multiple rules interact within rule groups:

  • Reinforcing Rules: Multiple rules flagging the same content increases overall confidence

  • Conflicting Signals: Some rules triggering while others don't can indicate edge cases

Confidence Clustering

Look for patterns in confidence score distributions:

  • High Confidence Clusters (85%+): Clear-cut cases that can guide rule optimization

  • Medium Confidence Clusters (60-75%): Potential threshold adjustment opportunities

  • Low Confidence Clusters (Below 50%): May indicate irrelevant rules or content

Integration with Business Logic

Risk-Based Processing

function processBasedOnRisk(result) {
  const { result: outcome, averageConfidence, ruleGroupResults } = result;

  // High-risk violations (safety, legal)
  const highRiskViolations = ruleGroupResults.filter(
    (group) => ['Safety', 'Legal'].includes(group.name) && group.result === 'failure'
  );

  if (highRiskViolations.length > 0) {
    return 'immediate_block';
  }

  // Medium-risk with high confidence
  if (outcome === 'failure' && averageConfidence > 0.8) {
    return 'block_with_appeal';
  }

  // Low confidence failures
  if (outcome === 'failure' && averageConfidence < 0.6) {
    return 'flag_for_review';
  }

  return 'approve';
}
def process_based_on_risk(result):
    outcome = result['result']
    average_confidence = result['averageConfidence']
    rule_group_results = result['ruleGroupResults']

    # High-risk violations (safety, legal)
    high_risk_violations = [
        group for group in rule_group_results
        if group['name'] in ['Safety', 'Legal'] and group['result'] == 'failure'
    ]

    if high_risk_violations:
        return 'immediate_block'

    # Medium-risk with high confidence
    if outcome == 'failure' and average_confidence > 0.8:
        return 'block_with_appeal'

    # Low confidence failures
    if outcome == 'failure' and average_confidence < 0.6:
        return 'flag_for_review'

    return 'approve'
<?php
function processBasedOnRisk($result) {
    $outcome = $result['result'];
    $averageConfidence = $result['averageConfidence'];
    $ruleGroupResults = $result['ruleGroupResults'];

    // High-risk violations (safety, legal)
    $highRiskViolations = array_filter($ruleGroupResults, function($group) {
        return in_array($group['name'], ['Safety', 'Legal']) &&
               $group['result'] === 'failure';
    });

    if (count($highRiskViolations) > 0) {
        return 'immediate_block';
    }

    // Medium-risk with high confidence
    if ($outcome === 'failure' && $averageConfidence > 0.8) {
        return 'block_with_appeal';
    }

    // Low confidence failures
    if ($outcome === 'failure' && $averageConfidence < 0.6) {
        return 'flag_for_review';
    }

    return 'approve';
}
?>
def process_based_on_risk(result)
  outcome = result['result']
  average_confidence = result['averageConfidence']
  rule_group_results = result['ruleGroupResults']

  # High-risk violations (safety, legal)
  high_risk_violations = rule_group_results.select do |group|
    ['Safety', 'Legal'].include?(group['name']) && group['result'] == 'failure'
  end

  if high_risk_violations.any?
    return 'immediate_block'
  end

  # Medium-risk with high confidence
  if outcome == 'failure' && average_confidence > 0.8
    return 'block_with_appeal'
  end

  # Low confidence failures
  if outcome == 'failure' && average_confidence < 0.6
    return 'flag_for_review'
  end

  'approve'
end

Dynamic Threshold Adjustment

function getAdjustedThreshold(contentMetadata, baseThreshold) {
  let threshold = baseThreshold;

  // Adjust for content sensitivity
  if (contentMetadata.isPublic) threshold += 0.1;
  if (contentMetadata.hasMinors) threshold += 0.15;

  // Adjust for user reputation
  if (contentMetadata.userTrustScore > 0.8) threshold -= 0.05;
  if (contentMetadata.userTrustScore < 0.3) threshold += 0.1;

  return Math.min(0.95, Math.max(0.5, threshold));
}
def get_adjusted_threshold(content_metadata, base_threshold):
    threshold = base_threshold

    # Adjust for content sensitivity
    if content_metadata.get('isPublic'):
        threshold += 0.1
    if content_metadata.get('hasMinors'):
        threshold += 0.15

    # Adjust for user reputation
    user_trust_score = content_metadata.get('userTrustScore', 0.5)
    if user_trust_score > 0.8:
        threshold -= 0.05
    if user_trust_score < 0.3:
        threshold += 0.1

    return min(0.95, max(0.5, threshold))
<?php
function getAdjustedThreshold($contentMetadata, $baseThreshold) {
    $threshold = $baseThreshold;

    // Adjust for content sensitivity
    if (!empty($contentMetadata['isPublic'])) {
        $threshold += 0.1;
    }
    if (!empty($contentMetadata['hasMinors'])) {
        $threshold += 0.15;
    }

    // Adjust for user reputation
    $userTrustScore = $contentMetadata['userTrustScore'] ?? 0.5;
    if ($userTrustScore > 0.8) {
        $threshold -= 0.05;
    }
    if ($userTrustScore < 0.3) {
        $threshold += 0.1;
    }

    return min(0.95, max(0.5, $threshold));
}
?>
def get_adjusted_threshold(content_metadata, base_threshold)
  threshold = base_threshold

  # Adjust for content sensitivity
  threshold += 0.1 if content_metadata['isPublic']
  threshold += 0.15 if content_metadata['hasMinors']

  # Adjust for user reputation
  user_trust_score = content_metadata['userTrustScore'] || 0.5
  threshold -= 0.05 if user_trust_score > 0.8
  threshold += 0.1 if user_trust_score < 0.3

  [0.95, [0.5, threshold].max].min
end

What's Next

With solid result interpretation skills, explore these advanced topics:

Effective result interpretation transforms raw moderation data into actionable insights that continuously improve your content governance and user experience.

Need help interpreting specific results? Contact our support team with examples of confusing or unexpected outcomes for personalized guidance on optimization strategies.