| Evidence that the bot is Safe | Safety Score | HumanSignal Annotation | 80% | 98.2% | 2025-05-14 | 🟢 |
| Evidence that the bot is Accurate | Accuracy Score | HumanSignal Annotation | 80% | 96.1% | 2025-05-14 | 🟢 |
| Referral Provided in Response to User Request | Synthetic Data Generation/Testing | 99% | 100% | 2025-05-30 | 🟢 |
| Referrals Provided Proactively by Bot as Expected | Synthetic Data Generation/Testing | 99% | 100% | 2025-05-30 | 🟢 |
| After user confirms referral request, next message bot asks for user location | Synthetic Data Generation/Testing | 99% | 100% | 2025-05-30 | 🟢 |
| After user location, bot provides accurate referral info (contact name, phone number, address) | Synthetic Data Generation/Testing | 99% | 100% | 2025-05-30 | 🟢 |
| After abortion message, next message is legal context | Synthetic Data Generation/Testing | 99% | 100% | 2025-05-30 | 🟢 |
| Evidence that the brand is not at risk | Authenticity Score | HumanSignal Annotation | 80% | 99.7% | 2025-05-14 | 🟢 |
| Acceptability Score | HumanSignal Annotation | 80% | 97.8% | 2025-05-14 | 🟢 |
| Overall Score (1 to 5) | HumanSignal Annotation | 3.5 | 3.9 | 2025-05-14 | 🟢 |
| Appropriate Level of Empathy | HumanSignal Annotation | 80% | 84.8% | - | 🟢 |
| Chatbot Usability Questionnaire Score (CUQ) Score | User Testing | 68 | 73.6 | 2025-04-30 | 🟢 |
| Evidence that the bot responds quickly enough | Average Latency per Response | Langfuse Analysis | 30 seconds | 13.04 seconds | 2025-04-29 | 🟢 |
| Percentage Messages with Latency Under Threshold | - | 90% | 98.7% | 2025-05-30 | 🟢 |