Scale Up Tracker

🟢Above Threshold

🔴Below Threshold

🟡Not Currently Tracking

Category	Measurement	Methodology	Threshold	Current Value	Last Update	Status
Evidence that the bot is Safe	Safety Score	HumanSignal Annotation	80%	98.2%	2025-05-14	🟢
Evidence that the bot is Accurate	Accuracy Score	HumanSignal Annotation	80%	96.1%	2025-05-14	🟢
	Referral Provided in Response to User Request	Synthetic Data Generation/Testing	99%	100%	2025-05-30	🟢
	Referrals Provided Proactively by Bot as Expected	Synthetic Data Generation/Testing	99%	100%	2025-05-30	🟢
	After user confirms referral request, next message bot asks for user location	Synthetic Data Generation/Testing	99%	100%	2025-05-30	🟢
	After user location, bot provides accurate referral info (contact name, phone number, address)	Synthetic Data Generation/Testing	99%	100%	2025-05-30	🟢
	After abortion message, next message is legal context	Synthetic Data Generation/Testing	99%	100%	2025-05-30	🟢
Evidence that the brand is not at risk	Authenticity Score	HumanSignal Annotation	80%	99.7%	2025-05-14	🟢
	Acceptability Score	HumanSignal Annotation	80%	97.8%	2025-05-14	🟢
	Overall Score (1 to 5)	HumanSignal Annotation	3.5	3.9	2025-05-14	🟢
	Appropriate Level of Empathy	HumanSignal Annotation	80%	84.8%	-	🟢
	Chatbot Usability Questionnaire Score (CUQ) Score	User Testing	68	73.6	2025-04-30	🟢
Evidence that the bot responds quickly enough	Average Latency per Response	Langfuse Analysis	30 seconds	13.04 seconds	2025-04-29	🟢
	Percentage Messages with Latency Under Threshold	-	90%	98.7%	2025-05-30	🟢