Helen Satchwell
I've spent my career making complicated, high-stakes information actually usable by the people who need it.
At AWS, I helped build the quality systems behind AI tools where getting it wrong affects real people, not just metrics. That meant writing SOPs, designing training programs, building audit frameworks, and guiding teams through fast-moving, ambiguous work where clarity genuinely mattered.
My background is in linguistics, so I think about language carefully: how it works, who it serves, and where it quietly breaks down. That lens shows up in everything, from model safety to untangling why a process isn't working the way it should.
At AWS, I helped build the quality systems behind AI tools where getting it wrong affects real people, not just metrics. That meant writing SOPs, designing training programs, building audit frameworks, and guiding teams through fast-moving, ambiguous work where clarity genuinely mattered.
My background is in linguistics, so I think about language carefully: how it works, who it serves, and where it quietly breaks down. That lens shows up in everything, from model safety to untangling why a process isn't working the way it should.
AI Integrity & Intent Evaluation 🤖I help to align human intent and machine logic.
I specialize in the "Human-in-the-loop" side of AI. From adversarial testing to PII filtering, I evaluate how models handle complex, multi-dimensional prompts. I don't just look for errors; I look for the patterns of failure to help engineers build safer, smarter systems. |
Quality Architecture & Ops 📈
Turning messy data into structured systems.
I thrive in "day zero" environments where there is no manual. I have a track record of walking into high-volume workflows, identifying the gaps in logic, and building the audit systems and Python-driven pipelines needed to scale quality from the ground up. |
Technical Enablement & Governance 🤝Translating the "Technical" into the "Actionable."
I remove the ambiguity that causes confusion or inconsistent results, creating the documentation and training tools that allow teams to move fast without breaking things. |
Research & Publications
TofuEval: Evaluating Hallucinations
|
The Routledge Companion to International Children’s Literature
|
Building a Standard from Scratch
The Challenge: At AWS Bedrock, Task Leads and I were managing a 250,000-response AI dataset. Vendors performed 3-pass annotations across 9 independent dimensions, but there was no guidance on how to provide quality feedback. Without a framework, feedback was inconsistent, and our quality baseline scored 47%.
The "Whiteboard" Solution: I built the infrastructure that didn't exist.
The Result: By creating a clear standard where there was none, we moved the quality needle from 47% to 96.7% in a single quarter. We didn't just hit a target; we built a scalable, high-precision loop for GenAI training.
The "Whiteboard" Solution: I built the infrastructure that didn't exist.
- The Audit Framework: I designed a mechanism to audit the feedback MLDLs were providing. I designed a QA feedback guide including examples, optional phrasing templates, and defining QA feedback scoring rubrics and expectations. I wrote Python scripts to automate the performance metrics and identify issues as soon as QA batches were processed.
- Defining "Good" Feedback: I moved the team away from generic reviews. I established a requirement that every piece of feedback had to be a specific critique or structured praise, backed by a direct guideline citation.
- Calibration & Coaching: I led the calibration sessions to ensure that 350+ people across global teams were finally speaking the same "quality language."
The Result: By creating a clear standard where there was none, we moved the quality needle from 47% to 96.7% in a single quarter. We didn't just hit a target; we built a scalable, high-precision loop for GenAI training.
"Helen walks into a problem with a whiteboard in her head." |
"A rare ability to build processes from the ground up." |
"Exceptionally reliable, meticulous, and deeply committed." |
Real-World Systems: HOA Governance & Fiduciary Strategy
Managing a 448-unit residential community is a lot like managing a global data team: it requires transparency, structured processes, and the ability to translate complex rules into something everyone understands.
As Board Secretary, I oversee a multimillion-dollar budget and manage the community's official records. I recently spearheaded a complete rewrite of our community regulations to remove legal ambiguity and ensure our rules were fair, enforceable, and aligned with state law. I treat community governance as a system that needs to be both robust and human-centric.
As Board Secretary, I oversee a multimillion-dollar budget and manage the community's official records. I recently spearheaded a complete rewrite of our community regulations to remove legal ambiguity and ensure our rules were fair, enforceable, and aligned with state law. I treat community governance as a system that needs to be both robust and human-centric.
Languages & Technical Skills
|
Languages
Linguistic Data Science
Analysis & Logic
|
Tools & Platforms
Python & NLP
Technical Enablement
|
Let’s Build Something Clear 🚀
I'm looking for work I believe in. If you're building something in mental health, healthcare, education, or community, something that tries to help people rather than just scale, that's where I want to put my energy.
Outside of work: rock climber, community organizer, devoted dog/cat mom, and the human behind a small cat business in Boulder, CO.
Find Me Here:
Outside of work: rock climber, community organizer, devoted dog/cat mom, and the human behind a small cat business in Boulder, CO.
Find Me Here: