Helen Satchwell

Helen Satchwell
Linguist. Systems Architect. Problem Solver.

I've spent my career making complicated, high-stakes information actually usable by the people who need it.

At AWS, I helped build the quality systems behind AI tools where getting it wrong affects real people, not just metrics. That meant writing SOPs, designing training programs, building audit frameworks, and guiding teams through fast-moving, ambiguous work where clarity genuinely mattered.

My background is in linguistics, so I think about language carefully: how it works, who it serves, and where it quietly breaks down. That lens shows up in everything, from model safety to untangling why a process isn't working the way it should.

AI Integrity & Intent Evaluation 🤖

I help to align human intent and machine logic.

I specialize in the "Human-in-the-loop" side of AI. From adversarial testing to PII filtering, I evaluate how models handle complex, multi-dimensional prompts. I don't just look for errors; I look for the patterns of failure to help engineers build safer, smarter systems.

Quality Architecture & Ops 📈

Turning messy data into structured systems.

I thrive in "day zero" environments where there is no manual. I have a track record of walking into high-volume workflows, identifying the gaps in logic, and building the audit systems and Python-driven pipelines needed to scale quality from the ground up.

Technical Enablement & Governance 🤝

Translating the "Technical" into the "Actionable."

I remove the ambiguity that causes confusion or inconsistent results, creating the documentation and training tools that allow teams to move fast without breaking things.

Research & Publications

TofuEval: Evaluating Hallucinations

The Goal: Measuring how often LLMs hallucinate when summarizing long-form conversations.
The Work: Provided the expert human annotations used to flag summaries that deviated from the actual facts of the dialogue.
The Impact: This work helped build a new benchmark for testing model consistency and accuracy, providing a standard for identifying factual inconsistencies.

View Paper on arXiv

The Routledge Companion to International Children’s Literature

The Goal: Making the historical and cultural development of Chilean children’s and young adult literature accessible to an English-speaking academic audience.
The Work: Translated Chapter 43 ("Development of literature for children and young people in Chile") by Manuel Peña Muñoz from Spanish to English.
The Impact: Preserved cultural nuances and historical context for a major international reference work.

View Publication

Building a Standard from Scratch

The Challenge: At AWS Bedrock, Task Leads and I were managing a 250,000-response AI dataset. Vendors performed 3-pass annotations across 9 independent dimensions, but there was no guidance on how to provide quality feedback. Without a framework, feedback was inconsistent, and our quality baseline scored 47%.

The "Whiteboard" Solution: I built the infrastructure that didn't exist.

The Audit Framework: I designed a mechanism to audit the feedback MLDLs were providing. I designed a QA feedback guide including examples, optional phrasing templates, and defining QA feedback scoring rubrics and expectations. I wrote Python scripts to automate the performance metrics and identify issues as soon as QA batches were processed.
Defining "Good" Feedback: I moved the team away from generic reviews. I established a requirement that every piece of feedback had to be a specific critique or structured praise, backed by a direct guideline citation.
Calibration & Coaching: I led the calibration sessions to ensure that 350+ people across global teams were finally speaking the same "quality language."

The Result: By creating a clear standard where there was none, we moved the quality needle from 47% to 96.7% in a single quarter. We didn't just hit a target; we built a scalable, high-precision loop for GenAI training.

"Helen walks into a problem with a whiteboard in her head."

"She never just flagged issues; she’d come back with two or three solutions already mapped out, risks included. She's the person on your team who makes everyone around her better without making it a big deal."

— Daisy Carlesso, Senior ML & GenAI Operations Manager, AWS

"A rare ability to build processes from the ground up."

"Helen’s most impactful contribution was designing a feedback mechanism from scratch that enabled highly precise quality feedback. She has a rare ability to create detailed procedures, identify gaps early, and iterate quickly until the process is robust enough to scale."

— Mickael Dahlen, Data Operations Manager, AWS

"Exceptionally reliable, meticulous, and deeply committed."

"Helen has a remarkable ability to translate complex processes into clear, well-structured documentation and operational guidelines. She led training sessions, mentored teammates, and helped colleagues build new capabilities."

— Elvira Magomedova, ML Data Operations Manager

Real-World Systems: HOA Governance & Fiduciary Strategy

Managing a 448-unit residential community is a lot like managing a global data team: it requires transparency, structured processes, and the ability to translate complex rules into something everyone understands.
As Board Secretary, I oversee a multimillion-dollar budget and manage the community's official records. I recently spearheaded a complete rewrite of our community regulations to remove legal ambiguity and ensure our rules were fair, enforceable, and aligned with state law. I treat community governance as a system that needs to be both robust and human-centric.

Languages & Technical Skills

Languages

English (Native)
Spanish (Fluent/Native)
Portugues (Intermediate)

Linguistic Data Science

LLM & RAG Evaluation
Adversarial Testing & Red-Teaming
Guardrails Analysis & Framework Design
High-Acuity & Sensitive Data Governance
Intent Recognition & Vendor Calibration

Analysis & Logic

Multi-dimension Evaluation Models
IAA Metrics (Cohen's Kappa, F1)
Statistical Significance Testing ( p < 0.05)
Sampling Methodologies

Tools & Platforms

AWS (S3, SageMaker) & SQL
JIRA & Wiki/Confluence
Appen & Advanced Excel

Python & NLP

Data Pipelines (JSON, CSV, TXT)
Pandas & NumPy (Beginner)
NLP Analysis (NLTK, spaCy) (Beginner)
statsmodels & sklearn (Beginner)

Technical Enablement

Technical Writing & SOP Development
Annotation Workflow Design
Data Transformation Scripting
Cross-functional Communication

Let’s Build Something Clear 🚀

I'm looking for work I believe in. If you're building something in mental health, healthcare, education, or community, something that tries to help people rather than just scale, that's where I want to put my energy.

Outside of work: rock climber, community organizer, devoted dog/cat mom, and the human behind a small cat business in Boulder, CO.

Find Me Here:

LinkedIn Profile
Email: helensatchwell@gmail.com

Helen SatchwellLinguist. Systems Architect. Problem Solver.

AI Integrity & Intent Evaluation 🤖​

Quality Architecture & Ops 📈​

Technical Enablement & Governance 🤝​