Lithify / docs PRIVATE BETA
Docs / Agents / Flagging

Flagging agent

The bouncer at the door. Looks at every incoming brief and decides whether the rest of the pipeline should bother with it. Three possible verdicts: legitimate, spam, or a manipulation attempt.

Built-in Runs first Read-only

What it watches for

  • Manipulation attempts — text that tries to instruct an AI, jailbreak prompts, attempts to extract internal data, or smuggle tool calls.
  • Out-of-scope submissions — personal favours, unrelated topics, illegal content, anything the project clearly isn't there to handle.
  • Low-quality submissions — empty, gibberish, spam, or so vague there's nothing to act on.

Verdicts

VerdictMeaning
LegitimateLooks like real work for the team. Pipeline continues.
SpamJunk or out-of-scope. Brief is cancelled — but kept for the record.
ManipulationLooks like an attempt to manipulate an AI. Same outcome as spam, with a clearer reason on the brief so you know why.

Settings

SettingDefaultWhat it does
ModelThe project defaultThe flagging job is small enough that lighter models work well.
StrictnessMediumHigher = catches more, with more false positives. Lower = lets more through, including some borderline submissions.

How the workflow uses it

Every project ships with two rules wired up to the flagging agent:

  • Legit → hand the brief to the refining agent.
  • Spam or manipulation → cancel the brief.

Cancellation isn't loss. Cancelled briefs stay in the database with the full submission and the verdict. They show up in stats and search; you can reopen any of them if the bouncer was wrong.

Why it's a separate agent

Splitting flagging from refining means the two passes can use different settings, and you can turn flagging off entirely for trusted intakes (a private internal form, say) without changing how anything else works.