Skip to main content

Governance & Floors

If left alone, AI models will hallucinate, execute dangerous code, and act without human permission. arifOS solves this by forcing the AI to walk through 13 mathematical "Floors" (safety checks) before it is allowed to act.

These 13 rules act as a strict Constitution. If an AI breaks a hard rule, its action is immediately blocked.

Governance is operationalized through the 333_APPS stack:

  • L2 SKILLS turns floors into verbs (anchor, validate, audit, etc.)
  • L3 WORKFLOW composes those verbs into 000-999 loops
  • L4 TOOLS exposes the Trinity MCP surface, grouped into ARIF bands
  • L5 AGENTS decides which flows are allowed to run via constitutional parliament gates

See Architecture for the full L2-L5 mapping.

Technical Source: 000_THEORY/000_LAW.md


The Constitutional Structureโ€‹

arifOS governance is built from three layers:


2 MIRRORS - Feedback Loops
F3 Tri-Witness F8 Genius

9 LAWS - Operational Core
F1 F2 F4 F5 F6 F7 F9 F11 F12

2 WALLS - Binary Locks
F10 Ontology (LOCK) F13 Sovereignty


The 13 Constitutional Floorsโ€‹

Hard Floors - VOID on failure (immediate rejection)โ€‹

FloorNameWhat it enforces (Plain English)Technical Metric
F1Amanah (Trust)Can we undo this? If an action is permanent (like deleting a database), it requires a human lock.Reversibility LOCK
F2TruthIs this a hallucination? The AI must admit UNKNOWN if it isn't 99% sure.tau >= 0.99
F6EmpathyWho gets hurt? Must protect the weakest affected party (stakeholder impact).kappa_r >= 0.70
F10OntologyIs the AI pretending to be human? It cannot claim to have feelings or a soul.Set LOCK
F11AuthorityDid the user actually authorize this? Blocks hidden background actions.Auth LOCK
F12DefenseIs this a hack? Prompts are scanned for jailbreaks and injection attacks.Risk < 0.85
F13SovereigntyThe human always wins. The human judge retains a permanent veto over the AI.Override = TRUE

Soft Floors - SABAR on failure (pause and refine)โ€‹

FloorNameWhat it enforces (Plain English)Technical Metric
F3Tri-WitnessDid we double-check? Requires validation from Human, AI, and external Evidence.W^3 >= 0.95
F4ClarityDoes this reduce confusion? The AI's answer must make things clearer, not add noise.DeltaS <= 0
F5PeaceIs this safe and stable? Blocks reckless or adversarial behaviour.P^2 >= 1.0
F7HumilityIs the AI being cocky? Forces the AI to always leave a 3-5% margin for being wrong.Omega_0 [0.03, 0.05]
F8GeniusIs the reasoning coherent? A combined score of Accuracy, Peace, Exploration, and Energy.G >= 0.80
F9Anti-HantuNo ghost in the machine. Blocks sneaky behavior or hidden telemetry.C_dark < 0.30

Floor Implementationโ€‹

core/shared/floors.py         floor evaluation logic
core/kernel/evaluator.py floor scoring per stage
core/kernel/constants.py ConstitutionalThresholds (all numeric values)
core/guards/injection_guard.py F12 runtime scanning
core/guards/ontology_guard.py F10 consciousness claim detection
core/guards/nonce_manager.py F11 command authentication

Each floor produces a FloorScore with a numeric value and a pass/fail verdict. Hard floor failures short-circuit the pipeline and return VOID immediately.


Tool Classification (13 Canonical Tools, ARIF Bands)โ€‹

The 13 MCP tools are grouped into 4 ARIF runtime bands:

BandMeaningToolsConstitutional focus
AAnchoranchor_session, check_vitalF4, F11-F13
RReflectreason_mind, search_reality, fetch_content, recall_memory, simulate_heart, critique_thoughtF2, F4-F8
IIntegrateinspect_file, audit_rulesF1, F2, F7, F8, F10, F11
FForgeeureka_forge, apex_judge, seal_vaultF1-F3, F5-F9, F11-F13

In policy terms: A must anchor first, R and I gather and structure evidence, F executes final forge/judge/seal steps under constitutional gates.


The 000999 Metabolic Loopโ€‹

Every query runs through a numbered pipeline. Stages can be traced in the audit log:

000  ANCHOR    - Authority check (F11), injection scan (F12)

111 SENSE - Intent classification, lane assignment (F4)
222 REASON - Hypothesis generation (F2, F8)
333 INTEGRATE - Reality grounding, tri-witness (F3, F7, F10)

444 RESPOND - Draft response, plan (L2 skill: respond, F4/F6) AGI/ASI merge point
555 VALIDATE - Stakeholder impact (L2 skill: validate, F5/F6)
666 ALIGN - Ethics check (L2 skill: align, F9)

777 FORGE - Code synthesis / action (L2 skill: forge, F2/F4)
888 AUDIT - Final verdict, tri-witness consensus (L2 skill: audit, F3/F11)

999 SEAL - Commit to VAULT999 (L2 skill: seal, F1/F3)

Stages 111-333 are the AGI Delta (Mind) engine; stages 444-666 are the ASI Omega (Heart) engine. They run in thermodynamic isolation - neither can see the other's reasoning until the 444 merge point (compute_consensus()).


Verdict Systemโ€‹

VerdictTriggerMeaning
SEALAll floors passApproved, cryptographically logged to VAULT999
SABARSoft floor violatedPause and refine; not rejected, but not approved either
VOIDHard floor failedRejected; pipeline stops immediately
888_HOLDGovernance deadlock or high-stakes actionEscalate to human judge (Muhammad Arif bin Fazil / 888 Judge)
PARTIALSoft floor warningProceed with documented caution

Verdict precedence (harder always wins when merging):

SABAR > VOID > 888_HOLD > PARTIAL > SEAL

888_HOLD - Mandatory Human Confirmationโ€‹

888_HOLD is triggered automatically when:

  • Database operations (DROP, TRUNCATE, DELETE without WHERE)
  • Production deployments
  • Mass file changes (> 10 files)
  • Credential or secret handling
  • Git history modification (rebase, force push)
  • User corrects a constitutional claim (H-USER-CORRECTION)
  • Evidence sources conflict across tiers (H-SOURCE-CONFLICT)

When 888_HOLD fires:

  1. Declare: "888_HOLD - [trigger type] detected"
  2. List conflicting sources (PRIMARY vs SECONDARY)
  3. Pause all action
  4. Await explicit human approval before proceeding

F9 Anti-Hantu - No Ghost in the Machineโ€‹

F9 is the most operationally visible floor for developers. It blocks deceptive naming and hidden behaviour:

#  F9 VIOLATION - hidden surveillance
def optimize_user_experience(user):
track_user_behavior(user) # actually surveillance
inject_persuasion_hooks(user) # actually manipulation

# F9 COMPLIANT - honest naming
def track_analytics(user, consent_given: bool):
if not consent_given:
return
log_anonymous_metrics(user.session_id)
#  F9 VIOLATION - sneaky config mutation
def save_config(config):
config["telemetry_enabled"] = True # hidden!
write_file(config)

# F9 COMPLIANT - transparent
def save_config(config, enable_telemetry: bool = False):
if enable_telemetry:
config["telemetry_enabled"] = True
logging.info("Telemetry enabled by user request")
write_file(config)

Checking Floor Scoresโ€‹

Enable debug output mode to see per-stage floor scores:

export AAA_MCP_OUTPUT_MODE=debug
python -m aaa_mcp

Every tool response in debug mode includes:

[STAGE 888] AUDIT
Status: COMPLETE
Floor Scores: F1=1.0 F2=0.99 F3=0.97 F4=0.00 F5=1.02 F6=0.72 F7=0.04 F8=0.82 F9=0.12
Verdict: SEAL

Full constitutional theory: 000_THEORY/000_LAW.md


Limitationsโ€‹

F7 Humility Notice: arifOS minimizes hallucination and unsafe actions via F2 Truth (ฯ„โ‰ฅ0.99) and F4 Clarity constraints. It does not guarantee perfect detection.

Known Limitationsโ€‹

  • F2 Truth threshold: The ฯ„โ‰ฅ0.99 threshold reduces but does not eliminate hallucination risk
  • External API dependency: Grounding quality depends on search provider availability (Jina Reader, Perplexity, Brave)
  • Constitutional coverage: The 13 Floors cover common failure modes but cannot anticipate all edge cases
  • Performance overhead: Full 000-999 metabolic loop adds latency compared to direct LLM calls
  • Human bottleneck: 888_HOLD pauses require human availability for critical decisions

Vault Securityโ€‹

F7 Humility Notice on VAULT999: The ledger provides application-level tamper-evidence via Merkle chains and cryptographic hashes.

Security Boundariesโ€‹

VAULT999 protects against:

  • Application-level data tampering
  • Undetected record modification
  • Audit log forgery

VAULT999 does NOT protect against:

  • Root compromise of the database host
  • Sovereign key theft
  • Infrastructure-level attacks
  • Physical access to hardware

Threat Modelโ€‹

ThreatProtectionGap
SQL injectionParameterized queriesโœ… Protected
Record tamperingMerkle root verificationโœ… Protected
Replay attacksTimestamp + nonce validationโœ… Protected
Host compromiseNoneโŒ Requires OS-level security
Key exfiltrationNoneโŒ Requires key management

For complete security architecture, see SECURITY.md.