Learn by Directing AI
Unit 6

Implement RBAC and governance

Step 1: Identify the PII

Factory 2's quality testing data includes a quality_inspector column with full names. Ahmed Al-Sabah, Sara Al-Mutairi, Khalid Al-Enezi -- these are employee names. Personal data.

Not everyone at Al-Bina'a needs to see these names. The quality team needs them for their work. The CFO's reporting doesn't. The analysts running cost attribution queries don't. But right now, if you give someone access to the data, they see everything.

In the old on-premise database, everyone could see everything. Fatimah hasn't thought about who should see what because the question never came up. In a cloud system with controlled access, it's a decision you need to make.

Step 2: Design the RBAC model

Design roles based on who needs what:

Role Purpose Access scope
data_engineer Development and debugging Full access to all layers including raw/staging
analyst Reporting and analysis Mart-layer only. Inspector names masked or hidden.
quality_team Quality control work Quality data with inspector names visible
cfo_reporting Cost attribution reports Read-only access to cost attribution views only

Direct AI to create these roles and their grants. AI commonly generates RBAC where a broad dataset-level grant is applied first, then attempts to restrict specific columns. The problem: the broad grant takes precedence. The analyst can see unmasked PII because the dataset-level permission overrides the intended column restriction.

Step 3: Create restricted views

Instead of trying to make one table serve every role, create views designed for each consumer:

  • A cost attribution view for the CFO that includes only the aggregated numbers
  • An analyst view of delivery data with inspector names hashed (SHA-256) or excluded
  • A quality team view with full inspector visibility

Restricted views serve two purposes simultaneously. They limit data exposure (governance) and they limit query scope (cost). An analyst who can only access a pre-aggregated mart view can't accidentally run a SELECT * on the raw staging table -- which protects PII and prevents expensive full-table scans.

Step 4: Test effective permissions

This is where most RBAC implementations fail. The GRANT statements look correct. The role definitions make sense on paper. But the only way to know what a role can actually see is to query as that role.

Query as the analyst role:

SELECT * FROM fct_cost_attribution LIMIT 5;

Can they see cost data? They should.

SELECT * FROM stg_factory2_deliveries LIMIT 5;

Can they see staging data with inspector names? They should not.

Test every role against every data layer. Verify what they can actually see, not what the GRANT statements say they should see.

Step 5: Verify grant precedence

Specifically test: does a broad dataset grant override a column restriction? Direct AI to check whether any role has broader access than intended.

If you gave the analyst role SELECT on the dataset (to access mart tables) and then tried to restrict the quality_inspector column, the dataset-level grant may take precedence. The GRANT statements read correctly, but the analyst can see inspector names.

The fix is usually restructuring the grants -- using view-level permissions instead of column-level restrictions, or ensuring narrow grants don't get overridden by broad ones.

Step 6: Document the RBAC design

Every RBAC decision that isn't documented becomes governance debt. Document:

  • Each role's purpose and who holds it
  • Each permission grant and its rationale
  • The testing evidence (query results showing what each role can and cannot see)
  • Any "temporary" access that should be reviewed later

This documentation doesn't produce error messages if it's missing. It surfaces months later when a compliance audit asks "why does this role have access to this dataset?" and no one can answer.

Step 7: Check dbt docs for PII

Run dbt docs generate. The generated documentation includes sample values from the project -- including staging models. If the staging models contain unmasked inspector names, those names appear in the dbt docs.

AI sets up masking in mart models but doesn't check whether PII leaks through generated documentation, debug logs, or staging tables visible in the docs UI. This is a governance surface that's easy to miss.

Verify: does the dbt docs output expose any inspector names? If so, configure dbt to exclude staging columns with PII from the documentation.

Step 8: Update Fatimah on RBAC

Explain the access control design to Fatimah. She hasn't thought about this, but she'll understand it when framed in terms she cares about: the board's financial data is accessible only to appropriate roles, inspector names are restricted to the quality team, and no one can accidentally run an expensive query on raw data.

✓ Check

Check: Query as the analyst role and confirm: (1) they can read mart-layer cost attribution, (2) they cannot see unmasked inspector names from Factory 2's quality data.