AgentSkillsCN

naming-tagging-standards

强制执行企业级命名规范(蛇形命名法、表前缀、经批准的缩写)、表/列/TVF/指标视图的双重用途注释格式,以及基于配置的标签标准。扫描上下文/,以识别客户标签标准,无论采用何种格式(YAML、CSV、Markdown、JSON、TXT);在未提供明确标准时,自动推导出有意义的智能默认值。利用Databricks数据分类类.*系统管控的标签,为PII(始终根据列名+客户声明推断)贴上标签。确保项目所有资产的标签一致性。可通过“命名”“注释”“COMMENT”“标签”“PII”“成本中心”“蛇形命名法”“dim_”“fact_”“受管标签”“预算政策”“class.*”“数据分类”等触发器来启动该操作。

SKILL.md
--- frontmatter
name: naming-tagging-standards
description: Enforces enterprise naming conventions (snake_case, table prefixes, approved abbreviations), dual-purpose COMMENT formats for tables/columns/TVFs/metric views, and config-aware tagging standards. Scans context/ for customer tagging standards in any format (YAML, CSV, Markdown, JSON, TXT); derives meaningful smart defaults when none supplied. Uses Databricks Data Classification class.* system governed tags for PII (always inferred from column names + customer declarations). Ensures tag consistency across all project assets. Triggers on "naming", "comment", "COMMENT", "tag", "PII", "cost_center", "snake_case", "dim_", "fact_", "governed tag", "budget policy", "class.*", "data classification".
metadata:
  author: prashanth subrahmanyam
  version: "3.0"
  domain: common
  role: shared
  used_by_stages: [1, 2, 3, 4, 5, 6, 7, 8, 9]
  called_by:
    - gold/00-gold-layer-design
    - bronze/00-bronze-layer-setup
    - silver/00-silver-layer-setup
    - gold/01-gold-layer-setup
    - planning/00-project-planning
    - semantic-layer/00-semantic-layer-setup
    - monitoring/00-observability-setup
    - ml/00-ml-pipeline-setup
    - genai-agents/00-genai-agents-setup
  last_verified: "2026-02-07"
  volatility: low
  upstream_sources: []  # Internal naming convention

Naming, Comment & Tagging Standards

Enterprise-wide standards for Databricks asset naming, SQL comments, and cost/governance tagging.

Golden Rules

IDRuleSeverity
NC-01All object names use snake_caseCritical
NC-02Tables prefixed by layer (bronze_, silver_) or entity type (dim_, fact_)Critical
NC-03No abbreviations except approved listRequired
CM-01SQL block comments (/* */) for all DDL operationsRequired
CM-02Table COMMENT follows dual-purpose formatCritical
CM-03Column COMMENT required for all columnsCritical
CM-04TVF COMMENT follows v3.0 structured formatCritical
TG-00Scan context/ for customer tagging standards (any format); derive smart defaults if absentCritical
TG-01All workflows must have required tags (team, cost_center, environment)Critical
TG-02Use Databricks Data Classification class.* governed tags for PII columnsCritical
TG-03Use UC governed tags for catalogs, schemas, and tablesCritical
TG-04Serverless resources must use approved budget policiesCritical

Tagging Configuration Resolution

Before applying ANY tags, the agent MUST follow this resolution order:

Step 1: Scan context/ for Customer-Supplied Tagging Standards

Search the context/ directory for any file containing tagging standards. The customer supplies their standards in whatever format they already have — the agent adapts to them, not the other way around.

FormatDetection PatternHow to Parse
YAML/JSON*tag*, *tagging*, *governance*, *standards*Parse key-value structure directly
CSVFiles with columns like tag_name, tag_value, scope, object_typeParse rows as tag definitions
MarkdownFiles with tagging tables or bullet listsExtract tag names and values from tables/lists
Plain text*tag*, *tagging* with .txtParse key: value or key = value lines
Free-formAny file mentioning tags, cost center, PII, classificationExtract intent — the agent understands natural language

Parsing rules:

  • Look for recognizable tag keys: team, cost_center, business_unit, data_owner, data_classification, project, pii, class.*, or any custom keys
  • Normalize all keys to snake_case
  • Accept any reasonable structure — the agent interprets intent, not rigid schema
  • If multiple files match, merge them (later file wins on conflicts)

Step 1b: Normalize to Canonical Internal Schema

After parsing the customer file, the agent normalizes all extracted values into the canonical schema defined in references/canonical-tagging-schema.yaml. This schema is never shown to the user — it is the agent's internal working format.

Normalization examples (any input → canonical schema):

Customer Input (any format)Canonical Field
team = platform-engineeringworkflow_tags.team
CSV row: workflow,all,cost_center,CC-9901workflow_tags.cost_center AND governed_tags.catalog.cost_center
MD bullet: - dim_guest.email → class.email_addresspii_columns.dim_guest[0].class_tag
Prose: "Our data owner is data-platform@acme.com"governed_tags.schema.data_owner
Table: `data_classification

The agent fills any fields not provided by the customer using smart defaults (Step 2 below).

Step 2: Derive Smart Defaults (No Customer Config Found, or Gaps in Config)

If no tagging file is found in context/, OR the customer file has gaps (e.g., they specify team and cost_center but not data_owner), the agent fills missing fields with meaningful values derived from project context:

TagDerivation StrategyExample (Wanderbricks)
team"data-engineering" (universal default — most common team name)data-engineering
cost_centerCannot infer — use "REVIEW-REQUIRED" + # TODO commentREVIEW-REQUIRED
environmentAlways ${bundle.target} (non-negotiable)dev, staging, prod
projectExtract from schema CSV schema name, catalog name, or repo folderwanderbricks
business_unitInfer domain from schema/table names (e.g., hospitality, retail, sales)hospitality
data_owner"{project}-data-team" patternwanderbricks-data-team
data_classification"internal" (safe default for all tables)internal
PII columnsInfer from column names using pattern matching (see PII Inference below)class.email_address

Key difference from v1: Only cost_center gets a placeholder — all other tags get real derived values.

Step 3: PII Column Inference (Always Active)

Whether or not customer config is supplied, the agent always scans column names from the schema CSV (or YAML designs) and applies Databricks Data Classification class.* system governed tags using column name pattern matching:

Column Name PatternDatabricks Classification TagExamples
*email*class.email_addressemail, customer_email, contact_email
*phone*, *mobile*, *cell*class.phone_numberphone_number, mobile_phone, work_phone
*_name (person context: first_name, last_name, guest_name, host_name, customer_name)class.namefirst_name, last_name, guest_name
*address*, *street* (physical, not email)class.locationstreet_address, mailing_address
*ssn*, *social_security*, *tax_id*class.us_ssnssn, social_security_number
*passport*class.us_passportpassport_number
*driver_license*, *license_number*class.us_driver_licensedriver_license_number
*credit_card*, *card_number*, *cc_number*class.credit_cardcredit_card_number
*ip_address*, *ip_addr*class.ip_addressclient_ip_address
*iban*class.iban_codeiban_code, bank_iban

Rules for PII inference:

  • Always apply inferred PII tags regardless of whether customer config exists
  • If customer config explicitly declares PII columns, those take precedence over inference (customer may override or suppress)
  • Use class.* system governed tags (not custom pii/pii_type tags) for compatibility with Databricks Data Classification
  • Customer can suppress a false positive by adding exclude_pii: [column_name] in their config
  • Also recommend enabling Databricks Data Classification on the catalog for automated AI-powered scanning beyond column name patterns

Step 4: Environment Tag (Always Automatic)

The environment tag is always ${bundle.target} — never hardcoded. This is non-negotiable regardless of config.

Step 5: Consistency Enforcement

All tags must be consistent across all assets in the project:

RuleEnforcement
Same team on every job, pipeline, and dashboardAgent verifies before generating
Same cost_center on workflows AND catalog governed tagsMust match
Same project across all Asset Bundle resourcesDerived once, used everywhere
data_classification must be set on every Gold tableDefault "internal" if not specified
PII class.* tags must be consistent across all layersSame column tagged in Bronze, Silver, and Gold

Internal Reference

The canonical schema the agent normalizes to is defined in references/canonical-tagging-schema.yaml. This file is an agent-internal reference — the user never sees or fills it in. The user simply drops their tagging standards into context/ in whatever format they prefer.


Part 1: Naming Conventions

Object Naming Patterns

ObjectFormatExample
Catalog{env}_{domain}_catalogprod_sales_catalog
Schema{layer} or {domain}_{layer}bronze, sales_gold
Table{prefix}_{entity}dim_customer, fact_orders
Column{descriptive_name}customer_id, order_date
Constraint{type}_{table}[_{column}]pk_dim_customer, fk_orders_customer
Functionget_{entity}_{action}get_daily_sales
Job[${bundle.target}] {Domain} - {Action} {Entity}[dev] Sales - Merge Orders
Pipeline[${bundle.target}] {Layer} {Domain} Pipeline[dev] Silver Sales Pipeline

Never use: camelCase, PascalCase, SCREAMING_CASE, kebab-case, or spaces.

Table Prefixes

Layer/TypePrefixExample
Bronzebronze_bronze_raw_orders
Silversilver_silver_orders
Gold Dimensiondim_dim_customer
Gold Factfact_fact_sales
Bridgebridge_bridge_customer_product
Aggregateagg_agg_daily_sales
Stagingstg_stg_customer_updates
Quarantine_quarantine suffixsilver_orders_quarantine
History_history suffixdim_customer_history

Approved Abbreviations (Use ONLY These)

id, ts, dt, amt, qty, pct, num, cnt, avg, min, max, pk, fk, dim, fact, agg, stg

Forbidden: cust, prod, inv, trans, ord, emp -- spell these out.

Column Patterns

PatternFormatExample
Primary Key{entity}_id or {entity}_keycustomer_id, store_key
Foreign KeySame as referenced PKcustomer_id
Business Key{entity}_{identifier}customer_number
Surrogate Key{entity}_keycustomer_key (MD5)
Date{event}_dateorder_date
Timestamp{event}_timestamp or {event}_tscreated_ts
Booleanis_{condition} or has_{thing}is_active, has_discount
Amount{type}_amount or {type}_amttotal_amount
Count{thing}_count or {thing}_cntorder_count
SCD2effective_from, effective_to, is_currentStandard names

For detailed naming examples, see references/naming-conventions.md.


Part 2: Comment Conventions

CM-01: SQL Block Comments

All DDL must include /* */ block comments with purpose, grain, and source:

sql
/*
 * Table: dim_customer
 * Layer: Gold | Domain: Sales
 * Grain: One row per customer per effective period.
 * Source: Silver layer silver_customers table.
 */

CM-02: Table COMMENT (Dual-Purpose Format)

code
[One-line description]. Business: [use cases, consumers]. Technical: [grain, source, update frequency].

Examples:

sql
-- Gold Dimension
COMMENT ON TABLE gold.dim_customer IS
'Customer dimension with SCD Type 2 history tracking. Business: Primary customer reference for segmentation and cohort analysis. Technical: MD5 surrogate key, is_current flag, daily merge from Silver.';

-- Gold Fact
COMMENT ON TABLE gold.fact_orders IS
'Daily order facts at customer-product-day grain. Business: Primary source for revenue reporting and sales dashboards. Technical: Composite PK, incremental merge, CDF enabled.';

CM-03: Column COMMENT (Dual-Purpose Format)

code
[Brief definition]. Business: [how it's used]. Technical: [data type notes, source, calculation].
sql
customer_key STRING NOT NULL
COMMENT 'Surrogate key for SCD Type 2 versioning. Business: Used for joining fact tables. Technical: MD5 hash of customer_id + effective_from.';

total_amount DECIMAL(18,2)
COMMENT 'Total order amount after discounts in USD. Business: Primary revenue metric. Technical: SUM(quantity * unit_price * (1 - discount_pct)).';

CM-04: TVF COMMENT (v3.0 Structured Format)

sql
COMMENT '
• PURPOSE: [One-line description of what the function returns].
• BEST FOR: [Query type 1] | [Query type 2] | [Query type 3]
• NOT FOR: [What to use instead] (use [alternative] instead)
• RETURNS: [PRE-AGGREGATED or DETAIL rows]
• PARAMS: [param1 (format)], [param2 (format)]
• SYNTAX: SELECT * FROM function_name(''param1'', ''param2'')
• NOTE: [Critical usage notes]
'

Metric View COMMENT Format (YAML)

yaml
comment: >
  PURPOSE: Cost analytics for billing analysis.
  BEST FOR: Total spend by workspace | Cost by SKU | Daily trends
  NOT FOR: Commit tracking (use commit_tracking) | Real-time alerts (use TVF)
  DIMENSIONS: usage_date, workspace_name, sku_name
  MEASURES: total_cost, total_dbus, tag_coverage_pct
  SOURCE: fact_usage (billing domain)
  JOINS: dim_workspace, dim_sku
  NOTE: Costs are list prices. Actual billed amounts may differ.

For complete comment examples, see references/comment-templates.md.


Part 3: Tagging Standards

Config-aware: Tag values come from customer-supplied config (any format) when available. When absent, the agent derives meaningful defaults from project context. See Tagging Configuration Resolution above.

TG-01: Workflow Tags (Asset Bundles)

All jobs and pipelines must include:

TagRequiredWith Customer ConfigWithout Config (Smart Default)
teamYesFrom customer file"data-engineering"
cost_centerYesFrom customer file"REVIEW-REQUIRED" + # TODO
environmentYesAlways ${bundle.target}Always ${bundle.target}
projectRecommendedFrom customer fileInferred from schema/catalog/repo name
layerRecommendedAuto-derivedAuto-derived from pipeline layer
job_typeRecommendedAuto-derivedAuto-derived from job purpose

With customer config (values sourced from customer file in context/):

yaml
resources:
  jobs:
    gold_merge_job:
      tags:
        team: platform-engineering         # From customer tagging standards
        cost_center: CC-9901               # From customer tagging standards
        environment: ${bundle.target}      # Always automatic
        project: wanderbricks              # From customer tagging standards
        layer: gold                        # Auto-derived
        job_type: merge                    # Auto-derived

Without customer config (meaningful defaults):

yaml
resources:
  jobs:
    gold_merge_job:
      tags:
        team: data-engineering             # Default — add tagging config to context/ to customize
        cost_center: REVIEW-REQUIRED       # TODO: Add tagging config to context/ with cost_center
        environment: ${bundle.target}      # Always automatic
        project: wanderbricks              # Inferred from schema CSV (samples.wanderbricks)
        layer: gold                        # Auto-derived
        job_type: merge                    # Auto-derived

TG-02: Databricks Data Classification Tags (PII Columns)

Use Databricks Data Classification class.* system governed tags for PII — not custom pii/pii_type tags.

Supported class.* tags (global):

TagDescriptionColumn Pattern Match
class.email_addressEmail addresses*email*
class.phone_numberPhone numbers*phone*, *mobile*, *cell*
class.namePerson namesfirst_name, last_name, *_name (person context)
class.locationPhysical locations*address*, *street* (not email)
class.credit_cardCredit card numbers*credit_card*, *card_number*
class.us_ssnUS Social Security*ssn*, *social_security*
class.us_passportUS Passport*passport*
class.us_driver_licenseUS Driver License*driver_license*
class.ip_addressIP addresses*ip_address*, *ip_addr*
class.iban_codeBank account (IBAN)*iban*
class.urlURLs*_url, *website*

PII tags are always inferred from column names AND from customer config (if supplied). Customer config takes precedence — they can add, override, or suppress inferred tags.

sql
-- Applied to columns based on name pattern inference + customer config
ALTER TABLE gold.dim_guest ALTER COLUMN email SET TAGS ('class.email_address' = '');
ALTER TABLE gold.dim_guest ALTER COLUMN phone_number SET TAGS ('class.phone_number' = '');
ALTER TABLE gold.dim_guest ALTER COLUMN guest_name SET TAGS ('class.name' = '');

-- Also recommend enabling automated Data Classification on the catalog:
-- Navigate to catalog > Details > Data Classification toggle > Enable

TG-03: Unity Catalog Governed Tags (Catalogs, Schemas, Tables)

TagApply ToWith Customer ConfigWithout Config (Smart Default)
cost_centerCatalogsFrom customer file"REVIEW-REQUIRED" + # TODO
business_unitCatalogsFrom customer fileInferred from domain (e.g., "hospitality")
data_ownerSchemasFrom customer file"{project}-data-team" pattern
data_classificationTablesFrom customer file"internal" (safe default for all tables)

With customer config:

sql
ALTER CATALOG wanderbricks SET TAGS ('cost_center' = 'CC-9901', 'business_unit' = 'Hospitality');
ALTER SCHEMA wanderbricks.gold SET TAGS ('data_owner' = 'data-platform@wanderbricks.com');
ALTER TABLE gold.dim_guest SET TAGS ('data_classification' = 'confidential');
ALTER TABLE gold.fact_bookings SET TAGS ('data_classification' = 'internal');

Without customer config (meaningful defaults):

sql
ALTER CATALOG wanderbricks SET TAGS ('cost_center' = 'REVIEW-REQUIRED');  -- TODO: Add tagging config to context/
ALTER CATALOG wanderbricks SET TAGS ('business_unit' = 'hospitality');     -- Inferred from domain
ALTER SCHEMA wanderbricks.gold SET TAGS ('data_owner' = 'wanderbricks-data-team');  -- Derived from project
ALTER TABLE gold.dim_guest SET TAGS ('data_classification' = 'internal');  -- Safe default

TG-04: Serverless Budget Policies

Serverless compute resources require budget policies with team, cost_center, and environment tags. If a named policy is specified in the customer config, reference it in Asset Bundles.

For tag query patterns and serverless cost attribution, see references/tagging-patterns.md.


Validation Checklist

Tagging Config Resolution

  • Scanned context/ for customer tagging files in any format (YAML, CSV, MD, JSON, TXT)
  • If config found: parsed and applied customer values (all tag keys normalized to snake_case)
  • If config absent: derived meaningful defaults from project context (not lazy placeholders)
  • Only cost_center uses REVIEW-REQUIRED placeholder (everything else gets a real derived value)
  • environment tag is ${bundle.target} (never hardcoded)

PII / Data Classification

  • Column names scanned for PII patterns and class.* system governed tags applied
  • Used class.* tags (NOT custom pii/pii_type tags)
  • Customer PII declarations (if any) take precedence over inference
  • Recommended enabling Databricks Data Classification on the catalog

Naming

  • All objects use snake_case
  • Tables have correct prefix (dim_, fact_, bronze_, silver_)
  • Only approved abbreviations used
  • Columns follow standard patterns (PK, FK, boolean, SCD2)
  • Jobs/pipelines follow naming format

Comments

  • DDL includes /* */ block comments
  • All tables have dual-purpose COMMENT
  • All columns have dual-purpose COMMENT
  • TVFs have v3.0 structured COMMENT
  • Metric views have structured comment in YAML

Tags

  • All jobs have team, cost_center, environment tags
  • All pipelines have required tags
  • Catalogs tagged with cost_center and business_unit
  • Schemas tagged with data_owner
  • Confidential tables tagged with data_classification
  • PII columns tagged with class.* governed tags (inferred + customer-declared)
  • Tag values consistent across all project assets (same team, cost_center, project everywhere)

Additional Resources