AgentSkillsCN

recipe-patterns

在创建、配置或运行任何 Dataiku 配方(如准备、联接、分组、同步、Python)时使用,包括数据清洗、公式计算以及 GREL 表达式。

SKILL.md
--- frontmatter
name: recipe-patterns
description: "Use when creating, configuring, or running any Dataiku recipe (prepare, join, group, sync, python) including data cleaning, formulas, and GREL"

Dataiku Recipe Patterns

Reference patterns for creating different recipe types via the Python API.

Recipe Type Decision Table

Recipe TypeUse WhenKey Method
PrepareColumn transforms, filtering, formula columns, renaming, data cleaningproject.new_recipe("prepare", ...)
JoinCombining datasets on key columns (LEFT, INNER, RIGHT, OUTER)project.new_recipe("join", ...)
GroupAggregations: sum, count, avg, min, max, stddev, etc.project.new_recipe("grouping", ...)
SyncCopying data between connections (e.g., to a data warehouse)project.new_recipe("sync", ...)
PythonCustom transformations not possible with visual recipesproject.new_recipe("python", ...)

Universal Builder Pattern

Every recipe follows the same create-configure-run lifecycle:

python
# 1. Create via builder
builder = project.new_recipe("<type>", "<recipe_name>")
builder.with_input("<input_dataset>")
builder.with_output("<output_dataset>")
recipe = builder.create()

# 2. Configure settings
settings = recipe.get_settings()
# ... recipe-specific configuration ...
settings.save()

# 3. Apply schema updates (visual recipes only)
schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
    schema_updates.apply()

# 4. Run and check
job = recipe.run(no_fail=True)
state = job.get_status()["baseStatus"]["state"]  # "DONE" or "FAILED"

Prepare Recipe Quick Reference

Prepare recipes use raw_steps to add processors:

python
settings = recipe.get_settings()
settings.raw_steps.append({
    "type": "CreateColumnWithGREL",
    "params": {"column": "revenue", "expression": "price * quantity"}
})
settings.save()

Common Processors

ProcessorPurpose
CreateColumnWithGRELAdd calculated / derived columns
ColumnTrimmerStrip whitespace from text columns
ColumnLowercaserLowercase text for consistency
FillEmptyWithValueReplace nulls with a default
FilterOnValueKeep or remove rows by column value
FilterOnFormulaKeep or remove rows by GREL expression
ColumnRenamerRename columns
ColumnsSelectorKeep or remove a set of columns
ColumnSplitterSplit a column by delimiter
DateParserParse string to date
DateFormatterFormat date to string

Top 5 GREL Patterns

PatternExampleNotes
Mathprice * quantityStandard operators +, -, *, /
Conditionalif(amount > 1000, 'large', 'small')Nestable: if(..., ..., if(...))
String opsupper(name), trim(val), length(s)Also lower(), toString()
Date extractiondatePart(order_date, 'month')Parts: year, month, day, hour
Coalescecoalesce(val, 'default')Returns first non-null argument

Always Remember

  1. Call settings.save() after configuration changes
  2. Call compute_schema_updates().apply() for visual recipes (join, grouping, etc.)
  3. Call recipe.run(no_fail=True) to execute (already waits for completion)
  4. Check job.get_status()["baseStatus"]["state"] for success ("DONE") or failure ("FAILED")
  5. Verify output dataset has expected data and schema

Common Pitfalls

Schema Propagation

Visual recipes (join, grouping) need schema updates applied before running:

python
schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
    schema_updates.apply()

Column Case for SQL Databases

Use UPPERCASE column names in dataset schemas to avoid "invalid identifier" errors:

python
for col in raw["schema"]["columns"]:
    col["name"] = col["name"].upper()

Job Completion

recipe.run() already waits -- do not look for wait_for_completion():

python
job = recipe.run(no_fail=True)  # Returns after job completes
state = job.get_status()["baseStatus"]["state"]  # "DONE" or "FAILED"

Detailed References

Recipe types:

Data preparation:

Working Examples