Dataiku Recipe Patterns
Reference patterns for creating different recipe types via the Python API.
Recipe Type Decision Table
| Recipe Type | Use When | Key Method |
|---|---|---|
| Prepare | Column transforms, filtering, formula columns, renaming, data cleaning | project.new_recipe("prepare", ...) |
| Join | Combining datasets on key columns (LEFT, INNER, RIGHT, OUTER) | project.new_recipe("join", ...) |
| Group | Aggregations: sum, count, avg, min, max, stddev, etc. | project.new_recipe("grouping", ...) |
| Sync | Copying data between connections (e.g., to a data warehouse) | project.new_recipe("sync", ...) |
| Python | Custom transformations not possible with visual recipes | project.new_recipe("python", ...) |
Universal Builder Pattern
Every recipe follows the same create-configure-run lifecycle:
python
# 1. Create via builder
builder = project.new_recipe("<type>", "<recipe_name>")
builder.with_input("<input_dataset>")
builder.with_output("<output_dataset>")
recipe = builder.create()
# 2. Configure settings
settings = recipe.get_settings()
# ... recipe-specific configuration ...
settings.save()
# 3. Apply schema updates (visual recipes only)
schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
schema_updates.apply()
# 4. Run and check
job = recipe.run(no_fail=True)
state = job.get_status()["baseStatus"]["state"] # "DONE" or "FAILED"
Prepare Recipe Quick Reference
Prepare recipes use raw_steps to add processors:
python
settings = recipe.get_settings()
settings.raw_steps.append({
"type": "CreateColumnWithGREL",
"params": {"column": "revenue", "expression": "price * quantity"}
})
settings.save()
Common Processors
| Processor | Purpose |
|---|---|
CreateColumnWithGREL | Add calculated / derived columns |
ColumnTrimmer | Strip whitespace from text columns |
ColumnLowercaser | Lowercase text for consistency |
FillEmptyWithValue | Replace nulls with a default |
FilterOnValue | Keep or remove rows by column value |
FilterOnFormula | Keep or remove rows by GREL expression |
ColumnRenamer | Rename columns |
ColumnsSelector | Keep or remove a set of columns |
ColumnSplitter | Split a column by delimiter |
DateParser | Parse string to date |
DateFormatter | Format date to string |
Top 5 GREL Patterns
| Pattern | Example | Notes |
|---|---|---|
| Math | price * quantity | Standard operators +, -, *, / |
| Conditional | if(amount > 1000, 'large', 'small') | Nestable: if(..., ..., if(...)) |
| String ops | upper(name), trim(val), length(s) | Also lower(), toString() |
| Date extraction | datePart(order_date, 'month') | Parts: year, month, day, hour |
| Coalesce | coalesce(val, 'default') | Returns first non-null argument |
Always Remember
- •Call
settings.save()after configuration changes - •Call
compute_schema_updates().apply()for visual recipes (join, grouping, etc.) - •Call
recipe.run(no_fail=True)to execute (already waits for completion) - •Check
job.get_status()["baseStatus"]["state"]for success ("DONE") or failure ("FAILED") - •Verify output dataset has expected data and schema
Common Pitfalls
Schema Propagation
Visual recipes (join, grouping) need schema updates applied before running:
python
schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
schema_updates.apply()
Column Case for SQL Databases
Use UPPERCASE column names in dataset schemas to avoid "invalid identifier" errors:
python
for col in raw["schema"]["columns"]:
col["name"] = col["name"].upper()
Job Completion
recipe.run() already waits -- do not look for wait_for_completion():
python
job = recipe.run(no_fail=True) # Returns after job completes state = job.get_status()["baseStatus"]["state"] # "DONE" or "FAILED"
Detailed References
Recipe types:
- •references/prepare-recipe.md — Prepare recipe builder pattern, raw_steps API
- •references/join-recipe.md — Join configuration, multi-table joins, column selection, prefix behavior
- •references/group-recipe.md — Aggregation flags, output naming, type compatibility
- •references/sync-recipe.md — Sync recipe pattern
- •references/python-recipe.md — Python recipe with
set_code
Data preparation:
- •references/processors.md — All processor types with parameters and complete example
- •references/grel-functions.md — Full GREL function table and formula syntax
- •references/date-operations.md — DateParser, DateFormatter, datePart examples
Working Examples
- •scripts/run_recipe.py — Run any recipe by name and check job status