File Modification Rules (CRITICAL)
General Rules (All Projects)
Never Overwrite These Files
- •Outputs from someone else — files from collaborators, students, or external sources
- •Original/source data — anything you did not generate in this session
- •Configuration files — unless the user explicitly requests changes
Instead
- •Create a new file with an updated name (e.g., add
_updated,_with_new_column) - •Ask the user before modifying if uncertain about the file's origin
Files You CAN Overwrite
- •Files you created in the current session
- •Configuration files when the user requests changes
Script Output Integrity
Never create standalone scripts solely to modify outputs from other scripts.
If modifications are needed:
- •Modify the original script — if the change belongs upstream, update the source (ask user first)
- •Handle in downstream scripts — do the transformation as part of the consuming script's workflow
Do NOT create intermediate "fixer" scripts that modify upstream outputs after the fact — this breaks the reproducibility chain between scripts and their outputs.
When Uncertain
If you're not sure whether a file is safe to overwrite, ask:
- •"This file exists — was it generated by another script or provided externally?"
- •"Should I create a new file with a modified name, or is it safe to overwrite?"
Data Science Projects (with data/ and outs/ directories)
These additional rules apply in projects with project-type: data-science or that use the script-organization conventions (numbered scripts, data/ for inputs, outs/ for outputs).
Output Ownership
Each script writes only to its own output folder under outs/. Never write to another script's output folder.
# Script 03_plots.qmd writes ONLY to:
out_dir <- here("outs/03_plots")
# NEVER write to another script's folder:
# here("outs/01_analysis/modified_data.rds") # WRONG
data/ is Read-Only
Scripts never write to data/. This directory contains only external/immutable inputs (raw data, collaborator files, annotations). If your code produced it, it goes in outs/.
Example
# BAD: Creating a script to add Gene.short to another script's output
# This disconnects the file from the script that produced it
# GOOD: Downstream plotting script joins against lookup at runtime
limma_res <- readRDS(here("outs/01_analysis/limma_results_annotated.rds")) %>%
left_join(gene_name_lookup, by = "trinity_gene_id")