AgentSkillsCN

education-data-source-crdc

深入参考公民权利数据收集(CRDC)——美国教育部面向所有公立学校的两年一度 OCR 调查。适用于分析学校纪律差距、课程公平性、骚扰数据、约束/隔离措施、长期缺勤,或任何公民权利教育指标时使用。涵盖法律框架、数据要素、质量问题与历史变迁。

SKILL.md
--- frontmatter
name: education-data-source-crdc
description: >-
  Deep reference for Civil Rights Data Collection (CRDC) - biennial OCR survey
  of all public schools. Use when analyzing school discipline disparities,
  course access equity, harassment data, restraint/seclusion, chronic
  absenteeism, or any civil rights education indicators. Covers legal
  framework, data elements, quality issues, and historical changes.
metadata:
  audience: data-analysts
  domain: education-data

CRDC Data Source Reference

The Civil Rights Data Collection is a mandatory biennial survey of all U.S. public schools measuring educational opportunity and civil rights compliance. It is the only national source for school-level discipline disparities, course access equity, harassment, and restraint/seclusion data disaggregated by race, sex, disability, and English learner status.

CRITICAL: Value Encoding

The Education Data Portal uses integer codes, not the string codes shown in OCR documentation. Always filter using integers.

VariableString Code (Raw)Portal Integer
Race: WhiteWH1
Race: BlackBL2
Race: HispanicHI3
Sex: MaleM1
Sex: FemaleF2

See ./references/variable-definitions.md for complete encoding tables.

What is CRDC?

The Civil Rights Data Collection is a mandatory biennial survey of all public schools and districts that measures educational opportunity and civil rights compliance:

  • Collector: U.S. Department of Education, Office for Civil Rights (OCR)
  • Purpose: Enforce civil rights laws, identify discrimination, monitor equity
  • Coverage: All public LEAs and schools receiving federal financial assistance
  • Frequency: Biennial (every 2 school years)
  • Disaggregation: Race/ethnicity, sex, disability status, English learner status
  • History: Collected since 1968 (as Elementary and Secondary School Civil Rights Survey)
  • Available years: 2011, 2013, 2015, 2017, 2020, 2021 (biennial — no data for even-numbered school years)

Reference File Structure

FilePurposeWhen to Read
civil-rights-context.mdLegal framework (Title VI, IX, Section 504, IDEA)Understanding why data is collected
data-elements.mdAll data categories and what's collectedPlanning analysis, identifying variables
collection-methodology.mdSampling, universe, timeline, reportingUnderstanding coverage limitations
variable-definitions.mdKey variables, codes, disaggregation categoriesCoding data, interpreting values
data-quality.mdKnown issues, suppression, state variationsAddressing limitations in analysis
historical-changes.mdEvolution across collection yearsTime series analysis, year comparison

Decision Trees

What CRDC data do I need?

code
Research topic?
├─ School discipline
│   ├─ Suspensions (ISS/OSS) → ./references/data-elements.md#discipline
│   ├─ Expulsions → ./references/data-elements.md#discipline
│   ├─ Referrals to law enforcement → ./references/data-elements.md#discipline
│   ├─ School-related arrests → ./references/data-elements.md#discipline
│   └─ Preschool suspensions → ./references/data-elements.md#discipline
├─ Restraint and seclusion
│   └─ Physical restraint, mechanical, seclusion → ./references/data-elements.md#restraint-seclusion
├─ Harassment and bullying
│   ├─ Allegations by type → ./references/data-elements.md#harassment
│   └─ Disciplined for harassment → ./references/data-elements.md#harassment
├─ Course access and enrollment
│   ├─ AP/IB courses → ./references/data-elements.md#advanced-courses
│   ├─ Gifted/talented → ./references/data-elements.md#gifted-talented
│   ├─ Math/science courses → ./references/data-elements.md#course-access
│   └─ Computer science → ./references/data-elements.md#course-access
├─ Chronic absenteeism
│   └─ Students missing 15+ days → ./references/data-elements.md#chronic-absenteeism
├─ Special populations
│   ├─ Students with disabilities (IDEA) → ./references/data-elements.md#students-with-disabilities
│   ├─ English learners → ./references/data-elements.md#english-learners
│   └─ Preschool enrollment → ./references/data-elements.md#preschool
├─ School staffing
│   ├─ Teacher experience/certification → ./references/data-elements.md#staffing
│   └─ Counselors, nurses, etc. → ./references/data-elements.md#staffing
└─ School safety
    └─ Offenses, violence, weapons → ./references/data-elements.md#school-offenses

Understanding the legal context?

code
Civil rights law question?
├─ Race/ethnicity discrimination → ./references/civil-rights-context.md#title-vi
├─ Sex/gender discrimination → ./references/civil-rights-context.md#title-ix
├─ Disability discrimination → ./references/civil-rights-context.md#section-504
├─ Special education services → ./references/civil-rights-context.md#idea
├─ Age discrimination → ./references/civil-rights-context.md#age-discrimination-act
└─ OCR enforcement process → ./references/civil-rights-context.md#ocr-enforcement

Data quality concerns?

code
Data quality issue?
├─ Missing or suppressed data → ./references/data-quality.md#suppression
├─ Definition inconsistencies → ./references/data-quality.md#definition-variation
├─ Year-to-year comparability → ./references/historical-changes.md
├─ COVID-19 impact (2020-21) → ./references/data-quality.md#covid-impact
├─ Underreporting concerns → ./references/data-quality.md#underreporting
└─ State-level variations → ./references/data-quality.md#state-variations

Quick Reference: CRDC Data Categories

Collection Years

School YearCollectionCoverageKey Notes
2011-12Sample~7,000 districtsFirst modern CRDC; sampled
2013-14Expanded~16,000 districtsLarger sample
2015-16Near-universe~96,000 schoolsFirst near-complete
2017-18Universe~96,000 schoolsFull universe collection
2020-21Universe~97,500 schoolsCOVID-impacted year
2021-22Universe~98,000 schoolsPost-pandemic baseline
2023-24UniverseIn progressCurrent collection

Critical: CRDC is biennial - no data for odd years (2012, 2014, 2016, 2018, 2019).

Data Categories

CategoryDescriptionDisaggregation
EnrollmentStudent counts by grade levelRace, sex, disability, LEP
DisciplineSuspensions, expulsions, arrestsRace, sex, disability, LEP
Restraint/SeclusionPhysical/mechanical restraint, seclusionRace, sex, disability
HarassmentAllegations and discipline by typeRace, sex, disability
Course AccessAP, IB, math, science, CS offeringsSchool-level, enrollment by race/sex
Chronic Absenteeism15+ days missedRace, sex, disability, LEP
StaffingTeachers, counselors, nurses, etc.FTE counts, qualifications
OffensesViolence, weapons, drugs at schoolType of offense
RetentionStudents retained in gradeRace, sex, disability

Key Identifiers

IDFormatLevelExampleNotes
crdc_id12-digit stringSchool010000201705Primary CRDC identifier; always present
ncessch12-digit stringSchool010000201705NCES school ID, joins to CCD; may be null for some entries
leaid7-digit stringDistrict0100002NCES district ID, joins to CCD; always present

Note: The OCR-internal combokey (e.g., AL-0010-00002) does NOT appear as a column in Portal data. Use crdc_id or ncessch for school-level identification.

WARNING: String Type Override Required. When reading CRDC data from CSV, ncessch, leaid, and crdc_id must be read as String (pl.Utf8) via schema_overrides. Polars infers these as Int64, silently destroying leading zeros for ~19% of rows (FIPS 01-09 states: AL, AK, AZ, AR, CA, CO, CT). Parquet files preserve types automatically.

Race/Ethnicity (Portal Integer Codes)

CodeCategory
1White
2Black or African American
3Hispanic/Latino of any race
4Asian
5American Indian or Alaska Native
6Native Hawaiian or Other Pacific Islander
7Two or more races
99Total

Empirically observed values: Codes 1-7 and 99 appear in CRDC data. Additional codes (8 Nonresident alien, 9 Unknown, 20 Other) are defined in the codebook but are not observed in practice for K-12 CRDC datasets. See variable-definitions.md for the full codebook listing.

Sex (Portal Integer Codes)

CodeCategory
1Male
2Female
3Non-binary/other (newer collections; rows exist but mostly contain -1 or -2 values)
99Total

Disability Status (Portal Integer Codes)

CodeCategory
0Students without disabilities
1Students with disabilities (served under IDEA)
2Students with Section 504 only
3Students not served under IDEA (includes 504-only and non-disabled)
4Students with disabilities (combined: IDEA + Section 504)
99Total

Note: Not all disability codes appear in every dataset. Enrollment data typically has [1, 2, 99]; discipline data has [0, 1, 2, 4, 99]. Verify codes against the live codebook for your specific dataset.

English Learner Status (Portal Integer Codes)

CodeCategory
1English learner (EL/LEP)
99All students

Missing Data Codes

CodeMeaningWhen Used
-1MissingData not reported by school/district
-2Not applicableItem doesn't apply to this entity
-3SuppressedData suppressed for privacy (small cell sizes)
-9Skip patternQuestion not asked in this collection year (rare; check codebook)
nullNot availableValue absent from dataset (e.g., ncessch is null for some schools)

Verify these codes against the live codebook for your specific dataset. Use get_codebook_url() from fetch-patterns.md.

Data Access

Datasets for CRDC are available via the mirror system. See datasets-reference.md for canonical paths, mirrors.yaml for mirror configuration, and fetch-patterns.md for fetch code patterns including fetch_from_mirrors() and fetch_yearly_from_mirrors().

Key datasets (6 of 22 total):

DatasetPathTypeCodebook
Disciplinecrdc/schools_crdc_discipline_k12_{year}Yearlycrdc/codebook_schools_crdc_discipline
AP/IB Enrollmentcrdc/schools_crdc_apib_enrollSinglecrdc/codebook_schools_crdc_ap-ib-enrollment
Enrollmentcrdc/schools_crdc_enrollment_k12_{year}Yearlycrdc/codebook_schools_crdc_enrollment
Chronic Absenteeismcrdc/schools_crdc_chronic_absenteeism_{year}Yearlycrdc/codebook_schools_crdc_chronic-absenteeism
Harassment/Bullyingcrdc/schools_crdc_harass_bully_students_{year}Yearlycrdc/codebook_schools_crdc_harrassment-bullying-students
Restraint/Seclusioncrdc/schools_crdc_restraint_seclusion_students_{year}Yearlycrdc/codebook_schools_crdc_restraint-seclusion-students

22 CRDC datasets exist total (6 yearly, 16 single-file). See datasets-reference.md for the complete list with all paths and codebook references.

CRDC naming note: Some data file paths use concatenated names (e.g., disciplineinstances, mathandscience) while their codebook counterparts use underscored names (e.g., discipline_instances, math_and_science). Always use the exact paths from datasets-reference.md.

Codebooks are .xls files co-located with data in all mirrors. Use get_codebook_url() from fetch-patterns.md to construct download URLs:

python
from fetch_patterns import get_codebook_url
url = get_codebook_url("crdc/codebook_schools_crdc_discipline")

Truth Hierarchy: When interpreting variable values, apply this priority:

  1. Actual data file (what you observe in the parquet/CSV) -- this IS the truth
  2. Live codebook (.xls in mirror) -- authoritative documentation, may lag
  3. This skill documentation -- convenient summary, may drift from codebook

If this documentation contradicts the codebook, trust the codebook. If the codebook contradicts observed data, trust the data and investigate.

Filtering

python
import polars as pl

# Filter to a single state (California) and disaggregated race groups
df = df.filter(
    (pl.col("fips") == 6) &       # California
    (pl.col("race") < 99)          # Exclude totals row
)

# Filter to specific demographic intersection
df = df.filter(
    (pl.col("race") == 2) &        # Black students
    (pl.col("sex") == 99) &         # Both sexes (total)
    (pl.col("disability") == 99)    # All disability statuses
)

Common Pitfalls

PitfallIssueSolution
Using string codesPortal uses integers, not stringsrace == 2 not race == "BL"
Raw countsDifferent enrollment sizesUse rates per 100/1000 students
Missing yearsAssuming annual dataRemember biennial schedule
COVID year2020-21 not comparableFlag or exclude from trends
SuppressionSmall cell suppressionCheck suppression rates first
Sample yearsEarly years sampledUse 2015+ for national estimates
Definition driftVariables change over timeCheck codebooks for each year
Forgetting code 99Including totals in calculationsFilter race < 99 for disaggregated analysis
CSV type inferencePolars infers ncessch/leaid/crdc_id as Int64Use schema_overrides={"ncessch": pl.Utf8, "leaid": pl.Utf8, "crdc_id": pl.Utf8}

Equity Analysis Framework

CRDC data is designed for civil rights analysis. Key analytical approaches:

Disparity Ratios

python
import polars as pl

# Calculate discipline disparity using Portal integer codes
def discipline_disparity(df, discipline_var, group_a, group_b):
    """
    Calculate risk ratio between two groups.
    Value > 1 indicates group_a has higher rate.

    Args:
        df: DataFrame with CRDC data
        discipline_var: Column with discipline counts
        group_a: Integer race code (e.g., 2 for Black)
        group_b: Integer race code (e.g., 1 for White)

    Example:
        # Black vs White OSS disparity
        disparity = discipline_disparity(df, 'students_susp_out_sch_single', 2, 1)
    """
    # Filter to each group (using integer codes)
    df_a = df.filter(pl.col('race') == group_a)
    df_b = df.filter(pl.col('race') == group_b)

    # Calculate rates
    rate_a = df_a.select(pl.col(discipline_var).sum()).item() / \
             df_a.select(pl.col('enrollment_crdc').sum()).item()
    rate_b = df_b.select(pl.col(discipline_var).sum()).item() / \
             df_b.select(pl.col('enrollment_crdc').sum()).item()

    return rate_a / rate_b

# Example: Black (race=2) vs White (race=1) disparity
# disparity = discipline_disparity(df, 'students_susp_out_sch_single', 2, 1)

Composition vs. Representation

  • Composition: What share of suspended students are Black?
  • Representation: Are Black students suspended at higher rates than enrollment share?

Risk Ratios

  • Compare discipline/outcome rates across groups
  • Adjust for school-level factors when appropriate

Related Data Sources

SourceRelationshipWhen to Use
education-data-source-ccdSchool/district characteristicsLinking CRDC to school demographics, locale, Title I status (join on ncessch or leaid)
education-data-source-edfactsAssessment outcomesComparing discipline patterns to academic outcomes
education-data-explorerParent discovery skillFinding available CRDC endpoints and variables
education-data-queryData fetchingDownloading CRDC parquet/CSV files from mirrors
education-data-contextGeneral interpretationEducation data interpretation and citation generation

Topic Index

TopicReference File
Title VI (race)./references/civil-rights-context.md
Title IX (sex)./references/civil-rights-context.md
Section 504 (disability)./references/civil-rights-context.md
IDEA./references/civil-rights-context.md
OCR enforcement./references/civil-rights-context.md
Discipline data./references/data-elements.md
Restraint/seclusion./references/data-elements.md
Harassment./references/data-elements.md
Course access./references/data-elements.md
AP/IB/Gifted./references/data-elements.md
Chronic absenteeism./references/data-elements.md
Staffing./references/data-elements.md
Preschool./references/data-elements.md
Sampling approach./references/collection-methodology.md
Collection timeline./references/collection-methodology.md
Variable codes./references/variable-definitions.md
Suppression rules./references/data-quality.md
COVID impact./references/data-quality.md
Year changes./references/historical-changes.md