Raw Data to Clean Dataset
Two-step chain: assess data quality issues, then execute a cleaning and transformation plan.
Category: data
Difficulty: beginner
Platforms: chatgpt claude
Tags: data-cleaning data-quality assessment preprocessing chain
Prompt Template
You are a data quality analyst. Assess the quality of this dataset.
Dataset description: {{dataset}}
Columns/fields: {{columns}}
Sample data or issues noticed: {{sample}}
Intended use: {{intended_use}}
## Data Quality Assessment
### Completeness
| Column | Missing Count | Missing % | Impact on Analysis |
### Accuracy
- Data type mismatches found:
- Invalid values found:
- Range violations:
### Consistency
- Duplicate rows:
- Conflicting records:
- Format inconsistencies (dates, names, codes):
### Timeliness
- Data freshness:
- Stale records:
## Quality Score
| Dimension | Score (1-10) | Critical Issues |
| Completeness | | |
| Accuracy | | |
| Consistency | | |
| Overall | | |
## Cleaning Priority List
| Priority | Issue | Affected Rows | Recommended Action | Complexity |
Tips
- Assess before you clean — you might find the data is fine for your purpose
- Focus on columns critical to your analysis first
- Keep a log of every transformation for reproducibility
- The 'intended use' determines how strict your quality standards need to be