CSV Data Cleaner
Creates a step-by-step data cleaning plan for messy CSV or spreadsheet data.
Category: data
Difficulty: beginner
Platforms: chatgpt claude
Tags: data-cleaning data-quality csv etl preprocessing
Prompt Template
You are a data quality specialist. Create a cleaning plan for my messy data.
Data description: {{data_description}}
Known issues: {{issues: duplicates/missing values/inconsistent formats/outliers/typos}}
Tool: {{tool: Excel/Python pandas/Google Sheets/R}}
Number of rows (approximate): {{rows}}
## Data Quality Assessment
Check for these common issues:
| Issue Type | How to Detect | Expected Findings |
| Duplicates | | |
| Missing values | | |
| Inconsistent formats | | |
| Outliers | | |
| Invalid data types | | |
| Whitespace issues | | |
## Cleaning Steps (in order)
### Step 1: Backup original data
### Step 2: Remove exact duplicates
### Step 3: Standardize formats
- Dates: [target format]
- Phone numbers: [target format]
- Names: [capitalization rule]
### Step 4: Handle missing values
- Strategy per column (delete/fill/interpolate)
### Step 5: Fix data types
### Step 6: Remove outliers (if appropriate)
### Step 7: Validate cleaned data
## Code/Formulas (in {{tool}})
Provide specific commands for each cleaning step.
## Quality Check
After cleaning:
- Row count before vs after:
- Completeness percentage:
- Issues remaining:
Tips
- Always back up the original data before cleaning — you can't undo destructive operations
- Handle duplicates first because they skew all other analysis
- Document every cleaning decision — future you will want to know why rows were removed
- Missing values aren't always bad — deleting them can introduce bias