Skip to Main Content

Data: Clean

Fundamentals of Data Cleaning

Sometimes data needs fixing before it can be used. Data Cleaning is the preparation of data that has been corrupted, is incorrect, or needs reformatting before loading into analytical software.

Depending on the type of cleaning that needs to be performed, there are various methods and tools used for data cleaning. One of the most common is spreadsheet software such as Excel.

Common points to consider when looking to clean your data include looking for:

  • Outliers
  • Missing Data
  • Malicious Data
  • Erroneous Data
  • Irrelevant Data
  • Formatting

Formatting

Reformatting your data is one of the most common cleaning needs for a new data set. The most widely accepted data format is the American Standard Code for Information Interchange or ASCII and includes common file extensions of  *.csv, *.txt., and *.dat.

Introductory Resource: Data Preparation & Descriptive Statistics

Examples:

Data Cleaning Cycle

Cleaning Data is often a cyclical event as data is used, reused, and added to. Below represents a common perception of the Data Cleaning Cycle.data cleaning cycle

iteratorshq.com

Data Cleaning Software

Excel is not statistical software, but it is often the software of choice for data cleaning and inputting into statistical software.The library has many resources available for learning what capabilities excel has to offer for data cleaning.

There are a myriad of Data Cleaning Software choices available for purchase. It is always wise to consult with reviews or a trusted entity with experience with the software to make sure it applies to your application.

Opensource software is also available:

Library Books