Sometimes data needs fixing before it can be used. Data Cleaning is the preparation of data that has been corrupted, is incorrect, or needs reformatting before loading into analytical software.
Depending on the type of cleaning that needs to be performed, there are various methods and tools used for data cleaning. One of the most common is spreadsheet software such as Excel.
Common points to consider when looking to clean your data include looking for:
Reformatting your data is one of the most common cleaning needs for a new data set. The most widely accepted data format is the American Standard Code for Information Interchange or ASCII and includes common file extensions of *.csv, *.txt., and *.dat.
Introductory Resource: Data Preparation & Descriptive Statistics
Examples:
Cleaning Data is often a cyclical event as data is used, reused, and added to. Below represents a common perception of the Data Cleaning Cycle.
Excel is not statistical software, but it is often the software of choice for data cleaning and inputting into statistical software.The library has many resources available for learning what capabilities excel has to offer for data cleaning.
There are a myriad of Data Cleaning Software choices available for purchase. It is always wise to consult with reviews or a trusted entity with experience with the software to make sure it applies to your application.
Opensource software is also available: