What is Valid Data?
Valid data is data that shows the results can be trusted and used in the future. Validity works hand-in-hand with accuracy and reliability as reliability and accuracy make-up two essential pillars for data integrity. If results are unreliable and inaccurate then they cannot be valid. Most people think that validity means that it is more-or-less similar to the expected results however this is not necessarily the case. In many cases, valid results have shown the exact opposite of what was originally thought.
What makes data invalid?
Having a preconception of you think should happen can be very dangerous and often damaging. If you start to change results to match what you think should be happening at the time then the results become invalid. Using unreliable data sources makes the results invalid as data could have easily been modified or deleted. Accuracy of the metrics being taken which involved the accuracy of the machines used to get the sufficient level of data required. Not recording data with a sufficient degree of accuracy (i.e. using 1 instead of 1.2334) can easily result in loss of information and understand as there could be patterns and areas of interest hidden in the detail. Working with people who don’t record information properly. This is similar to the point above however relating to the people you are working with. If you are collaborating with other people then it is important that everyone works to the same granularity (i.e. everyone using years rather than quarters or using Kilometres rather than metres). This is exactly the same when using analytical tools such as Spotfire or Qlikview. Every data table has to be at the same granular level.
Making invalid data valid
Once data has been labelled “invalid” there is the general idea that the data cannot be used and has to be disposed of. This is not true as there are different ways to make best use of the “invalid” data. These are:
1. If the areas of invalidity are known then repeat the data collection for that bit using a better method.
2. Use the invalid data as a start point for collecting more information
3. Break-up the data into chunks and then pick and choose which ones you want
4. Give some, or part of, the data to someone else. Just because it is not useful for you it doesn’t mean it can’t be useful for someone else.
Check out more about data and analytics on my website at www.analystrising.com
More information on Data accuracy can be found below.