Importance of Data Validity

What is Valid Data?
Valid data is data that shows the results can be trusted and used in the future.  Validity works hand-in-hand with accuracy and reliability as reliability and accuracy make-up two essential pillars for data integrity. If results are unreliable and inaccurate then they cannot be valid.  Most people think that validity means that it is more-or-less similar to the expected results however this is not necessarily the case. In many cases, valid results have shown the exact opposite of what was originally thought.

What makes data invalid?
Having a preconception of you think should happen can be very dangerous and often damaging. If you start to change results to match what you think should be happening at the time then the results become invalid.  Using unreliable data sources makes the results invalid as data could have easily been modified or deleted.  Accuracy of the metrics being taken which involved the accuracy of the machines used to get the sufficient level of data required.  Not recording data with a sufficient degree of accuracy (i.e. using 1 instead of 1.2334) can easily result in loss of information and understand as there could be patterns and areas of interest hidden in the detail. Working with people who don’t record information properly. This is similar to the point above however relating to the people you are working with. If you are collaborating with other people then it is important that everyone works to the same granularity (i.e. everyone using years rather than quarters or using Kilometres rather than metres). This is exactly the same when using analytical tools such as Spotfire or Qlikview. Every data table has to be at the same granular level.

Making invalid data valid
Once data has been labelled “invalid” there is the general idea that the data cannot be used and has to be disposed of. This is not true as there are different ways to make best use of the “invalid” data. These are:

1. If the areas of invalidity are known then repeat the data collection for that bit using a better method.
2. Use the invalid data as a start point  for collecting more information
3. Break-up the data into chunks and then pick and choose which ones you want
4. Give some, or part of, the data to someone else. Just because it is not useful for you it doesn’t mean it can’t be useful for someone else.

Check out more about data and analytics on my website at

More information on Data accuracy can be found below.

Link 1 – 
Link 2 –
Link 3 –
Link 4 –
Link 5 –

Please follow and like us:

Reliable and Unreliable Data Sources

Reliable and Unreliable Data Sources

This is one of the most important aspects of data analysis which is why it is my first blog post.

Using an unreliable data source means that the accuracy of your data, and ultimately your results, will be compromised. You need to be very careful when choosing the correct data sources as even using the smallest amount of inaccurate data will magnify the inaccuracy when doing calculations with it.

Sites like Wikipedia can be edited by most users and therefore cannot be used as a trusted source.  The most reliable sources of data come from primary sources obtained by yourself or from accredited journals where everything is taken into account.  If you have to use other sources of information then the best sources to use are ones from higher educational establishments which get their data from a wide variety of sources.  It is most likely that in these sources, areas of uncertainty and inaccuracy will be stated.

In conclusion, it is very important to use accurate sources of data especially if it is for someone or a business. Try to use primary sources of data and remember that it is your neck on the line so make your work accurate and this should help you to gain a reputation as a good analyst.

Click here to access my homepage

For more information on data reliability, click on the external links below

Please follow and like us: