Importance of Data Validity

What is Valid Data?
Valid data is data that shows the results can be trusted and used in the future.  Validity works hand-in-hand with accuracy and reliability as reliability and accuracy make-up two essential pillars for data integrity. If results are unreliable and inaccurate then they cannot be valid.  Most people think that validity means that it is more-or-less similar to the expected results however this is not necessarily the case. In many cases, valid results have shown the exact opposite of what was originally thought.

What makes data invalid?
Having a preconception of you think should happen can be very dangerous and often damaging. If you start to change results to match what you think should be happening at the time then the results become invalid.  Using unreliable data sources makes the results invalid as data could have easily been modified or deleted.  Accuracy of the metrics being taken which involved the accuracy of the machines used to get the sufficient level of data required.  Not recording data with a sufficient degree of accuracy (i.e. using 1 instead of 1.2334) can easily result in loss of information and understand as there could be patterns and areas of interest hidden in the detail. Working with people who don’t record information properly. This is similar to the point above however relating to the people you are working with. If you are collaborating with other people then it is important that everyone works to the same granularity (i.e. everyone using years rather than quarters or using Kilometres rather than metres). This is exactly the same when using analytical tools such as Spotfire or Qlikview. Every data table has to be at the same granular level.

Making invalid data valid
Once data has been labelled “invalid” there is the general idea that the data cannot be used and has to be disposed of. This is not true as there are different ways to make best use of the “invalid” data. These are:

1. If the areas of invalidity are known then repeat the data collection for that bit using a better method.
2. Use the invalid data as a start point  for collecting more information
3. Break-up the data into chunks and then pick and choose which ones you want
4. Give some, or part of, the data to someone else. Just because it is not useful for you it doesn’t mean it can’t be useful for someone else.

Check out more about data and analytics on my website at

More information on Data accuracy can be found below.

Link 1 – 
Link 2 –
Link 3 –
Link 4 –
Link 5 –

Please follow and like us:

What’s the problem with Excel?

Most, if not every company, uses Microsoft Excel as a method for controlling data as it is an excellent tool for performing calculations, creating tables, drawing graphs and comparing data-sets. However, more and more companies are starting to actively move away from using Excel and starting to use specialist analytical software.

Why is this and what is the benefit?

The first reason is that the new software can do more complex calculations more easily and at a faster rate.  This means that time is saved on calculations involving large data-sets.

The second reason is that better and more interactive visualizations can be produced. In many analytical software’s you can breakdown graphs and charts by a metric and change the breakdown in a matter of seconds. You can also click into the graphs and charts to see what that bit of data consists of.  You can also easily create multiple types of visualizations on a single page to create a dashboard that can be very helpful.  This ease of using visualizations means that patterns and trends can be spotted easily and then used to maximize efficiency and reduce costs.

The third reason is probably the most important which is that Excel files start to run slowly and keep crashing when they are too big. When using specialist software the files can be much bigger and they don’t get slower.  This means that huge amounts of data can be stored and manipulated with the risk of data loss considerably reduced.


The reasons above are the main reasons why people are moving from Excel to more specialist software. There are other reasons however those are the main ones.

Check here for my home page for some more information on data.

Below are links a some other websites that also talk about the pros and cons of using Excel compared with other specialist analytical software.

Please follow and like us:

Reliable and Unreliable Data Sources

Reliable and Unreliable Data Sources

This is one of the most important aspects of data analysis which is why it is my first blog post.

Using an unreliable data source means that the accuracy of your data, and ultimately your results, will be compromised. You need to be very careful when choosing the correct data sources as even using the smallest amount of inaccurate data will magnify the inaccuracy when doing calculations with it.

Sites like Wikipedia can be edited by most users and therefore cannot be used as a trusted source.  The most reliable sources of data come from primary sources obtained by yourself or from accredited journals where everything is taken into account.  If you have to use other sources of information then the best sources to use are ones from higher educational establishments which get their data from a wide variety of sources.  It is most likely that in these sources, areas of uncertainty and inaccuracy will be stated.

In conclusion, it is very important to use accurate sources of data especially if it is for someone or a business. Try to use primary sources of data and remember that it is your neck on the line so make your work accurate and this should help you to gain a reputation as a good analyst.

Click here to access my homepage

For more information on data reliability, click on the external links below

Please follow and like us: