Descriptive statistic vs Inferential statistic — Part I
The difference between descriptive statistic & Inferential statistic
Imagine you’re working in a farm and that farm have thousands of sheep. Now, your boss asks you what is the average weight of all his sheep?
What’s your answer to his question?
Well, your answer might be different on the condition that you have. That condition will lead you whether to use descriptive statistic or inferential statistic.
Alright, let’s start with the most foundational concepts first. The first foundational concept that you need to know is population. Population is a group of anything that we’re interested in. So, what are you interested in the case of your boss question. It’s, all the sheep or thousands of sheep in that farm. It’s your population. You’re interested in all the sheep in that farm in this case. Okay? Cool…
So, now let’s look into the second foundational concept which is sample. Yes, sample is not something that we’re interested in. It’s something we know (Our data). In the case of your boss’ question, sample could be the sheep that laying down on the grass or all the sheep that’s eating the grass or something else. But you got the idea, right?
Sample could be something that we’re interested in. That’s where we use descriptive statistic. Where our sample is equal to our population. But, will you go to each of your farm’s sheep and count their weight? Do you have time to do that? Of course not. That’s also what happen in real world cases. Most of the time you won’t have enough time or resources to collect data of your population. That’s why you need sampling (what we know) from you population (what we wish we know), you need sample. And when you do sampling you deal with uncertainty. In this situation inferential statistic comes in.
To explain more about descriptive statistic, supposed you willingly and correctly have collected all sheep’s weight in the farm. All you have to do left to answer your boss’ question is to count the average of your collected data. Yea, that’s descriptive statistic. But, this case is costly and requires a lot of time while your boss need the answer as soon as possible.
Then you remember that yesterday you saw one of your farm’s sheep stand on a digital weight scales and the monitor shows number 65,8 Kg. Aha, then you said maybe the average weight of all your farm’s sheep is 65,8 Kg? Then you take more sheep (randomly — random sampling) onto the weight scales and record weight of each that you put onto it. And you get the average weight of your sample is 63,3 Kg.
Still, you’re not really sure that 63,3 Kg is the correct average of all your farm’s sheep’s weight. But, this is your best guess so far. And any other guess might be worse that this one.
So, why not take 63,3 Kg and tell your boss that the average weight of all sheep in the farm is 63,3 Kg?
Yes, this 63,3 Kg (Null hypothesis) is your best guess. But, is it ‘good enough’? Maybe it’s not good enough. That’s why you need to do something more, to collect more evidence (more data) maybe. Because your so far best guess might be not good enough to change your mind (Alternative hypothesis).
There are three kind of lies: lies, damned lies, statistic.
— Benjamin Disraeli