Abstract
Let’s start with an example. A random sample of 400 persons included 240 smokers and 160 non-smokers. Of the smokers, 192 had Coronary Heart Disease (CHD), while only 32 non-smokers had CHD. Could a health insurance company claim the proportion of smokers having CHD differs from the proportion of non-smokers having CHD? This is a typical hypothesis testing problem. In general, there are 6 steps for performing hypothesis testing. Step 1: define the null hypothesis (H0 ). Step 2: define the alternative hypothesis (Ha ). Step 3: define the type I error (α) and sample size (n). Step 4: define a statistic and the rejection region. Step 5: calculate the statistic using the sample data. Step 6: state the conclusion (reject H0 or not). For the above example, let us assume P1 represents the true proportion of smokers having CHD and P2 is the true proportion of non-smokers having CHD. T hen, Step 1: forming the null hypothesis H0 : P1 = P2 . Step 2: forming the alternative hypothesis Ha : P1 ≠ P2 . Step 3: we select α = .05 and we know n = 400. Step 4: for comparing the difference in two proportions we choose statistic z = (p1 – p2 )/sqrt (p(1 – p)*(1/n1 + 1/n2 )), where p1 = sample proportion of smokers having CHD = x1 /n1 = 192/240 = .80, p2 = sample proportion of non-smokers having CHD = x2 /n2 = 32/160 = .20, p = overall sample proportion of total subjects (i.e., both smokers and non-smokers) having CHD = (x1 + x2 )/(n1 + n2 ) = (192 +32)/(240 + 160) = 224/400 = 0.56 and “sqrt” in the statistic z formula denotes taking the square root. Therefore, in Step 5 we calculate our statistic z = (.80 - .20)/sqrt ((.56) (1 - .56)*((1/240 + 1/160))) = .60/.05066 = 11.84. Since 11.84 exceeds the rejection region value of 1.96, in Step 6 we reject H0 and conclude that smokers had significantly higher proportion of CHD than that of non-smokers (P-value < .0000001).
Citation
Chyou PH. What is “P-value” and How to get it?. SM J Biometrics Biostat. 2017; 2(3): 1016.