- Title
- On the quantification of statistical significance of the extent of association projected on the margins of 2x2 tables when only the aggregate data is available: a pseudo p-value approach applied to leukaemia relapse data
- Creator
- Cheema, S. A.; Beh, E. J.; Hudson, I. L.
- Relation
- 21st International Congress on Modelling and Simulation (MODSIM2015). Proceedings of the 21st International Congress on Modelling and Simulation (Broadbeach, Qld 29 November - 4 December, 2015) p. 1682-1688
- Relation
- http://www.mssanz.org.au/modsim2015/
- Publisher
- Modelling and Simulation Society of Australia and New Zealand (MODSIM)
- Resource Type
- conference paper
- Date
- 2015
- Description
- Aggregate data arises in situations where survey research or other means of collecting individual-level data are either infeasible or in efficient. The recent increasing use of aggregate data in the statistical and allied fields – including epidemiology, education and social sciences – has arisen due to number of reasons. These include the questionable reliability of estimates when sensitive information is required, the imposition of strict confidentiality policies on data by government and other organisational bodies and in some contexts it is impossible to collect the information that is needed. In this paper we present a novel approach to quantify the statistical significance of the extent of association that exists between two dichotomous variables when only the aggregate data is available. This is achieved by examining a newly developed index, called the aggregate association index (or the AAI), developed by Beh (2008 and 2010) which enumerates the overall extent of association about individuals that may exist at the aggregate level when individual level data is not available. The applicability of the technique is demonstrated by using leukaemia relapse data of Cave et al. (1998). This data is presented in the form of a contingency table that cross-classifies the follow up status of leukaemia relapse by whether cancer traces were found (or not) on the basis of polymerase child reaction (PCR) – a modern method used to detect cancerous cells in the body assumed superior than conventional for that period, microscopic identification. Assuming that the joint cell frequencies of this table are not available, and that the only available information is contained in the aggregate data, we first quantify the extent of association that exists between both variables by calculating the AAI. This index shows that the likelihood of association is high. As the AAI has been developed by exploiting Pearson’s chi-squared statistics, the AAI inherently suffers from the well-known large sample size effect that can overshadow the true nature of the association shown in the aggregate data of a given table. However, in this paper we show that the impact of sample size can be isolated by generating a pseudo population of 2x2 tables under the given sample size. Therefore, the focus of this paper is to present an approach to help answer the question “is this high AAI value statistically significant or not?” by using aggregate data only. The answer to this question lies we believe, in the calculation of the p-value of the nominated index. We shall present a new method of numerically quantifying the p-value of the AAI thereby gaining new insights into the statistical significance of the association between two dichotomous variables when only aggregate level information is available. The pseudo p-value approach suggested in this paper enhances the applicability of the AAI and thus can be considered a valuable addition to the literature of aggregate data analysis.
- Subject
- aggregate data; aggregate association index; pseudo p values; ecological inference; sample size
- Identifier
- http://hdl.handle.net/1959.13/1315636
- Identifier
- uon:22979
- Identifier
- ISBN:9780987214355
- Language
- eng
- Reviewed
- Hits: 683
- Visitors: 799
- Downloads: 0
Thumbnail | File | Description | Size | Format |
---|