- Title
- Power transformations for reciprocal averaging and related methods
- Creator
- Wang, Ting-Wu
- Relation
- University of Newcastle Research Higher Degree Thesis
- Resource Type
- thesis
- Date
- 2024
- Description
- Research Doctorate - Doctor of Philosophy (PhD)
- Description
- Contingency tables have been widely used in the statistics literature, with a plethora of robust tools now available for exploring the association between two or more categorical variables. A prime example of this is Pearson’s chi-squared test of independence, a very popular technique for determining whether there is a statistically significant association between the variables under investigation. When a statistically significant association is found, the relative distribution of the cell counts in a contingency table can be utilised to further explore the nature of this association through a weighted average of these relative cell counts. This is a fundamental principle of the method of reciprocal averaging (RA) which, classically, yields a one-dimensional set of row and column scores and their maxi-mum correlation. In this thesis, RA is shown to be linked to canonical correlation analysis (CCA) demonstrating that RA and CCA explore the associations that exist between the categorical variables by maximising the association between them. The exploration of these techniques in this thesis is not limited to the one-dimensional solution but, by using singular value decomposition, also delves into the multi-dimensional solution that captures all of the association that exists between the variables. This thesis contributes to the RA and, more generally, contingency table literature by incorporating a power transformation, δ, into RA and explore its links to CCA. Building on the historical development of RA and CCA, this research highlights the benefits and the roles of power transformations to these methods. For example, for nearly 100 years, power transformations have been used to address the following three interrelated issues: Stabilising the variance of, say, a Poisson random variable. Historically, this is achieved by transforming the data in the contingency table so that the variance is made constant; Minimising the presence of over-dispersion that is known to exist in all contingency tables, and; Normalising the cell counts in the contingency table which are typically treated as Poisson random variables. While power transformations have a long history of technical development and application in the statistics and its allied literature, their impact on the numerical features obtained using RA has not been investigated. This thesis addresses this gap by exploring two distinct yet straightforward types of power transformations. The first type of power transformation considered in this thesis is known as the “power family 2” transformation and involves a chosen δ applied to the relative distribution of the cell counts for each row and column category. Examining these relative counts standardises the data, revealing additional details about the structure of the association that exists between the variables. The second, and novel, type of power transformation involves directly applying the power, δ, to the cell counts of a contingency table. Unlike the “power-family 2” transformation, this second power transformation ensures that the hypothesis of complete independence is correctly defined for all values of δ. Once these transformations are applied, RA can then be used to determine the row and column scores for each category and, in doing so, obtains the maximum correlation between these sets of scores. New mathematical derivations and theoretical foundations of RA are presented when incorporating the two power transformations. This thesis also examines the impact of incorporating these transformations into RA on the scores and their correlation by exploring these features through their application to real datasets. Practitioners seeking to select an optimal value of δ will benefit from the three criteria discussed in this research. These criteria ensure that the chosen δ either retains the global maximum correlation between the row and column scores for a given range of δ values, maintains statistical significance where one exists or minimises the presence of over-dispersion in the contingency table. These criteria are validated through simulations and applications to real-world datasets, serving as a guide for analysts in determining the optimal value of δ. Furthermore, this thesis includes a range of computational functions written in the R programming environment to support these analyses and outlines potential future research directions based on the new contributions presented throughout.
- Subject
- reciprocal averaging; power transformation; canonical correlation analysis; maximising association
- Identifier
- http://hdl.handle.net/1959.13/1512501
- Identifier
- uon:56625
- Rights
- Copyright 2024 Ting-Wu Wang
- Language
- eng
- Full Text
- Hits: 48
- Visitors: 65
- Downloads: 17
Thumbnail | File | Description | Size | Format | |||
---|---|---|---|---|---|---|---|
View Details Download | ATTACHMENT01 | Thesis | 2 MB | Adobe Acrobat PDF | View Details Download | ||
View Details Download | ATTACHMENT02 | Abstract | 281 KB | Adobe Acrobat PDF | View Details Download |