Posts

Showing posts from August, 2023

Analyzing Categorical Data with Contingency Tables

Image
The previous post discussed the concept of missing categorical data in datasets and highlighted its importance. The types of missing data, Missing Completely at Random (MCAR) and Missing at Random (MAR), were explained. Different methods of handling missing data were presented for MCAR and MAR scenarios, including mode imputation, creating an "Unknown" category, conditional imputation, hot deck imputation, multiple imputation, logistic regression imputation, propensity score matching, cluster analysis, and using specialized software. The validation of imputation methods and the role of domain knowledge were emphasized. Additionally, the use of chi-square analyses to assess whether missingness is Missing at Random (MAR) was covered. The process of performing a chi-squared test and interpreting its results was explained, including the comparison of observed and expected frequencies in a contingency table and the consideration of the p-value. The application of the chi-squared t...

Categorical Data Analysis

Image
  Module 2: Categorical Data Analysis Our first series provided an overview of categorical variables, distinguishing them from numerical data. It emphasized the unique analytical and statistical considerations for categorical variables due to differences in measurement scales, analysis techniques, and data encoding. It highlighted two categorical variable types: nominal (no inherent order) and ordinal (with inherent order). Subtypes like binary, multi-category, hierarchical, and label-set nominal variables were explored, along with equidistant/unequally spaced ordinal categories. Unequal intervals in ordinal categories necessitate qualitative comparison and non-parametric tests for analysis. In today's episode, we will look at how to deal with missingness in categorical data. Missing Categorical Data Missing data refers to the absence or lack of values for one or more variables in a dataset. It can occur for reasons such as non-response in a survey, data entry errors, or system iss...

Statistics for data science - categorical data

Image
  INTRODUCTION TO CATEGORICAL VARIABLES Categorical variables, also known as qualitative variables, are a type of data that represents distinct categories or groups. Unlike numerical variables with measurable quantities, categorical variables consist of labels or attributes that describe characteristics or qualities of the data. Below are a few examples: ยท        Understanding the nature of categorical data Understanding the nature of categorical data is important because it impacts the types of analysis and statistical techniques that can be applied to the data. Categorical variables require different approaches compared to numerical variables. The reasons for this are: Analysis Techniques : Categorical variables cannot be treated the same way as numerical variables regarding mathematical operations. Numerical variables allow for mathematical computations such as addition, subtraction, and averaging. In contrast, categorical ...