Analysis of categorical variable biostatistics

Challenge

In this project, we wanted to study the probability of an individual having cancer given certain characteristics such as BMI, and smoking status, among others. Additionally, we also studied the odds ratio of each covariate.

Deliverable

We provided material explaining the entire theory of the logistic regression model, odds ratio, and significance analysis. We observed, for example, that people who smoke have a higher chance of developing the disease.

A bit more about the case

The database consisted of 10 variables and 57 observations (individuals). In addition to logistic regression, significance measures were evaluated for each variable.
With the chosen fit, we observed that individuals who smoke or individuals in remission have a higher chance of developing the disease early. For example, the chance of a person who smokes is 12 times higher than that of a non-smoker to have early development of the disease. It would be interesting to administer questionnaires to new individuals to increase the sample size and be able to fit more accurate models.

Ana Andrade