Linear regression for micro-entrepreneurs

Challenge

In this project, we wanted to study, through a linear regression model, the relationship between the revenue of a specific period for Individual Microentrepreneurs (MEI) and their characteristics such as age, main activities, and occupation.

Deliverable

We delivered material that not only explained the entire theory of a linear regression model but also examined the relevance of each explanatory variable. Additionally, an analysis of variance (ANOVA) was conducted. To perform this work, we used the R software.

A bit more about the case

The database consisted of 964 Individual Microentrepreneurs (MEIs) and 76 variables. After modeling, we observed through the T-test that many variables are considered irrelevant. Among the 76 variables, only 5 showed significance. Using the ANOVA table, we concluded that only the municipality and experience influence the annual revenue. Another measure of fit quality is the coefficient of determination, also known as R^2. This measure ranges from 0 to 1; the closer to 1, the more of Y's variation is explained linearly by the independent variables Xi. In this fit, the R^2 was 0.09437, meaning the model didn't explain the annual revenue very well. So, we concluded that these variables are not very effective in explaining a company's revenue.

Ana Andrade