In the previous section we learned how to process Bearable’s date field, and how to visualize some data points.

Today we will learn to look for correlations between factors.

Why are we looking for correlations?

Correlation for logical values help us understand when two factors often happen together. Such as is it often cold when it’s raining? Notice that we don’t know if it’s raining because it’s cold or it’s cold because it’s raining. It’s the often parroted “Correlation does not imply causation” logical fallacy.

Why are we looking for correlations in Bearable data then?

Nevertheless it’s still a useful activity to search for correlations in the Bearable factors: You have more information about your life, and can interpret the data as you please. We can find associations such as “Do I order junk food and watch TV?” or “Is it always raining when I take the dog for a walk?”.

Correlations in Bearable

We start by filtering for factors.

df <- data %>% filter(data$category == "Factors")

Let’s split the factors into the dataframe:

for (v in 1:nrow(df)) {
  factors <- str_split(df$detail[v], pattern="\ \\|\ ", simplify=TRUE)
  for (f in factors) {
    if (f == "|") {
      next
    }

    if (!(f %in% colnames(df))) {
      df[f] <- as.logical(c(FALSE)*nrow(df))
    }

    df[v, f] <- TRUE
  }
}

Now we have the factors for every date, however we probably have multiple dates with the same values, let’s filter them out:

df <- df %>% distinct(date, .keep_all = TRUE)

Drop the values we will not use:

drop <- c("detail", "notes", "rating.amount", "time.of.day", "category", "date", "weekday", "day")
df = df[,!(names(df) %in% drop)]

Let’s drop the factors with only one value. These could be factors you always added, therefor not interesting for us (but will make the visualization cleaner).

# Remove non-sensible values from cor table.
# Based on https://stackoverflow.com/questions/19113181/removing-na-in-correlation-matrix
zv <- apply(df, 2, function(x) length(unique(x)) == 1)
dfr <- df[, !zv]

Reading the correlations table

Let’s ask some questions: “Do I stay in when it’s snowing?”

> cor(df$Sedentary, df$Snowing)
[1] 0.7559289

Well it’s close to 1.0, so that’s promising, is it statistically significant though? Let’s try to build a linear model:

> summary(glm(df$Sedentary ~ df$Snowing))
Call:
glm(formula = df$Sedentary ~ df$Snowing)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.1429  -0.1429  -0.1429   0.0000   0.8571  

Coefficients:
               Estimate Std. Error t value Pr(>|t|)  
(Intercept)      0.1429     0.1323   1.080   0.3159  
df$SnowingTRUE   0.8571     0.2806   3.055   0.0185 *
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 0.122449)

    Null deviance: 2.00000  on 8  degrees of freedom
Residual deviance: 0.85714  on 7  degrees of freedom
AIC: 10.379

Number of Fisher Scoring iterations: 2

It’s in the 95% confidence interval (p < 0.05), so we can safely say it’s true. Let’s assume I’m not a wizzard and me staying inside doesn’t affect the weather, so we can be sure that I stay inside if it’s snowing. You get the idea.

Visualization

You can type cor(dfr) and get the correlation between the factors, however chances are that it’s huge and very hard to understand.

Let’s visualize the correlations. There are many projects to do that, I like corrplot. You can install it using package.install("corrplot").

library(corrplot)

corrplot(cor(dfr),      # Correlation matrix
         method = "square", # Correlation plot method
         type = "full",    # Correlation plot style (also "upper" and "lower")
         diag = FALSE,      # If TRUE (default), adds the diagonal
         tl.col = "black", # Labels color
         bg = "white",     # Background color
         title = "",       # Main title
         col = NULL)

Factors (The data is generated, for privacy reasons.)

Adding some more data

Factors are not the only data we can plot this way, you can also change the first line and filter for supplements as well:

df <- data %>% filter(data$category == "Factors" | data$category == "Meds/Supplements")

In the next part we will learn how to visualize factors based on your mood.