Video lecture notes

Videos 5 & 6

Confounds: Features of the same and how we use it that mislead us.

Four types of causal confounds

  1. The fork
    • X ← Z β†’ Y
    • common shared factors
    • Because Z causes X, Y; X and Y are correlated, and knowing X will allow us to tell more about Y (because Z is common between them). But X and Y not causally linked.
    • When stratified by Z, X and Y will be independent.
  2. The pipe
    • X β†’ Z β†’ Y
    • Z is the mediator. X and Y are associated but are not causally linked (but is indirectly linked through Z).
  3. The collider
    • X β†’ Z ← Y
    • X and Y are independent, but they both influence Z.
    • Weirdly, when stratified by Z, X and Y look dependent.
    • Colliders are cool! Basically the effect of selection bias causes some weird correlation. The stratification here is the act of selection (like people getting grants, jobs; location of good restaurants; attractiveness/skill of actors etc).
  4. The descendant
    • Example: X β†’ Z β†’ Y & Zβ†’ A (combination of pipe + descendant).
    • Other combinations include collider + descendant, fork + descendant.
    • How a descendant behaves depend on what it is attached to.
    • In the pipe example, stratifying by A sometimes can make X independent from Y. This depends on how strongly A is linked to Z.
    • Descendants are everywhere. Many measurements are proxies of what we want to measure. They are useful depending on how it is used, as it contains information about unobserved β€œlatent” variables.

Framework for causal inference between treatment and outcomes when there are confounds (U):

  • Randomization of variable (treatment) is the best way of removing confounds, but often we cannot do it. What do we do then?
  • In those cases, we need to do some statistical treatment that mimics experimental randomization.
  • If the DAG is known, stratifying the confound U will allow looking the the direct relationship between the treatment and the outcome.
  • Essentially, we are marginalization of a joint distribution along one axis. We are finding the distribution of Y when we change X, averaged over the distributions of the control variable U.
  • Do-calculus:
    • the framework that takes DAG, rules for finding P(Y|do(X)).
    • says what is possible before picking functions. also
    • justifies a graphical form of analysis.
    • (conservative) less powerful when there are no assumptions. (silver lining) helps making special assumptions showing that the assumptions makes things better.
  • Backdoor criterion
    • A theorem that comes from do-calculus.
    • Can be applied by identifying all paths, figuring out all backdoor paths to the treatment (X) and a minimal adjustment set that closes the backdoor paths.
    • backdoor criterion gives the minimum adjustment set, which is not always the best adjustment set.
    • <www.daggitty.net> β†’ can draw dags and it will tell adjustment set etc.
    • Stratifying by confounds essentially is just adding them in the regression equation. By stratifying them, we are essentially trying to find the slope of the variable of interest (X) for different values of the confounds. So in effect, it is stratification for continuous values.
      Example is stratified by for all values of , giving a distribution:
  • good and bad controls
    • bad controls - generally taught convention
      • anything on the spreadsheet
      • variables that are collinear
      • pre-treatment variables
    • bad controls open up backdoors and bring out statistically significant differences (correlations) when there are none.
    • Examples of bad controls:
      • stratifying by colliders or unobserved forks with colliders in the middle.
      • stratifying by the mediator (in pipes), which will remove all link between X and Y.
      • stratifying by dependents (adds/weakens association between the variables on interest).
  • Table 2 fallacy
    • doing a multiple linear regression on the data and removing confounds does not mean that all coefficients have meaning.
    • removing backdoors from the input to the output variable selectively and asymmetrically removes the arrows such that we can find out the total effect of the input (X) on the output (Y).
    • However, this means that the coefficient of the other variables only capture their partial effect on the output (Y), as their effect through the input (X) is removed by the backdoor criteria.
    • This is why table 2 does not make sense. Only the coefficients of X is meaningful, everything else is only a partial effect on Y.
    • Summary: not all coefficients are created equal. Some are not-interpretable.

Linked chapters: 5 & 6

Notes on linear regression:

  • Standardize the data (zero mean and standard deviation of 1). Gaussian priors with a mean of 0 and std of 0.5 are good for most cases as they will be centered but over the extremes. Exponential priors of 1 are good for stds.

General opinions:

  • Avoid being clever because it is unreliable and opaque. Better to be systematic and boring.
  • logical deduction about the confounds (and causal link between parameters).