Importance of weak priors when updating beliefs

Related resource notes: Statistical Rethinking

Acknowledgements: Tanay Biradar and Shivansh Dave for feedback and discussion!

Disclaimer: This is an idea that I have heard many times before, especially from a Bayesian inference point of view. I am just re-writing this idea in my own words to better incorporate it.

Summary

Our prior beliefs strongly affect how well new data/evidence is incorporated into our beliefs. If we have strong prior beliefs, it is really hard to actually change our beliefs based on evidence. In this perspective, having a weak belief is not just important to get a better sense of the world based on evidence/facts; it also makes us much more adaptable to changes in the world.

Experiments

How to test ideas about “prior beliefs” and how it influences our learning based on evidence? I used ideas from the book Statistical Rethinking - in the book, the author describes really nice experiment involving a globe and uses it to illustrate bayesian concepts.¹

The experiment is simple - the goal is to find the proportion of water on earth (assuming we don’t know the answer). We start with a prior belief of what we think the proportion is. Then, we toss a globe into the air and catch it. We then look at the top most point of the globe and see if is land or water, and mark the toss as “L” or “W”. We repeat this multiple times and from it, look at what is predicted from our prior belief of the proportion and update it based on the new evidence (refer to this note if you want to jump in deeper).

Jargon used:

prior or prior probability: prior belief.
likelihood: probability/likelihood of the evidence based on prior belief.
posterior or posterior probability: updated prior belief based on evidence.

Experiment 1: Effect of weak vs strong prior

The experiment is simple - I toss the globe 1000 times and get a dataset. Then, using a weak and a strong prior, I see how much our estimate of proportion (posterior distribution) changes based on the dataset.

The weak prior I am using is a gaussian distribution centered around 0.4, but has a large standard deviation. That is, I start by saying that I think the proportion of water on earth is around 40% but I am not really certain/confident about it. Based on the dataset, I update my prior beliefs.

Fig 1: Left: weak (gaussian) prior ; Right: Updated beliefs in 4 steps (250 tosses each)

Fig 2: Left: strong (gaussian) prior centered around 0.4 ; Right: Updated beliefs in 4 steps (250 tosses each)

Code

## simulate globe toss
simulate_globe_toss <- function (p=0.7, N=10){
  sample( c("W","L"), size=N, prob = c(p, 1-p), replace=TRUE )
}
 
## (Fig. 1) Simulate bayesian update using a weak (uniform/gaussian) prior
data = replicate(simulate_globe_toss(N=4),n=250)
 
# prior updating using grid approximation
grid_size <- 110
p_grid <- seq( from=0 , to=1 , length.out=grid_size )
#unstd.prior <- rep( 1 , grid_size )                 # uniform prior
#prior <- unstd.prior / sum(unstd.prior)
unstd.prior <- dnorm(p_grid, mean=0.4, sd=0.05)      # gaussian prior
prior <- unstd.prior / sum(unstd.prior)
 
# plot prior
par(mfrow = c(1, 2))
plot( p_grid , prior , type="l" , col = 1, ylim = range(c(0,1)),
      xlab="proportion of water" , ylab="prior probability")
 
# updates
for (i in 1:4){
  likelihood <- dbinom( sum(data[i,] == "W") , size=length(data[i,]) , prob=p_grid )
  unstd.posterior <- likelihood * prior
  posterior <- unstd.posterior / sum(unstd.posterior)
  prior <- posterior
  if (i==1){
    plot( p_grid , prior , type="l" , col = 1, ylim = range(c(0,0.25)),
          xlab="proportion of water" , ylab="posterior probability")
  } else{
    lines( p_grid , posterior , col = i)
  }
 
}
 
## (Fig. 2) Simulate bayesian update using a strong prior
unstd.prior <- dnorm(p_grid, mean=0.4, sd=0.01)  
prior <- unstd.prior / sum(unstd.prior)
 
# plot prior
par(mfrow = c(1, 2))
plot( p_grid , prior , type="l" , col = 1, ylim = range(c(0,1)),
      xlab="proportion of water" , ylab="prior probability")
 
# updates
for (i in 1:4){
  likelihood <- dbinom( sum(data[i,] == "W") , size=length(data[i,]) , prob=p_grid )
  unstd.posterior <- likelihood * prior
  posterior <- unstd.posterior / sum(unstd.posterior)
  prior <- posterior
  if (i==1){
    plot( p_grid , prior , type="l" , col = 1, ylim = range(c(0,0.5)),
          xlab="proportion of water" , ylab="posterior probability")
  } else{
    lines( p_grid , posterior , col = i)
  }
  
}

With a weak prior, on every update (250 tosses), our beliefs (posterior distribution) get stronger around a proportion based on data.

Now, if I start with a strong prior belief that the proportion of water is 0.4, then even 1000 tosses are not sufficient to move my beliefs by a lot. Our belief (posterior probability) remains close to 0.4 and is still strong (hasn’t gotten broader, indicating uncertainty).

So, with strong priors, one needs substantially more data to change prior beliefs. How much data do strong priors require, and how does it scale?

Experiment 2: Updating a weak and a strong prior

Before looking more into the data requirements for strong vs weak prior, I first come up with some definitions for strong and weak prior.

I define the confidence of the prior as the area under the curve for $m e an \pm 0.05$ .² If I set my initial mean to be 0.4 (as experiment 1), my confidence will be area under the curve between 0.35 and 0.45. The sharper the peak around my mean, the higher the confidence.

Fig 3: Defining confidence of prior beliefs using area under the curve around the mean.

Code

## (Fig 3.) plot prior distributions
# roughly 25, 50 , 75, 90, 95, 99
ind = c(9, 12, 15, 16, 17, 21)
par(mfrow=c(2,3))
 
for (j in ind){
  plot( p_grid , prior_database[j,] , type="l" , col = 1, ylim = range(c(0,0.5)),
        xlab="proportion of water" , ylab="posterior probability")
  mtext(paste("Confidence % =", round(prior_effect$confidence_percent[j],2)*100) )
}

This is a made up measure but since area under the curve is maximally 1 for probability density, this should also reach 100% as the gaussian gets narrower and narrower, so it sort of makes sense (as seen in the figure above).

Fig 4: Updating a weak prior belief with data.

Code

## (Fig. 4) weak prior update with increased data
weak_prior_database = matrix(nrow=12, ncol=length(p_grid))
 
# setup prior
unstd.prior <- dnorm(p_grid, mean=0.4, sd=0.05)  
prior <- unstd.prior / sum(unstd.prior)
weak_prior_database[1,] = prior
 
# setup measurement
data_count = 0
condition = FALSE
 
for(i in 1:11){
  
  # new data
  toss = simulate_globe_toss(N=1000)
  
  # update beliefs
  likelihood <- dbinom( sum(toss == "W") , size=length(toss) , prob=p_grid )
  unstd.posterior <- likelihood * prior
  posterior <- unstd.posterior / sum(unstd.posterior)
  prior <- posterior
  
  weak_prior_database[i+1,] = prior
}
 
par(mfrow=c(3,4))
 
for (j in 1:12){
  plot( p_grid , weak_prior_database[j,] , type="l" , col = 1, ylim = range(c(0,0.75)),
        xlab="proportion of water" , ylab="posterior probability")
  mtext(paste("Tosses used =", (j-1)*1000) )
}

Next, I looked at how the prior beliefs are updated with fresh data. Turns out, the weak priors and strong priors are updated in starkly different manners. With weak priors, a new peak emerges at the correct location and the old peak disappears (Fig 4, above). With strong priors, the peak never disappears - it just gets pushed towards the correct location (Fig 5, below).

Fig 5: Updating a strong prior belief with data.

Code

## (Fig. 5) Strong prior update
strong_prior_database = matrix(nrow=12, ncol=length(p_grid))
 
# setup prior
unstd.prior <- dnorm(p_grid, mean=0.4, sd=0.01)  
prior <- unstd.prior / sum(unstd.prior)
strong_prior_database[1,] = prior
 
# setup measurement
data_count = 0
condition = FALSE
 
for(i in 1:11){
  
  # new data
  toss = simulate_globe_toss(N=1000)
  
  # update beliefs
  likelihood <- dbinom( sum(toss == "W") , size=length(toss) , prob=p_grid )
  unstd.posterior <- likelihood * prior
  posterior <- unstd.posterior / sum(unstd.posterior)
  prior <- posterior
  
  strong_prior_database[i+1,] = prior
}
 
par(mfrow=c(3,4))
 
for (j in 1:12){
  plot( p_grid , strong_prior_database[j,] , type="l" , col = 1, ylim = range(c(0,0.75)),
        xlab="proportion of water" , ylab="posterior probability")
  mtext(paste("Tosses used =", (j-1)*1000) )
}

The reason for this difference³ is that updating beliefs involves multiplying the evidence with prior belief. If the prior belief is too strong and is almost zero everywhere (like in the previous figure), then multiplication with the evidence does not allow for a new peak to be formed. The only route for updating is slowly moving this narrow peak towards a location that is pointed by evidence. And this is a really slow process.

So how slow is the process?

Experiment 3: Evidence requirement for strong priors

To show this, I use the same definition as above to decide that the belief is in the “correct” location. I count the data required to update the belief (posterior probability) to have an area of 99% around $0.7 \pm 0.05$ . This is a crude definition as before,⁴ but sufficient enough to give some insights.

Plotting the number of tosses required given the confidence of the prior gives us this steep looking curve:

Fig 6: Scaling of data requirement with strength of priors to obtain “correct” beliefs.

Code

## (Fig. 6) Scaling the strength of prior
# setting p_grid again
grid_size <- 110
p_grid <- seq( from=0 , to=1 , length.out=grid_size )
 
# get range of sd for gaussian
confidence_scale = 10^(seq(0, -2, by= -0.1))
prior_effect = as.data.frame(confidence_scale)
prior_effect$confidence_percent = rep(NA,length(confidence_scale))
prior_effect$data_required = rep(NA,length(confidence_scale))
 
# setup databases for prior and posterior
prior_database = matrix(nrow=length(confidence_scale), ncol=length(p_grid))
posterior_database = matrix(nrow=length(confidence_scale), ncol=length(p_grid))
 
for (i in 1:length(confidence_scale)){
  
  # setup prior
  unstd.prior <- dnorm(p_grid, mean=0.4, sd=confidence_scale[i])  
  prior <- unstd.prior / sum(unstd.prior)
  prior_database[i,] = prior
  
  # get confidence percentage
  prior_effect$confidence_percent[i] = sum(prior[p_grid>0.35 & p_grid<0.45])
  
  # setup measurement
  data_count = 0
  condition = FALSE
  
  while(!condition){
    
    # new data
    toss = simulate_globe_toss(N=10)
    
    # update beliefs
    likelihood <- dbinom( sum(toss == "W") , size=length(toss) , prob=p_grid )
    unstd.posterior <- likelihood * prior
    posterior <- unstd.posterior / sum(unstd.posterior)
    prior <- posterior
    
    # add data_count
    data_count = data_count + 10
    
    # check condition
    condition = sum(prior[p_grid>0.65 & p_grid<0.75]) > 0.89
  }
  
  posterior_database[i,] = posterior
  prior_effect$data_required[i] = data_count
 
}
 
plot(prior_effect$confidence_percent*100, prior_effect$data_required,
     ylab="Tosses required" , xlab="Confidence %")

This clearly shows that the stronger the prior belief, the more data we need to update. The growth is quite fast indicating that very strong priors, like the one in Fig 5., require huge amounts of data before the final belief accurately captures the one represented by the evidence.

Real-world implications

So what do the above experiments mean for us in the real world?

In the above experiments, I tested the effect of having a prior belief about how much water covers earth, and how that is updated based on new evidence. If one assumes that our brain works in a Bayesian manner,⁵ we can assume that we have prior beliefs that are updated based on new evidence. So we can see the effect of prior beliefs on our ability to learn from new evidence.

Here are my takeaways:

Having a weak prior belief, that is, thinking I don’t know too much about something, is actually good as it allows us to form accurate impressions based on a small amount of data. We can get from not knowing anything to confidently saying that the earth has about 70% water in ~1000 tosses (Fig 1).
Having a strong prior belief makes us less amenable to change and requires a lot more data to change our mind. Being very confident that the earth only has ~40% water resulted in very little change after ~1000 tosses (Fig 2).
This difference is because of the way learning happens when there are prior beliefs - weak prior beliefs are easily swayed by data but strong prior beliefs only grudgingly move with huge amounts of data (Fig 3).
Finally, the amount of evidence required to change prior beliefs grows steeply with confidence in the belief.

Overall, my key realization from this exercise was that I need to be more cognizant on where my prior beliefs come from. If my prior beliefs are based on evidence I collected, it is okay to have strong priors. This is because the confidence in these beliefs reflects directly on the evidence I have encountered. Examples include things which I have experienced, researched and thought about myself.

Even here, though, it is good to be aware that I have a strong prior belief, so that when required I can relinquish it - like when things change drastically and previous priors are no longer applicable (like the current ML/AI revolution which is changing so fast that my previous priors no longer apply).

However, if my prior beliefs are from other sources, then I need to be very careful about my strength of my beliefs. If it is stuff I heard from social media or acquaintances, I should make sure that my prior remains weak. Examples include stuff like gossip, cultural stereotypes, etc. which should be quickly unlearned based on evidence (ideally they should not affect prior beliefs, but it is unreasonable to assume perfect rejection).

It is fine to accept prior beliefs from credible sources but I need to be cognizant that I am accepting someone’s beliefs under the assumption that they have done the hard work of updating it based on evidence. If not, I will fall in the same trap of believing conspiracy theories, which rely on accepting false beliefs with high confidence. These beliefs will cause idea stagnation by impeding learning from new evidences.

Another important realization for me is that in order to learn and change our minds, we need to weaken our priors. This is especially important when I need to listen to other perspectives and have a meaningful conversation. For the duration of the conversation, I think we need to weaken our priors so that we do not reject incoming evidence as this would impede a meaningful conversation.

Finally, I think we need to be cognizant of using social media to update our beliefs as it influences our perception of evidence by providing posts that matches beliefs. I feel that this will artificially boosts the confidence of our beliefs. I explore this more here.

Most of the code and plots here are from 2020_McElreath - Statistical Rethinking Statistical Rethinking]]. ↩
Standard deviation would work too (and so would 1/std), and that is what I use in the code, but I feel this measure is more intuitive. ↩
I initially thought this might be because of the grid approximation method I am using here, but I double checked by using quadratic estimation and it still persists. ↩
Disclaimer: I am only on the 4th chapter of the Statistical Rethinking|Statistical Rethinking]] book so there might be better ways of doing this. ↩
Reasonable assumption according to some neuroscientists and unreasonable according to others. ↩

💭 DN's Umwelt

Table of Contents

Graph View

Explorer

Importance of weak priors when updating beliefs

Table of Contents

Summary

Experiments

Experiment 1: Effect of weak vs strong prior

Experiment 2: Updating a weak and a strong prior

Experiment 3: Evidence requirement for strong priors

Real-world implications

Graph View

Backlinks

Recent Notes

Agriculture needs another revolution

Why we remember (and how to remember better)

Wellbeing is a skill: Perspectives from contemplative neuroscience

Sage seminar series

Recent Notes

Agriculture needs another revolution

Why we remember (and how to remember better)

Wellbeing is a skill: Perspectives from contemplative neuroscience

Sage seminar series

Pest detection using plant vibrations

Table of Contents

Graph View

Explorer

Importance of weak priors when updating beliefs

Table of Contents

Summary

Experiments

Experiment 1: Effect of weak vs strong prior

Experiment 2: Updating a weak and a strong prior

Experiment 3: Evidence requirement for strong priors

Real-world implications

Footnotes

Graph View

Backlinks

Recent Notes

Recent Notes