I think I understand and I can see the utility of this form of analysis.
Great! I'm very excited for you and see you got pretty far already. That's pretty impressive :)
The main advantage comes about from the fact that we often are in a situation where we can derive an easy expression from p(A|B) but not for P(B|A). Or the conditional probabilities are simpler ( p(A,B) is a hard expression to write out but p(A|B) and p(B) are easy ). Or we have access to some additional information we usually don't have access to.
It's all about learning how to re-write the conditional probabilities in a way that simplifies the problem, basically.
For example, what is the probability that a person A will buy a stock at a price x?
p(buy a stock at price x)
Without some appropriate information, it's really difficult to evaluate that probability. However, let's say that we break the problem into smaller parts: The person has to be both willing to buy the stock at price x, and the stock also needs to be available at price x. Let's also say that the person thinks that the fair price of the stock is f, which means he'll buy it in 100% of the cases if x<f.
Then the above is
p(buy at x|f)
=p(willing to buy the stock at price x and the stock is available at price x|f) (Eq. 1)
However, this is still very tricky to evaluate. We know that *if* the stock is available, he'll buy it if the price is right (x<f), but the above requires also that the stock is available. So what we do know is:
p(willing to buy the stock at price x|the stock is available at price x, f)=Heaviside(x<f)
But we don't know the expression of (Eq. 1). However, we can expand Eq. 1 into a more convenient form:
p(buy at x|f)
=p(willing to buy the stock at price x|the stock is available at price x, f) p(the stock is available at price x|f)
=Heaviside(x<f) * p(the stock is available at price x|f)
=Heaviside(x<f) * p(the stock is available at price x)
Where the stock availability obviously depends only on what is available on the stock market, and not on person A's subjective notion of what a fair price is (which is why I dropped the f). So to evaluate that you can look at the stock market. So now we have an expression we can evaluate easily.
Then you can inspect a population of people (instead of one person) by basically marginalizing over f. So assume some fair price distribution p(f), which however will be largely unknown. Then you can let go of the assumption that you magically know the person's view on a fair price. Or you can use the Bayes' theorem to get the fair price distribution, given that you have observed people's buying habits or something:
p(f|buy at x) = p(buy at x|f) p(f) / p(buy at x).
Edit: Actually, we'd need to include (market data) into that expression to really derive it properly, but you get the idea.
let there be events A and B and conditional probabilities P(A|B) and P(B|A).
P(A|B) = P(A∩B)/P(B) → P(A∩B) = P(A|B)*P(B)
P(B|A) = P(A∩B)/P(A) → P(A∩B) = P(B|A)*P(A)
Hence, P(A|B)*P(B) = P(B|A)*P(A)
→ P(A|B) = (P(B|A)*P(A))/P(B) and P(B|A) = (P(A|B)*P(B))/P(A)
Bayes theorem is then a statement of the relationship between two inverse conditional probabilities. The relationship can then be explored my assumption is that this is what Bayesian Analysis is used for, exploring this relationship and using it to make inferences.
Yes. It's Bayes' theorem. I use Bayesian analysis as a catch-all word to deal with conditional probabilities, where one of the probabilities involve data and we make use of some prior knowledge (so I don't only use it in the context of Bayes' theorem).
That analysis would begin with replacing the marginal probabilities P(A) and P(B) with their joint equivalents.
That is,
P(A) = P(A∩B) + P(A∩~B) and P(B) = P(A∩B) + P(~A∩B)
So,
P(A|B) = (P(B|A)*P(A))/(P(A∩B) + P(A∩~B))
P(B|A) = (P(A|B)*P(B))/(P(A∩B) + P(~A∩B))
Now we sub the joint probabilities with there conditional probabilities,
P(A∩~B) = P(A|~B) * P(~B)
P(~A∩B) = P(~A|B) * P(~A)
So,
P(A|B) = (P(B|A)*P(A))/(P(B|A)*P(A) + P(B|~A)*P(~A))
P(B|A) = (P(A|B)*P(B))/(P(A|B)*P(B) + P(A|~B)*P(~B))
The expressions convey that P(A|B) and P(B|A) are therefore ratios between the product of some conditional and prior probability divided by the sum of that product and the sum of some of its conditional in relation to its negation.
In statistical terms, specifically the use case of analyzing Hypothesis as you've stated, where A and ~A are hypothesis and B is a observation,
P(B|A) is the probability of observation B given hypothesis A
P(A) is the probability of hypothesis A before observation B
P(B|~A) is the probability of observation B given hypothesis ~A
P(~A) is the probability of hypothesis ~A before observation B
Finally,
P(A|B) is the probability of A given observation B
As such we can compare hypothesis A and ~A given some set of observations B (We can account for a set of observations by summating the denominator) and verify the validity of a given hypothesis through the ratio of these hypothesis when observations B are given.
Yes, we can compare two hypotheses (it can be A and (not A) or it can be A and B) , and determine which of them is more likely to be correct. It's amazing once you get used to it. Sivia's book is great at illustrating practical applications that are both easy to get started with and interesting.
You've already gotten pretty far. The fundamentals are quite simple, but extremely powerful once you get comfortable with them.
Is this correct and do you have anything to add?
Only notation-wise. I use P(A,B) to state "probability of A and B", and I_Alice to mark prior knowledge accessible to Alice.
For example, you probably now understand the issue with one property of the Inquirer evidence, which used to state, among other things, that everyone has access to the same prior knowledge (before it was re-defined), such that
p(A|I_Alice)=p(A|I_Legga),
which is generally not true.
What can be said is that if we're willing to share data, and I trust you, then the probability evaluation from both our perspectives will be more similar after sharing the data than before sharing the data (Aumann's agreement theorem).