this is all just mental masturbation, i doubt it would be admissible in court
So what I'm gathering is that Benford's distirbution is actually a very widespread phenomena and many processes fall comfortably. It seems like the fundamental requirement is the magnitude or scale of the data set, it must be large, and the randomness of the data set, it must be random but not too random.
The authors of the papers I'm reading seem pretty convinced by voting processes in your common democratic society to obey Benford's law but can violate it naturally through strategy. it just so happens that those strategies are extremely unlikely to occur given behavioral constraints.
However given the law can be violated naturally it had me thinking that, ofc under the assumption you accept the current hypothesis concerning the connection between the law and voting, that maybe it technically could have been violated via the mail in order process given Democrats advocated it in mass (hence it becomes part of their voting strategy) and Trump advocated against (it's not apart of his and even rejects it which may cause his voters to reject it). I guess this could be figure the validity of this theory by seeing the distribution of trump voters who voted by mail and when they were counted, plus you must consider that mail in votes were chosen to be counted after in-person votes. Would that violate the law naturally? I'm really not sure.
Not sure.
There's also the possibility that people are cherry-picking data. That is, if they're looking at small populations and doing the Benford's `test` on each individual population, the actual question they're asking is `how likely is it that there exists a few statistical Biden-vote anomalies within a larger population.` However, if they now find that one or two examples, then they can showcase those. Suppose that were true, then it'd be perfectly expected that the Trump votes do not show anomalies in those few example cases (if most individual populations follow Benford's law).
However, it is possible that the Benford's law could be violated just due to random chance if your sample size is large enough (there are enough many individual populations).
Can you explain this part?
From what I read it's evidently the case that the number of subpopulations or distributions you pull data from doesn't matter.
The relationship is derived from the randomness of the whole data set and its magnitude.
I'd be interested if they can show that there are no statistical anomalies within *any* individual population when inspecting Trump votes, not just that the Trump votes follow Benford's law within one or two cherry-picked cases.
Yes, good point.
In such a case we would know Trump is cheating as well or that benfords law doesn't universally apply.
Another thing I was wondering is what digits are they focusing on to create the frequency distribution, the papers I read about applying the law to voter fraud detection made it clear you can't just pick any digit - some are better than others and more resistant to these natural fluctuations of frequency in the data set.
TPG said:this is all just mental masturbation, i doubt it would be admissible in court
Perhaps, but the idea is fascinating.
However, it is possible that the Benford's law could be violated just due to random chance if your sample size is large enough (there are enough many individual populations).Can you explain this part?
From what I read it's evidently the case that the number of subpopulations or distributions you pull data from doesn't matter.
The relationship is derived from the randomness of the whole data set and its magnitude.
For example, if you look at these stock figures here:
You will see that indeed they follow, approximately, Benford's law.
However, you can also see that there is variance in the results. That is, the datasets clearly do not perfectly follow Benford's law, but only approximately. You can see similar variance in the electoral vote vs Benford figures.
So when I say random chance, what I really mean is that different realizations of the data will roughly follow Benford's law, but there's an underlying stochastic element to them as well. Whether that stochastic element is derived due to counting error (something akin to a Poissonian process, which could be accounted for in the derivation of Benford's law) or due to processes that are not accounted for within Benford's law (changes in the weather, ...), the fact remains that there is a stochastic (or at least unknown) element that will influence the test. So, effectively, there is some element of random chance that we can't quantify. If there wasn't, we would expect the data to follow Benford's law perfectly.
So the question then becomes: How likely is it that those fluctuations just naturally produce something similar to that Biden vote result that seemingly violated Benford's law.
To answer that, we can look at Trump votes and see if there are any examples where Benford's law is violated. If, for example, 1/10000 individual populations violate Benford's law, then we can derive how likely it is that N Biden vote anomalies is within the realms of the expected normal statistical fluctuation (assuming that the mail-in votes do not alter the results).
I'd be interested if they can show that there are no statistical anomalies within *any* individual population when inspecting Trump votes, not just that the Trump votes follow Benford's law within one or two cherry-picked cases.
Yes, good point.
In such a case we would know Trump is cheating as well or that benfords law doesn't universally apply.
I was thinking more in terms of how frequently Benford's law applies, but you're absolutely right.
Another thing I was wondering is what digits are they focusing on to create the frequency distribution, the papers I read about applying the law to voter fraud detection made it clear you can't just pick any digit - some are better than others and more resistant to these natural fluctuations of frequency in the data set.
Sounds interesting. Maybe they'll post a more formal analysis.
However, it is possible that the Benford's law could be violated just due to random chance if your sample size is large enough (there are enough many individual populations).Can you explain this part?
From what I read it's evidently the case that the number of subpopulations or distributions you pull data from doesn't matter.
The relationship is derived from the randomness of the whole data set and its magnitude.
For example, if you look at these stock figures here:
You will see that indeed they follow, approximately, Benford's law.
However, you can also see that there is variance in the results. That is, the datasets clearly do not perfectly follow Benford's law, but only approximately. You can see similar variance in the electoral vote vs Benford figures.
So when I say random chance, what I really mean is that different realizations of the data will roughly follow Benford's law, but there's an underlying stochastic element to them as well. Whether that stochastic element is derived due to counting error (something akin to a Poissonian process, which could be accounted for in the derivation of Benford's law) or due to processes that are not accounted for within Benford's law (changes in the weather, ...), the fact remains that there is a stochastic (or at least unknown) element that will influence the test. So, effectively, there is some element of random chance that we can't quantify. If there wasn't, we would expect the data to follow Benford's law perfectly.
So the question then becomes: How likely is it that those fluctuations just naturally produce something similar to that Biden vote result that seemingly violated Benford's law.
To answer that, we can look at Trump votes and see if there are any examples where Benford's law is violated. If, for example, 1/10000 individual populations violate Benford's law, then we can derive how likely it is that N Biden vote anomalies is within the realms of the expected normal statistical fluctuation (assuming that the mail-in votes do not alter the results).
Yes, I see.
This is in line with what I've read especially as it relates to digit selection.
You select one that is least likely to be effected by the randomness you are describing and as such that black swan event can then be said to be as unlikely as possible.
this is all just mental masturbation
You say that like it's a bad thing.