Wikipedia:Reducing consensus to an algorithm

The consensus algorithm is much simpler than this diagram.

One might think at first that a general model of Wikipedia consensus formation would look something like:

A = N * R

where:

It's not really this simple. Each of the sources (call them S1, etc.) has its own R value (R1, etc., best expressed as a decimal value, where 0 is garbage and 1 is the most reliable source imaginable), so it would really need to be a recursive function to determine an adjusted source value for each source. But we also need to give a bit of extra weight for providing more sources.

This is more like it:

A = ((S1 * R1) + (S2 * R2) ...) / (N - (N / 10))

Key:

• A = argument strength/credibility
• Sx = an individual source's relevance (how well it supports argument A)
• Rx = reliability of that source
• N = number of relevant sources supporting the argument

In plain English: Each source is assigned a contextual value, a combination of relevance to supporting the argument and reliability (reputability) of the source. These values are added together, then divided by number of sources presented, to produce an average. This step accounts for an increasing number of sources in support of the argument actually making the argument stronger, by slightly reducing the amount by which the relevance-and-reliability total is divided (in this model, for every ten sources you get a 1/10 bonus to argument credibility).

This is just a Gedankenexperiment, since we have no objective way to assign numeric S and R values. Still, this does seems to fairly accurately model how we settle content disputes generally, if you reduce the process to statistical outcomes.

The more and better your sources are, the more your view will be accepted by consensus, all other things being equal. (Once in a while they are not equal, e.g. when a political or other faction has seized control of an article for the nonce and simply rejects ideas they don't like regardless of the evidence, until a noticeboard steps in and undoes the would-be ownership of the page.)

Given a non-staggering sample size of sources, the model even accurately captures the negative effect on A (credibility) when trying to rely on any obviously terrible sources even if other sources are high-end. Every source that does not have an R value close to 1 will drag down the average (much like how a failing grade on one exam or paper out of 10 in a class will significantly lower your overall grade even if you got straight As on all the rest of them).

It's not quite a perfect model, since it doesn't account for the fact that citing 100+ really terrible sources for a nonsense position ("Bigfoot is real", etc.) just makes you look crazy; the factoring of the effect of the number of sources is too simplistic.