Continuing my exploration of ADMIXTURE, I turned to the HGDP data, which has 660,918 SNPs for a wide assortment of worldwide populations. After pruning 12,086 SNPs with more than 1% missing genotypes, I was still left with ~650k SNPs.
Here are some experiments on this dataset. First, a clustering with K=2 of Han Chinese, Russians, and Orcadians (left to right)
The emergence of 2 clusters (red=Mongoloid, blue=Caucasoid) is as expected, with Russians showing a small participation in the red cluster (7.2%). These northern Russians are believed to have a substantial Finno-Ugric genetic origin, so this is inline with a recent estimate for the eastern component in the westernmost Finno-Ugric speakers being less than 10% (but see below).
Notice a couple of Chinese individuals with a small Caucasoid component: as I've mentioned before Mongolians, and presumably northern Han have a small Caucasoid component from early movements of Iranian speakers from the west. That's an advantage of doing your own admixture analysis, that you can look at the data at a fine detail, and not rely on the published figures.
Next, a clustering of Orcadians, Uygur, and Han Chinese:
The variable admixture in Uygurs is evident (47.2-63.7%, mean: 54.2%)
Next, a clustering of Druze, Bedouin, and Bantu from Kenya.
Druze appear complete Caucasoid (red), Bantu completely Negroid (save for a couple of individuals), while Bedouins show a quite variable minor Negroid component. This variable African contribution (0-17.6%) makes an elongated cluster out of Bedouins in a recent analysis, pulling them away from other Middle Eastern populations in a Sub-Saharan direction.
Finally, I clustered European populations together with Mandenka and Han Chinese:
The populations are in the following order: Han, Mandenka, Orcadian, French Basque, French, North Italian, Tuscan, Sardinian, Russian.
Here are the admixture proportions:
Notice how the eastern component in Russians is now estimated as 10.9%. This probably reflects the inclusion of French Basque and Sardinians, i.e., populations which have historically no opportunity for eastern Eurasian admixture, rather than only Orcadians. This underscores the importance of having appropriate poles in inter-continental admixture estimates (see Appendix I).
Note also that the 100% value for the Han Chinese is not incompatible with the presence of the two aforementioned Caucasoid-admixed individuals, who are present here with an estimated 1.9% and 0.5% such admixture. However, this contributes little to the sample average of 40+ individuals.
The minor (0.1%) Sub-Saharan admixture in Tuscans and Sardinians is also interesting. As you can guess from the figure, this stems from a handful of individuals (green specks) with less than 1% admixture, which is, however more than the numerical low of 0.001% inferred for most Europeans by the software.
UPDATE I: Eurasian Cline
Below is a run for the following populations (left-to-right: French Basque, Russians, Uygur, Mongolians, Daur, Han Chinese). Notice that the Mongolic-speakers (Mongolian and Daur from HGDP have a small Caucasoid admixture, as I have mentioned before.
APPENDIX I: The importance of choosing poles
The choice of appropriate poles in the estimation of inter-continental admixture is extremely important.
If there is a racial admixture continuum between two major races, such as we observe in Eurasia, then we can express each intermediate population as a weighted sum of populations that live to the east and west of it.
For example, I will use a variable in interval [0, 1] to represent the position in the continuum, with 0: pure western, and 1: pure eastern.
A population at 0.4 can be expressed as the following weighted sum:
0.4 = 0.6*0 + 0.4*1
i.e., as an admixture of 60% western, and 40% eastern.
But, it can also be expressed as e.g.,
0.4 = 0.612*0.02 + 0.388*1
Notice that the choice of a slightly eastward-tilted "western pole" (at position 0.02 in the continuum) has resulted in a reduction of the inferred eastern component (from 40% to 38.8%).
This is exactly what happened in our example: Russian eastern admixture reduced when we used Orcadians, rather than French Basque as the western pole.
Note also, that this is all done automatically: no one told ADMIXTURE to identify these two poles: it was the presence of unlabeled individuals from different ends of the spectrum that influenced the admixture estimates for the rest.
APPENDIX II: Latent populations
Another important point that needs to be remembered has to do with the possible existence of latent ancestral populations.
For example, it is true that Eurasia (minus South Asia) is economically described as a continuum from the Caucasoids of the Atlantic coast to the Mongoloids of the Pacific, with a transition zone in Central Asia and Siberia, and spillovers on either side. But, we cannot exclude the prehistoric existence of other races in the Eurasian landmass that do not exist today in a relatively unadmixed form.
In Eurasia, the Proto-Uralic race was postulated as such a "third race" with features of its own and not reducible to simple Caucasoid-Mongoloid admixture. It is difficult to see whether these features are ancestral peculiarites (prior to admixture with Caucasoids and Mongoloids), or if they have arisen in a mixed Caucasoid-Mongoloid population.
It is also important to understand how such latent populations affect genetic continua:
First, if the latent population is equidistant from the two major races, then its admixture has no effect on an individual's position in the continuum between the two races. However, it is possible that the latent population was more related to one of the two major races. In that case, admixture with it will move a population towards that race.
So while the jury is still out about the existence of a Proto-Uralic race in Eurasia, its effects on admixed populations indicates that if it had existed it was genetically closer to Mongoloids than to Caucasoids.