Showing posts with label Caucasoid. Show all posts
Showing posts with label Caucasoid. Show all posts

August 22, 2012

East Eurasian-like ancestry in Northern Europe (part 3)

(This is the third part of the series. See part 1 and part 2.)

In the first two parts of the series, I showed that northern European populations show hints of East Eurasian ancestry when compared against Sardinians. I used Dai, Han, and Karitiana as reference populations for East Eurasia. In the current post, I extend this analysis by using HGDP Papuans and the Onge (Reich et al. 2009) from the Andaman Islands.

The f4 statistics using Karitiana, Papuan, and Onge populations can be found in this spreadsheet.

Below, you can see that they are all near perfectly correlated with each other.

The visual appraisal is confirmed when we calculate the correlation coefficients:


The fact that all three populations track the same signal is strong evidence for the direction of gene flow: from Asia into northern Europe. If the signal was present in only one of the three populations, then it could conceivably be an artefact of gene flow in the opposite direction (from northern Europeans to the affected population). But, the fact that all three populations show the same pattern would require northern European-like admixture in the Andaman Islands, Papuan New Guinea and South America, which does not appear very parsimonious.

While the signals from the three populations are correlated, their intensity varies. The Z-scores provide a measure of this intensity. The mean Z-scores using a Karitiana, Papuan, and Onge reference across all populations are respectively -17.7, -8.0, and -6.0.

While I did not include the Han reference of part 1 in this analysis, inspection of the f4 statistics (which can be obtained at the bottom of that part), suggests that the Z-scores become more significant when using an Onge, Papuan, Han, and Karitiana reference in that order. For example, for the Finnish_D population, they are: -10.037, -13.2949, -23.9305, and -27.764 respectively.

It thus appears that the element contributing East Eurasian-like ancestry in northern Europeans was derived from the northern spectrum of East Eurasians; the Karitiana may live in South America today, but they trace their ancestors to northern Eurasia, having entered the Americas c. 15ka.

In my opinion, the signal has been formed by a superposition of a few factors:

  1. The fact that Y-haplogroup R, the main lineage in modern northern Europeans has a common origin (Y-haplogroup P) with haplogroup Q, the main lineage in modern Amerindians, and many Siberians. We can hypothesize that the population that brought R into Europe was intermediate genetically across the Caucasoid-Mongoloid spectrum. In West Eurasia, this population admixed with the Palaeo-West Eurasians (Y-haplogroups IJ, G, and possibly LT), and contributed their DNA primarily to the northern Europeoids.
  2. Other population movements of more regional impact, such as Y-haplogroup N, which affected mainly Uralic, Baltic, and East Slavic populations, as well as elements from the mixed West/East Eurasian mtDNA contact zone that ancient DNA analysis has revealed in Eastern Europe and Siberia.
The raw dumps of fourpop output for Papuan and Onge reference can be found here.

East Eurasian-like admixture in Northern Europe (part 2)

This is a continuation of my earlier post. Please refer to it for the methodology. A new part 3 can be found here.

I have repeated the experiment with a much larger set of populations:
English_D, British_D, Ukranians_Y,  Karitiana, Spaniards, Sardinian,  Serb_D, Mordovians_Y, Irish_D,  French, Finnish_D, Chuvashs_16,  Romanian_D, N_Italian_D, French_Basque,  Austrian_D, Russian_D, Hungarians_19,  Kent_1KG, German_D, Belorussian,  Tuscan, Lithuanian_D, Orkney_1KG,  Dutch_D, TSI30, Ukrainian_D,  Bulgarians_Y, Bulgarian_D, Russian,  Swedish_D, Pais_Vasco_1KG, French_D,  Castilla_Y_Leon_1KG, Lithuanians, San,  Polish_D, Romanians_14, Orcadian,  Cornwall_1KG, Valencia_1KG, North_Italian,  FIN30, Norwegian_D, CEU30
I used Sardinians as the Caucasoid reference population, Karitiana for Mongoloids, and San for Africans. The latter two were chosen because they live at maximally opposite corners of the Earth (South America vs. South Africa).

A first plot of the f4 statistics used for f4 regression ancestry estimation is seen below:

Clearly, some evidence of a cline is present, but several populations appear to deviate from it. In order to get the cleanest possible cline, I carried out the following greedy procedure: I calculate the correlation coefficient of this set, and iteratively remove one population that leads to the maximum improvement of the correlation, until no further improvement takes place. The following populations were removed with this procedure:

Spaniards, Serb_D, Romanian_D, N_Italian_D, Tuscan, TSI30, Bulgarians_Y, Bulgarian_D, Castilla_Y_Leon_1KG, Romanians_14, Valencia_1KG
This seems to make sense, as all these are southern European populations. Note that their removal does not mean that they do not partake in the same phenomenon as northern Europeans: they also exhibit Karitiana-shift relative to the Sardinians, but there are probably other confounding factors that make them fall "off-cline". Including them would diminish the clarity of the cline for Northern European populations. The regression of the remaining populations can be seen on the right:



f4 regression ancestry estimation results are shown on the left. These appear to be much higher than was the case with the Han and Dai in the previous experiment.

I can't say that I've made any obvious mistakes, but these admixture proportions are substantial, and call for an explanation. Whatever their true levels, I am fairly confident on at least a few points:

First, it is evident that northern Europeans have higher levels of this element than southern Europeans; the latter are not altogether deficient in it, but they fall "off-cline", making estimation of their admixture proportions more difficult.

Second, within northern Europe, there is a fairly clear east-west cline of diminishing Amerasian-like admixture. The minimum occurs in Sardinians and secondarily in Southwest Europe. Romance, Celtic, and Germanic populations all have less of it than Balto-Slavic and Uralic ones. And, some populations of northeastern Europe seem to have a noticeable excess of it.

The groups with the most Amerasian-like admixture possess Y-haplogroup N, a clear trace of eastern ancestry that is not shared by most Europeans. The arrival of this haplogroup, either with Comb Ceramic of the Baltic Neolithic or later with Seima Turbino Bronze Age expansions is probably responsible for the local excess in Northeastern Europe. The Chuvash are, of course, a Turkic population but of Finno-Ugrian genetic origin.

But, the presence of this element even in Western Europe cannot be explained on the basis of typically Mongoloid elements which are almost completely lacking there. If Mesolithic Europeans were themselves Asian-shifted, then this would account for the presence of the element, but not necessarily for its clinal manifestation. The double (north-south and east-west) cline indicates every sign of an intrusive element. So, for the time being, I will propose that this is associated with late (e.g., Copper and Bronze Age) phenomena, such as the northern stream of the Bronze Age Indo-European invasion of Europe.

This may be due to the

  • (i) northern Indo-European groups picking up some native east European or Siberian elements as they made their way into Europe, 
  • or (ii), more likely, in my opinion, that the Y-haplogroup R1 group of people, whose closest relatives are in Central/South Asia (R2) , and whose more distant relatives (Q) are in Siberia and the Americas, were from the beginning an "intermediate population" between West and East Eurasia. The R1 group of people in its R1b and R1a varieties first appear in Europe during the Copper Age, and they are lacking in early Neolithic sites.


Eight years ago, and in a totally different context, I wrote:

Similarly, 9 out of 10 Basques are descended from a man who has also fathered 9 out of 10 Kets from Siberia and 9 out of 10 Maya Indians from America. That man, founder of haplogroup P thus has descendants who belong to two of the major human races (or three, if Amerindians are considered as separate from Asian Mongoloids)   
... 
In conclusion, human continental populations form groups of genetic and phenotypic similarity, and these groups can be considered races in the phenetic sense. However, these groups are not monophyletic, hence in the cladistic sense they should not be considered as valid taxa. Since the principle of common descent is generally applied in modern systematics (or at least it should!), I think it's best not to recognize human subspecies. 

If these data pan out, it may be revealed that the European branch of the Caucasoids is actually a product of admixture too, with at least two of its constituent elements being the "Palaeo-West Eurasians" (Y-haplogroups G, IJ, possibly LT) and the "Neo-NW Eurasians" (Y-haplogroups N1 and R1), with the "Neo-Afrasians" (Y-haplogroup E1b1b) forming a third element.

(A raw dump of fourpop output can be found here).

July 14, 2012

Population strata in the West Siberian plain (Baraba forest steppe)

Also from the Population Dynamics in Prehistory and Early History (2012) volume, this is an awesome ancient DNA study which dissects a succession of archaeological cultures stretching from the beginning of the metal ages to the beginning of the Iron Age in a small region of West Siberia. As the authors write:
Our work is devoted to the analysis of human migration processes that occurred during the Bronze Age (4th–early 1st millennium BC) in the forest steppe zone between the Ob and Irtysh rivers (about 800 km from west to east). This area, known as Baraba forest steppe, stretches over 200 km from the taiga zone in the north to the steppes in the south.
The careful examination of the sequence of cultures, combining ancient mtDNA and physical anthropology paints a very compelling picture of the changes that occurred in the span of a few millennia in the Baraba forest steppe. The authors give the map on the left, with the caption: "Fig. 5 | Location of ancient human groups with a high frequency of mtDNA haplogroups U5, U4 and U2e lineages. The area of Northern Eurasian anthropological formation is marked by yellow region on the map (References: 1 Bramanti et al., 2009; 2Malmstrom et al., 2009; 3 Krause et al., 2010; 4 this study)".


The northern Eurasian anthropological formation actually combines eastern and western Eurasian features and may correspond to the Proto-Uralic type. Researchers have clashed about the origins of this population element, with some considering it a third Eurasian race that evolved independently of Caucasoids and Mongoloids, others assigning it to a much diverged branch of one of the two major Eurasian races, and still others considering it the product of admixture between east and west.


All indications are that the type, unlike the Caucasoid-Mongoloid mixtures that took place in Central Asia in the last 2 millennia, is of more ancient vintage, and represents an anthropological element that was indeed of Caucasoid-Mongoloid origins, but in the rather remote past. The authors write with respect to the most ancient periods:

In contrast to the occupation of the southern region of West Siberia, modern humans arrived in the Ob-Irtysh interfluve relatively late, at the end of the Pleistocene, about 13–14 thousand years ago (Okladnikov, Molodin, 1983; Petrin, 1986). The absence of burials dating back to this period in the region does not allow us to conduct a biological investigation of this earliest population. The most ancient anthropological material available is from the Neolithic period (4th–5th millennium BC). 
And, what of the earliest available material?
The anthropological analysis of the material allowed us to detect a specific craniological type in the Baraba population, which was assigned to one of the anthropological formations discovered by V.V. Bunak in 1956 through the analysis of Neolithic materials from the northern forest zone of the East European Plain. Bunak called it the “northern Eurasian anthropological formation” (Bunak, 1956).

This anthropological type developed in a zone that is intermediate to the geographic areas occupied by the classic Caucasoids and the Mongoloids. The exists substantial anthropological evidence showing a wide geographic distribution of this anthropological formation: from the Trans-Urals forest and the Barabian province of Western Siberia in the east to Karelia and the Baltic in the west (Chikisheva, 2010).
The mtDNA evidence seems to support the anthropological assessment:
We have analyzed 18 mtDNA samples from the Ust-Tartas population to date (Fig. 3). The results obtained thus far allow us to draw several preliminary conclusions about the genetic background in the region in the beginning of the Bronze Age. By the Early Metal Period the mtDNA pool structure was already mixed and consisted of both Western and Eastern Eurasian haplogroups in nearly equal proportions. The eastern Eurasian mtDNA cluster was represented by Haplogroups A, C, Z, D, which are most typical of modern and perhaps ancient populations located in the east of the region studied. Haplogroups C and D were predominantly represented by widely distributed root haplotypes. A lineage of Haplogroup A that was detected in two Ust-Tartas samples represents a subcluster that is apparently characteristic of West Siberia and the Volga-Ural Region. The observed presence of Haplogroup Z lineages with a high frequency in the Ust-Tartas group was unexpected, since these lineages are nearly absent in the gene pool of modern indigenous West Siberian populations.

It is worth noting that the Western Eurasian mtDNA haplogroups in the Ust-Tartas series were represented only by Haplogroup U lineages, and specifically by the three subgroups – U2e, U4, U5a1. These results are in agreement with previous data indicating that Haplogroup U lineages (particularly Subgroups U5 and U4) predominated in Eastern, Central and Northern European hunter-gatherer groups from 14000 to 4000 years ago (Bramanti et al., 2009; Malmstrom et al., 2009), and possibly in earlier periods (Krause et al., 2010). The geographic area within which this genetic feature is observed appears to be broad (Fig. 5). Apparently, Baraba was near the eastern periphery of this area.
We now have evidence of the zone of U dominance extending from Iberia in the west and all the way to Lake Baikal in the east. But, this zone is not homogeneous: its western, European, end appears to have lacked the East Eurasian lineages, while starting from Ukraine and to the East the U types were supplemented by the Mongoloid lineages.

But, there was structure within the U zone itself: according to Lillie et al. (same volume) in Ukraine during the 6th millennium BC, the West Eurasian types were represented by U1 and U3, a different mix than in the Baraba forest steppe, and haplogroup T was also present, while of the Mongoloid haplogroups only C was present.

As we head into the Bronze Age, the population of the region displayed signs of continuity:

The genetic analysis of the Odinovo and Krotovo groups (10 and 6 samples, respectively) (Fig. 3) did not reveal any differences between them and the previous Ust-Tartas group, such as the presence of new mtDNA haplogroups. The mtDNA pool structure was still mixed. The East Eurasian haplogroups were represented by the D, C, Z (in both the Odinovo and Krotovo groups) and A (in the Krotovo group) haplogroups. The East Eurasian lineages identified were phylogenetically close (lineages of haplogroups A, C, Z) or even identical (D haplogroup, 16223–16362 lineages) to the samples from the Ust-Tartas group. The West Eurasian part of the samples were represented by the U5a1 (Odinovo group) and U2e (Krotovo group) haplogroup lineages.  
Although only a small series of samples have been investigated thus far, the data obtained reveal continuity between the Odinovo and Krotovo populations and the earlier Ust-Tartas group. These findings are consistent with the autochthonous development of the Baraba populations during the Early and the beginning of the Middle Bronze Age, as well as with the anthropological evidence.  
It is during the Middle and Late Bronze ages that we begin to say the first intrusive lineage into the native population mix:
The anthropological analysis of the West Siberian Andronovo population shows at least four craniological types. Three types are related to the Palaeocaucasian race and are represented by proto-European anthropological type variants. The fourth, Mongoloid, component is autochthonous. The most intensive interactions between the Andronovo migrants and the indigenous populations apparently occurred in the Baraba forest steppe and the right bank of the upper Ob River (Chikisheva and Pozdnyakov, 2003).  
To investigate the putative impact of Andronovo migrants on the mtDNA pool structure of the indigenous populations in Baraba, mtDNA samples from the Late Krotovo (n=20) and Andronovo (n=20) groups in this region were analyzed (Fig. 3) and compared to recently published data (n=10) (Keyser et al., 2009) and our own unpublished data (n=6) on mtDNA lineages from West Siberian Andronovo populations located outside the Baraba forest steppe.  
The genetic influence of migrants can be detected by the appearance of a new mtDNA haplogroup that was absent in the populations preceding the migration wave. This new mtDNA haplogroup, a West Eurasian T haplogroup, was detected in the Late Krotovo population. The T haplogroup appears simultaneously (with a 15 % frequency) in the Krotovo and Andronovo groups, but was completely absent in all preceding Baraba populations. We therefore consider the appearance of the Haplogroup T-lineage as the most likely genetic marker of the Andronovo migration wave to the region.  
This assumption is confirmed by mtDNA studies of Andronovo groups from other West Siberian areas. Haplogroup T lineages were found, with a frequency of 25 %, in the samples (n=16) taken from two Andronovo groups from the Krasnoyarsk and upper Ob River areas.  
We also detected another remarkable feature in the mtDNA pool of the Andronovo group from Baraba. Most mtDNA samples belonged to haplogroups, such as the East Eurasian A and C haplogroups, that are typical of preceding Baraba indigenous populations. Still, these haplogroups were not found in the other West Siberian Andronovo groups. Apparently, the Andronovo group from Baraba assimilated the aboriginal Krotovo population, from which it obtained these East-Eurasian mtDNA haplogroups. Obviously, there was reciprocal genetic contact between the migrant and indigenous groups in the region. 

...

A small but informative series of mtDNA samples from the Baraba Late Bronze Age culture population (n=5) was analyzed (Fig. 3), revealing the presence of MtDNA lineages (East Eurasian A and C lineages) that mark the genetic continuity with aboriginal Baraba groups. At the same time, the series includes the Haplogroup-T lineage, which we believe marks the Andronovo migration wave to West Siberia. Our data is therefore consistent with the putative origin of the West Siberian Late Bronze Culture population as the result of interaction between the Baraba indigenous genetic substrate and the newly arrived group.
It is now clear that the Andronovo groups moving into the area possessed mtDNA haplogroup T and assimilated the locals with their U+East Eurasian mix. It is of course interesting that haplogroup T is the only non-U lineage found in the aforementioned study of Mariupol-type cemeteries from Neolithic Ukraine.

The earliest occurrence of haplogroup T is in the Pre-Pottery Neolithic B of the Near East (Tell Hallula), and this haplogroup appears all over the place in Neolithic Europe. While a recent article has suggested a pre-Neolithic dispersal of T subclades into Europe, on the basis of modern populations, this hypothesis is difficult to reconcile with the ancient DNA data.


Pending new discoveries, it appears likely that mtDNA haplogroup T represents a Neolithic entrant into the boreal zone of U dominance. This has, of course, substantial implications in the context of J.P. Mallory's concept of fault lines, as it demonstrates that the steppe populations did not evolve in isolation, but the dominant lineage in the Andronovo groups was a late entrant into the indigenous U-zone of the eastern European plain.


But, the story doesn't end here:

The analysis of mtDNA samples from the Chicha-1 population revealed some interesting patterns. Crucial changes in the composition of mtDNA haplogroups in the gene pool were observed as compared to the earlier Baraba groups studied (Fig. 3). Dominance of Western Eurasian haplogroups and the near absence of East Eurasian were observed. Additionally, several new West Eurasian haplogroups appeared in the region, including Haplogroups U1a, U3, U5b, K, H, J and W.  
The phylogeographic analysis suggests that the distribution and diversification centres of several of these mtDNA haplogroups and specific lineages are located on the west and south west of the Baraba forest steppe region, on the territory corresponding to modern-day Kazakhstan and Western Central Asia (Fig. 10). Apparently, the migration wave from the south strongly influenced the gene pool of the Baraba population in the transitional period from the Bronze to the Early Iron Age. The impact of the northern human groups was probably less evident in the south of the Baraba forest steppe, at least at the mtDNA level. 
The drastic appearance of a purely Caucasoid population at the Iron Age from a southern, east-Caspian origin perhaps corresponds to the arrival of the first steppe Iranians. The vector of proposed migration is reasonable, if we consider both the likely Indo-Iranian homeland east of the Caspian, as well as the literary evidence for Scythian mobility during this period.

All in all, this is commendable research which allows us to intuit a sequence of events:
  • An early mixture zone between Caucasoids and Mongoloids
  • The Bronze Age arrival of mtDNA-T bearing Andronovo groups, the first pastoralists entering the zone of U+East Eurasian boreal hunter-gatherers; these Caucasoid peoples admixed with the natives of the mixture zone.
  • The early Iron Age arrival of a full-blown set of Caucasoid mtDNA lineages from the south paving the way for the Iranian Scytho-Sarmatian period

Human migrations in the southern region of the West Siberian Plain during the Bronze Age: Archaeological, palaeogenetic and anthropological data


Molodin, Vyacheslav I. et al.


In this paper we present archaeological and anthropological data on human migrations in the Western Siberian foreststeppe region during the Bronze Age (4th–beginning of 1st millennium BC). These data, accumulated over forty years of intensive research in the region, are compared to new results showing the diversity of mitochondrial DNA (mtDNA) lineages in this region during that period (92 mtDNA samples from seven ancient human groups). Preliminary analyses have demonstrated the usefulness of ancient DNA in tracing and unravelling patterns of past human migrations.  


Link


Prehistoric populations of Ukraine: Migration at the later Mesolithic to Neolithic transition


Lillie, Malcolm C. et al.


This paper focuses on the identification of population movements during the Mesolithic and Neolithic periods in the Dnieper Basin region of Ukraine. We assess the evidence for migration from the perspective of individual life histories using a combination of palaeoanthropology/pathology, radiocarbon dating, stable isotopic studies of diet, and mtDNA. 


Link

December 14, 2011

Clusters Galore analysis of West Eurasians

It's been a while since the last Clusters Galore analysis, so I've decided to use my recently assembled dataset and run such an analysis over the individuals who belonged to the Six main West Eurasian components.

Hence, at the beginning, I identified 945 individuals in my set who had more than 95% combined admixture proportions in the Six. Subsequently, I ran MDS on this set, keeping 50 dimensions.

One of the open issues in Clusters Galore analysis is how to choose how many MDS dimensions to retain. So far, I've applied a heuristic by choosing the number of MDS dimensions that maximizes the number of inferred clusters by MCLUST. However, when I actually inspect the MDS plots, it often turns out that meaningful information seems present at even higher number of MDS dimensions. As a result, I've decided to pick the number of dimensions in the following manner.

The main idea is that data points in uninformative MDS dimensions will appear as largely Gaussian noise. So, we can use a test of normality (I've chosen the Shapiro-Wilk test) to detect dimensions that appear not to be noise. Below is the p-value of this test for different MDS dimensions:
Up to 22 dimensions, there is a strong non-Gaussian signal (all p-values less than 0.001). Hence, I would use the first 22 dimensions in MCLUST analysis. With these dimensions, the number of inferred clusters was estimated as 35. So, this is something like a 6-fold increase in resolution over the Six components inferred by ADMIXTURE.

The cluster totals for the different populations can be seen in the spreadsheet.

Important Caveat: Some populations (e.g., Finnish_D, or Turkish_D) have a great number of individuals who do not meet the "95% in the Six" inclusion threshold. Hence, results are not representative for them, and simply indicate the cluster assignment of their subsets that do meet the threshold. You can check whether individuals have been removed from the original dataset by comparing sample sizes in the Clusters Galore spreadsheet with the K12a one.

Here are some observations on the 35 cluster. I will mention the modal population (or region) for each one:
  1. Ashkenazi
  2. Scandinavian
  3. French
  4. British Isles
  5. Armenian
  6. S Italian/Sicilian
  7. Kurd
  8. Greek
  9. Cypriot
  10. Balto-Slavic
  11. Hungarian
  12. Balkan
  13. Sephardic
  14. Spanish
  15. Iberian
  16. North Italian/Tuscan
  17. Morocco Jews (main)
  18. Saudis
  19. Georgian/Abkhazian
  20. Basque
  21. Bedouin
  22. Druze #1
  23. Druze #2
  24. Druze (main)
  25. Mozabite (main)
  26. Mozabite #1
  27. Orkney
  28. Sardinian
  29. Azerbaijan Jews
  30. Iran/Iraq Jews
  31. Lezgins
  32. Morocco Jews #1
  33. Samaritan
  34. Yemen Jews
  35. Abkhazian

May 14, 2011

East Asian- and African-shift of West Asian/European populations

In my critique of Moorjani et al. (2011), I noticed how the authors projected West Eurasian samples on an African/East Eurasian axis and assumed that African-shift along that axis was due to the presence of African admixture.

I showed, that while some populations are shifted towards Africans, others are shifted towards Asians, so a projection along an African-East Asian axis is in reality a palimpsest of the two phenomena.

I argued that assessment of African admixture using a simple 2-population model that does not account for the "East Asian factor" leads to erroneous results. I then compared five methods of admixture estimation, showing that the Sub-Saharan admixture results of Moorjani et al. (2011) were higher than all the other methods, consistent with my hypothesis.

In the present, I show how many different European/West Asian populations are shifted towards Africans/East Asians.

Principal Components Analysis

First, a PCA plot of just the West Eurasian populations; labels are mapped onto each population's average position:
Second, a PCA plot together with 25 Chinese and 25 Yoruba; Eurasians are separated from Africans along eigenvector 1, and East Eurasians from Eurafricans along eigenvector 2.
Third, a blowup of the West Eurasian portion of the above plot:
Third, a further blowup of the plot, excluding Chuvash to make it even clearer:
Finally, here is a spreadsheet with the average PC co-ordinates of the studied populations. The first two eigenvalues are 25.92 and 11.91.

With respect to the Asian- and African- shift of West Eurasian populations, I note that northern Europeans (and Basques) are less African-shifted than southern Europeans, and, at the same time they are more Asian-shifted: the 16 least Asian-shifted populations have a coastline in the Mediterranean (excluding the Portuguese), while the 16 least African-shifted populations do not (excluding the French).

Discussion

This analysis suggests the importance of choosing appropriate populations to represent Caucasoids in the global context. One often sees CEU used for that purpose. I see no major problem with that in general, as it is good for different studies to have a similar reference point, and CEU have been used for years for that purpose.

However, when dealing with the problem of admixture, this becomes an issue. CEU emerges as one of several populations with minimal African-shift, but are intermediate in terms of their Asian-shift. Sardinians, on the other hand, have mininmal Asian-shift (by far), and are intermediate in terms of their African-shift.



If one were to choose a single population to serve as a Caucasoid pole according to a criterion of maximal differentiation, then Basques are the obvious candidate, as they are tied for 1st place in having least-African shift, and 2nd in terms of Asian-shift. Indeed, a K=3 ADMIXTURE analysis of this dataset demonstrates that they are in fact the population showing the maximal contribution of the Caucasoid-specific component.

The analysis presented here also demonstrates the relative value of different population isolates in ancestry analysis. The Chuvashs, for example, are clearly not part of the genetic continuum of Europe, and neither are French Basques and Sardinians: all of these populations form very distinct clusters within the West Eurasian-specific context (first plot of this post). Nonetheless, their analysis within a global context demonstrates that they are distinct in different ways: Chuvash because of their substantial Asian-shift, Sardinians because of the substantial lack thereof.

In conclusion, It is a good idea not to employ a simple 2-way population mixture model to assess either African or Asian admixture in West Eurasians. Such a model may lead to erroneous results if it employs West Eurasians' shift on the African-East Asian axis.

May 08, 2011

On the northern/southern Caucasoid contributions to Asia

I project a great number of Siberian, Central Asian, and South Asian populations on the first two principal components created by Han, West Asians, and Northern Europeans.

PC1 captures east-west variation across Eurasia, although the Han are also related to Ancestral South Asians, a major component in the ancestry of South Asians. PC2 captures West Asian-North European variation, so it is quite useful to extract the relative northern vs. southern Caucasoid elements in the populations examined.

Here are the first two PCs with the populations used to create them. Northeastern European (N=49) includes Lithuanians, Belorussians, Russians, Poles, and various non-Balkan Slavs. Northwestern European (N=46) includes Germans, Irish, Norwegians, and various continental Germanics. West Asian (N=93) includes Armenians, Iranians, Adygei, Lezgins, and Georgians.

Population labels are always placed on population averages. Notice that the Han form a tight cluster, halfway (along PC2) between West Asians and Northeast Europeans; this is expected as they are an outgroup that has not been significantly affected by Caucasoids.

We will now project various populations onto the previous 2-D map: their horizontal position (along PC1) depends on the extent of Caucasoid admixture, while their vertical position (along PC2) depends on whether this admixture is more northern or southern Caucasoid.

UPDATE (May 9):

I have also carried out supervised ADMIXTURE analysis, using the dataset of this post, adding Onge from the Indian Ocean as a fourth ancestral group together with Han, Northern Europeans, and West Asians.
The results seem consistent with the PCA projection, while the distinctiveness of the East Asian (dark blue) and Ancestral South Indian (light blue) components emerges.

April 23, 2011

Genetic structure of West Eurasians

I have decided to generate a new major data dump of ADMIXTURE results. In comparison to previous such experiments:
  1. The focus is entirely on West Eurasians (Caucasoids).
  2. I have excluded all potential relatives from the source datasets, as well as several populations that tend to create uninformative clusters of their own (e.g., Druze or Ashkenazi Jews); exceptions are populations of great anthropological interest (e.g., Basques).
  3. I have included all relevant Dodecad Ancestry Project populations with 5+ participants.
  4. I have developed a new way of "framing" the region of interest by choosing appropriate sets of individuals from outside of it.
"Framing" populations

I have, since the beginning of my ADMIXTURE experiments, emphasized the importance of including appropriate population controls designed to squeeze out minor distant admixture in populations of interest, so that it does not confound the inference of region-specific components.

This leads to a problem: there are many possible sources of admixture. For example, we do not know a priori which set of African populations may have contributed to Caucasoid populations, or which set of East Asian ones. We could choose e.g., the Yoruba and the Chinese to represent Sub-Saharans and East Asians, but that might exclude possible sources of variation, and lead to Yoruba- and Chinese- specific clusters rather than more general Sub-Saharan and East Asian ones. If we included more population controls, we would cover more possible sources of variation, but ADMIXTURE would infer components of little interest (e.g., between Pygmies vs. Bushmen or Mongols vs. Chinese)

To avoid this, I propose to create meta-populations consisting of a single individual from many populations, i.e., a Yoruba, a Mandenka, a San, a Mbuti Pygmy, etc. for Sub-Saharan Africa, or a Miaozu, a Han, a Mongol, a She, a Hezhen, etc. for East Asia. That way we are both helping ADMIXTURE infer general components, while at the same time preventing it from inferring non-region specific ones.

Results

The entirety of the results presented here can be downloaded. They include:
  1. Population sources
  2. ADMIXTURE proportions for populations
  3. Fst divergences between components
  4. Population portraits showing individual level variation
See spreadsheet and associated bundle (or here).

At K=3, we observe the emergence West Eurasian, Sub-Saharan, and East/South Asian components.

The impact of the Sub-Saharan component is felt most distinctly in North Africa and the Near East, especially among Arabs; the impact of the East/South Asian one in West Asia and Northeastern Europe, especially among Finnic and Turkic speakers.

It is interesting to note that 39.8% of the Indian_D sample is assigned to the E/S Asian component. I had previously estimated in a roundabout way, and in a slightly smaller sample that the Ancestral South Indian component in Project participants was 33.3%, so ADMIXTURE has roughly managed to infer correctly that about 1/3 of this Indian sample's ancestry is more closely related to East Asians than to West Eurasians.

At K=4, the first split within the Caucasoid group appears: a component centered onn Europe, and one on West/South Asia.

Many populations possess both these components in clinal proportions.

The European component shrinks to insignificance in Arabians, such as Saudis and Yemenese.

The West/South Asian component shrinks to insignificance in Northeast Europeans, such as Finns, Lithuanians, north Russians, and Chuvash.


At K=5, a new Mediterranean component emerges. This is highly represented in populations to the North, South, and East of the Mediterranean sea.

This component is noteworthy for its absence in India and Northeastern Europe.

In Northeastern Europe, the Mediterranean component is hardly represented at all, whereas the West/South Asian component, freed of its K=4 Mediterranean associations now makes its appearance.

Conversely, in the West Mediterranean, among Basques, Sardinians, Moroccans, and Mozabites the West/South Asian component vanishes to non-existence.


At K=6, a North African component emerges.

Notice its presence in the Near East and parts of Southern Europe.

The two regions can be contrasted in terms of their African components, with very high North/Sub-Saharan African ratio in Europe vs. much lower in the Near East.

The explanation for this seems straightforward, as Europe was affected by North Africa in prehistoric and historic times, whereas the Near East also shares a border with more southern parts of the African continent, as well as the potential influence of the medieval slave trade that seems to have affected Muslim Near Eastern populations disproportionately.


At K=7, a Southwest Asian component emerges which is highest in Arabia and East Africa. I could've called this Red Sea, but I've reserved this name for a similar component that emerges at higher K.

It is clear that this is the main Caucasoid component present in East Africa.

It vanishes to non-existence in the Northern fringe of Europe, in the British Isles, Scandinavia, and among the Finns and Lithuanians.

Another interesting aspect of its distribution is its presence in Pakistan but not India. Perhaps, in this case, it reflects historical contacts between the Islamic Near East and parts of South Asia.


At K=8, we observe most of the familiar components from the K=10 analysis of the Dodecad Project. However, the use of the framing populations has meant that these components emerge before either Africans or East Eurasians split.

Now, the South Asian component appears, which swallows up most of the E/S Asian component that previously linked South with East Asians. This component extends a great way to the Near East and eastern parts of the Caucasus.

Quite interestingly, the remainder of the Caucasoid component in South Asia that is not absorbed by the new South Asian component seems to be split between the West Asian and North/Central European components, with an absence of the South European component.

It is among the Lezgins of the Caucasus that such a combination occurs, on the western shore of the Caspian Sea. The same combination of Caucasoid components also occurs in Uzbeks and Chuvash.

I conclude from this that the Caucasoids who entered South and Central Asia were probably derived from the eastern fringes of the Caucasoid world where only the West Asian (in the south) and North/Central European (in the north) are in existence. The area around the Caspian Sea seems like an excellent candidate for their origin, as I have speculated before, as that region has two important properties:
  1. It is transitional between predominantly N/C European populations to the north and predominantly W Asian populations to the south
  2. It is the border of the influence of the S European element, with Georgians possessing some of it, while Lezgins do not.

At K=9, we see the emergence of specific Sardinian and Basque components. Normally this is undesirable, but, I believe this breakup serves to divide the previously inferred South European component meaningfully.

What was South European in lower K seems to have an Atlantic vs. Mediterranean dimension, with the Basque/Sardinian ratio being particularly high in the Atlantic facade of Europe. Conversely, this ratio is low in the Mediterranean as we move eastwards: it is already low in Italy and the Balkans and becomes virtually zero in Cypriots, Armenians, and Levantine Arabs.

North Africa is also particularly interesting in having a low Basque/Sardinian rate, even in Morocco. It appears that Sardinians are a much better proxy of European influences in the region than Basques are.

K=10 is particularly exciting because, for the first time, there is clear evidence of structure in the North/Central European component that can now be split, for the first time, into Northwestern and Northeastern ones.

The NW European component is maximized in Orcadians, and people from the British Isles in general, as well as in Scandinavia. These populations have a low NE/NW ratio, as do the French, Iberians, and Italians.

Conversely, Balto-Slavs have a high NE/NW ratio.

Interestingly, Greeks have a balanced NE/NW ratio (1.2), intermediate between Italians and Balto-Slavs. Similar balanced ratios are also found among Lezgins (1.08), Turks, and Iranians. I conclude that Slavic or other Eastern European admixture cannot account for the totality of presence of this component in Greeks.

Indians have a 1.8 NE/NW ratio. In Pakistan this is 6.5, in Uzbeks it is 2.9, and in the North Eurasian_Ra it is 14.2. My conclusion is that a single migration of steppe people from eastern Europe cannot account for the presence of North European-like genes in Asia.

I propose that a palimpsest of population movements has brought such elements into the interior of Asia: the migration of the early Indo-Iranians from West Asia or the Balkans with a balanced NE/NW ratio, and, the migration of steppe people from Eastern Europe with a high NE/NW ratio. The latter, did affect much of Asia, but it is in India, where Iranian groups did not penetrate in great numbers the lower ratio of the Indo-Aryans has been best preserved.

The case of the Finns is also interesting, as there is a surplus of NE over NW European elements. Their position is intermediate between Scandinavians and Lithuanians/Russians but toward the latter. So, Finns appear to (i) have a substratum similar to Balto-Slavs, (ii) to be influenced by Scandinavians, and (iii) with a balance of East Eurasian elements (5.8% at this analysis) preserving the legacy of their linguistic ancestors from the east. At present it is difficult to determine how much of the NE European component in Finns is due to their eastern ancestors who were presumably mixed Caucasoid/Mongoloid long before they arrived in the Baltic, and how much was absorbed in situ.


At K=11 the Ethiopian/East African component emerges, absorbing some of the Red Sea and Sub-Saharan components from the previous K=10 run.

In comparison to the East African component of the Dodecad Project analysis, this component is closer to West Eurasians than to Sub-Saharan Africans, and a residual Sub-Saharan element remains in the two East African (Ethiopian and East_African_D) population samples. Presumably this is due to the more complete sampling of Sub-Saharan genetic diversity using the Sub_Saharan_H "framing" population.

Outside Africa, both E African/Sub-Saharan components are present in the Near East and North Africa with higher E African/Sub-Saharan ratios in the Near East and lower ones in North Africa.

In Europe, there are low such ratios in the few populations where African admixture is present, together with some N African. We can probably conclude that African admixture is mostly due to North Africans, and African-influenced Near Eastern populations, rather than directly from Sub-Saharan Africa.

At K=12 the first uninformative cluster emerged, centered on Iraqi Jews, hence I decided to stop the analysis at this point.

Population Portraits

There is a plethora of population portraits in the download bundle, showing how admixture proportions vary in individuals within populations, and how they vary between successive K.

Here is, for example, the K=11 portrait of Cypriots. A picture of overall homogeneity of this sample emerges, but notice how the NW European and NE European have disjoint presence in the Cypriot individuals, with 5 having some of the former, 6 having some of the latter, and only 1 of these having both.

Compare with Lezgins (right) where these two components occur in all individuals. Whatever this admixture represents, it must be old enough if it is so uniformly distributed in the population.



Here are the Georgians at K=10. Notice that their NE European component is unevenly distributed, and in every case where it occurs it is accompanied by a thin slice of East Asian. This may well indicate partial Russian or other Eastern European ancestry in these individuals.



Side-by-side comparisons are also quite useful. Consider Armenians vs. Lezgins vs. Iranians at K=7







Notice how Lezgins, who live north of the Caucasus mountains possess some of the N/C European component, which the Armenians, who live to the south of them lack. This should come as no surprise, as the Lezgins inhabit parts of the ancient Sarmatia Asiatica. Compare with Iranians, who are differentiated by their Indo-European Armenian neighbors by the presence of a "S Asian" component, which, in turn, ties them to their Indo-Aryan linguistic relatives.

Much more can be said, but I'll let readers explore the data on their own, and draw their own conclusions from them.

January 02, 2011

A genetic map of West Eurasians

In the following, I have included all populations from Human genetic variation: the first ? components that had "West Eurasian" admixture of at least 75% at K=3.

I have also added some additional populations from Xing et al. and a number of Dodecad Ancestry Project populations, including some making their debut; sample sizes in several old populations have increased due to participation in the current submission opportunity.

The number of markers is ~37k in order to include the greatest variety of populations, but as will be seen, the ability to detect structure is not greatly diminished.

Some South Asian populations were above the 75% cutoff and form their own cluster at the bottom, with the isolated Kalash at some way off. The island of Sardinia is its own island in genetic space as well.

The most distinctive feature of this plot is the separation of Europeans from West Asians. The big hole framed by Chuvash (bottom), Greeks, Italians, and European Jews (top), Europeans (left), and West Asians (right) and probably reflects barriers to gene flow by the Black Sea and Aegean.

A fairly linear cluster to the right of this hole contrasts people from the Caucasus (Urkarah, bottom) with those from Arabia (Yemenese Jews, Saudis, Bedouins).

Using the Galore approach on just the first two MDS dimensions resulted in 13 clusters:

The distinctiveness of several populations is discovered by MCLUST using just the first two dimensions, confirming our visual impression, e.g., #13: Chuvash, #12: Kalash, #3: Sardinian.

Other clusters, correspond to multiple populations, e.g., #9: Caucasus, and #10: Arabians.

In the latter case, as I have mentioned several times before, we should not conclude that these populations are identical, but see whether they can be divided using additional MDS dimensions.

Indeed, using just 4 dimensions, MCLUST infers 46 clusters.

Even more clusters can be inferred with the usual set of 177k markers and more MDS dimensions, but, for now, I just wanted to make the point that even the smaller number of SNPs suffices to uncover population variation.

This allows us to amortize genotyping efforts using different chips with relatively few markers in common with most of the populations included in the Dodecad Project.

October 06, 2010

Eurasian ADMIXTURE (a precursor to Eurasian-DNA-Calc?)

(Last Update: Oct 8; K=7 added)

I took the 540,814 markers from the HGDP dataset that are also included in the 23andMe personal genomics test, and that have less than 1% no-call rate.

I ran ADMIXTURE on all the West- (and some mainly Caucasoid Central-) Eurasian populations, including Yoruba and Han Chinese to account for non-Caucasoid admixture in parts of Eurasia.

The populations are (left-to-right): Tuscan, North Italian, Sardinian, French, French Basque, Orcadian, Russian, Adygei, Palestinian, Bedouin, Druze, Mozabite, Pathan, Sindhi, Balochi, Brahui, Burusho, Yoruba, Han Chinese.

Here are the admixture proportions corresponding to this experiment:

This seems like a good starting point for the new EURASIAN-DNA-CALC I have in the works.

Relative to the existing EURO-DNA-CALC, doubling the number of ancestral populations (from 3 to 6), and increasing the number of SNPs (by 3 orders of magnitude) introduces some obvious computational problems. I have some ideas on how to resolve them, so stay tuned.

APPENDIX

For the sake of completeness here are the ADMIXTURE runs for K=3 to K=5.

At K=3, the three major races (Caucasoid: green, Mongoloid: red, Negroid: blue) emerge.
At K=4, the Caucasoids are split into West Eurasians (red) and Central Asians (purple)
At K=5, the West Eurasians are split into Europeans (yellow) and West Asians (blue)
PS: I will probably do some ADMIXTURE runs for K=7 and higher in the next few days; the results will be posted in this blog post as an update.

UPDATE (K=7)
The Druze get their own cluster (pink) with an average membership of 65.4% of Druze individuals

September 28, 2010

Some ADMIXTURE estimates in Eurasia

(Last Update: Sep 29)

Continuing my exploration of ADMIXTURE, I turned to the HGDP data, which has 660,918 SNPs for a wide assortment of worldwide populations. After pruning 12,086 SNPs with more than 1% missing genotypes, I was still left with ~650k SNPs.

Here are some experiments on this dataset. First, a clustering with K=2 of Han Chinese, Russians, and Orcadians (left to right)

The emergence of 2 clusters (red=Mongoloid, blue=Caucasoid) is as expected, with Russians showing a small participation in the red cluster (7.2%). These northern Russians are believed to have a substantial Finno-Ugric genetic origin, so this is inline with a recent estimate for the eastern component in the westernmost Finno-Ugric speakers being less than 10% (but see below).

Notice a couple of Chinese individuals with a small Caucasoid component: as I've mentioned before Mongolians, and presumably northern Han have a small Caucasoid component from early movements of Iranian speakers from the west. That's an advantage of doing your own admixture analysis, that you can look at the data at a fine detail, and not rely on the published figures.


Next, a clustering of Orcadians, Uygur, and Han Chinese:
The variable admixture in Uygurs is evident (47.2-63.7%, mean: 54.2%)

Next, a clustering of Druze, Bedouin, and Bantu from Kenya.

Druze appear complete Caucasoid (red), Bantu completely Negroid (save for a couple of individuals), while Bedouins show a quite variable minor Negroid component. This variable African contribution (0-17.6%) makes an elongated cluster out of Bedouins in a recent analysis, pulling them away from other Middle Eastern populations in a Sub-Saharan direction.

Finally, I clustered European populations together with Mandenka and Han Chinese:

The populations are in the following order: Han, Mandenka, Orcadian, French Basque, French, North Italian, Tuscan, Sardinian, Russian.

Here are the admixture proportions:


Notice how the eastern component in Russians is now estimated as 10.9%. This probably reflects the inclusion of French Basque and Sardinians, i.e., populations which have historically no opportunity for eastern Eurasian admixture, rather than only Orcadians. This underscores the importance of having appropriate poles in inter-continental admixture estimates (see Appendix I).

Note also that the 100% value for the Han Chinese is not incompatible with the presence of the two aforementioned Caucasoid-admixed individuals, who are present here with an estimated 1.9% and 0.5% such admixture. However, this contributes little to the sample average of 40+ individuals.

The minor (0.1%) Sub-Saharan admixture in Tuscans and Sardinians is also interesting. As you can guess from the figure, this stems from a handful of individuals (green specks) with less than 1% admixture, which is, however more than the numerical low of 0.001% inferred for most Europeans by the software.


UPDATE I: Eurasian Cline

Below is a run for the following populations (left-to-right: French Basque, Russians, Uygur, Mongolians, Daur, Han Chinese). Notice that the Mongolic-speakers (Mongolian and Daur from HGDP have a small Caucasoid admixture, as I have mentioned before.
APPENDIX I: The importance of choosing poles

The choice of appropriate poles in the estimation of inter-continental admixture is extremely important.

If there is a racial admixture continuum between two major races, such as we observe in Eurasia, then we can express each intermediate population as a weighted sum of populations that live to the east and west of it.

For example, I will use a variable in interval [0, 1] to represent the position in the continuum, with 0: pure western, and 1: pure eastern.

A population at 0.4 can be expressed as the following weighted sum:

0.4 = 0.6*0 + 0.4*1

i.e., as an admixture of 60% western, and 40% eastern.

But, it can also be expressed as e.g.,

0.4 = 0.612*0.02 + 0.388*1

Notice that the choice of a slightly eastward-tilted "western pole" (at position 0.02 in the continuum) has resulted in a reduction of the inferred eastern component (from 40% to 38.8%).

This is exactly what happened in our example: Russian eastern admixture reduced when we used Orcadians, rather than French Basque as the western pole.

Note also, that this is all done automatically: no one told ADMIXTURE to identify these two poles: it was the presence of unlabeled individuals from different ends of the spectrum that influenced the admixture estimates for the rest.

APPENDIX II: Latent populations

Another important point that needs to be remembered has to do with the possible existence of latent ancestral populations.

For example, it is true that Eurasia (minus South Asia) is economically described as a continuum from the Caucasoids of the Atlantic coast to the Mongoloids of the Pacific, with a transition zone in Central Asia and Siberia, and spillovers on either side. But, we cannot exclude the prehistoric existence of other races in the Eurasian landmass that do not exist today in a relatively unadmixed form.

In Eurasia, the Proto-Uralic race was postulated as such a "third race" with features of its own and not reducible to simple Caucasoid-Mongoloid admixture. It is difficult to see whether these features are ancestral peculiarites (prior to admixture with Caucasoids and Mongoloids), or if they have arisen in a mixed Caucasoid-Mongoloid population.

It is also important to understand how such latent populations affect genetic continua:

First, if the latent population is equidistant from the two major races, then its admixture has no effect on an individual's position in the continuum between the two races. However, it is possible that the latent population was more related to one of the two major races. In that case, admixture with it will move a population towards that race.

So while the jury is still out about the existence of a Proto-Uralic race in Eurasia, its effects on admixed populations indicates that if it had existed it was genetically closer to Mongoloids than to Caucasoids.