Showing posts with label Finland. Show all posts
Showing posts with label Finland. Show all posts

August 17, 2014

Indo-Europeans preceded Finno-Ugrians in Finland and Estonia

According to an abstract of a Ph.D thesis (below). This would appear to work well with the dating of the signature Y-chromosome haplogroup of Finno-Ugrians. 

Bidrag till Fennoskandiens språkliga förhistoria i tid och rum (Heikkilä, Mikko)
My academic dissertation "Bidrag till Fennoskandiens språkliga förhistoria i tid och rum" ("Spatiotemporal Contributions to the Linguistic Prehistory of Fennoscandia") is an interdisciplinary study of the linguistic prehistory of Northern Europe chiefly in the Iron Age (ca. 700 BC―AD 1200), but also to some extent in the Bronze Age (ca. 1700―700 BC) and the Early Finnish Middle Ages (ca. AD 1200―1323). The disciplines represented in this study are Germanistics, Nordistics, Finnougristics, history and archaeology. The language-forms studied are Proto-Germanic, Proto-Scandinavian, Proto-Finnic and Proto-Sami. This dissertation uses historical-comparative linguistics and especially loanword study to examine the relative and absolute chronology of the sound changes that have taken place in the proto-forms of the Germanic, Finnic and Samic languages. Phonetic history is the basis of historical linguistics studying the diachronic development of languages. To my knowledge, this study is the first in the history of the disciplines mentioned above to examine the systematic dating of the phonetic development of these proto-languages in relation to each other. In addition to the dating and relating of the phonetic development of the proto-languages, I study Fennoscandian toponyms. The oldest datable and etymologizable place-names throw new light on the ethnic history and history of settlement of Fennoscandia. For instance, I deal with the etymology of the following place-names: Ahvenanmaa/Åland, Eura(joki), Inari(järvi), Kemi(joki), Kvenland, Kymi(joki), Sarsa, Satakunta, Vanaja, Vantaa and Ähtäri. 
My dissertation shows that Proto-Germanic, Proto-Scandinavian, Proto-Finnic and Proto-Sami all date to different periods of the Iron Age. I argue that the present study along with my earlier published research also proves that a (West-)Uralic language – the pre-form of the Finnic and Samic languages – was spoken in the region of the present-day Finland in the Bronze Age, but not earlier than that. In the centuries before the Common Era, Proto-Sami was spoken in the whole region of what is now called Finland, excluding Lapland. At the beginning of the Common Era, Proto-Sami was spoken in the whole region of Finland, including Southern Finland, from where the Sami idiom first began to recede. An archaic (Northwest-)Indo-European language and a subsequently extinct Paleo-European language were likely spoken in what is now called Finland and Estonia, when the linguistic ancestors of the Finns and the Sami arrived in the eastern and northern Baltic Sea region from the Volga-Kama region probably at the beginning of the Bronze Age. For example, the names Suomi ʻFinlandʼ and Viro ʻEstoniaʼ are likely to have been borrowed from the Indo-European idiom in question. (Proto-)Germanic waves of influence have come from Scandinavia to Finland since the Bronze Age. A considerable part of the Finnic and Samic vocabulary is indeed Germanic loanwords of different ages which form strata in these languages. Besides mere etymological research, these numerous Germanic loanwords make it possible to relate to each other the temporal development of the language-forms that have been in contact with each other. That is what I have done in my extensive dissertation, which attempts to be both a detailed and a holistic treatise.

August 06, 2014

Dairy farming transition ~2,500 years BC in the far north of Europe

Proceedings of the Royal Society B doi: 10.1098/rspb.2014.0819

Neolithic dairy farming at the extreme of agriculture in northern Europe

Lucy J. E. Cramp et al.

The conventional ‘Neolithic package’ comprised animals and plants originally domesticated in the Near East. As farming spread on a generally northwest trajectory across Europe, early pastoralists would have been faced with the challenge of making farming viable in regions in which the organisms were poorly adapted to providing optimal yields or even surviving. Hence, it has long been debated whether Neolithic economies were ever established at the modern limits of agriculture. Here, we examine food residues in pottery, testing a hypothesis that Neolithic farming was practiced beyond the 60th parallel north. Our findings, based on diagnostic biomarker lipids and δ13C values of preserved fatty acids, reveal a transition at ca 2500 BC from the exploitation of aquatic organisms to processing of ruminant products, specifically milk, confirming farming was practiced at high latitudes. Combining this with genetic, environmental and archaeological information, we demonstrate the origins of dairying probably accompanied an incoming, genetically distinct, population successfully establishing this new subsistence ‘package’.

Link

November 26, 2012

Medieval signal of Swedish (?) admixture in Finland

I took the FIN (Finnish), GBR (British), and CDX (Chinese Dai) samples of the 1000 Genomes Project, each of which has a sample size of 100 in order to investigate the signal of East-West Eurasian admixture in Finns. While neither Britons nor Dai could be imagine of having contributed to Finns directly, they ought to make useful proxies of a NW European population lacking recent East Eurasian ancestry, and an East Eurasian population lacking recent West Eurasian ancestry respectively.

In the following, I will assume a generation length of 29 years and a sample birthyear of 1980 as in previous experiments.

First, the 1-reference analysis of FIN using GBR produced an admixture proportion lower bound of 37.4 +/- 5.1 percent.

The corresponding analysis of FIN using CDX produced an admixture proportion lower bound of 4.4 +/- 1.0 percent.

The 2-ref admixture test with {GBR,CDX} reported success:

Test SUCCEEDS (z=2.76, p=0.0057) for FIN with {GBR, CDX} weights
But, the decay rates were inconsistent, a situation which might occur when major admixture from different sources took place at different times. In particular, the one using CDX corresponded to 65.57 +/- 8.36 generations, and the one using GBR to 25.48 +/- 4.93 generations.

In calendar dates, Finns are estimated to have mixed with an East Eurasian CDX-like population between 170BC-320AD and with a NW European GBR-like population between 1100-1380AD.

The central date of the latter estimate is 1,240AD, which corresponds quite closely to the beginning of Swedish rule and is in the middle of the 13th. century, between the time when Finland was initially claimed for western Christendom (12th c.) and the time when the conflict between Sweden and Russia was settled (14th c.).

October 18, 2012

ADMIXTURE tracks Amerindian-like admixture in northern Europe

I have recently assembled a new "world" dataset of 4,280 individuals that I am currently incrementally analyzing with ADMIXTURE. But, I noticed an interesting pattern at K=4 that I wanted to share right away.

4 ancestral populations emerge at this level of resolution, which I have named: European, Asian, African, Amerindian. The names aren't important, and you can replace them with whatever you prefer. 

The interesting thing about this K=4 analysis is that European populations show evidence of Amerindian admixture, consistent with the pattern inferred using f-statistics, where European populations show admixture between Sardinians and a Karitiana-like population.

This pattern may have emerged at previous ADMIXTURE analyses at this level of resolution, but thanks to the f3 evidence presented in previous posts, it is now clear that it is no quirk of ADMIXTURE, but indicative of a real (albeit still rather mysterious) pattern of gene flow that differentially affected European populations.

For example, the Irish_D population has 7.6% of the Amerindian component, and so do HGDP Orcadians. HGDP Sardinians have only 1.7% of it, which appears to be the minimum in Europe, with French_Basque having more at 4.6%.

Another interesting observation is that West Eurasian populations that show an excess of East Eurasian-like admixture appear to be doing so for two separate reasons. For example, HGDP Russians have 11.7% of Amerindian component, but also 4.5% of "Asian", and 1000 Genomes Finns have 3.3% Asian and 12% Amerindian. Behar et al. (2010) Turks, on the other hand, have 9.9% Asian and 2.2% Amerindian. All these populations are East Eurasian-shifted relative to Sardinians, a pattern which can also be observed by looking at the K=3 analysis, but for apparently different reasons.

The pattern for Near Eastern populations is also interesting. For example, Yunusbayev et al. (2011) Armenians have 0% of the Amerindian component, and 5.7% of the Asian, and all three HGDP Arab populations (Druze, Palestinian, Bedouin) also have 0% of the Amerindian component, with variable levels of the Asian.

It would appear that whatever process contributed Amerindian-like admixture in Europeans, minimally affected Near Eastern populations, with Sardinians being demonstrably related to Neolithic Europeans (thanks to ancient DNA evidence), tilting towards the Near Eastern pattern. On the other hand, Near Eastern populations show evidence of Asian admixture, which probably involves unresolved East Asian/ASI ancestry, and will be resolved at higher K. Sardinians appear to be at the end of three clines: (i) Amerindian-like cline of Europe-Siberia-Americas, (ii) East Asian-like cline of Europe-Central Asia/Siberia-East Asia, (iii) ASI-like cline of Europe-Near East-South Asia. These are separate, but not independent phenomena.

To confirm that the signal picked up by ADMIXTURE tracks the signal picked up by ADMIXTOOLS formal tests, I calculated the following D-statistic:

D(Sardinian, European, Karitiana, San)

where European is any population with a sample size of at least 10, and which belonged at 99% in the European+Amerindian components:


And, here is a scatterplot:
The correlation is clear, and the Pearson coefficient is -0.96. This means that populations with higher % Amerindian, as estimated by ADMIXTURE, also show higher D-statistic evidence for admixture.

What of the actual estimates of admixture produced by ADMIXTURE? Using the F4 ratio test, I recently showed that African admixture in Sardinians confounds estimates of Amerindian-like admixture in northern Europeans and vice versa (Amerindian-like admixture in northern Europeans confounds African admixture in Sardinians).

In that experiment, I "scrubbed" Sardinians to remove segments of African ancestry, and showed that estimates of Amerindian-like admixture in the CEU population diminished from 13.9% to 8.8%. The latter seems reasonably close to the 7.1% inferred by ADMIXTURE.

On balance, I would say that ADMIXTURE at K=4 provides a good proxy for the effect described in Patterson et al. (2012). Its results are more difficult to interpret, because its underlying model does not take into account evolutionary relationships between populations. On the other hand, it has the advantage of being able to handle multiple ancestral populations, and has consistently proven able to generate useful data that correlate well with those from other techniques of population genetics.

October 13, 2012

An estimate of the admixture time for Finns

Using a similar procedure as in my recent post on the Baltic (Update II), I used 15 FIN individuals from the 1000 Genomes together with 12 Nganasans from Rasmussen et al. (2010) as reference populations, and 15 other FIN individuals to estimate admixture LD in a rolloff analysis. Three outlier Nganasan individuals (GSM558800, GSM558802, GSM558807) were removed.
The estimated time of admixture is 86.095 +/- 10.187 generations, or 2500 +/- 300 years. It corresponds rather well to the beginning of the Iron Age in northern Europe.

As I mention in my previous post, there is evidence for intrusive cultures (Battle Axe and Seima Turbino) converging on the area from different directions during the preceding Bronze Age. If the above date is accurate, it will suggest a rather late admixture event between the Europeoid and Siberian elements of Finns. The former may have included both the descendants of Mesolithic European hunter-gatherers and intruders from Central Europe (Corded Ware/Battle Axe); the latter may have included both Comb Ceramic and the descendants of the Seima Turbino metallurgists.

October 10, 2012

The Indo-European invasion of the Baltic

In some recent posts, I showed that South Asian populations (North Indian BrahminsSouth Indian Brahmins) can be seen as mixtures of West Eurasian and South Indian populations, but also that West Eurasians (BulgariansGreeksArmenians, and French) can be seen as mixtures of South Asian and Sardinian populations.

This may seem strange, but can be explained if we understand how f3-statistics and rolloff actually work. These methods do not require pure or unadmixed ancestral populations, but exploit allele frequency differences in the reference populations together with either (i) allele frequencies in the mixed population, in the case of f3-statistics, or (ii) admixture linkage disequilibrium in the mixed population, in the case of rolloff.

If a and b are allele frequencies in two ancestral populations A and B that mix, then:

  • The frequency of a will shift towards b if A experiences gene flow from B
  • The frequency of a will randomly shift if A experiences gene flow from an "outgroup" population
  • The frequency of a will shift towards b if A experiences gene flow from a third population that is geographically and genetically intermediate between A and B

An application to the Europe-South Asia cline

I took the following set of populations, and calculated all 1,365 possible f3-statistics:
"FIN30"         "Lithuanians"   "Russian"       "Pathan"        "Balochi"       "North_Kannadi" "Polish_D"      "Russian_D"     "Mixed_Slav_D"  "Bulgarian_D"   "Serb_D"        "Ukrainian_D"   "Belorussian"   "Bulgarians_Y"  "Ukranians_Y"
In the following table, I report the lowest Z-scores for each target population (third column). So, for example, Polish_D can be seen as a mixture of Lithuanians and Balochi. Only negative scores are indicative of admixture. I highlight in bold the significant negative scores (Z less than -3)


Lithuanians North_Kannadi FIN30 0.001606 0.000259 6.193 280043
Ukrainian_D Belorussian Lithuanians 0.00078 0.000299 2.614 268493
Lithuanians North_Kannadi Russian -0.002738 0.000248 -11.045 279965
North_Kannadi Polish_D Pathan -0.006959 0.000229 -30.344 280220
North_Kannadi Bulgarians_Y Balochi -0.003636 0.000246 -14.781 281604
Pathan Ukrainian_D North_Kannadi 0.033802 0.000623 54.237 271858
Lithuanians Balochi Polish_D -0.001171 0.000178 -6.581 279519
Lithuanians Pathan Russian_D -0.001829 0.000166 -11.026 280658
Lithuanians Pathan Mixed_Slav_D -0.001715 2e-04 -8.594 277635
Lithuanians Balochi Bulgarian_D -0.001247 0.000313 -3.979 272342
Lithuanians Balochi Serb_D -0.00091 0.000377 -2.416 270807
Lithuanians Balochi Ukrainian_D -0.002222 0.000358 -6.211 270399
Lithuanians Balochi Belorussian -0.000897 0.00027 -3.325 273076
Balochi Polish_D Bulgarians_Y -0.001198 0.000185 -6.481 279632
Lithuanians Balochi Ukranians_Y -0.001727 0.000187 -9.236 278677

It is clear, that what I have described holds here: European populations appear like mixtures of Lithuanians and South Asians; conversely, South Asian populations appear like mixtures of Europeans and North Kannadi.

This does not mean that the populations that appear unadmixed (FIN30, Lithuanians, North_Kannadi, and Serbs) are in fact so, for at least two reasons:
  1. The f3 statistic confirms, but does not reject the presence of admixture; in particular, it fails to find real admixture in highly drifted populations
  2. The f3 statistics exploits allele frequency correlations between populations: but the North Kannadi and Lithuanians/Finns occupy opposite ends of the studied cline, so their lack of signal of admixture may be due to the non-existence of populations that are even more unadmixed than themselves.
In the case of South Indians, we are completely sure that this is the case. Reich et al. (2009) managed to show this not because there are any unadmixed Ancestral South Indians (ASI) left, but because they exploited the existence of the Onge, an isolated group from the Andaman Islands that was a sister group to the ASI. So, we can be fairly sure that southern Indians themselves have West Eurasian-like admixture, even the ones that are at the end of the West Eurasia-South India cline on its southern end.

The problem is: there is no isolated group of unadmixed Europeans left in existence that might serve a similar proxy function as the Onge did for South Asians.

Enter Pickrell et al. (2012) to the rescue. In that paper, the authors studied admixture in the Khoe-San of South Africa. Now, many of the Khoe-San sub-groups appeared to be admixed, but the "Juj'hoan North" population appeared to be at the "end of the cline": it's impossible to detect admixture in them using alelle frequency differences, because, quite simply, there are no populations that are less unadmixed than them: they're as pure descendants of "Ancestral Bushman" as exist on the earth today.

But, the clever thing is, that we don't have to detect admixture only using allele frequency differences, but also using admixture LD, i.e., by exploiting the correlation between linkage disequilibrium (the co-inheritance of physically separated markers on a chromosome) and allele frequency differences between populations. Pickrell el al. were able to do this not by conjuring up a more unadmixed population than the "Juj'hoan North" one available to them, but by splitting up that population, and using one half to find allele frequency differences, and the other half to detect admixture LD.

Admixture LD signal in Lithuanians

Using the aforementioned idea, I set out to see whether Lithuanians, who occupy the European end of the Europe-South Asia cline present such a signal of admixture LD. I used the Lithuanian_D sample from the Dodecad Project and the Balochi HGDP sample as reference populations (to calculate allele frequency differences), and the Behar et al. (2010) Lithuanians for admixture LD. There were only ~300k SNPs usuable in this set, but sufficient to detect the signal of admixture LD:
The admixture time estimate is 200.350 +/- 61.608 generations, or 5,810 +/- 1790 years. This is not very precise, probably because of the small number of SNPs and individuals used, but it certainly points to the Neolithic-to-Bronze Age for the occurrence of this admixture. The date is certainly reminiscent of the expansion of the Kurgan culture out of eastern Europe, or, the later Corded Ware culture of northern Europe.

So, it may well appear that at least some of the people participating in these groups of cultures, were indeed influenced by the Indo-Europeans as they expanded from their West Asian homeland. These intruders mixed with eastern Europeans who vacillated during the late Neolithic between a northern Europeoid pole akin to Mesolithic hunter gatherers from Gotland and Iberia, and a widely dispersed Sardinian-like population that is in evidence at least in the Sweden-Italian Alps-Bulgaria triangle. The gradual appearance of non-mtDNA U related lineages in Siberia and Ukraine is most likely related to this phenomenon.

It would seem that the Proto-Indo-Europeans mixed with different substrata in the four directions of their expansion: Sardinian-like people in southern Europe, Lithuanian-like people in northern Europe, South Indian-like people in South Asia, and East Eurasians in Siberia and east central Asia. Extant groups are descendants of divergent Neolithic population groups, brought closer together (genetically) because of variable admixture with the PIE population and its early offshoots.

Conclusion

There are mutual signals of admixture across a Europe-South Asia cline: Europeans appear to be mixed with South Asians, and South Asians appear to be mixed with Europeans. The simplest explanation for this pattern involves expansion of a third, geographically and genetically intermediate population that affected both Europe and South Asia. We can use the signal of admixture LD to prove that this expansion affected some of the most unadmixed populations in Europe (e.g., Lithuanians), just as it did the most unadmixed populations of India (e.g., Dravidians).

It will be interesting to use these techniques to study signals of admixture in other "end of the line" populations such as Sardinians, South Indians, etc.

UPDATE I (rolloff analysis of Poles):

I have carried out rolloff analysis of my 25-strong Polish_D sample using Lithuanians and Pathans as references:
The signal is fairly distinct, and corresponds to 149.296 +/- 38.783 generations or 4330 +/- 1120 years. I am guessing that either the different reference population (Pathans vs. Balochi), or, more likely the increased number of target individuals (25 vs. 10) have contributed to the narrowing down of the uncertainty. It will be interesting to explore this signal further with more population pairs.

UPDATE II (rolloff analysis of Finns):

I have also used the 1000 Genomes Finnish sample (FIN) in a similar manner as Lithuanians, using 15 individuals to estimate allele frequency differences, and 15 ones for admixture LD, and using the Pathans as a South Asian reference population. There is a clear signal of admixture:
This dates to 104.967 +/- 14.797 generations, or 3,040 +/- 430 years. Finland came under the influence of both Europeans (and likely Indo-Europeans) during the Bronze Age period (a mixture of Battle Axe with local Comb Ceramic seems to have occurred), as well as likely non-European (and likely Uralic) intrusions during the same time frame, as part of the Seima-Turbino phenomenon. It will be interesting to repeat this analysis with an East Eurasian reference population to isolate potential signals of admixture dating to either the Comb Ceramic or Seima-Turbino episodes of migration.

(Note; added Oct 14): I carried out rolloff analysis using Nganassans as suggested in the above paragraph here.

UPDATE III (rolloff analysis of Ukrainians):

I have used the Yunusbayev et al. sample of Ukrainians, and estimated its admixture time using Lithuanians and Balochi as reference populations:
The admixture time estimate is 191.078 +/- 35.079 generations, or 5,540 +/- 1,020 years. It seems very similar to that in Lithuanians, with a smaller standard error, perhaps on account of either the larger number of SNPs or larger number of individuals.

It is tempting to associate this admixture signal with the Maikop culture which appeared at around this time. Assuming that North_European/West_Asian (or Lithuanian-like and Balochi-like) gene pools existed north and south of the Pontic-Caspian-Caucasus set of geographical barriers, then the Maikop culture which shows links to both the early Transcaucasian culture and those of Eastern Europe would have been an ideal candidate region for the admixture picked up by rolloff to have taken place. There are, of course, other possibilities.

UPDATE IV (rolloff analysis of Lithuanians with Pathan reference):

I repeated the first analysis of this post, but this time, I used Pathans, rather than Balochi as a reference population:
The admixture time estimate of 217.501 +/- 51.170 generations, or 6,310 +/- 1,480 years appears to be similar with the original estimate of 5,810 +/- 1790 years, so it does not appear that the use of Balochi or Pathan as a reference population much affects this result.

August 20, 2012

Visualizing admixture differences with ACD tool

Vaêdhya has created a new ACD tool that allows one to visualize differences between sets of populations in terms of admixture components. He also posts two examples of the application of his tool on data generated by myself in the Dodecad Project, as well as by the Harappa Project.

 I have speculated about the origins of Indo-Iranians before, noting that the evidence links even the Kurds with a "South Asian" component; in subsequent higher-resolution analysis, such as the K12b, it appeared that this component was related to the Gedrosia component. In any case, the evidence is clear about the links of different Iranian and Indo-Aryan groups, so it is nice that this can be made evident with the ACD tool and data from the Harappa Project. Notice the excess of the Baloch (~Gedrosia) component in Kurds and Iranians in contradistinction to the Indo-European Armenians and Semitic Assyrians. It is fairly clear to me that the Iranian ancestral homeland is to be sought to the east, with the Bactria-Margiana Archaeological Complex (BMAC) being a good candidate for its location.

In a second plot, Vaêdhya uses Dodecad data to contrast patterns of differences in Northeastern Europe. Here, too, the patterns are clear, with Finns, and secondarily Russians showing an excess of Siberian ancestry relative to Poles. This is, no doubt, due to the Finnic element, which links Finns, and the Uralic substratum in Russians with Siberia. A second contrast is between Finns and Russians/Poles. The latter have more of the Caucasus component, a probable legacy of the Bronze Age Indo-European invasion of Europe. A final contrast is the higher Atlantic_Med element in Poles, which suggests an excess of early Neolithic farmer ancestry, or, admixture with West European populations such as Germans and others who possess more of this component than Slavs.

August 31, 2011

ICHG 2011 abstracts are online

You can search here. I will update this entry with any interesting abstracts I've identified and my early comments on them, if any.

UPDATE:

I will add abstracts to this entry one by one, with the newer ones added to the top of the post.

Demographic histories of African hunting-gathering populations inferred from genome-wide SNP variation.
S. Soi et al.

Africa is the geographic origin of anatomically modern humans; it is also home to a third of all modern languages, including four major language families: Niger-Kordofanian, Afro-Asiatic, Nilo-Saharan, and Khoesan. Despite the importance of African populations for studying human origins and the complexity of demographic and linguistic relationships among African populations, genome-wide analyses of sub-Saharan variation have been sparse. To address this deficiency, we used Illumina 1M-Duo SNP arrays to genotype samples (N=697) from 44 sub-Saharan populations, which we supplemented with published data sets. Principal components analysis (PCA) and linear regression were used to assess the statistical effect of geography and linguistics on the partitioning of genetic variation. As ascertainment bias can distort the allele frequency spectrum, we examined patterns of linkage disequilibrium (LD), haplotype sharing, and identity by descent (IBD) to understand the demographic relationship among populations. To affirm that LD-based analyses were robust to ascertainment bias, we assessed the rank correlation of estimates of effective population size from the rate of LD decay within populations and estimates of population size based on the variance of microsatellite repeat lengths from previously published data (Spearman’s ρ=0.782, p=0.011). Additionally, the presence of long IBD tracts between individuals indicates recent common ancestry. Thus, we used the GERMLINE algorithm to infer IBD tracts between individuals in hunting-gathering populations and neighboring agriculturalist and pastoralist populations. To infer the time to most recent common ancestor and test demographic models while accounting for the confounding effects of migration and changes in population sizes, we employed Approximate Bayesian Computation (ABC) using summaries of haplotype frequency, diversity and sharing within and between populations. We report, for the first time, evidence for recent common ancestry of Ethiopian hunter-gatherers and the Kenyan Sanye/Dahalo, who speak a language with remnant clicks, with click-speaking eastern African Khoesan populations. This work supports archaeological and linguistic studies that indicate that the distribution of Khoesan speaking populations may have extended as far north as Ethiopia.
Not very surprising to me, as I detected a contribution of the "Palaeo_African" component (which has one of its peaks in San) in East Africans.

Comparative study of the Y chromosome diversity in some ethnic groups living in Iran and populations of the Middle East.
L. Andonian et al.

Background: The main goal of this study is to conduct a population genetic study of: a) Armenians living in Iran, in the context of general Armenian population; and b) Iranian Azeris, one of the biggest ethno-linguistic communities, in comparison with other Turkic-speaking populations of the Middle East (from eastern Turkey, Azerbaijan Republic and Turkmenistan). Methods: Buccal cells of 89 Armenian males from central Iran, the descendants of Armenians forcibly moved to Iran in the beginning of 17th century CE, and 105 Turkic-speaking Azeri males from north-west Iran (Tabriz) were collected by mouth swabs. The samples were screened for 12 Single Nucleotide (SNP) and 6 microsatellite markers on the non-recombining portion of the Y chromosome. The results of genetic typing were statistically analyzed using Arlequin software. Results: Iranian Armenians display a moderate level of genetic variation and are genetically closer to Western Armenians which is in agreement with historical records. Iranian Azeris demonstrate much weaker genetic resemblance with Turkmens (as putative source population) than with their geographic neighbors. Conclusion: Political, religious and geographic isolation had moderate influence on the genetic structure of modern Iranian Armenians during the last four centuries, which is expressed in lower diversity of their patrilineal genetic legacy. The imposition of Turkic language to the populations of north-west Iran was realized predominantly by the process of elite dominance,i.e. by the limited number of invaders who left weak traces in the patrilineal genetic history of Iranian Azeris.

A direct characterization of human mutation.
J. X. Sun et al.

Mutation and recombination provide the raw material of evolution. This study reports the largest study of new mutations to date: 2,058 germline mutations discovered by analyzing 85,289 Icelanders at 2,477 microsatellites. We find that the paternal-to-maternal mutation rate ratio is 3.3, and that the mutation rate in fathers doubles between the ages of 15 to 45 whereas there is no association to age in mothers. Strong length constraints apply for microsatellites, with longer alleles tending to mutate more often and decrease in length, whereas shorter alleles tending to mutate less often and increase in length. Based on these direct observations of the microsatellite mutation process, we build a model to estimate key parameters of evolution without calibration to the fossil record. The sequence substitution rate per base pair is estimated to be 1.84-2.21×10-8 per generation (95% credible interval). Human-chimpanzee speciation is estimated to be 3.92-5.91 Mya, challenging views of the Toumaï fossil as dating to >6.8 Mya and being on the hominin lineage since the final separation of humans and chimpanzees.
This microsatellite based estimate of human-chimp speciation contrasts with a recent SNP-based estimate of 7 million years.

Genetic structure of Jewish populations on the basis of genome-wide single nucleotide polymorphisms.
N. M. Kopelman

The Jewish population forms a genetically structured population, due to historical migrations and diverse histories of the various Jewish communities. Discerning the ancestry and population structure of different Jewish populations is important for understanding the complex history of the Jewish communities as well as for research on the genetic basis of disease. Using >500,000 genome-wide single-nucleotide polymorphisms, we investigated patterns of population structure in 438 samples from 30 Jewish populations in the context of additional samples from non-Jewish populations. The collection of Jewish populations studied incorporates a variety of populations not previously included in other genomic population structure studies of Jewish groups (e.g. NM Kopelman et al. 2009 BMC Genet 10:80; G Atzmon et al. 2010 AJHG 86:850-859; DM Behar et al. 2010 Nature 466:238-242; SM Bray et al. 2010 PNAS 107:16222-16227; JB Listman et al. 2010 BMC Genet 11:48). We identify fine-scale population structure within the Jewish samples, including notable distinctions separating Ashkenazi, Mizrahi, Sephardi, and North African populations. Additionally, we identify distinctions within major regional groups, including a separation among the North African populations of Libyan, Moroccan, and Tunisian Jewish samples and a separation among the Mizrahi populations of Bukharan, Georgian, Iranian, and Iraqi Jewish samples. These results supply enhanced information regarding Jewish population structure, providing a basis for further detailed analysis of the genetic history of Jewish populations.
Hopefully the wealth of this new Jewish and non-Jewish data will be made publicly available.

LD patterns in dense variation data reveal information about the history of human populations worldwide.
S. Myers et al.

A detailed understanding of population structure in genetic data is vital in many applications, including population genetic analyses and disease gene mapping, and relates directly to human history. However, there are still few methods that directly utilize information contained in the haplotypic structure of modern dense, genome-wide variation datasets. We have developed a set of new approaches, founded on a model first introduced by Li and Stephens, which fully use this powerful information, and are able to identify the underlying structure in large datasets sampling 50 or more populations. Our methods utilize both Bayesian model-based clustering and principal component analyses, and by using LD information effectively, consistently outperform existing approaches in both simulated and real data. This allows us to infer ancestry with unprecedented geographical precision, in turn enabling us to characterize the populations involved in ancient admixture events and, critically, to precisely date such events. We applied our new techniques to combined data for 30 European populations sampled by us, or publicly available, and the worldwide HGDP data. We find almost all human populations have been influenced by mixture with other groups, with the Bantu expansion, the Mongol empire and the Arab slave trade leaving particularly widespread genetic signatures, and many more local events, for example North African (Moroccan) admixture into the Spanish that we date to 834-1394AD. Dates of admixture events between European groups and groups from North Africa and the Middle East, seen in multiple Mediterranean countries, vary between 800 and 1700 years ago, while Greece, Croatia and other Balkan states show signals of admixture consistent with Slavic migration from the north, which we date to 600-1000AD. At the finest scale, we are able to study admixture patterns in data gathered by a project (POBI) examining people within the British Isles. Our approaches reveal genetic differences between individuals from different UK counties, and show that the current UK genetic landscape was formed by a series of events in the millennium following the fall of the Roman Empire.
Existing methods (see comments below) for dating historical admixture events differ from each other by a factor of two, and they all assume a 2-population model. Hopefully the research described here will be an improvement, especially if it is encapsulated in an easy-to-use piece of software. It will definitely be interesting to see the evidence for Slavic admixture in the Balkans, which probably corresponds somewhat to the "East European" component discovered in the Dodecad Project which differentiates Balkan populations from their Italian and West Asian neighbors.


Evidence for extensive ancient admixture in different human populations.
J. Wall et al.

We generated whole-genome sequences from four Biaka pygmies and analyzed them along with the publicly available genomes of 69 individuals from a range of different ethnicities. We scanned each of the 73 genomes for regions with unusual patterns of genetic variation that might have arisen due to ancient admixture with an ‘archaic’ human group. While a majority of the most extreme regions were really misalignment errors, we did find hundreds of regions that likely introgressed in from archaic human ancestors, and we estimate the amount and the timing of these ancient admixture events. These regions were found in the genomes of both sub-Saharan African and non-African populations. While Neandertals are a natural source population for ancient admixture into non-Africans, the source for ancient admixture into sub-Saharan African populations is less obvious.
Wall and Hammer have been arguing for archaic admixture for years, and there's a good chance they finally found the "smoking gun" here. I've argued before that Homo sapiens was not the only species in Africa at the time of its emergence, due to the great ecological diversity of the continent, and the long adaptation of humans there. We are unlikely to ever be able to find and sequence Paleolithic non-sapiens Homo from tropical Africa, but the signal is there to be discovered in modern African hunter-gatherers.

Validating the authenticity of the pedigrees of Chinese Emperor CAO Cao of 1,800 years ago.
H. Li

Deep pedigrees are of great value for studying the Y chromosome evolution. However, the authenticity of the pedigree information requires careful validation. Here, we validated some deep pedigrees in China with full records of 70-100 generations spanning over 1,800 years by comparing their Y chromosomes. The present clans of these pedigrees claim to be descendants of Emperor CAO Cao (155AD-220AD). Haplogroup O2-M268 is the only one that is enriched significantly in the claimed clans (P=9.323×10-5, OR=12.72), and therefore, is most likely to be that of the Emperor. Moreover, our analysis showed that the Y chromosome haplogroup of the Emperor is different from that of his claimed ancestry of the earlier CAO aristocrats (Haplogroup O3-002611). This study offers a successful showcase of the utility of genetics in studying the ancient history.
This is probably the oldest attested Y-chromosome lineage currently available. Confucius next? It will be interesting to know how many likely Cao descendants there are today, as a control on the rate with which a socially-selected lineage can grow.

Exceptions to the "One Drop Rule"? DNA evidence of African ancestry in European Americans.
J. L. Mountain et al.

Genetic studies have revealed that most African Americans trace the majority (75-80%, on average) of their ancestry to western Africa. Most of the remaining ancestry traces to Europe, and paternal lines trace to Europe more often than maternal lines. This genetic pattern is consistent with the "One Drop Rule,” a social history wherein children born with at least one ancestor of African descent were considered Black in the United States. The question of how many European Americans have DNA evidence of African ancestry has been studied far less. We examined genetic ancestry for over 77,000 customers of 23andMe who had consented to participate in research. Most live in the United States. A subset of about 60,000 shows genetic evidence of fewer than one in 16 great-great-grandparents tracing ancestry to a continental region other than Europe. They are likely to consider themselves to be entirely of European descent. We conducted two analyses to understand what fraction of this group has genetic evidence of some ancestry tracing recently to Africa. We first identified individuals whose autosomal DNA indicates that they are predominantly of European ancestry, but who carry either a mitochondrial (mt) DNA or Y chromosome haplogroup that is highly likely to have originated in sub-Saharan Africa. Of the 60,000 individuals with 95% or greater European ancestry, close to 1% carry an mtDNA haplogroup indicating African ancestry. Of approximately 33,000 males, about one in 300 trace their paternal line to Africa. We then identified the subset of these European Americans who have estimates of between 0.5% and 5.0% of ancestry tracing to Africa. This subset constitutes about 2% of this set of individuals likely to be aware only of their European ancestry. The majority (75%) of that group has a very small estimated fraction of African ancestry (about 0.5%), likely to reflect African ancestry over seven generations (about 200 years) ago. We estimate that, overall, at least 2-3% of individuals with predominantly European ancestry have genetic patterns suggesting relatively deep ancestry tracing to Africa. This fraction is far lower than the genetic estimates of European ancestry of African Americans, consistent with the social history of the United States, but reveals that a small percentage of “mixed race” individuals were integrating into the European American community (passing for White) over 200 years ago, during the era of slavery in the United States.
Hopefully this was not done with 23andMe's "Ancestry Painting" that grossly overestimates European ancestry with even East Africans and South Asians often getting >90% "European". The search for non-white ancestry seems to be a favorite pastime of many people who test at 23andMe, so this could potentially bias the results; on the other hand, I've encountered many, many more people who are seeking that illusive Amerindian ancestor of family lore, so, perhaps this is not as big of a problem for the detection of African ancestry.

Estimating a date of mixture of ancestral South Asian populations.
P. Moorjani

Linguistic and genetic studies have shown that most Indian groups have ancestry from two genetically divergent populations, Ancestral North Indians (ANI) and Ancestral South Indians (ASI). However, the date of mixture still remains unknown. We analyze genome-wide data from about 60 South Asian groups using a newly developed method that utilizes information related to admixture linkage disequilibrium to estimate mixture dates. Our analyses suggest that major ANI-ASI mixture occurred in the ancestors of both northern and southern Indians 1,200-3,500 years ago, overlapping the time when Indo-European languages first began to be spoken in the subcontinent. These results suggest that this formative period of Indian history was accompanied by mixtures between two highly diverged populations, although our results do not rule other, older ANI-ASI admixture events. A cultural shift subsequently led to widespread endogamy, which decreased the rate of additional population mixtures.
I have previously highlighted that ROLLOFF, the method used by these authors produces age estimates that are about half the age of HAPMIX and StepPCO. As of this writing, ROLLOFF does not seem to be available for independent evaluation, so it is not entirely clear to me whether it, or the older methods, are right. It would be great if this issue is dealt with in the publication arising from this research.

Another issue that must be dealt with is the spurious inference that Ancestral North Indians are more closely related to Europeans than to West Asians in the previous publication on the ANI/ASI division, an inference that was an artifact of unequal sample sizes between Adygei and CEU.

Synthesis of autosomal and gender-specific genetic structures of the Uralic-speaking populations.
K. Tambets et al.
The variation of uniparentally inherited genetic markers - mitochondrial DNA (mtDNA) and non-recombining part of Y chromosome (NRY) - has suggested somewhat different demographic scenarios for the spread of maternal and paternal lineages of North Eurasians, in particular those speaking Uralic languages. The west-east-directed geographical component has evidently been the most important factor that has influenced the proportion of western and eastern Eurasian mtDNA types among Uralic-speakers. The palette of maternal lineages of Uralic-speakers resemble that of geographically close to them European or Western Siberian Indo-European and Altaic-speaking neighbours. However, the most frequent in North Eurasia NRY type N1c, that is a common patrilineal link between almost all Uralic-speakers of eastern and western side of the Ural Mountains, is rare among Indo-European-speakers, with a notable exception of Latvians, Lithuanians and North Russians. In this study the information of genetic variation of uniparentally inherited markers in Uralic-speaking populations from 13 Finno-Ugric and 3 Samoyedic speakers is combined with the results of their genome-wide analysis of 650 000 SNPs (Illumina Inc.) to assign their place in a landscape of autosomal variation of North Eurasian populations and globally. The genome-wide analysis of the genetic profiles of studied populations showed that the proportion between western and eastern ancestry components of Uralic-speakers is concordant with their mtDNA data and is determined mostly by geographical factors. Interestingly, among the Saami - the population which is often considered as a genetic outlier in Europe - the dominant western component is accompanied by about one third of the eastern component, making the Saami genetically more similar to Volga-Finnic populations than to their closest Fennoscandian-East Baltic neighbors. The high frequency of pan-northern-Eurasian paternal lineage N1c among Saami cannot explain this phenomenon alone - genetic ancestry profiles of autosomes of other Finnic- and Baltic-speaking populations, who share the high N1c with the Saami, do not show a considerable eastern Asian contribution to their genetic makeup.
This study seems to include more Northern Eurasian references, but we will have to wait and see how its components are defined. Notice the slight discrepancy between its eastern Saami estimate (1/3) and that of the following study (22%), which is probably an artefact of the different range of samples used.

Population genetics of Finland revisited - looking Eastwards.
K. Rehnström et al.

We have previously reported that the genetic structure within Finland correlates well both with geography and known population history. While these studies have quantified the genetic distances between Finland and European neighbours to the south and the west, the influence of the Eastern and the Northern populations have not been described using genome-wide tools. Here we investigated the degree of Asian ancestry in Northern Europe. We also studied the genetic ancestry of geographic and linguistic neighbours of Finns, using genome-wide SNP data in a dataset comprising over 2200 individuals. First we quantied the proportions of European (represented by HapMap CEU) and Asian (HapMap CHB/JPT) genetic ancestry. Within Finland, the average Asian ancestry proportion varied from 2.5% in the Swedish speaking Finns to 5.1% in Northern Finland. The Saami population, being the indigenous inhabitants of Northern Finland, showed a surprisingly high proportion of Asian genetic ancestry (17.5%). We therefore hypothesize that, as genetic sharing between individuals in Northern Finland and Saami are higher than in other parts of the country, the Asian genetic ancestry in Finland could partly be through admixture with the Saami. Using a model-based estimation of individual ancestry, three ancestral populations provided a best fit for the combined Finnish and Saami dataset. Particularly, one of these ancestral populations was predominant in the Saami (average 78%), and higher in Northern Finland (average 14%) compared to the rest of the country (average 4%). Despite the fact that Finns are the closest relatives of the Saami of all populations included in this study, in general, our results show that language and genetics are only weakly related. The Finns are more closely related to most Indo-European speaking populations than to linguistically related populations such as the Saami. These analyses are currently being extended to sequence level variation using genome-wide sequence data for 100 Finns as part of the 1000 Genomes project, and 200 further individuals from the North-Eastern Finnish subisolate of Kuusamo. These 200 individuals provide good power to identify founder haplotypes within this isolate. Next, we aim to investigate the power to extend the imputation of haplotypes to the rest of Northern Finland as well as to the rest of the country.
It is unfortunate that these researchers used HapMap populations to study admixture in Finns; the Chinese are, especially, not a very good proxy for the East Eurasian element in the Finnish population. There are much data available on North Eurasian populations at this point, so I find the continued use of HapMap populations puzzling; hopefully this will be remedied when this research finds itself in the journals.

The current Dodecad estimate of East Eurasian admixture in the 1000 Genomes FIN population is 5.9%, the bulk of which is "Northeast Asian", a component which peaks in Nganasan, Chukchi, and Koryak, and is also well-represented in Central Siberia among Selkups. I don't have 5 Swedish-speaking Finns to report an average yet, but the ones I have are in the ~2-4% "Northeast Asian" range.

I also ran a quick test of FIN together with CEU and CHB and ~186k SNPs I am currently considering for the next version Dodecad v4 of my ancestry analysis. At K=2, FIN is 3.7% Asian, which seems consistent with the authors reporting the highest Asian ancestry of 5.1% in northern Finland, and also shows how the use of CHB as an Asian reference underestimates the degree of Eastern Eurasian admixture.

April 23, 2011

Genetic structure of West Eurasians

I have decided to generate a new major data dump of ADMIXTURE results. In comparison to previous such experiments:
  1. The focus is entirely on West Eurasians (Caucasoids).
  2. I have excluded all potential relatives from the source datasets, as well as several populations that tend to create uninformative clusters of their own (e.g., Druze or Ashkenazi Jews); exceptions are populations of great anthropological interest (e.g., Basques).
  3. I have included all relevant Dodecad Ancestry Project populations with 5+ participants.
  4. I have developed a new way of "framing" the region of interest by choosing appropriate sets of individuals from outside of it.
"Framing" populations

I have, since the beginning of my ADMIXTURE experiments, emphasized the importance of including appropriate population controls designed to squeeze out minor distant admixture in populations of interest, so that it does not confound the inference of region-specific components.

This leads to a problem: there are many possible sources of admixture. For example, we do not know a priori which set of African populations may have contributed to Caucasoid populations, or which set of East Asian ones. We could choose e.g., the Yoruba and the Chinese to represent Sub-Saharans and East Asians, but that might exclude possible sources of variation, and lead to Yoruba- and Chinese- specific clusters rather than more general Sub-Saharan and East Asian ones. If we included more population controls, we would cover more possible sources of variation, but ADMIXTURE would infer components of little interest (e.g., between Pygmies vs. Bushmen or Mongols vs. Chinese)

To avoid this, I propose to create meta-populations consisting of a single individual from many populations, i.e., a Yoruba, a Mandenka, a San, a Mbuti Pygmy, etc. for Sub-Saharan Africa, or a Miaozu, a Han, a Mongol, a She, a Hezhen, etc. for East Asia. That way we are both helping ADMIXTURE infer general components, while at the same time preventing it from inferring non-region specific ones.

Results

The entirety of the results presented here can be downloaded. They include:
  1. Population sources
  2. ADMIXTURE proportions for populations
  3. Fst divergences between components
  4. Population portraits showing individual level variation
See spreadsheet and associated bundle (or here).

At K=3, we observe the emergence West Eurasian, Sub-Saharan, and East/South Asian components.

The impact of the Sub-Saharan component is felt most distinctly in North Africa and the Near East, especially among Arabs; the impact of the East/South Asian one in West Asia and Northeastern Europe, especially among Finnic and Turkic speakers.

It is interesting to note that 39.8% of the Indian_D sample is assigned to the E/S Asian component. I had previously estimated in a roundabout way, and in a slightly smaller sample that the Ancestral South Indian component in Project participants was 33.3%, so ADMIXTURE has roughly managed to infer correctly that about 1/3 of this Indian sample's ancestry is more closely related to East Asians than to West Eurasians.

At K=4, the first split within the Caucasoid group appears: a component centered onn Europe, and one on West/South Asia.

Many populations possess both these components in clinal proportions.

The European component shrinks to insignificance in Arabians, such as Saudis and Yemenese.

The West/South Asian component shrinks to insignificance in Northeast Europeans, such as Finns, Lithuanians, north Russians, and Chuvash.


At K=5, a new Mediterranean component emerges. This is highly represented in populations to the North, South, and East of the Mediterranean sea.

This component is noteworthy for its absence in India and Northeastern Europe.

In Northeastern Europe, the Mediterranean component is hardly represented at all, whereas the West/South Asian component, freed of its K=4 Mediterranean associations now makes its appearance.

Conversely, in the West Mediterranean, among Basques, Sardinians, Moroccans, and Mozabites the West/South Asian component vanishes to non-existence.


At K=6, a North African component emerges.

Notice its presence in the Near East and parts of Southern Europe.

The two regions can be contrasted in terms of their African components, with very high North/Sub-Saharan African ratio in Europe vs. much lower in the Near East.

The explanation for this seems straightforward, as Europe was affected by North Africa in prehistoric and historic times, whereas the Near East also shares a border with more southern parts of the African continent, as well as the potential influence of the medieval slave trade that seems to have affected Muslim Near Eastern populations disproportionately.


At K=7, a Southwest Asian component emerges which is highest in Arabia and East Africa. I could've called this Red Sea, but I've reserved this name for a similar component that emerges at higher K.

It is clear that this is the main Caucasoid component present in East Africa.

It vanishes to non-existence in the Northern fringe of Europe, in the British Isles, Scandinavia, and among the Finns and Lithuanians.

Another interesting aspect of its distribution is its presence in Pakistan but not India. Perhaps, in this case, it reflects historical contacts between the Islamic Near East and parts of South Asia.


At K=8, we observe most of the familiar components from the K=10 analysis of the Dodecad Project. However, the use of the framing populations has meant that these components emerge before either Africans or East Eurasians split.

Now, the South Asian component appears, which swallows up most of the E/S Asian component that previously linked South with East Asians. This component extends a great way to the Near East and eastern parts of the Caucasus.

Quite interestingly, the remainder of the Caucasoid component in South Asia that is not absorbed by the new South Asian component seems to be split between the West Asian and North/Central European components, with an absence of the South European component.

It is among the Lezgins of the Caucasus that such a combination occurs, on the western shore of the Caspian Sea. The same combination of Caucasoid components also occurs in Uzbeks and Chuvash.

I conclude from this that the Caucasoids who entered South and Central Asia were probably derived from the eastern fringes of the Caucasoid world where only the West Asian (in the south) and North/Central European (in the north) are in existence. The area around the Caspian Sea seems like an excellent candidate for their origin, as I have speculated before, as that region has two important properties:
  1. It is transitional between predominantly N/C European populations to the north and predominantly W Asian populations to the south
  2. It is the border of the influence of the S European element, with Georgians possessing some of it, while Lezgins do not.

At K=9, we see the emergence of specific Sardinian and Basque components. Normally this is undesirable, but, I believe this breakup serves to divide the previously inferred South European component meaningfully.

What was South European in lower K seems to have an Atlantic vs. Mediterranean dimension, with the Basque/Sardinian ratio being particularly high in the Atlantic facade of Europe. Conversely, this ratio is low in the Mediterranean as we move eastwards: it is already low in Italy and the Balkans and becomes virtually zero in Cypriots, Armenians, and Levantine Arabs.

North Africa is also particularly interesting in having a low Basque/Sardinian rate, even in Morocco. It appears that Sardinians are a much better proxy of European influences in the region than Basques are.

K=10 is particularly exciting because, for the first time, there is clear evidence of structure in the North/Central European component that can now be split, for the first time, into Northwestern and Northeastern ones.

The NW European component is maximized in Orcadians, and people from the British Isles in general, as well as in Scandinavia. These populations have a low NE/NW ratio, as do the French, Iberians, and Italians.

Conversely, Balto-Slavs have a high NE/NW ratio.

Interestingly, Greeks have a balanced NE/NW ratio (1.2), intermediate between Italians and Balto-Slavs. Similar balanced ratios are also found among Lezgins (1.08), Turks, and Iranians. I conclude that Slavic or other Eastern European admixture cannot account for the totality of presence of this component in Greeks.

Indians have a 1.8 NE/NW ratio. In Pakistan this is 6.5, in Uzbeks it is 2.9, and in the North Eurasian_Ra it is 14.2. My conclusion is that a single migration of steppe people from eastern Europe cannot account for the presence of North European-like genes in Asia.

I propose that a palimpsest of population movements has brought such elements into the interior of Asia: the migration of the early Indo-Iranians from West Asia or the Balkans with a balanced NE/NW ratio, and, the migration of steppe people from Eastern Europe with a high NE/NW ratio. The latter, did affect much of Asia, but it is in India, where Iranian groups did not penetrate in great numbers the lower ratio of the Indo-Aryans has been best preserved.

The case of the Finns is also interesting, as there is a surplus of NE over NW European elements. Their position is intermediate between Scandinavians and Lithuanians/Russians but toward the latter. So, Finns appear to (i) have a substratum similar to Balto-Slavs, (ii) to be influenced by Scandinavians, and (iii) with a balance of East Eurasian elements (5.8% at this analysis) preserving the legacy of their linguistic ancestors from the east. At present it is difficult to determine how much of the NE European component in Finns is due to their eastern ancestors who were presumably mixed Caucasoid/Mongoloid long before they arrived in the Baltic, and how much was absorbed in situ.


At K=11 the Ethiopian/East African component emerges, absorbing some of the Red Sea and Sub-Saharan components from the previous K=10 run.

In comparison to the East African component of the Dodecad Project analysis, this component is closer to West Eurasians than to Sub-Saharan Africans, and a residual Sub-Saharan element remains in the two East African (Ethiopian and East_African_D) population samples. Presumably this is due to the more complete sampling of Sub-Saharan genetic diversity using the Sub_Saharan_H "framing" population.

Outside Africa, both E African/Sub-Saharan components are present in the Near East and North Africa with higher E African/Sub-Saharan ratios in the Near East and lower ones in North Africa.

In Europe, there are low such ratios in the few populations where African admixture is present, together with some N African. We can probably conclude that African admixture is mostly due to North Africans, and African-influenced Near Eastern populations, rather than directly from Sub-Saharan Africa.

At K=12 the first uninformative cluster emerged, centered on Iraqi Jews, hence I decided to stop the analysis at this point.

Population Portraits

There is a plethora of population portraits in the download bundle, showing how admixture proportions vary in individuals within populations, and how they vary between successive K.

Here is, for example, the K=11 portrait of Cypriots. A picture of overall homogeneity of this sample emerges, but notice how the NW European and NE European have disjoint presence in the Cypriot individuals, with 5 having some of the former, 6 having some of the latter, and only 1 of these having both.

Compare with Lezgins (right) where these two components occur in all individuals. Whatever this admixture represents, it must be old enough if it is so uniformly distributed in the population.



Here are the Georgians at K=10. Notice that their NE European component is unevenly distributed, and in every case where it occurs it is accompanied by a thin slice of East Asian. This may well indicate partial Russian or other Eastern European ancestry in these individuals.



Side-by-side comparisons are also quite useful. Consider Armenians vs. Lezgins vs. Iranians at K=7







Notice how Lezgins, who live north of the Caucasus mountains possess some of the N/C European component, which the Armenians, who live to the south of them lack. This should come as no surprise, as the Lezgins inhabit parts of the ancient Sarmatia Asiatica. Compare with Iranians, who are differentiated by their Indo-European Armenian neighbors by the presence of a "S Asian" component, which, in turn, ties them to their Indo-Aryan linguistic relatives.

Much more can be said, but I'll let readers explore the data on their own, and draw their own conclusions from them.

December 08, 2010

Genome-wide analysis of population structure in the Finnish Saami

The K=6 ADMIXTURE results from the supplementary material can be seen below:

This is based on ~38k SNPs.

It is unfortunate that they included Native American HGDP populations, but did not include the most relevant published data on Siberians that I first used to study population structure across north Eurasia here and here and here.

Hence, they discover a "Native American"-like component in Saami, which in all likelihood can be further resolved into Siberian-specific components utilizing the Rasmussen et al. dataset.

The "closest approximation" to the East Eurasian component in Saami in the HGDP panel are the Yakuts, but finer-scale analysis (see my previous posts) reveals that the Yakuts are made up almost entirely of an Altaic-specific component tying them to Turkic, Mongol, and Tungusic populations, while the eastern component in European Finns, Vologda Russians and Chuvashs has relationships with Central Siberians such as Kets, Selkups, and Nganasans, all of which are missing in this paper.

Hopefully this data will become publicly available online for re-analysis with the relevant populations included.

European Journal of Human Genetics advance online publication 8 December 2010; doi: 10.1038/ejhg.2010.179

A genome-wide analysis of population structure in the Finnish Saami with implications for genetic association studies

Jeroen R Huyghe et al.

The understanding of patterns of genetic variation within and among human populations is a prerequisite for successful genetic association mapping studies of complex diseases and traits. Some populations are more favorable for association mapping studies than others. The Saami from northern Scandinavia and the Kola Peninsula represent a population isolate that, among European populations, has been less extensively sampled, despite some early interest for association mapping studies. In this paper, we report the results of a first genome-wide SNP-based study of genetic population structure in the Finnish Saami. Using data from the HapMap and the human genome diversity project (HGDP-CEPH) and recently developed statistical methods, we studied individual genetic ancestry. We quantified genetic differentiation between the Saami population and the HGDP-CEPH populations by calculating pair-wise FST statistics and by characterizing identity-by-state sharing for pair-wise population comparisons. This study affirms an east Asian contribution to the predominantly European-derived Saami gene pool. Using model-based individual ancestry analysis, the median estimated percentage of the genome with east Asian ancestry was 6% (first and third quartiles: 5 and 8%, respectively). We found that genetic similarity between population pairs roughly correlated with geographic distance. Among the European HGDP-CEPH populations, FST was smallest for the comparison with the Russians (FST=0.0098), and estimates for the other population comparisons ranged from 0.0129 to 0.0263. Our analysis also revealed fine-scale substructure within the Finnish Saami and warns against the confounding effects of both hidden population structure and undocumented relatedness in genetic association studies of isolated populations.

Link

July 29, 2010

NordicDB paper on Finns, Danes, and Swedes

On the left is an MDS plot using ~45k SNPs. Some explanation on the datasets: CAPS are Swedish; SGENE and MS are Finnish (Helsinki region); Aneurysm is Finnish (Helsinki and Kupio).

A striking feature of the plot is the distinctiveness of the different Finish samples (light vs. dark brown points). This is not so difficult to explain if one considers that the light brown squares (DGI-FIN) are from Botnia. This parallels the results of Salmela et al. (2008) or Jakkula et al. (2008) in underscoring the internal structure of the population of Finland

The familiar V shape was also observed in the PCA produced by McEvoy et al. (2009) or Nelis et al. (2009). In my opinion, it is produced by the differential representation of the two main population elements of the Nordic countries, namely the Germanic and Finnic elements.

Here is the website of NordicDB.

European Journal of Human Genetics doi: 10.1038/ejhg.2010.112

NordicDB: a Nordic pool and portal for genome-wide control data

Monica Leu et al.

A cost-efficient way to increase power in a genetic association study is to pool controls from different sources. The genotyping effort can then be directed to large case series. The Nordic Control database, NordicDB, has been set up as a unique resource in the Nordic area and the data are available for authorized users through the web portal (http://www.nordicdb.org). The current version of NordicDB pools together high-density genome-wide SNP information from ~5000 controls originating from Finnish, Swedish and Danish studies and shows country-specific allele frequencies for SNP markers. The genetic homogeneity of the samples was investigated using multidimensional scaling (MDS) analysis and pairwise allele frequency differences between the studies. The plot of the first two MDS components showed excellent resemblance to the geographical placement of the samples, with a clear NW–SE gradient. We advise researchers to assess the impact of population structure when incorporating NordicDB controls in association studies. This harmonized Nordic database presents a unique genome-wide resource for future genetic association studies in the Nordic countries.

Link

June 28, 2010

Half of hidden heritability found (for height, at least)

This is a quite interesting paper, as it shows, by sampling a large number of individuals), that the heritability of height is not missing after all. The authors looked at a large number of individuals, and this allowed them to discover statitically significant associations between height and more SNPs than before.

This bears great promise as it may hint that genome-wide association studies, that have come under substantial criticism lately, may be failing not because of an inherent flaw, but rather because they are not sampling enough individuals.

The discovered SNPs account for 45% of the heritability of height. Where is the rest? The authors argue for two additional sources:

First, SNPs in current microarray chips sample the genome incompletely. Locations in-between discovered SNPs are in incomplete linkage disequilibrium with the discovered SNPs. So, there is undetected polymorphism, in the gaps between the hundreds of thousands of SNPs in current chips, that may explain a portion of the missing heritability.

Second, SNPs have different minor allele frequencies. For example, in one SNP the minor allele may occur at 10% of individuals, while in others at 30%. This is important, because it is more difficult to arrive at a statistically significant result in the former case.

Consider a SNP with a minor allele frequency of 2%. Then, if you sample 1,000 individuals, only about 20 of them are expected to have the minor allele. You cannot estimate the average height of the minor allele with a sample of 20 people as securely as you can with a sample of 500. Thus, if the SNP influences height in a small way, you will not be able to detect it.

A further complication, which I've written about before, is that some variation in the human genome is family-related, or at least occurs at fewer individuals than the allele frequency cutoff. If 99.9% of people have C at a given location and 0.1% of people have T, this variant is unlikely to be included in a microarray chp, because it is too rare to matter economically: you would only get a handful of individuals -if you're lucky- in a sample of 1,000 for such a variant. However, rarity does not mean that the variant is functionally unimportant, and the rare allele may play a substantial role in the height of the people who possess it.

The publication of this paper is a cause for optimism, as it shows that progress can be made by brute force: fuller genome coverage and more individuals. We'll have to wait and see whether or not the same approach will work for other complex traits, such as IQ or schizophrenia, that have been hitherto difficult to crack.

Obviously, the cost of sampling more individuals will become an issue in future studies, but the cost-per-individual is expected to drop. So, I'm guessing that more discoveries are in store for us in the next few years.

UPDATE (Jun 28):

Not the main point of the paper, but also included in the supplementary material (pdf) are some nice PCA results.

In the European-only PCA we see the familiar north-south gradient (anchored by Tuscans TSI and Netherlands NET on either side), and the orthogonal deviation of the Finns. Swedes (SWE) occupy a northern European end of the spectrum like the Dutch, but are spread towards Finns, reflecting low-level Finnish admixture in that population. Conversely, Finns are variable along the same axis, reflecting variable levels of admixture. Australians (AUS) and UK, on the other hand, are on the northern European edge of the main European gradient, with a number of individuals spread toward the Tuscan side.


The PCA with all populations is also quite interesting. East Eurasians (Chinese and Japanese) form a tight pole at the bottom right. Gujarati Indians (GIH) form a different pole, spread towards Europeans, reflecting variable levels of West Eurasian admixture in that population, probably corresponding to the ANI element recently discovered in Indian populations. Mexicans (MEX) are spread towards East Asians, reflecting their Amerindian admixture, but notice how they are not positioned exactly on the European-East Asian axis, probably reflecting the third, minority, Sub-Saharan element in their ancestry, as well as the fact that Amerindians are not perfectly represented by East Asians. Finns are tilted towards East Asians, as expected, reflecting the fact that their genetic specificity vis a vis Northern Europeans is due to low-level East Eurasian ancestry.

An interesting aspect of the first two PCs is the fact that the Maasai (MKK) and Luhya (LUW) from Kenya are not separated from Caucasoids, and neither are Yoruba from Nigeria (YRI). This is a good reminder of the fact that identity in the first two principal components may mask difference revealed in higher order components. This difference (at least for Maasai) is seen in the next two PCs.


Nature Genetics doi:10.1038/ng.608

Common SNPs explain a large proportion of the heritability for human height

Jian Yang et al.

Abstract

SNPs discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method with simulations based on the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency than the SNPs explored to date.

Link