Willy's PhD Defense

Date: Monday, June 20, 2016.
Place: INSA-Toulouse.

Programme
08:30 - 09:00	WELCOME COFFEE
09:00 - 09:45	TALK 1 - Rasmus Heller. Should we do genotyping by sequencing? (view abstract) x Genotyping by sequencing is becoming increasingly popular as a means of generating population and genomic scale data for non-model organisms. We used a small RADseq data set to examine potential problems or biases in common bioinformatics pipelines, using the site frequency spectrum as an informative summary of the data. We found that a standard RADseq genotyping tool overcalls singletons severely. This can be improved by eschewing genotype calls and using the genotype likelihood directly to infer the SFS. However, even after this the SFS from RADseq data is not identical to one obtained using whole-genome shotgun sequencing on the same individuals. Part of the explanation appears to be that the RADseq protocol produces a non-random subset of the genome which may have a different SFS form the genome-wide one. I discuss some implications of this and ongoing work on the topic.
09:50 - 10:35	TALK 2 - Olivier François. Fast inference of individual ancestry coefficients using geographic data. (view abstract) x Geography and landscape are important determinants of genetic variation in natural populations, and several ancestry estimation methods have been proposed to investigate population structure using genetic and geographic data simultaneously. Those approaches are often based on computer-intensive stochastic simulations and do not scale with the dimensions of the data sets generated by high-throughput sequencing technologies. There is a growing demand for faster algorithms able to analyse genomewide patterns of population genetic variation in their geographic context. Here, we present TESS3, a major update of the spatial ancestry estimation program TESS. By combining matrix factorization and spatial statistical methods, TESS3 provides estimates of ancestry coefficients with accuracy comparable to TESS and with run-times much faster than the Bayesian version. In addition, the TESS3 program can be used to perform genome scans for selection, and separate adaptive from nonadaptive genetic variation using ancestral allele frequency differentiation tests. The main features of TESS3 are illustrated using simulated data and analysing genomic data from European lines of the plant species Arabidopsis thaliana.
10:40 - 11:00	COFFEE BREAK
11:00 - 11:45	TALK 3 - Mark Beaumont. Expectation Propagation for demographic inference using genome data. (view abstract) x
11:50 - 13:30	LUNCH TIME
14:00	PhD DEFENSE - Willy Rodriguez. Reconstructing the demographic history of populations from genomic data. (view abstract) x The rapid development of DNA sequencing technologies is expanding the horizons of population genetic studies. It is expected that genomic data will increase our ability to reconstruct the history of populations. While this increase in genetic information will likely help biologists and anthropologists to reconstruct the demographic history of populations, it also poses big challenges. In some cases, simplicity of the model may lead to erroneous conclusions about the population under study. Recent works have shown that DNA patterns expected in individuals coming from structured populations correspond with those of unstructured populations with changes in size through time. As a consequence it is often difficult to determine whether demographic events such as expansions or contractions (bottlenecks) inferred from genetic data are real or due to the fact that populations are structured in nature. Moreover, almost no inferential method allowing to reconstruct past demographic size changes takes into account structure effects. In this thesis, some recent results in population genetics are presented: (i) a model choice procedure is proposed to distinguish one simple scenario of population size change from one of structured population, based on the coalescence times of two genes, showing that for these simple cases, it is possible to distinguish both models using genetic information form one single individual; (ii) by using the notion of instantaneous coalescent rate, it is demonstrated that for any scenario of structured population or any other one, regardless how complex it could be, there always exists a panmitic scenario with a precise function of population size changes having exactly the same distribution for the coalescence times of two genes. This not only explains why spurious signals of bottlenecks can be found in structured populations but also predicts the demographic history that actual inference methods are likely to reconstruct when applied to non panmitic populations. Finally, (iii) a method based on a Markov process is developed for inferring past demographic events taking the structure into account. This is method uses the distribution of coalescence times of two genes to detect past demographic changes in structured populations from the DNA of one single individual. Some applications of the model to genomic data are discussed.
	BUFFET AND DRINKS (POT DE THESE)

Venue
To come to the INSA, take the subway (line B) in direction to "Ramonville" and get off at the station "Faculté de Pharmacie". The talks in the morning will take place in the GMM ("Génie Mathématique et Modélisation") building, room 139 (first floor). The PhD defense in the afternoon will take place in "Amphithéâtre Joseph Fourier". See the map here.