Recombination rate variation and linked selection in C. reinhardtii v2.0 - a summary

Nov 5, 2018 in PUBLICATIONS
research writing r

Table of Contents

In June of this year, I uploaded my first first-author (there has to be a cleaner way of saying that) manuscript to the preprint server BioRxiv. This was a huge and important milestone for me, and I was incredibly excited to have it out for the world to see virtually as soon as it was done – such is the power of preprint servers!

Like in most endeavours in science, however, there’s always room for improvement, and I’ve been very fortunate to have gotten some thorough feedback from both members of my committee as well as other researchers in the community at large. Here, I’m going to briefly summarise the paper before covering what’s been changed so far in this iteration of the manuscript and why.

Overview

In this paper, we looked at the landscape of recombination rate variation across the genome of the haploid unicellular alga Chlamydomonas reinhardtii, asking the questions: 1) What genomic features predict recombination rate variation? 2) How does recombination rate correlate with nucleotide diversity? 3) What is the rate of sex in natural populations of C. reinhardtii?

To obtain genome-wide estimates of recombination rate, we used a dataset of whole-genome sequences from 19 C. reinhardtii individuals sampled in Quebec. Next, we used the recombination rate estimator LDhelmet, which calculates historical recombination rate using statistical associations between alleles.

(Initial) Results

Recombination rate is highest immediately surrounding genes

In examining which genome annotations had the highest recombination rates, we subdivided intergenic sequence into regions 2 kb upstream or downstream of a gene, while subdividing genomic sequence into UTR regions, introns, and protein-coding sequence. Here, recombination was indeed highest in the 2 kb of sequence immediately flanking genes, in accordance with observations in other non-mammalian species. (Fig. 2 in the paper)

Recombination rate is correlated with nucleotide diversity

In the absence of recombination, selection at linked sites can result in the loss of linked neutral variation surrounding a selected site. In the initial iteration of this manuscript, we reported a positive association between recombination rate and nucleotide diversity. However, this analysis bears the risk of potential autocorrelation; I explain further below. (Fig. 3 in the paper)

Recombination is correlated with GC content

Despite mutation being AT-biased in C. reinhardtii (Ness 2015, Genome Res.) the genome is incredibly GC-rich (~64%). One possible explanation for this high GC content is the action of GC-biased gene conversion (gBGC), wherein mismatch repair following a recombination event preferentially fixes GC alleles, effectively mimicking positive selection. We did find a weakly positive correlation between GC content and recombination in C. reinhardtii – however, since LDhelmet does not discern between crossover events and gene conversions, more precise inferences regarding gBGC will require further work.

The frequency of sex in C. reinhardtii

C. reinhardtii is a facultatively sexual organism, in that it primarily reproduces asexually and only switches to a sexual mode of reproduction when nutrient stressed. Thus, effective recombination is modulated by the frequency of meiosis relative to mitosis. By combining our genome-wide estimate of recombination rate alongside prior estimates of neutral diversity, mutation rate, and physical recombination rate, we obtained a ratio of meiosis occuring every ~770 mitotic generations.

Manuscript 2.0

In this iteration of the manuscript, the following things have been changed/added:

Recombination rate and diversity - autocorrelation?

LDhelmet estimates population recombination rate ρ, which is equivalent to 2Ner, where Ne is effective population size (see Charlesworth 2009, Nature, for a useful review) while r represents physical recombination rate. Meanwhile, diversity (θ) is equivalent to 2Neμ, where μ is the mutation rate.

What this means is that provided a sufficiently large effective population size (indeed the case in C. reinhardtii - see Ness 2015), it could in fact be variance in Ne across the genome that is driving the observed correlation, and not the effects of selection at linked sites. This was pointed out to me by both my committee members as well as Tyler Kent, and effectively meant that one of the central premises of the paper was flawed.

To circumvent this, I initially attempted to correlate diversity with estimates of physical recombination rate from the linkage map of C. reinhardtii, and found no significant correlation. This was a bit of a blow; the original correlation was arguably the main finding of the paper in my eyes, and the basis for claiming that linked selection was affecting levels of diversity in the genome – an effect pronounced further still by relatively infrequent sex.

My first instinct was to look to variance in functional density, which has previously explained the lack of a correlation between physical recombination rate and diversity in a number of plant species (Wright et al 2006, Genetics; Flowers et al 2012, Mol Biol Evol). An inverse correlation between diversity and functional density indicates that regions with more ‘targets for selection’ display reduced variation, thus still suggesting the action of linked selection. Instead, however, I found a positive correlation between the two. This has previously been observed in chicken (Mugal 2015), but explained as a statistical artifact. Ultimately, the issue remained unresolved. The only other possibility was that the linkage map data were too sparse to make inferences surrounding linked selection with – but that still remained an unsatisfying answer.

(Bonus: This saga has since been immortalized in my Evolution 2018 poster, presented in Montpellier, France – have a gander!)

While I was busy struggling to rewrite the manuscript to reflect these new developments, my PI obtained crossover data from a recent paper that had directly identified recombination events through whole-genome sequencing of recombinants (Liu et al 2017, Nature Ecol Evol). Using crossover density as a proxy for physical recombination rate, Rob found that it correlated with both the LDhelmet estimates of recombination rate as well as nucleotide diversity. What this meant was the LDhelmet recombination rate estimates were indeed reflective of variation in recombination rate and not just Ne and that the effect of linked selection on diversity was indeed real.

The updated version of the manuscript now contains both explanations of the above as well as these newer results. I found this to be a very important lesson in the value of getting good feedback, even if it isn’t necessarily what one might want to hear in the moment.

A technical note - some cosmetic changes to figures

Anyone who knows me knows that I’m a bit of a data visualization buff, and spend far too much time finding ways to make plots look both prettier and more intuitive in their presentation. With this in mind, I was not particularly satisfied with the initial iteration of Figure 1, which displayed a) the distribution of recombination rates across the 17 chromosomes of C. reinhardtii and b) the decay of linkage disequilibrium across the genome, once again split by chromosome. Both plots were shaded by chromosome length, which was on a purple to blue scale – arguably not the most intuitive! Furthermore, the font sizes of the axis titles and labels were inconsistent, with some of the labels being harder to read.

The new version of the plot features more intuitive colours (yellow to orange/red, via the wesanderson package in R) and larger axis tick labels, which I think make for a more easily understandable plot. However, the main reason I’m excited about this is that this was my first use of the patchwork package (obtainable here) for stitching together plots in R, which works beautifully with ggplot2 3.0’s new labs(tag) feature – both of which I haven’t seen nearly enough fanfare about for the utility they provide.

Namely, where in the past stitching together plots has required either some gridExtra wizardry or simply exporting to Photoshop/Illustrator/what have you, these packages allow for the entire process to be done completely in R. To create a new plot featuring two plot objects p1 and p2 side by side, patchwork allows us to simply write

p1 + p2

and we’re good to go. It seems so intuitive and in line with the ggplot2 philosophy I’m surprised it hasn’t received even more attention.

The second component of this is that the newest version of ggplot2 features a new tag argument in the labs function (i.e. labs(tag = 'A')) which allows for the easy addition of tags to the top left corner of plots. Together, these constitute a very powerful toolkit for making plots of this sort all within R itself.

Finally, Figure 3 has also been updated to reflect the analyses described above regarding crossover density from the Liu dataset. Code for all figures in the paper can be found here.

Estimating the frequency of sex with population genetic methods - a better explanation

Previously, our calculation of the rate of sex was explained in terms of ‘effective mitoses’ and ‘effective meioses’, with the effects of mutation and recombination manifesting as either, respectively. This explanation was somewhat confusing, and so it has been rephrased entirely in terms of the formulae underlying the calculation, where the physical recombination rate r is taken to be rf, where f is the frequency of sex in this system. While the result remains unchanged, we hope this makes the calculation clearer for readers.

To sum up

This manuscript has been in prep since about three months into my graduate school career, and has undergone various changes and makeovers as further analyses were added. Beyond that, there still is room for improvement, and I’ll be looking for ways to update the manuscript further. As someone with little to no population genetics background upon starting graduate school, it’s certainly been challenging putting this work together thus far, but equally as rewarding at the same time.

Of course, I’d love to hear any further thoughts and comments on the manuscript if you have any – please feel more than welcome to reach out via email or on Twitter!