The race for cheap epigenomics: epiGBS versus bsRADseq

The world of science is rife with tiny dramas that can completely envelop the worlds of a few people and have huge effects on science in years to come. Follow the publication trail of any good (or even bad) question in science and you will discover conflict over the best answers to these questions. Usually the conflict is constructive, but it can sometimes turn nasty. The main focus of this post is, as far as I know, a dramatic, but amiable conflict over the best way to do a particular sequencing technique. I’ll give some of the background for laypeople and dig into the science for anyone who is interested in the difference between EpiGBS and bsRADseq.

Background

One of the most rapidly advancing fields in biology is DNA sequencing technology. It is becoming so fast and cheap that it seems that we will be more limited by computing power to analyze the sequences than by the capacity to get the sequences themselves. In an earlier post, I discussed the development of Genotyping By Sequencing, or GBS. In short, it is a method that randomly samples DNA to get a vast amount of information about an organism without the cost and data problems of sequencing the whole genome. In 2016, this method has been adapted to include information about epigenetics, the information outside base pair sequence that can be passed between generations. DNA can have methyl groups attached to it, which change the way organisms produce proteins. The summary of these methylation patterns is the methylome. The methylome is more easily changed than the genome, and can therefore provide additional information that might help us understand natural populations. Scientists know little about how large scale methylation patterns impact natural populations, so this is an exciting advance. The conflict comes from the fact that two teams, one in Austria and one in the Netherlands, came up with different solutions within months of each other. Both were published in prestigious journals, but there are important differences between the techniques.

My Research

Being a grad student waiting for a new method to be published is an uncomfortable position to be in. I spoke to Dr. Christina Richards of University of South Florida about epigenetic GBS in 2014, and was hooked. For a year and a half, I collected my samples and waited for the methods to be published. Finally, last December, I couldn’t wait any longer. I had to defend my dissertation proposal without knowing the method. My committee was not so keen on committing to an unpublished method, so I had to scrap the epigenetics portion of my proposal. Of course, the papers were published less than a month after my defense. As Kurt Vonnegut would say, so it goes.

The Conflict

The paper I had been waiting for, from Dr. Thomas van Gurp of the Netherlands team, was not actually published first. They were beaten by the Austrians at the last moment. Dr. Emiliano Trucchi and the Austrians published their treatise in a special epigenetics issue of Molecular Ecology in late January 2016. The Ecological Epigenetics Facebook group (a most exclusive club) was abuzz with the news, and Dr. van Gurp even commented with his congratulations. But just days later, he gleefully posted his own publication in Nature Methods, a journal with a decidedly higher impact factor. The Austrians did not return the congratulations.

Trucchi and his team named their method bsRADseq, for bisulfite Restriction-Associated-Digest sequencing. Van Gurp’s dubbed theirs epiGBS, for epigenetic Genotyping-By-Sequencing. Starkly different titles, but it turns out the methods are pretty similar. Essentially, restriction enzymes cut up your sample’s DNA first. Next, the bisulfite treatment uses a chemical to replace nonmethylated cytosines with uracils. It is a very effective method, but produces challenges in deriving the original sequence. In the next PCR step, the traces of bisulfite conversion are wiped out. The DNA is sequenced in a high throughput sequencer, then analyzed using a bioinformatics platform.

Each of the two methods solves the problem of the wiped out original sequence in a different way. It took me a few careful readings of the two papers to figure out the differences, so I figured that I would summarize them so that others don’t need to go through the same slog.

epiGBS bsRADseq
First Author van Gurp Trucchi
Reference genome Not needed Seq 2x
REnzyme PstI but others possible SbfI but any possible
Paired-end seq. Paired Paired or single
Multiplex? Yes Yes
Clustering software Written for epiGBS in Python STACKS
Validation? Yes- Arabidopsis No
#loci 1626 1710-3180

This is only a small subsection of the possible differences, but I think these are some of the most important aspects to highlight. In practice, the most important difference will be that epiGBS does not require a reference genome, but requires paired-end sequencing and a specialized bioinformatics pipeline. From tricks of the PCR primers, they can work out which strand is which. From there, it is a computing problem to derive the original sequence from the two strands. BsRADseq sequences the same samples twice, once with bisulfite conversion, and once without. This saves the computing problem, but requires twoce the cost. However, bsRADseq data has been processed in the popular STACKS platform, whereas epiGBS data requires Mr. van Gurp’s scripts.

I am not sure which method biologists will embrace. The simplicity of not needing to sequence twice seems to vouch for epiGBS, but the method will not likely be popular until the bioinformatics pipeline has been in use for a bit and had the kinks worked out.