The race for cheap epigenomics: epiGBS versus bsRADseq

The world of science is rife with tiny dramas that can completely envelop the worlds of a few people and have huge effects on science in years to come. Follow the publication trail of any good (or even bad) question in science and you will discover conflict over the best answers to these questions. Usually the conflict is constructive, but it can sometimes turn nasty. The main focus of this post is, as far as I know, a dramatic, but amiable conflict over the best way to do a particular sequencing technique. I’ll give some of the background for laypeople and dig into the science for anyone who is interested in the difference between EpiGBS and bsRADseq.


One of the most rapidly advancing fields in biology is DNA sequencing technology. It is becoming so fast and cheap that it seems that we will be more limited by computing power to analyze the sequences than by the capacity to get the sequences themselves. In an earlier post, I discussed the development of Genotyping By Sequencing, or GBS. In short, it is a method that randomly samples DNA to get a vast amount of information about an organism without the cost and data problems of sequencing the whole genome. In 2016, this method has been adapted to include information about epigenetics, the information outside base pair sequence that can be passed between generations. DNA can have methyl groups attached to it, which change the way organisms produce proteins. The summary of these methylation patterns is the methylome. The methylome is more easily changed than the genome, and can therefore provide additional information that might help us understand natural populations. Scientists know little about how large scale methylation patterns impact natural populations, so this is an exciting advance. The conflict comes from the fact that two teams, one in Austria and one in the Netherlands, came up with different solutions within months of each other. Both were published in prestigious journals, but there are important differences between the techniques.

My Research

Being a grad student waiting for a new method to be published is an uncomfortable position to be in. I spoke to Dr. Christina Richards of University of South Florida about epigenetic GBS in 2014, and was hooked. For a year and a half, I collected my samples and waited for the methods to be published. Finally, last December, I couldn’t wait any longer. I had to defend my dissertation proposal without knowing the method. My committee was not so keen on committing to an unpublished method, so I had to scrap the epigenetics portion of my proposal. Of course, the papers were published less than a month after my defense. As Kurt Vonnegut would say, so it goes.

The Conflict

The paper I had been waiting for, from Dr. Thomas van Gurp of the Netherlands team, was not actually published first. They were beaten by the Austrians at the last moment. Dr. Emiliano Trucchi and the Austrians published their treatise in a special epigenetics issue of Molecular Ecology in late January 2016. The Ecological Epigenetics Facebook group (a most exclusive club) was abuzz with the news, and Dr. van Gurp even commented with his congratulations. But just days later, he gleefully posted his own publication in Nature Methods, a journal with a decidedly higher impact factor. The Austrians did not return the congratulations.

Trucchi and his team named their method bsRADseq, for bisulfite Restriction-Associated-Digest sequencing. Van Gurp’s dubbed theirs epiGBS, for epigenetic Genotyping-By-Sequencing. Starkly different titles, but it turns out the methods are pretty similar. Essentially, restriction enzymes cut up your sample’s DNA first. Next, the bisulfite treatment uses a chemical to replace nonmethylated cytosines with uracils. It is a very effective method, but produces challenges in deriving the original sequence. In the next PCR step, the traces of bisulfite conversion are wiped out. The DNA is sequenced in a high throughput sequencer, then analyzed using a bioinformatics platform.

Each of the two methods solves the problem of the wiped out original sequence in a different way. It took me a few careful readings of the two papers to figure out the differences, so I figured that I would summarize them so that others don’t need to go through the same slog.

epiGBS bsRADseq
First Author van Gurp Trucchi
Reference genome Not needed Seq 2x
REnzyme PstI but others possible SbfI but any possible
Paired-end seq. Paired Paired or single
Multiplex? Yes Yes
Clustering software Written for epiGBS in Python STACKS
Validation? Yes- Arabidopsis No
#loci 1626 1710-3180

This is only a small subsection of the possible differences, but I think these are some of the most important aspects to highlight. In practice, the most important difference will be that epiGBS does not require a reference genome, but requires paired-end sequencing and a specialized bioinformatics pipeline. From tricks of the PCR primers, they can work out which strand is which. From there, it is a computing problem to derive the original sequence from the two strands. BsRADseq sequences the same samples twice, once with bisulfite conversion, and once without. This saves the computing problem, but requires twoce the cost. However, bsRADseq data has been processed in the popular STACKS platform, whereas epiGBS data requires Mr. van Gurp’s scripts.

I am not sure which method biologists will embrace. The simplicity of not needing to sequence twice seems to vouch for epiGBS, but the method will not likely be popular until the bioinformatics pipeline has been in use for a bit and had the kinks worked out.


4 thoughts on “The race for cheap epigenomics: epiGBS versus bsRADseq

  1. TrucchiE

    So funny to see there was so much drama around these methods/papers I was not aware of (found this post just today – 28/09/17). Truly sorry not having congratulated back to the Nature Methods authors (which I know quite well and I congratulated in person!) on the Ecological Epigenetics Facebook group but I didn’t know about it (not on FB myself). On the other hand, we were fully aware of each other work and were discussing pros and cons of the two alternative approaches since February 2015.
    Just a clarification, in bsRADseq you sequence one strand only (see Fig.1 of the paper) and use the standard (not converted) sequence of that strand to reference it (seq 2x) whereas in epiGBS you have to sequence both strands to reference each other (seq 2x as well). The overall coverage (i.e. sequencing cost) you need is the same in both methods.
    The real big difference, as you also noticed, is that bsRADseq relies on broadly-used software for loci assembly and SNP calling (STACKS) and bisulfite-converted reads mapping and methylation calling (BISMARK) whereas you need custom-made scripts for epiGBS.
    Anyway, thanks a lot for the nice summary of both methods.
    I really hope more people we’ll try them out.


    1. Thanks for reading Dr. Trucchi! Hope you forgive my artistic license with dramatizing the story, I confess I never expected both you and Dr. van Gurp to read this. Thanks for clarifying the specifics of the techniques, that makes a lot of sense. I hope to see more epigenomics as well. In a world of virtually unlimited genomics projects and limited funding, I guess we may have to wait, sadly.


  2. Gustavo

    Thanks for the great post. I just start reading about the methods for epigenetic studies in non-model species, and I think both methods are great.

    My opinion is that the main limitation on both approaches is the low number of loci. If RAD-seq would target around ~10% of the genome, I think the chances to obtain the methylated regions that correspond to certain phenotype from one organ will be very low. The ideal situation would be to find those methylated regions that are somehow up/down regulating specific genes that produce the observed phenotype. Am I right?

    Nevertheless, these methods represent a very nice way to address epigenetic questions at low cost compared with the genome-wide methylation approach, especially using a non-model organisms.


    1. Thanks for the feedback, Gustavo. I think your fear about not finding methylated regions that correspond to any one phenotype are probably well-founded. However, it’s important to note that methylation is more prevalent in coding regions, so you have a bit of an advantage over RAD that doesn’t use methylation-sensitive enzymes. Mariano Alvarez has done a lot more work on this than I have, and would be a good resource if you end up pursuing these methods. You can find him on twitter @MFAlvarez7.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s