RAD-tags (Restriction site Associated DNA tags)

The ability to efficiently and accurately determine genotypes is a keystone technology in modern genetics, crucial to studies ranging from clinical diagnostics, to genotype-phenotype association, to reconstruction of ancestry and the detection of selection. Next-generation sequencing technology provides novel opportunities for gathering genome-scale sequence data in natural populations. RAD-tags is a novel and efficient genotyping approach based on Illumina sequencing of libraries of Restriction-site Associated DNA (RAD) tags. Using short sequence reads, this technique provides genotype information on a large number of SNP markers.

RAD-tags is a strategy that targets DNA regions near restriction enzyme cutting sites. The ancestor of this approach was developed as a cost effective method to identify SNPs and was then modified for next generation sequencing to allow simultaneous identification and scoring of large numbers of SNPs for population genomic projects. By focusing sequencing efforts only on those tags flanking a restriction site in multiplexed samples, the proposed approach provides significant data complexity reduction and increased throughput.

In 2008, a method that allows the sequencing of Restriction-site Associated DNA (RAD) tags was described by Baird et al., allowing the detection of more than 13000 SNPs. RAD-tags are short fragments of DNA adjacent to each instance of a particular restriction enzyme (RE) recognition site. The RADseq method makes use of a single restriction enzyme digest coupled with secondary random fragmentation. After that a broad size selection is performed in order to generate reduced representation libraries consisting of all genomic regions adjacent to the RE cut site. The RADseq method despite the fact that it was rapid, simple and inexpensive for genetic mapping it was also applicable in a variety of organisms, even though they lack a reference genome. Despite the success of the method, the computational tools that were available for analyzing the genetic data retrieved from organisms without a reference genome, were performing poor since they discarded a significant amount of the genetic data (Emerson KJ 2010; Hohenlohe PA 2011; Pfender WF 2011).

Based on the previously described RADseq method, Peterson et al. (2012) described the Double digest RAD (ddRAD tags) sequencing. The ddRAD tags method increases the breadth of RADseq applications covering the gap between the phylogenetic analyses that require a small fraction of the genome and applications such as the genetic mapping that require a large fraction of the genome, so ddRAD is a suitable platform for a wide range of applications (i.e. phylogenetic relationships, population structure, genotype-phenotype mappig) in model and non-model organisms.

Briefly the ddRAD-tag process includes several main steps (library preparation and DNA pooling, standard multiplex sequencing on an Illumina platform, and bioinformatic analyses). According to the method, two restriction enzymes (a common cutter and a rare cutter) are used in order to double digest the DNA. The digestion is followed by precise size selection that excludes regions flanked by either very close or very distant RE recognition sites, resulting in a library consisting of only fragments close to the target size. The protocol that Peterson et al. (2012) described has many advantages: first of all the use of two restriction enzymes that simultaneously digest the DNA reduces the cost of the libraries construction, second the elimination of DNA-loss steps during the process allows the library construction even though a small quantity of starting material is available (100ug of DNA) and last but not least a very cost effective way to barcode several dozens of individuals and pool them in one sequence lane is suggested.

Assuming that we have a 96-wel microplate and we want to pool all 96 individuals to sequence in one Illumina lane, so we have to barcode 96 individuals. The cost effective way that Peterson et al. (2012) suggest is the following: for each one of the 12 columns of the microplate, 8 barcoded adapters will be used to ligate separately each individual sample. These samples are then pooled (so we would have 12 pools in total) following ligation, but before size selection. Size selection is performed on each pool of individuals and the resulting libraries are amplified with a primer that introduces an index that will be read off in the standard Illumina multiplexed sequencing protocol. Following PCR with uniquely indexed primers, multiple pools can be combined and individuals are sequenced in an Illumina sequencing lane. Each individual is recognized by the unique combination of the adapter sequence and the index sequence.

The precise and repeatable size selection offers advantages as well. For example, the probability of sampling both directions from the same restriction site is low since a small fraction of restriction fragments will fall in the target size-selection window and the process described by the authors is biased towards selecting fragments from different individuals which is desirable.

RAD-tags is a novel and efficient genotyping approach based on the next-generation sequencing technique that allows to obtain up to hundreds of thousands of DNA sequences randomly distributed in the genome. The ddRAD tags method allows the transition from the genetic to the genomic level in a costly and time effective way.


Selected references

  • Emerson KJ MC, Catchen JM, Hohenlohe PA, Cresko WA, et al. (2010) Resolving postglacial phylogeography using high-throughput sequencing. Proceedings of the National Academy of Sciences of the United States of America 107.
  • Hohenlohe PA AS, Catchen JM, Allendorf FW, Luikart G (2011) Next generation RAD sequencing identifies thousands of SNPs for assessing hybridization between rainbow and westslope cutthroat trout. Molecular ecology resources 11, 117-122.
  • Peterson BK, Weber JN, Kay EH, Fisher HS (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. Plos one 7, e37135.
  • Pfender WF SM, Johnson EA, Slabaugh MB (2011) Mapping with RAD (restriction-site associated DNA) markers to rapidly identify QTL for stem rust resistance in Lolium perenne. . Theoretical and Applied Genetics 122, 1467-1480.