DNA sequencing DNA sequencing by the chemical method of Maxam and Gilbert (PNAS, 1977) (formic acid) • Chemical reagents have been characterized which alter one or two bases in DNA. •An altered base can then be removed from the sugarphosphate backbone of DNA. •The strand is cleaved with piperidine at the sugar residue lacking the base. Reading the DNA sequence Gel PAGE + Urea (6 M) Sequencing by the chain-terminator or dideoxy procedure (Sanger, 1977) - Enzymatic methods. - Random incorporation of a dideoxynucleoside triphosphate into a growing strand of DNA. This method is an in-vitro DNA synthesis using ‘terminators’. Incorporation of dideoxynucleotides into growing strand terminates synthesis. - Requires DNA polymerase I. Requires a cloning vector with initial primer (M13, high Nobel winner 1980 yield bacteriophage). - Uses 32P-deoxynucleoside triphosphates. -Synthesized strand sizes are determined for each di-deoxynucleotide by using gel or capillary electrophoresis. Principle of the method 3’ 5’ T primer TT T 3’ 5’ ddATP in the reaction: anywhere there’s a T in the template strand, occasionally a ddA will be added to the growing strand ddA ddA ddA ddA The dideoxy chain termination (or enzymatic) method of DNA sequencing involves the in vitro synthesis of a DNA strand by a DNA polymerase, such as: Klenow fragment of E.coli DNA polymerase I (used in combination with cloning the DNA to be sequenced in M13 series of single-stranded vectors); modified form of phage T7 DNA polymerase, Sequenase. This enzyme, developed by Tabor and Richardson (P.N.A.S., 1987, vol. 84:4767-4772) is a site-directed mutant (His123Glu) of bacteriophage gene 5 protein. Features of Sequenase: 1. unlike Klenow fragment, Sequenase can be used with of double-stranded vectors); 2. reduced exonuclease activity, 3. highly processive; catalyzing the polymerization of thousands of nucleotides without dissociating from the template. Taq DNA polymerase (used in cycle sequencing - PCR) Primer walking La reazione di sequenza permette di stabilire con buona certezza l’ordine dei primi 250-350 nucleotidi. Gli inserti di DNA clonati sono solitamente molto più lunghi (5000 bp). Determinata la sequenza del primo tratto, si sintetizza un secondo primer disegnato per ibridarsi con la regione lontana circa 300 basi a valle del sito di innesco del primo primer. In maniera simile si sceglie un terzo sito legame per l’innesco, si sintetizza un altro oligonucleotide e si determina la sequenza delle successive 250-350 basi. La strategia “Primer walking” va avanti fino a completare il sequenziamento dell’intero inserto. Enzymatic method Chemical method Un apparato per il sequenziamento su gel di poliacrilamide sottile ~ 0.2-0.4 mm. MAXAM & GILBERT METHOD SANGER METHOD by-pass all the problems associated with polymerases RAPID; a large n° of samples can be processed simultaneously does not require subcloning into seq. vectors (restriction fragments can be used directly) composition of 2-D structure of the DNA template can cause premature termination by DNA polymerase time-consuming (labeling of a single end, purification steps) the only method for sequencing small oligonucleotides background due to degradation Corsa breve Corsa lunga Automated DNA Sequencing These systems employ fluorescent dyes attached to either the primer (I° generation of this techniques) or the ddNTP (II° generation of this techniques). The DNA fragments produced by sequencing reactions are run through polyacrylamide gels or capillary electrophoresis. The detection systems relies on laserinduced fluorescence (helium-neon laser; 633 nm). Detecting the bands within the gel is not trivial as there are only about 10-15 to 10-16 moles (femtomoles) of DNA in each band. (laser) Proc. Natl. Acad. Sci. USA (1995) vol.92, pp.4347-4351 Four-dye primer sequencing is one of the most commonly used method for high-throughput DNA sequencing. As in other sequencing methodologies, the detection sensitivity is limited by the spectroscopic properties of the available dyes (based on the structure of fluorescein) for labeling the sequencing fragments. Structure of FLUORESCEIN and FAM (5-carboxyfluorescein) To optimize the absorption and emission properties of the label, primers have been developed that exploit fluorescence energy transfer (ET) Fluorescence ET (FRET) is mediated by a dipole-dipole coupling between two chromophores that results in resonance transfer of excitation energy from an excited donor molecule to an acceptor. Amplified signal D A ET primers have two fluorescent dyes attached. The effective fluorescence intensity is 2 to 10 times greater than single dye primers. FAM is selected as common donor, FAM, JOE, TAMRA and ROX are selected as acceptors. FAM 5-carboxyfluorescein (SE= Succinimidyl ester) JOE 2’,7’-dimethoxy-4’,5’-dichloro- -6-carboxyfluorescein R = -COOH TAMRA ROX tetramethyl-6-carboxyRhodamine 6-carboxy-X-Rhodamine R1 = H R1 = H R2 = -COOH R2 = -COOH A standard procedure (II° generation sequencing) 1) 2) 3) The DNA is prepared as single strand A mixture of four normal (deoxy) nucleotides (dGTP, dATP, dTTP, dCTP) A mixture of four dideoxynucleotides (each present in limiting amounts) each labeled with a tag that fluoresces a different colour (ddGTP, ddATP, ddTTP, ddCTP) 4) DNA polymerase 5) Adequate buffer Results can be monitored in real-time on the interfaced screen and subsequently subjected to graphically interactive analysis READ LENGTHS: home-made PAGE apparatus ( 17 X 36 cm. - 0.3 mm thick gel) up to 150 - 180 bp; Macrophor Electrophoresis Unit (patented design of the EMBL) LKB-Pharmacia. 20 X 50 cm. - 0.1 mm thick gel. The electrophoresis unit is equipped with a thermostatic plate that provides uniform temperature control (eliminates ‘smiling effects’ and resolves G-C compressions) up to 300 - 400 bp; ALF DNA Sequencer Equipped with fixed-laser detection system, scanning a polyacrylamide gel (Pharmacia) up to 500 bp/hour/lane ABI Prism 3700 DNA Analyzer (Applied Biosystem). Automated capillary gel electrophoresis system. All four sequencing reactions are run in a single capillary (dye-labeled terminator chemistry). Detect over 500 bases at 98.5% accuracy at 100 bases/hour/capillary. MegaBACE (Amersham-Pharmacia Biotech). DNA fragments are separated by capillary electrophoresis (16, 48 or 96-capillary). It is operated by a confocal scanning laser, and is capable of up to 12 DNA sequencing runs per 24-hour (read length >650 bp), producing up to 500.000 bases/day. Next Generation Sequencing Pirosequenziamento Ronaghi M, Ehleen M and Nyrén P (1998) A sequencing method based on realtime pyrophosphate. Science, 238, 363-365. Si basa sulla rilevazione del pirofosfato rilasciato dall’incorporazione di un nucleotide durante la sintesi del DNA. adenosine 5’phosphosulfate (APS) Apyrase is an ATP diphosphohydrolase. It catalyses the removal of the gamma phosphate from ATP and the beta phosphate from ADP. The phosphate from AMP is not removed. PPi is not produced • Il primer è ibridato allo stampo a singolo elica, amplificato per PCR, e incubato con gli enzimi DNA polimerasi, ATP sulfurilasi, luciferasi e apirasi, adenosin 5’ fosfosolfato (APS) e luciferina. • Il primo dei quattro dNTP viene aggiunto alla reazione. La DNA polimerasi catalizza l’incorporazione del dNTP al filamento di DNA, se è complementare alla base del filamento stampo. Ogni evento di incorporazione è accompagnato dal rilascio di piro-fosfato (PPi) in quantità equimolare a quella del nucleotide incorporato. • In presenza di adenosina 5’ fosfosolfato (APS), l’ATP sulforilasi converte quantitativamente il PPi ad ATP, che, a sua volta guida la conversione, catalizzata dalla luciferasi, di luciferina ad ossiluciferina con conseguente produzione di luce di intensità proporzionale alla quantità di ATP. La luce prodotta è rilevata da una CCD camera e visualizzato come picco in un pirogramma. • L’apirasi è un enzima che degrada nucleotidi. Questo enzima degrada continuamente tutti i dNTP non incorporati e l’ATP in eccesso. L’apirasi non produce PPi. Non appena la degradazione è completata viene aggiunto un altro dNTP. • I dNTP vengono aggiunti sequenzialmente, uno alla volta. Poiché il dATP è un substrato naturale della luciferasi (come la ATP), al suo posto viene utilizzato la deossiadenosina α-tio-trifosfato (dATPS) che viene utilizzata efficentemente dalla DNA polimerasi ma non viene riconosciuta dalla luciferasi. • Man mano che il processo continua, il filamento di DNA complementare è sintetizzato e la sequenza nucleotidica è determinata dai picchi del pirogramma (~ 300 basi). La sequenza è: TTTGGGGTTGCAGTT → + DNA polimerasi, apirasi + ATP sulforilasi + luciferasi 454 Technology (Roche) • To start, the DNA is sheared into 300-800 bp fragments, and the ends are “polished” by removing any unpaired bases at the ends. • Adapters are added to each end. The DNA is made single stranded at this point. • One adapter contains biotin, which binds to a streptavidincoated bead. The ratio of beads to DNA molecules is controlled so that most beads get only a single DNA attached to them. • Oil is added to the beads and an emulsion is created. PCR is then performed, with each aqueous droplet forming its own micro-reactor. Each bead ends up coated with about a million identical copies of the original DNA. Biotinylated primers • After the emulsion PCR has been performed, the oil is removed, and the beads are put into a “picotiter” plate. Each well is just big enough to hold a single bead. • The pyrosequencing enzymes are attached to much smaller beads, which are then added to each well. • The plate is then repeatedly washed with the each of the four dNTPs, plus other necessary reagents, in a repeating cycle. • The plate is coupled to a fiber optic chip. A CCD camera records the light flashes from each well. Illumina/Solexa Sequencing - This method uses the basic Sanger idea of “sequencing by synthesis” of the second strand of a DNA molecule. Starting with a primer, new bases are added one at a time, with fluorescent tags used to determine which base was added. - The fluorescent tags block the 3’-OH of the new nucleotide, and so the next base can only be added when the tag is removed. -So, unlike pyrosequencing, you never have to worry about how many adjacent bases of the same type are present. -The cycle is repeated 50-100 times. The idea is to put 2 different adapters on each end of the DNA, then bind it to a slide coated with the complementary sequences for each primer. This allows “bridge PCR”, producing a small spot of amplified DNA on the slide. The slide contains millions of individual DNA spots. The spots are visualized during the sequencing run, using the fluorescence of the nucleotide being added. Attached termini Template DNA ●● ●● ●●● ● ●● ● PCR Product Third generation sequencing Back in 2003, The Human Genome cost approximately $500 million, years of work and huge international effort to produce. Actually, the cost of a genome falls to just $10,000 and maybe as low as $1000. Genome Sequencer FLX (Roche analyzer) Illumina Genome Analyzer Roche 454 FLX Illumina Genome Analyzer Amount of starting material needed DNA: 3 to 5 μg Total RNA: 20 μg DNA: 1 to 5 μg Total RNA: 1 to 2 μg Sequencing technology Pyrosequencing Bridge amplification Read length 200-300 bases 25-35 bases Sequence yield 100Mb (Mb=106) 800Mb-2Gb (Gb=109) Data file 12 to 15Gb/run 1 Tbyte (Tb=1012) Time/run 8hrs 3 to 5 days