LA GENOMICA GENOMICA La genomica è una branca della biologia molecolare che si occupa dello studio del genoma degli organismi viventi. DI COSA SI OCCUPA LA GENOMICA? In particolare si occupa della struttura, contenuto, funzione ed evoluzione del genoma. È una scienza che si basa sulla bioinformatica per l'elaborazione e la visualizzazione dell'enorme quantità di dati che produce. GENOMICA La genomica nacque negli anni 80, quando furono prese le prime iniziative per il sequenziamento di interi genomi. Una data di nascita si può probabilmente far coincidere con il sequenziamento completo del primo genoma, nel 1980: si trattava del genoma di un virus, il fago Φ-X174. Il primo sequenziamento del genoma di un organismo vero e proprio fu completato nel 1995 e si trattava di un batterio, Hemophilus influenzae, con un genoma di notevoli dimensioni (1,8 milioni di paia di basi. Da allora i genomi "completati" aumentano esponenzialmente (al 01-2008 > 700: 50 Archea, 575 procarioti, 77 eucarioti) . La prima pianta il cui genoma è completamente noto nella sua sequenza è stata Arabidopsis thaliana. obiettivi della genomica •mappe genetiche e fisiche del DNA degli organismi viventi, mediante il suo completo sequenziamento. •La sequenza del DNA viene poi annotata, ovvero vengono identificati e segnalati tutti i geni e le altre porzioni di sequenza significative, insieme a tutte le informazioni conosciute su tali geni. • Inserimento delle informazioni in appositi database, accessibili via Internet (gratuitamente). •genomica comparativa, che si occupa del confronto tra i genomi di diversi organismi, nella loro organizzazione e sequenza. SEQUENZIAMENTO DEL DNA Part of a radioactively labelled sequencing gel [Chain-termination methods . The classical chain-termination or Sanger method requires a single-stranded DNA template, a DNA primer, a DNA polymerase, radioactively or fluorescently labeled nucleotides, and modified nucleotides that terminate DNA strand elongation. The DNA sample is divided into four separate sequencing reactions, containing the four standard deoxynucleotides (dATP, dGTP, dCTP and dTTP) and the DNA polymerase. To each reaction is added only one of the four dideoxynucleotides (ddATP, ddGTP, ddCTP, or ddTTP). These dideoxynucleotides are the chain-terminating nucleotides, lacking a 3'-OH group required for the formation of a phosphodiester bond between two nucleotides during DNA strand elongation. Incorporation of a dideoxynucleotide into the nascent (elongating) DNA strand therefore terminates DNA strand extension, resulting in various DNA fragments of varying length. The dideoxynucleotides are added at lower concentration than the standard deoxynucleotides to allow strand elongation sufficient for sequence analysis. Elettroforesi capillare Esempio di output Current methods can directly sequence only relatively short (300-1000 nucleotides long) DNA fragments in a single reaction. [2]. The main obstacle to sequencing DNA fragments above this size limit is insufficient power of separation for resolving large DNA fragments that differ in length by only one nucleotide Strategia shot gun High-throughput sequencing The high demand for low cost sequencing has given rise to a number of highthroughput sequencing technologies.[15][16] These efforts have been funded by public and private institutions as well as privately researched and commercialized by biotechnology companies. High-throughput sequencing technologies are intended to lower the cost of sequencing DNA libraries beyond what is possible with the current dye-terminator method based on DNA separation by capillary electrophoresis. Many of the new high-throughput methods use methods that parallelize the sequencing process, producing thousands or millions of sequences at once. La genomica è stata affiancata più recentemente da nuove branche della biologia ad essa affini per modalità di approccio alla ricerca: • Trascrittomica si occupa dell'espressione dei geni negli RNA messaggeri di un intero organismo o di un particolare organo, tessuto o cellula in un particolare punto dello sviluppo dell'organismo o sotto particolari condizioni ambientali, facendo principalmente uso dei microarrays •Proteomica (elettroforesi 2D etc…) •Metabolomica (gascromatografia, etc…) •Metagenomica MICROARRAY A DNA microarray (also commonly known as gene or genome chip, DNA chip, or gene array) is a collection of microscopic DNA spots, commonly representing single genes, arrayed on a solid surface by covalent attachment to a chemical matrix. DNA arrays are different from other types of microarray only in that they either measure DNA or use DNA as part of its detection system. Qualitative or quantitative measurements with DNA microarrays utilize the selective nature of DNA-DNA or DNA-RNA hybridization under high-stringency conditions and fluorophore-based detection. DNA arrays are commonly used for expression profiling, i.e., monitoring expression levels of thousands of genes simultaneously, or for comparative genomic hybridization. http://en.wikipedia.org/wi ki/Image:Microarray_prin ting.ogg Microarray Public databases of microarray data Database Microarray Experiment Sets Sample Profiles as of Date Gene Expression Omnibus - NCBI 5366 134669 April 1, 2007 Stanford Microarray database 12742 ? April 1, 2007 UPenn RAD database ~100 ~2500 Sept. 1, 2007 UNC Microarray database ~31 2093 April 1, 2007 MUSC database ~45 555 April 1, 2007 ArrayExpress at EBI 1643 136 April 1, 2007 caArray at NCI 41 1741 November 15, 2006 UPSC-BASE ~100 ? November 15 Online microarray data-analysis programs and tools Several Open Directory Project categories list online microarray data analysis programs and tools: •Bioinformatics : Online Services : Gene Expression and Regulation at the Open Directory Project •Gene Expression : Databases at the Open Directory Project •Gene Expression : Software at the Open Directory Project •Data Mining : Tool Vendors at the Open Directory Project •Bioconductor: open source and open development software project for the analysis and comprehension of genomic data •Genevestigator : Web-based database and analysis tool to study gene expression across large sets of tissues, developmental stages, drugs, stimuli, and genetic modifications. •GeneCAT (Gene Co-expression Analysis Toolbox): Web-based database of gene expression data and expression analysis tools for Arabidopsis thaliana and barley. Phylogenetic profiles The rationale behind phylogenetic profiles is that genes that are involved in a given biological process tend to be either all present or all absent, depending on whether that process is active in the different organisms that are considered. Therefore, genes that are functionally associated will tend to have very similar phylogenetic profile