Banche dati Specializzate di Patterns Proteici

Banche dati
specializzate
Banche dati Specializzate
Le banche dati specializzate raccolgono insiemi di dati
omogenei dal punto di vista tassonomico e/o funzionale
disponibili nelle Banche dati Primarie e/o in Letteratura,
rivisti e annotati con informazioni di valore aggiunto
Banche dati Specializzate di Patterns Proteici
•Data una sequenza non caratterizzata:
A che famiglia appartiene?
Qual è la sua funzione?
“The protein signature approach”
• Confrontiamo sequenze appartenenti alla
stessa famiglia, cercando ‘pattern’ comuni
• Costruiamo un database di profili conservati (elementi di sequenza
conservati in specifiche posizioni)
• Usiamo questi profili (pattern) per classificare una sequenza incognita
What are protein signatures?
Protein family/domain
Multiple sequence alignment
Build model
Search
UniProt
Protein analysis
Significant
match
ITWKGPVCGLDGKTYRNECALL
AVPRSPVCGSDDVTYANECELK
Mature model
Diagnostic approaches (sequence-based)
Single motif
methods
Regex patterns
(PROSITE)
Full domain
alignment methods
Profiles
(Profile Library)
HMMs
(Pfam)
Multiple motif
methods
Identity matrices
(PRINTS)
Patterns
Sequence alignment
Motif
Define pattern
Extract pattern sequences
Build regular
expression
xxxxxx
xxxxxx
xxxxxx
xxxxxx
C-C-{P}-x(2)-C-[STDNEKPI]-x(3)-[LIVMFS]-x(3)-C
Pattern signature
PS00000
Banche dati Specializzate di Patterns Proteici
Protein families
PFAM (acronimo di Protein Families) è un database di domini di
proteine descritti con modelli markoviani. E’ diviso in due sezioni:
pfam-A contiene allineamenti curati da esperti; pfam-B contiene
sequenze che vengono automaticamente raggruppate.

Pfam
InterPro Entry
Groups similar signatures together
AddsAdds
extensive
extensive
annotation
annotation
Links Links
to other
to other
databases
databases
Structural information and viewers
 Hierarchical classification
Interpro hierarchies: Families
FAMILIES can have parent/child relationships with other Families
Parent/Child relationships are based on:
• Comparison of protein hits

child should be a subset of parent

siblings should not have matches in common
• Existing hierarchies in member databases
• Biological knowledge of curators
Interpro hierarchies: Domains
DOMAINS can have
parent/child relationships
with other domains
Domains and Families may be linked through
Domain Organisation
Hierarchy
InterPro Entry
Groups similar signatures together
AddsAdds
extensive
extensive
annotation
annotation
to databases
other databases
Links to Links
other
Structural information and viewers
InterPro Entry
Groups similar signatures together
Adds extensive
annotation
Adds extensive
annotation
Links Links
to other
to other
databases
databases
Structural information and viewers
The Gene Ontology project provides a
controlled vocabulary of terms for
describing gene product characteristics
InterPro Entry
Groups similar signatures together
Adds extensive
annotation
Adds extensive
annotation
Links Links
to other
to other
databases
databases
Structural information and viewers
UniProt
KEGG ... Reactome ... IntAct ...
UniProt taxonomy
PANDIT ... MEROPS ... Pfam clans ...
Pubmed
InterPro Entry
Groups similar signatures together
Adds extensive
annotation
Adds extensive
annotation
to databases
other databases
Links to Links
other
Structural information and viewers
PDB 3-D Structures
SCOP Structural
domains
CATH Structural
domain classification
Searching InterPro
Searching InterPro
Protein family membership
Domain organisation
Domains, repeats
& sites
GO terms
Searching InterPro
Searching InterPro
Banche dati Specializzate
associate a Patterns Nucleotidici

Eukaryotic Promoter Database (http://www.epd.isb-sib.ch/)

Transcription Factors TRANSFAC

Translation Terminations TransTERM

Vector database VectorDB

Repeats Database Repbase
Profili strutturali

CATH (http://www.cathdb.info/)

SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/)
CATH
SCOP
Banche dati Specializzate di

Geni

Genomi

Trascritti e Profili di Espressione

Pathways Metabolici

Mutazioni
Banche dati Specializzate
di Geni
•
COGs
•
Entrez Gene
•
RefSeq
ENTREZ Gene
Siti Genomici

NCBI Genomes

EBI Genomes

TIGR (Craig Venter)
Il Genoma Umano
 Il Genoma Umano all’NCBI

Il Genoma Umano alla Celera

Ensembl

UCSC Genome Bioinformatics
Banche dati del Trascrittoma
•
dbEST
•
UniGene
•
UTRdb/UTRsite
Banche dati di Espressione
•
GEO
•
ArrayExpress
•
EPDex
Banche dati di
Pathways Metabolici
Kyoto Encyclopedia
of Genes and Genomes
http://www.genome.jp/kegg/
Banche dati di
Pathways Metabolici
REACT_945.4
Banche dati delle
Mutazioni
•
dbSNP
• HGVBase