Introduzione - Dipartimento di Informatica e Automazione

Fondamenti della Teoria delle Basi di Dati e
Applicazione alla
Gestione e Integrazione di Schemi
Paolo Atzeni
Dottorato di Ricerca in Informatica e Automazione
Università Roma Tre
Giugno-luglio 2005
Base di dati
(accezione generica, metodologica)
• Insieme organizzato di dati utilizzati per il supporto allo
svolgimento delle attività di un ente (azienda, ufficio, persona)
(accezione specifica, metodologica e tecnologica)
• insieme di dati gestito da un DBMS
DataBase Management System — DBMS
Sistema (prodotto software) in grado di gestire collezioni di dati
che siano (anche):
– grandi (di dimensioni (molto) maggiori della memoria
centrale dei sistemi di calcolo utilizzati)
– persistenti (con un periodo di vita indipendente dalle
singole esecuzioni dei programmi che le utilizzano)
– condivise (utilizzate da applicazioni diverse)
garantendo affidabilità (resistenza a malfunzionamenti
hardware e software) e privatezza (con una disciplina e un
controllo degli accessi). Come ogni prodotto informatico, un
DBMS deve essere efficiente (utilizzando al meglio le risorse di
spazio e tempo del sistema) ed efficace (rendendo produttive le
attività dei suoi utilizzatori).
Generalità
• Obiettivi
– Discutere temi di ricerca, di interesse sia per chi lavora nel
settore basi di dati sia per gli altri
– Attenzione alle tematiche metodologiche più interessanti di
oggi (vedi keynote di Stonebraker e Salinger all’ICDE 2005):
• Eterogeneità
• Integrazione
• Approccio:
– Seminariale, con contributi individuali
• Molte idee comuni trovate in un corso tenuto a UBC da Rachel
Pottinger:
– http://www.cs.ubc.ca/~rap/teaching/534a/
Database integration, requirements
• Three “vignettes” (Stonebraker, 2001):
– Integration for republishing: a large distributor (or dept store)
integrates very many catalogs
– Integration of availability and pricing: planning a trip with
hotel and ferry, each provided by various subjects
– Integration for supply chain management: a manufacturer
need information coming by its subcontractors
Extraction
• How many sources we need to integrate?
– 2-3 or tens or hundreds?
• Sources change frequently in structure (and content, obviously)
• Are sources “cooperative” or not?
Development of wrappers
• Effective tools: drag and drop, example based, flexible
• Learning and automation (or semiautomation)
• Maintenance support
Is XML (or the “semantic Web”)
the solution?
• It could, but:
– Sites are not migrating to XML (nor to any “semantic”
technology):
• The Web is a giant “legacy” system
– XML is used as an internal tool, but not “exposed”
– In general, site owners offer the “bare minimum”
Integration issues
• Traditional problems, and more
• Many sources, with syntactic and semantic heterogeneities:
– Different models (more less structured)
– Different meaning:
• Currency
• Other conventions (e.g. number of students in a
university)
Timeliness of data
• Do we need
– up-to-date data (on-line queries over the net), or
– “static” data (DW)? What are the quality requirements?
• For example
– Stock quotes or sport results
– Census data or publications
Integration issues, more
• Integration is quite “vertical” (in a domain)
– Ex: all CS Dept in Italy
• Domain specifity could be useful, but how to use it?
– Ontologies or taxonomies (or just schemes) could be useful,
but: how?
• Who “annotates” the sources
– Metamodeling: everything mapped to a reference
Semantics or “smart” data
•
•
•
•
A long standing issue (since 1974, in the database field)
What does “semantics” mean?
All the semantics you can handle is syntax
Some say that semantics is now even more important since
data is public
Integrating databases or applications?
• DB talk to DB or applications talk to applications?
• Is communication synchronus or asynchronus?
• You need flexibility!
Another aspect of flexibility:
XML Databases
• Relational data seen as XML
• Storage/retrieval of whole XML documents, with simple search
facility
• Transaction requests expressed in XML (Web services?)
• XML as a rich data type, R/W at the element level
XML support in DBMS, the trend
• Two front-ends:
– Relational (SQL) and XML
• Two back-ends:
– Relational and XML
Where data resides
• Not only databases, but also “dataspaces”:
– Data is sometimes consumed “on-the-fly”: data streams
The same data and views for all?
• Adaptivity:
– Personalization: content adapted to the user
• upon system's decision
• upon user's request
– Customization: structure adapted to the user
• according to the user's role
• upon user's request
– Context dependence
Generalità
• Obiettivi
– Discutere temi di ricerca, di interesse sia per chi lavora nel
settore basi di dati sia per gli altri
– Attenzione alle tematiche metodologiche più interessanti di
oggi (vedi keynote di Stonebraker e Salinger all’ICDE 2005):
• Eterogeneità
• Integrazione
• Approccio:
– Seminariale, con contributi individuali
• Molte idee comuni trovate in un corso tenuto a UBC da Rachel
Pottinger:
– http://www.cs.ubc.ca/~rap/teaching/534a/
Tematiche, da precisare e confermare
•
•
•
•
•
Model Management
Data integration and merging (schema and data)
Schema matching and mapping
Schema translation from a model to another
Foundations:
– Information capacity and schema equivalence
– Query languages and their expressive power
Metadata – data about data
•
•
•
•
Relational Schema
XML Schema or DTD
UML document
ER Diagrams
Example: in a database about grades, the grades are the data. How
it’s stored is the metadata