Presentazione - Nucleo di Ricerca in Didattica dell`Informatica

annuncio pubblicitario
L’ordito algoritmico:
alcuni problemi algoritmici che hanno favorito il
progresso scientifico.
Alberto Policriti
Dipartimento di Matematica e Informatica,
Universita’ di Udine.
[email protected]
www.dimi.uniud.it/~policrit
Di cosa parleremo
• Classi di problemi (i problemi specifici
richiedono un trattamento tecnico)
• Problemi “significativi” (che legano
l’algoritmica ad altre discipline)
• Complessita’ (perche’, alla fine, e’ il vero
problema dell’algoritmica)
Quali problemi
• Il problema della decisione
(Entscheidungsproblem)
• Problemi algoritmici in biologia
computazionale
• Una riflessione sulla nozione di
complessita’
Passato
Presente
Futuro
Le fonti principali
• M. Davis
“The Universal Computer: the road from Leibniz to
Turing”
• S. Feferman
“On the light of Logic”
• E. Green
“Strategies for the systematic sequencing of complex
genomes”
• D. Knuth
“Papers on the foundation of Computer Science”
Il problema della decisione
Trovare un algoritmo per decidere le formule
se una formula della logica al prim’ordine e’
soddisfacibile.
In a sense it [il problema della decisione] is the most general
probem of mathematics.
J. Herbrand
La logica del prim’ordine
Esempio:
x‚y‚uxyx‚u  vy ‚vv‚u
Se x e y sono donne e x e’ felice con u, allora esiste v tale che y e’
felice con ve u e v sono amici
Se x e y sono punti e x e’ sulla retta u, allora esiste v tale che y e’
sulla retta v e u e v sono parallele
Esempio:
Un algoritmo per risolvere il problema della decisione ci potrebbe dire
se l’ipotesi di Riemann (ottavo problema di Hilbert) e’ vera o falsa!
David Hilbert
Born: 23 Jan 1862 in Königsberg, Prussia (now
Kaliningrad, Russia)
Died: 14 Feb 1943 in Göttingen, Germany
D. Hilbert nel 1937
The Entscheidungsproblem is solved when we know a procedure
that allows for any given logical expression to decide by finitely
many operations its validity or satisfiability. [...] The
Entscheidungsproblem must be considered the main problem of
mathematical logic.
Principles of Mathematical Logic
D. Hilbert and W. Ackermann 1928
Hilbert sapeva porre problemi!
The mathematicians present at an international conference in
Paris in August 1900 inevitably wondered what the new
century would bring to their subject. [...] he presented, as a
challenge to the mathematicians of the twentieth century, 23
problems that seemed utterly inaccessible by the methods
available at the time.
The Universal Computer
M. Davis
In his work, Hilbert demonstrated an unusual combination of
direct intuition and concern for absolute rigor. With exceptional
technical power at his command, he would tackle outstanding
problems, usually with a great originality of approach.
The title of Hilbert’s lecture in Paris was simply, “Mathematical
problems”.
Deciding the undecidable: Wrestling with Hilbert’s problems
S. Feferman
The great importance of definite problems for the
progress of mathematical science in general ... is
undeniable. ... [for] as long as a branch of
knowledge supplies a surplus of such problems,
it maintains its vitality. ... every mathematician
certainly shares ..the conviction that every
mathematical problem is necessarily capable of
strict resolution ... we hear within ourselves the
constant cry: There is the problem, seek the
solution. You can find it through pure thought...
D. Hilbert
The solution of three of Hilbert’s problems were to involve
mathematical logic and the foundation of mathematics in an essential
way; they are the ones numbered 1,2, and 10 in his list
Deciding the undecidable: Wrestling with Hilbert’s problems
S. Feferman
1. L’ipotesi del continuo
2. La consistenza dell’aritmetica
10. L’esistenza di un algoritmo per risolvere le equazioni
diofantee
Non parleremo di 1. e 2. e’ il legame con il problema della decisione
Il decimo problema di Hilbert
Equazioni diofantee:
P(x1, ... , xk) = 0 con P polinomio a coefficienti interi
Esempio:
E’ possibile scrivere una equazione diofantea che ammette soluzioni intere se
e solo se l’ipotesi di Riemann e’ falsa.
Contrary to Hilbert’s expectations, Problem 10 was eventually solved in the
negative. This was accomplished in 1970 by a young russian mathematician, Yuri
Matiyasevich, who built on earlier work in 1950’s and 1960’s by the American
logicians Martin Davis, Hilary Putnam, and Julia Robinson. [...]
Deciding the undecidable: Wrestling with Hilbert’s problems
S. Feferman
Gia’ nel 1920 si sospettava che problemi come il precedente fossero
indecidibili. Ma come dimostrare che non esiste un algoritmo??
La soluzione del secondo problema:
il simposio di Könisberg del 1930
During the days immediately preceding Hilbert’s address, a
symposium on the foundations of mathematics took place in
Königsberg. [...] At the round table discussion that concluded the
event, a shy young man named Kurt Gödel [...] made a quiet
announcement that, to those who grasped its import, signalled a new
era in foundational studies. Von Neumann got the point at once, and
concluded that the jig was up, that Hilbert’s program could not
succeed.
The Universal Computer
M. Davis
Il programma di Hilbert
1. La consistenza dell’aritmetica (secondo problema di
Hilbert)
2. La completezza della logica e dell’aritmetica (Gödel
1928)
3. Il problema della decisione (Entscheidungsproblem)
Kurt Gödel
Born: 28 April 1906 in Brünn, Austria-Hungary (now Brno, Czech Republic)
Died: 14 Jan 1978 in Princeton, New Jersey, USA
The crucial step in Gödel’s proof was his demonstration that the property
of a natural number of being the code of a proposition provable in PM is
itself expressible in PM.
[...]
- U says that some particular proposition is not provable in PM.
- That particular proposition is none other than U itself.
- Therefore, U says: “U is not provable in PM.”
The Universal Computer
M. Davis
Gödel aveva scritto il primo compilatore e ...
decretato la fine del programma di Hilbert!
Cosa rimane del programma di Hilbert?
Hilbert had also sought explicit calculational procedures by
means of which it would always be possible to determine, given some
premises and a proposed conclusion, written in the notation of what
has come to be called “first-order logic”, whether Frege’s rules
would enable that conclusion to be derived from those premises. The
task of finding such procedures came to be known as Hilbert’s
Entscheidungsproblem (literally: decision problem),
The Universal Computer
M. Davis
C’erano risultati parziali e i granndi giovani matematici erano tutti attivi:
F. P. Ramsey, W. Ackermann, P. Bernays , M. Shönfinkel e lo stesso Gödel
Apparently intrigued by these developments, Newman gave a
lecture course in the spring term of 1935 on the foundations
of mathematics featuring Gödel’s incompleteness theorem as
its climax. Attending this course, Turing learned about
Hilbert’s Entscheidungsproblem. Quite apart from the
incredulity of such as Hardy, after Gödel’s work it was hard
to believe that there could be an algorithm such as Hilbert
had wanted. Alan Turing began to think about how it could be
possible to prove that no such algorithm exists.
The Universal Computer
M. Davis
Now, if someone comes along with a proposed algorithm to
settle a given decision problem in a positive way, one can check
to see that it does the required work (or at least try to do so),
without inquiring into the general nature of what constitutes an
algorithm. But if it is to be shown that the problem is
undecidable, one has to have a precise explanation of what
algorithms can compute in general.
Deciding the undecidable: Wrestling with Hilbert’s problems
S. Feferman
Alan Turing
http://www.turing.org.uk/turing/
His high pitched voice already stood out above the general murmur of well-behaved
junior executives grooming themselves for promotion within the Bell corporation. Then
he was suddenly heard to say: "No, I'm not interested in developing a powerful brain.
All I'm after is just a mediocre brain, something like the President of the American
Telephone and Telegraph Company."
Quoted in A Hodges, Alan Turing the Enigma of Intelligence, (London 1983) 251.
[...] on the basis of Turing’s analysis of the notion of
computation, it is possible to conclude that anything
computable by any algorithmic process can be computed by a
Turing machine. So if we can prove that some particular task
can not be accomplished by a Turing machine, we can conclude
that no algorithmic process can accomplish that task. That is
how Turing proved that there is no algorithm for the
Entscheidungsproblem. In addition, Turing showed how to
produce one individual Turing machine that, all by itself, can
do anything that could be done by any Turing machine
whatever – a mathematical model of an all-purpose computer.
The Universal Computer
M. Davis
Il metodo diagonale nel
lavoro di Turing
Now, if we think of the halting set of a Turing machine as constituting
a “package” and of the code number of that machine as labeling that
package, then we have exactly the typical setup for applying the
diagonal method: labeled packages in which the labels are exactly
the kind of thing in the packages – in this case, natural numbers.
The Universal Computer
M. Davis
La macchina universale di Turing
The universal machine also provides a model of a “stored
program” computer [...] in which the machine makes no
fundamental distinction between “program” and “data.”
Finally, the universal machine shows how “hardware” [...]
thought of as a description of the functioning of a mechanism,
canbe replaced by equivalent “software” [...] “stored” on the
tape of a universal machine.
The Universal Computer
M. Davis
On computable numbers with an application to the `Entscheidungsproblem’
A. Turing Proc. of the London Mathematical Society 1937
Turing’s universal computer was a marvelous conceptual device that
all by-itself could execute any algorithmic task. But could one actually
build such a thing? And aside from what such a machine could
accomplish “in principle,” could it be designed and constructed so as
to be able to solve real world problems in an acceptable time frame,
and using reasonable available resources?
By the end of 1945, Turing had produced his remarkable ACE
(Automatic Computing Engine) Report. One detailed comparison
of the ACE Report with von Neumann's EDVAC Report, notes that
whereas the latter ``is a draft and is unfinished … more important
… is incomplete …'' the ACE Report ``is a complete description of
a computer, right down to the logical circuit diagrams'' and even
including ``a cost estimate of £11,200.''
The Universal Computer
M. Davis
ACE: la risposta (inglese) di
Turing ad Edvac
[It] is … very contrary to the line of development
here, and much more in the American tradition of
solving one's difficulties by means of much
equipment rather than by thought.
… Furthermore certain operations which we regard
as more fundamental than addition and
multiplication have been omitted.
---------------------------------------------Alan Turing
Problemi algoritmici in biologia
computazionale
Astronomy began when the Babylonians mapped
the heavens. Our descendants will certainly not
say that biology began with today’s genome
projects, but they may well recognize that a great
acceleration in the accumulation of biological
knowledge began in our era. To make sense of this
knowledge is a challenge, and will require
increased understanding of the biology of cells
and organisms. But part of the challenge is simply
to organise, classify and parse the immense
richness of sequence data.
Biological sequence analysis
R. Durbin, S. Eddy, A. Krogh and G. Mitchinson
Un po’ di storia
• 1953: F. Crick e J. Watson scoprono la struttura a
doppia elica del DNA
• anni ’70: si sviluppano le tecniche per il
sequenziamento di spezzoni di DNA (F. Sanger)
• anni ’80: viene lanciato il progetto genoma e
partono le prime sperimentazioni pilota (insieme
alle prime compagnie per lo sfruttamento
commerciale di queste ricerche)
• anni ’90: vengono sequenziati i primi organismi
(qualche M di paia di basi)
• 1990: viene pubblicato BLAST
• 1998: C. Venter annuncia la costituzione della
compagnia privata Celera e sfida il consorzio
pubblico per il sequenziaemnto del genoma
umano: Celera otterra’ il risultato in 3 anni (e
300 M di $)
http://www.accessexcellence.org/AB/
Human Genome Working Draft Sequence
published February 15 & 16, 2001
Science and Nature
Dietro la sfida:
Two main shotgun-sequencing strategies.
Clone-by-clone shotgun sequencing
Whole-genome shotgun sequencing
Programmi e algoritmi in bioinformatica
[...] Yet other programs provide user-friendly viewers for inspection
and editing of the resulting sequence assemblies. A particularly popular
suite of programs for these various steps is Phred, Phrap and
Consed,which are designed for base calling, sequence assembly and
the viewing of sequence assemblies, respectively. [...]
Strategies for the systematic sequencing of complex genomes
Eric D. Green
(21 occorrenze della parola “programs” 2 della parola “algorithms”)
Programmi e algoritmi nella sfida
Finally, perhaps the most essential element of any whole-genome
shotgun-sequencing strategy is the availability of a robust assembly
program that can accommodate the inevitably large collection of
sequence reads. [...] include algorithms that account for the anticipated
spatial relationship of read pairs emanating from individual subclones,
which help to avoid misassemblies due to repetitive sequences.
Strategies for the systematic sequencing of complex genomes
Eric D. Green
Com’e’ finita la sfida?
L’allineamento di sequenze
Among the most useful computer-based tools in
modern biology are those that involve sequence
alignments of proteins, since these alignements oftem
provide insights into gene and protein function. There
are several types of alignments: global alignments of
pairs of proteins, multiple alignments of members of
protein families, and alignments made diring data
base searches to detect homologies.
S. Henikoff and J.G.Henikoff PNAS 1992
Cos’e’ un allineamento?
Input:
GTTGATTAGCTTATCCCAAAGCAAGGCACTGAAAATGCTAGAT
GTGATGTAGCTTAACCCAAGCAAGGCACTAAAAATGCCTAGAT
Output:
GTTGAT_TAGCTTATCCCAAAGCAAGGCACTGAAAATG_CTAGAT
GT_GATGTAGCTTAACCCAA_GCAAGGCACTAAAAATGCCTAGAT
Algoritmi
•
•
•
•
•
•
•
Needelman-Wunsh 1970
Smith –Waterman 1981
Landau-Vishkin 1986
Wu-Manber 1992
Myers 1994
Chang-Lawler 1994
...
GTTGATTAGCTTATCCCAAAGCAAGGCACTGAAAATGCTAGAT
GTGATGTAGCTTAACCCAAGCAAGGCACTAAAAATGCCTAGAT
G
T
G
A
T
G
T
A
G
T
T G A
T
T
A G C
T
T
A
0
1
2
3
4
5
6
7
10
11
12
1
0
1
2
3
2
1
1
1
2
3
2
2
2
1
4
3
2
5
4
3
6
5
4
7
6
5
8
9
GTTGAT_TAGCTTATCCCAAAGCAAGGCACTGAAAATG_CTAGAT
GT_GATGTAGCTTAACCCAA_GCAAGGCACTAAAAATGCCTAGAT
Altri problemi algoritmici correlati
• exact-matching (un problema piu’ “vecchio”
e forse meno “applicativo”, gli algoritmi per
la cui soluzione si sono rivelati
fondamentali)
• strutture dati (non conviene rappresentare
in memoria sequenze come stringhe ma
come sistemi di indici per tutti i possibili
suffissi della sequenza)
• protein folding (un bel problema NPcompleto che ci hanno regalato i biologi)
• ...
Riflessioni conclusive
• Il problema della decisione poteva essere difficile ma era
enunciato in modo chiaro e preciso. Matematicamente
“pulito”.
• I problemi algoritmici in biologia computazionale non sono
sempre altrettanto “puliti” (forse, piu’ sono interessanti e
piu’ sono “sporchi”).
• In cosa consiste veramente la complessita’ di un problema
algoritmico?
Complessita’: le risorse che
abbiamo sono finite
Mathematics and Computer Science: Coping with Finiteness
Advances in our ability to compute are bringing us substantially
closer to ultimate limitations
D. Knuth
Che risorse (computazionali) abbiamo?
Universo
protone
10-13 cm
40 miliardi di anni luce
125
10
(maggiore o uguale al) numero di protoni nell’universo
Se assumiamo una unita’ di tempo pari al tempo necessario alla
luce a viaggiare per 10-13 cm e assumiamo che l’universo sia nato
10 milioni di anni fa, il numero di unita’ di tempo trascorse e’
minore o uguale a
42
10
Che “speranze” abbiamo
•
•
•
•
•
snail 0.0006 miles/h
man 4 miles/h
US auto 55 miles/h
Jet 600 miles/h
Supersonic jet 1200 miles/h
•
•
•
•
•
man (pencil) 0.2/sec
man (abacus) 1/sec
calculator 4/sec
computer 200.000/sec
fast computer 2M/sec
Grid problem: calcolare il numero di cammini da start a finish
finish
start
Il problema e’ difficile
• non ci sono metodi noti per calcolare il numero di
cammini (in a reasonable amount of time)
• possiamo comunque generare dei cammini random e
usare un teorema di statistica che ci dice che la
stima migliore e’ data dalla media dei reciproci
delle probabilita’ osservate
• otteniamo una stima enorme: (1.6 ± 0.3) 1024
Un problema semplice (da enunciare) e “pulito”, ma ...
non possiamo contare nemmeno su una procedura
esaustiva per enumerare i cammini!
il problema di stabilire una (qualunque) proprieta’ dei
cammini sulla griglia e’ algoritmicamente trattabile?
Forse abbiamo bisogno di una teoria della
complessita’ algoritmica che ci permetta di
classificare questo come un problema difficile
Conclusioni
I problemi algoritmici costituiscono l’ossatura
dell’informatica e le loro soluzioni richiedono uno
sforzo (matematico) genuino e particolare
I problemi algoritmici si sono rivelati essere “dietro
la scena” in momenti cruciali dell’avanzamento
scientifico
La complessita’ ed una teoria adeguata per il suo
studio e’ probabilmente la piu’ interessante delle
attuali sfide algoritmiche
My favorite way to describe computer science
is to say that it is the study of algorithms.
D.Knuth
Scarica