Inferenza statistica

Verifica di ipotesi
Parte VI
Verifica di ipotesi
Verifica di ipotesi
Definizione (Sistema di ipotesi)
Nell’ambito di un modello statistico parametrico, un sistema di
ipotesi statistiche è costituito da due congetture, incompatibili, sul
parametro ignoto θ ∈ Θ:
Ipotesi nulla
H0 : θ ∈ Θ0
Ipotesi alternativa
H1 : θ ∈ Θ1
dove Θ0 , Θ1 ⊂ Θ e Θ0 ∩ Θ1 = ∅. L’ipotesi H0 si dice ipotesi
nulla; l’ipotesi H1 si dice ipotesi alternativa.
Verifica di ipotesi
Alcuni sistemi di ipotesi (Θ ⊆ R)
Ipotesi nulla semplice contro ipotesi alternativa semplice:
H0 :
θ = θ0
H1 :
θ = θ1
Ipotesi nulla semplice contro ipotesi alternativa composta
(bilaterale):
H0 :
θ = θ0
H1 :
θ 6= θ0
Ipotesi composta (unilaterale) contro ipotesi composta
(unilaterale):
H0 :
θ ≤ (≥)θ0
H1 :
θ > (<)θ0
Verifica di ipotesi
Scopo
Fissato un sistema di ipotesi, scopo della verifica di ipotesi è quello
di decidere, sulla base dell’informazione campionaria, se accettare
l’ipotesi nulla oppure rifiutarla.
Si tratta essenzialmente di stabilire se il campione osservato ci
induca a ritenere H0 vera o falsa.
Definizione (Test)
Chiameremo test qualsiasi regola che ci permetta di stabilire se il
campione osservato induca ad accettare o meno l’ipotesi nulla, H0
Verifica di ipotesi
Definizione (Errori di primo e di secondo tipo)
Si dice errore di primo tipo l’errore che si commette quando si
rifiuta H0 ed essa è vera.
Si dice errore di secondo tipo l’errore che si commette quando si
accetta H0 ed essa è falsa.
Verifica di ipotesi
Generalmente un test si basa su una statistica, detta appunto
statistica test:
T : Y→R
T definisce una partizione di Y, tale che
Se T ∈ A ⊂ R, y appare coerente con H0 e si accetta l’ipotesi
nulla.
Se T ∈ R ⊂ R, y non appare coerente con H0 e si rifiuta
l’ipotesi nulla
dove A e R sono sottoinsiemi disgiunti di R: A ∩ R = ∅
Il fatto che T ∈ A non significa che H0 sia vera
Il fatto che T ∈ R non significa che H0 sia falsa
Verifica di ipotesi
Y
T (y) ∈ R
Rifiuto
T (y) ∈ A
Accettazione
Verifica di ipotesi
Definizione (Le probabilità di errore)
Probabilità di errore del primo tipo:
α(θ) = P(T (Y) ∈ R; H0 )
Probabilità di errore del secondo tipo:
β(θ) = P(T (Y) ∈ A; H1 )
Verifica di ipotesi
Definizione (Funzione di potenza di un test)
Sia T una statistica test per un sistema di ipotesi. La funzione
γ(θ) = P(T (Y) ∈ R; θ)
si dice funzione di potenza del test basato sulla statistica T .
Definizione (Livello di significatività di un test)
Sia T una statistica test per un sistema di ipotesi. La probabilità
α = sup γ(θ),
θ∈Θ0
che rappresenta la massima probabilità di errore del primo tipo, si
dice livello di significatività del test basato sulla statistica T .
Verifica di ipotesi
Esempio
Modello statistico: Y ∼ N(θ, 1)
Sistema di ipotesi:
H0 :
θ≤0
H1 :
θ>0
n=1
Primo test: accettiamo H0 se y < 0.5.
Secondo test: accettiamo H0 se y < z0.95 ' 1.64.
Verifica di ipotesi
Figura 16: Confronto tra i due test
1.0
Confronto tra le funzioni di potenza dei due test
0.6
0.0 α2 0.2 α1 0.4
γ(θ)
0.8
test 1
test 2
−4
−2
0
θ
2
4
Verifica di ipotesi
Un sistema di ipotesi molto semplice
H0 :
θ = θ0
(3)
H1 :
θ = θ1
Una statistica interessante:
λ∗ (y) =
L(θ0 ; y)
L(θ1 ; y)
si dice rapporto di verosimiglianza
Accetteremo H0 quando λ∗ sarà grande, altrimenti rifiuteremo
l’ipotesi nulla
Verifica di ipotesi
In Y, la regione di rifiuto del test sarà definita come
YR = {y ∈ Y : λ∗ (y) ≤ λα }
(4)
λα si dice valore critico ed è scelto in modo tale che
P(λ∗ (Y) ≤ λα ; θ0 ) = α
(5)
Verifica di ipotesi
Lemma (Lemma di Neyman e Pearson)
Dati un modello parametrico G e un sistema di ipotesi del tipo (3),
il test (4) è quello che ha la potenza più alta tra tutti quelli di
livello minore o uguale ad α, con α definito in (5).
Verifica di ipotesi
Lemma di N-P: dimostrazione (Y discreta)
Supponiamo che esista un Y ∗ ⊆ Y di livello non superiore ad
α:
X
X
α=
f (y; θ0 ) ≥
f (y; θ0 )
y∈YR
y∈Y ∗
Quindi
X
y∈YR \Y ∗
f (y; θ0 ) ≥
X
y∈Y ∗ \YR
f (y; θ0 )
Verifica di ipotesi
Lemma di N-P: dimostrazione (Y discreta)
Se y ∈ YR \ Y ∗ ⊆ YR ,
f (y; θ0 ) < f (y; θ1 )λα
Se y ∈ Y ∗ \ YR ⊆ YRc ,
f (y; θ0 ) > f (y; θ1 )λα
Verifica di ipotesi
Lemma di N-P: dimostrazione (Y discreta)
Allora,
λα
X
y∈YR
\Y ∗
X
f (y; θ1 ) ≥
y
∈Y ∗ \Y
f (y; θ0 )
R
X
≥ λα
y
∈Y ∗ \Y
f (y; θ1 )
R
Dividendo
per λα e sommando ad ambo i membri il termine
P
∗
y∈Y ∩YR f (y; θ1 ), si ottiene
X
X
f (y; θ1 )
f (y; θ1 ) ≥
y∈YR
y∈Y ∗
Verifica di ipotesi
Un sistema di ipotesi bilaterale
H0 : θ = θ0
(6)
H1 : θ 6= θ0
Verifica di ipotesi
Rapporto di verosimiglianza
λ(y) = λ =
L(θ0 ; y)
supθ6=θ0 L(θ; y)
se L(θ; y) è continua in θ
=
=
L(θ0 ; y)
supθ∈Θ L(θ; y)
L(θ0 ; y)
L(θ̂; y)
Verifica di ipotesi
Ad un livello di significatività α rifiuteremo H0 per valori piccoli di
λ:
R = {y : λ(y) ≤ λα }
Verifica di ipotesi
Una statistica equivalente a λ(y):
W (y) = −2 log(λ(y))
= −2(l(θ0 ) − l(θ̂))
W è funzione strettamente decrescente di λ
Ad un livello di significatività α rifiuteremo H0 per valori
grandi di W :
R = {y : W (y) ≥ wα = −2 log(λα )}
Verifica di ipotesi
Se siamo nel contesto di un problema regolare di stima e il sistema
di ipotesi è del tipo (6), allora quando H0 è vera
a
W (Y) ∼ χ21 ,
quindi, ad un livello di significatività α,
R = y : W (y) ≥ χ21,1−α
Verifica di ipotesi
Due statistiche test asintoticamente equivalenti a W
statistica test di Wald
We (y) = (θ̂ − θ0 )2 I(θ̂)
statistica test del punteggio (o dello score)
Wu (y) =
l 0 (θ0 )2
I(θ0 )
Verifica di ipotesi
Formulazione generale
H0 : θ ∈ Θ0
H1 : θ ∈ Θ1
λ(y) = λ =
supθ∈Θ0 L(θ; y)
L(θ̂0 ); y)
=
≤1
supθ∈Θ L(θ; y)
L(θ̂); y)
Ad un livello di significatività α rifiuteremo H0 per valori piccoli di
λ:
R = {y : λ(y) ≤ λα }
dove
λα : sup P(λ(Y) ≤ λα ; θ) ≤ α
θ∈Θ0
Verifica di ipotesi
Una statistica equivalente a λ(y):
W (y) = −2 log(λ(y))
= −2(l(θ0 ) − l(θ̂))
W è funzione strettamente decrescente di λ
Ad un livello di significatività α rifiuteremo H0 per valori
grandi di W :
R = {y : W (y) ≥ wα = −2 log(λα )}
Verifica di ipotesi
Livello di significatività osservato (o p-value)
α̂ = sup P(W (Y) ≥ W (y); θ)
θ∈Θ0
minimo livello di significatività per il quale si rifiuterebbe H0
rappresenta una sorta di distanza tra l’evidenza empirica e
l’ipotesi alternativa
se α̂ ≥ α si accetta H0
se α̂ < α si rifiuta H0
Verifica di ipotesi
Y ∼ N(µ, σ 2 ) media e varianza ignote. Verifica di ipotesi
sulla media
Sistema di ipotesi:
H0 : µ = µ0
H1 : µ 6= µ0
Sotto H0 la stima di massima verosimiglianza per σ 2 è:
Pn
(yi − µ0 )2
2
σˆ0 = i=1
n
L(θ) è massimizzata su tutto Θ da:
µ̂ = ȳ
e σ̂ 2 = s 2
Verifica di ipotesi
Quindi,
Pn
−n/2
(yi − µ0 )2
L(µ0 , σˆ0 2 )
i=1
λ(y) =
= Pn
2
L(µ̂, σ̂ 2 )
i=1 (yi − ȳ )
P
n
2
2 −n/2
i − ȳ ) + n(ȳ − µ0 )
i=1 (yP
=
n
2
i=1 (yi − ȳ )
−n/2
t2
=
1+
n−1
funzione monotona decrescente di |t|, con
√
n(ȳ − µ0 )
t=
s∗
Verifica di ipotesi
Ma, se è vera H0 ,
√
T (Y) =
n(Ȳ − µ0 )
∼ tn−1
S∗
e
α = P(|T | > tn−1,1−α/2 ; H0 )
allora, ad un livello di significatività α,
R = y : |t(y)| > tn−1,1−α/2
Verifica di ipotesi
Non è difficile dimostrare che se
H0 : µ ≤ µ0
H1 : µ > µ0
allora, ad un livello di signifcatività α,
R = {y : t(y) > tn−1,1−α }
Verifica di ipotesi
X ∼ N(µ1 , σ 2 ) , X ∼ N(µ2 , σ 2 ) medie e varianza comune
ignote. Confronto tra medie
Due campione di dimensione n1 e n2 risp. da X e da Y
Sistema di ipotesi:
H0 : µ1 = µ2 equivalente a H0 : µ1 − µ2 = 0
H1 : µ1 6= µ2 equivalente a H1 : µ1 − µ2 6= 0
Stime di massima verosimiglianza sotto H0 :
Pn1
Pn2
Pn
(yi − µ̂0 )2
i=1 xi +
j=1 yj
2
, σˆ0 = i=1
µ̂0 =
n1 + n2
n
L(θ) è massimizzata su tutto Θ da:
µ̂1 = x̄,
µ̂2 = ȳ e σ̂ 2 =
n1 sX2 + n2 sY2
n1 + n2
Verifica di ipotesi
λ(y) =
=
=
=
=
„ 2 «−(n1 +n2 )/2
σ̂0
L(µ̂0 , σˆ0 2 )
=
L(µ̂1 , µ̂2 , σ̂ 2 )
σ̂ 2
!
Pn1
Pn2
2
2 −(n1 +n2 )/2
i=1 (xi − µ̂0 ) +
j=1 (yj − µ̂0 )
Pn1
Pn2
2
2
i=1 (xi − µ̂1 ) +
j=1 (yj − µ̂2 )
!
Pn1
P
n2
2
2 −(n1 +n2 )/2
i=1 (xi − µ̂1 + µ̂1 − µ̂0 ) +
j=1 (yj − µ̂2 + µ̂2 − µ̂0 )
P
Pn1
n2
2
2
j=1 (yj − µ̂2 )
i=1 (xi − µ̂1 ) +
!
Pn1
Pn2
2
2
2
2 −(n1 +n2 )/2
i=1 (xi − µ̂1 ) + n1 (µ̂0 − µ̂1 ) +
j=1 (yj − µ̂2 ) + n2 (µ̂0 − µ̂2 )
Pn1
Pn2
2
2
i=1 (xi − µ̂1 ) +
j=1 (yj − µ̂2 )
!−(n1 +n2 )/2
n1 (µ̂0 − µ̂1 )2 + n2 (µ̂0 − µ̂2 )2
Pn2
1 + Pn1
2
2
i=1 (xi − µ̂1 ) +
j=1 (yj − µ̂2 )
Verifica di ipotesi
Facile verificare che
n1 (µ̂0 − µ̂1 )2 + n2 (µ̂0 − µ̂2 )2 =
n1 x̄ + n2 ȳ 2
n1 x̄ + n2 ȳ 2
= n1 x̄ −
+ n2 ȳ −
n1 + n2
n1 + n2
n1 n2
(x̄ − ȳ )2
=
n1 + n2
(x̄ − ȳ )2
=
1
1
+
n1 n2
Verifica di ipotesi
Quindi
λ(y) =
1+
t2
n1 + n2 − 2
−(n1 +n2 )/2
è funzione monotona decrescente di |t|:
p
(x̄ − ȳ )/ 1/n1 + 1/n2
t = sP
Pn2
n1
2
2
i=1 (xi − µ̂1 ) +
j=1 (yj − µ̂2 )
n1 + n2 − 2
=
(x̄ − ȳ )
r
1
1
s∗
+
n1 n1
Verifica di ipotesi
Ma, se è vera H0 ,
T (Y) =
(X̄ − Ȳ )
r
∼ tn1 +n2 −2
1
1
∗
S
+
n1 n1
e
α = P(|T | > tn1 +n2 −2,1−α/2 ; H0 )
allora, ad un livello di significatività α,
R = y : |t(y)| > tn1 +n2 −2,1−α/2
Verifica di ipotesi
Non è difficile dimostrare che se
H0 : µ1 ≤ µ2
H1 : µ1 > µ2
allora, ad un livello di signifcatività α,
R = {y : t(y) > tn1 +n2 −2,1−α }
Verifica di ipotesi
Y ∼ N(µ, σ 2 ) media e varianza ignote. Verifica di ipotesi
sulla varianza
Sistema di ipotesi:
H0 : σ 2 = σ02
H1 : σ 2 6= σ02
W (y) = −2 log(λ(y)) = −n log
σ̂ 2
σ02
Dipende dai dato solo attraverso T (y) =
T (Y) =
nσ̂ 2
∼ χ2n−1
σ02
+n
nσ̂ 2
e
σ02
sotto H0
σ̂ 2
−1
σ02
Verifica di ipotesi
Bisognerebbe determinare due valori, t1 e t2 , tali che
W (t1 ) = W (t2 ) = −2 log(λα )
e, sotto H0 ,
P(t1 < T < t2 ) = 1−α
Nella pratica si rinuncia alla prima condizione e definisce la regione
di accettazione per T :
A = [t1 = χ2n−1,α/2 , t2 = χ2n−1,1−α/2 ]
Verifica di ipotesi
Alcuni esercizi
Matteo Grigoletto e Laura Ventura (1998)
Statistica per le Scienze economiche. Esercizi con richiami di teoria
G. Giappichelli Editore - Torino
Cap. 6. Esercizi: 6.2.1; 6.2.2; 6.2.3; 6.2.4