Lab 1 - Yashar Deldjoo

annuncio pubblicitario
Yashar Deldjoo
Massimo Quadrana
Lab 1: “Linear Algebra Using Python
NumPy Library”
Recommender Systems Course
Course Lecturer: Paolo Cremonesi
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Table of Contents
●
References
●
Languages of Data Science
●
Python Main Libraries
●
Numpy Library
o
Introduction
o
Matrices
o
Standard Functions for Linear Algebra
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
References
Excellent book on ML
Libraries: numpy, Scipy,
Scikit-learn
Other libraries: Gensim (topic
modelling), text classification,
sentiment analysis
Short book with focus on ML
using Scikit-learn
Some knowledge on Scipy and
Pandas are assumed.
Recommender Systems - Lab 1, Politecnico di Milano 2016
Nice reference with details on 3
main areas: supervised
classification, supervised
regression and unsupervised
methods .
Considers “big data" by introducing
Hadoop,
POLITECNICO DI MILANO
Languages of Data Science
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Languages of Data Science: Comparison
Key Libraries
R
●
●
●
●
gbm
RTextTools
dplyr, zaa
ggplot2, caret,
Standouts
●
●
●
●
Python
●
●
●
numpy, matlplotlib
scipy, Scikit-Learn
Pandas, nltk, theano
●
●
●
●
Setbacks
OpenSource
Good for statistical analysis and
data processing
Huge collection of algorithms (as
packages)
Visualization support
●
OpenSource
Easy to learn
All the benefits of general
programming languages
Big data ready
●
●
Speed of execution
Need to handle
library
dependencies for
ver 2.x and 3.x
●
Steep learning
curve
Obscure commands
MATLAB
●
●
●
●
Statistics and ML
Image Processing
Signal Processing
Optimization and
Wavelet
●
●
●
Good with mathematical processes
Complex matrix operations
Broad range of Ml, Signal
Processing, Image Processing
libraries (as toolbox)
●
●
Not Open Source
Difficulty when the
data is not in Matrix
format
libsvm
shogun
liblinear
ltfat, vlfeat
●
●
OpenSource
Designed to handle numerical and
scientific computing
Good performance
Ability to call C, Python functions
●
Julia
●
●
●
●
Relatively new
language
Recommender Systems - Lab 1, Politecnico di Milano 2016
●
●
POLITECNICO DI MILANO
Python Main Libraries
Fundamental package for
scientific computing with Python
A mature and popular plotting package, that
provides publication-quality 2D plotting as
well as rudimentary 3D plotting
Recommender Systems - Lab 1, Politecnico di Milano 2016
A collection of numerical algorithms and
toolboxes, including optimization, statistics,
signal processing and much more.
A strong machine learning library for the
Python programming language. It features
various classification, regression and
clustering. Communicates with NumPy and
Scipy quite well.
POLITECNICO DI MILANO
Numpy
NumPy is the fundamental package for scientific computing with Python:
●
a powerful N-dimensional array object
●
sophisticated (broadcasting) functions
●
tools for integrating C/C++ and Fortran code
●
useful linear algebra, Fourier transform, and random number
capabilities
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Numpy
Why is it important to use NumPy arrays?
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Example: Comparison
tstart = time.time()
tstart = time.time()
tstart = time.time()
s=0
for num in range(1,10001):
a = np.arange(1001)
b=a*a
s = s + sum(b)
s=0
for num in range(1,10001):
a = np.arange(1001)
b = a.dot(a)
s=s+b
s=0
for num in range(1,10001):
a = [x ** 2 for x in range(0, 1001)]
s = s + sum(a)
print('Sum = {}'.format(s))
tend = time.time()
print('Sum = {}'.format(s))
tend = time.time()
print('Sum = {}'.format(s))
tend = time.time()
print(tend - tstart, 'sec' )
print(tend - tstart, 'sec' )
numpy data storage
print(tend - tstart, 'sec' )
numpy data storage +
using optimized function
written in C
normal python list
written in C
Sum = 3338335000000
2.08597731590271 sec
Sum = 3338335000000
0.6752722263336182 sec
Recommender Systems - Lab 1, Politecnico di Milano 2016
Sum = 3338335000000
0.02157735824584961 sec
POLITECNICO DI MILANO
Compare
tstart = time.time()
tstart = time.time()
tstart = time.time()
s=0
for num in range(1,10001):
a = np.arange(1001)
b=a*a
s = s + sum(b)
s=0
for num in range(1,10001):
a = np.arange(1001)
b = a.dot(a)
s=s+b
s=0
for num in range(1,10001):
a = [x ** 2 for x in range(0, 1001)]
s = s + sum(a)
print('Sum = {}'.format(s))
tend = time.time()
print('Sum = {}'.format(s))
tend = time.time()
print('Sum = {}'.format(s))
tend = time.time()
print(tend - tstart, 'sec' )
print(tend - tstart, 'sec' )
numpy data storage
print(tend - tstart, 'sec' )
numpy data storage +
using optimized function
written in C
normal python list
written in C
Sum = 3338335000000
2.08597731590271 sec
Sum = 3338335000000
0.6752722263336182 sec
Recommender Systems - Lab 1, Politecnico di Milano 2016
Sum = 3338335000000
0.02157735824584961 sec
POLITECNICO DI MILANO
Numpy
● Import NumPy
>>> from numpy import *
>>> import numpy as np
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Numpy
● Import NumPy
>>> from numpy import *
>>> import numpy as np
Recommender Systems - Lab 1, Politecnico di Milano 2016
Pollutes the namespace
recommended ✔
e.g. np.array([1,2,4,5])
POLITECNICO DI MILANO
MATRICES
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Matrices
● Creating Matrices
● Indexing Matrices
● Combining Matrices
● Operations on Matrices
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Matrices
python uses 0-array notation
(like C, C++, NOT MATLAB)
A =
a00
a01
a021
a10
a11
a12
a20
a21
a22
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Matrices
● Creating Matrices
>>> import numpy as np
>>> a = np.array([0,1,2,3,4,5])
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Matrices
● Creating Matrices
>>> import numpy as np
>>> a = np.array([0,1,2,3,4,5])
>>> print(a)
[0, 1, 2, 3, 4, 5]
1D Array
>>> a.ndim
1
>>> a.shape
(6,)
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Matrices
● Creating Matrices
>>> import numpy as np
>>> a =
np.array([[0,1],[2,3],[4,5]])
2D Array
>>> print(a)
[[0, 1]
[2, 3]
[4, 5]]
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Matrices
● Creating Matrices
>>> import numpy as np
>>> a = np.matrix(‘0,1;2,3;4,5’)
>>> print(a)
[[0, 1]
[2, 3]
[4, 5]]
Recommender Systems - Lab 1, Politecnico di Milano 2016
2D Array
POLITECNICO DI MILANO
Indexing
A
=
1
2
3
4
5
6
7
8
9
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Indexing
[all rows, col 1] = ?
A
=
1
2
3
4
5
6
7
8
9
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Indexing
A[:,0]
A
=
1
2
3
4
5
6
7
8
9
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Indexing
[all rows, col end] = ?
A
=
1
2
3
4
5
6
7
8
9
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Indexing
A[:,-1]
A
=
1
2
3
4
5
6
7
8
9
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Indexing
?
A
=
1
2
3
4
5
6
7
8
9
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Indexing
A[1:3,0:2]
A
=
1
2
3
4
5
6
7
8
9
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Indexing
[last three rows ] = ?
A
=
Recommender Systems - Lab 1, Politecnico di Milano 2016
1
2
3
.
.
.
.
.
.
.
.
.
4
5
6
7
8
9
10
11
12
POLITECNICO DI MILANO
Indexing
A[-3:,:]
A
=
Recommender Systems - Lab 1, Politecnico di Milano 2016
1
2
3
.
.
.
.
.
.
.
.
.
4
5
6
7
8
9
10
11
12
POLITECNICO DI MILANO
Indexing
[first two rows ] = ?
A
=
Recommender Systems - Lab 1, Politecnico di Milano 2016
1
2
3
4
.
.
.
5
.
.
.
6
.
.
.
7
8
9
10
11
12
POLITECNICO DI MILANO
Indexing
A[0:2,:]
A
=
Recommender Systems - Lab 1, Politecnico di Milano 2016
1
2
3
4
.
.
.
5
.
.
.
6
.
.
.
7
8
9
10
11
12
POLITECNICO DI MILANO
Combining Matrices
>>> a = np.array([[1,2,3],[4,5,6]])
>>> b = np.array([[7,8,9],[10,11,12]]
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Combining Matrices
>>> c = hstack((a,b))
a
b
1
2
3
7
8
9
4
5
6
10
11
12
>>> d = vstack((a,b))
a
1
2
3
4
5
6
7
8
9
10
11
12
Recommender Systems - Lab 1, Politecnico di Milano 2016
b
POLITECNICO DI MILANO
Combining Matrices
a
>>> c = c_[a,b]
b
1
2
3
7
8
9
4
5
6
10
11
12
>>> d = r_[a,b]
a
1
2
3
4
5
6
7
8
9
10
11
12
Recommender Systems - Lab 1, Politecnico di Milano 2016
b
POLITECNICO DI MILANO
Combining Matrices
a
>>> a
1
2
3
4
5
6
aT
>>> d = np.transpose(a)
or
d = a.transpose()
Recommender Systems - Lab 1, Politecnico di Milano 2016
1
4
2
5
3
6
POLITECNICO DI MILANO
Linear Algebra in Numpy
Numpy
MATLAB
Usage
ndim(a) or a.ndim
ndims(a)
get the number of dimensions of an array
size(a) or a.size
numel(a)
get the number of elements of an array
shape(a) or a.shape
size(a)
get the “size” of the matrix
np.array([[1.,2.,3.], [4.,5.,6.]])
[ 1 2 3; 4 5 6 ]
2x3 matrix literal
vstack([hstack([a,b]), hstack([c,d])])
[ a b; c d ]
construct a matrix from blocks a, b, c, and d
a[-1]
a(end)
access last element in the 1xn matrix a
a[1,4]
a(2,5)
access element in second row, fifth column
a[0:5,:]
a(1:5,:)
the first five rows of a
a[-5:]
a(end-4:end,:)
the last five rows of a
a[np.ix_([1,3,4],[0,2])]
a([2,4,5],[1,3])
rows 2,4 and 5 and columns 1 and 3. This
allows the matrix to be modified, and
doesn’t require a regular slice.
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Linear Algebra with Numpy
Numpy
MATLAB
Usage
a.transpose() or a.T
a'
transpose of a
a.dot(b)
a*b
matrix multiplication
a*b
a .* b
element-wise multiplication
a/b
a./b
element-wise divide
np.nonzero(a>0.5)
find(a>0.5)
find the indices where (a > 0.5)
y = x.copy()
y=x
numpy assigns by reference
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Compare
import numpy as np
a = np.array([[1, 2, 3],[4, 5, 6]])
print(a)
b=a
b[1,1] = 77
print(b)
print(a)
[[1 2 3]
[4 5 6]]
[[ 1 2 3]
[ 4 77 6]]
[[ 1 2 3]
[ 4 77 6]]
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Compare
import numpy as np
a = np.array([[1, 2, 3],[4, 5, 6]])
print(a)
b=a
b[1,1] = 77
print(b)
print(a)
[[1 2 3]
[4 5 6]]
[[ 1 2 3]
[ 4 77 6]]
[[ 1 2 3]
[ 4 77 6]]
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Compare
import numpy as np
import numpy as np
a = np.array([[1, 2, 3],[4, 5, 6]])
print(a)
a = np.array([[1, 2, 3],[4, 5, 6]])
print(a)
b=a
b[1,1] = 77
print(b)
b = a.copy()
print(a)
print(a)
b[1,1] = 77
print(b)
[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]
[[ 1 2 3]
[ 4 77 6]]
[[ 1 2 3]
[ 4 77 6]]
[[ 1 2 3]
[ 4 77 6]]
[[ 1 2 3]
[ 4 7 6]]
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Linear Algebra with Numpy
Numpy
MATLAB
Usage
np.arange(1.,11.) or np.r_[1.:11.]
1:10
create an increasing vector
np.arange(10.) or np.r_[:10.]
0:9
create an increasing vector
zeros(3,4)
3x4 two-dimensional array full of 64-bit
floating point zeros
ones(3,4)
3x4 two-dimensional array full of 64-bit
floating point ones
np.zeros((3,4))
np.ones((3,4))
np.eye(3)
np.diag(a)
eye(3)
diag(a)
3x3 identity matrix
vector of diagonal elements of a
np.random.rand(3,4)
rand(3,4)
random 3x4 matrix
np.linspace(1,3,4)
linspace(1,3,4)
4 equally spaced samples between 1 and 3,
inclusive
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Linear Algebra with Numpy
Numpy
MATLAB
Usage
a.max(0)
max(a)
maximum element of each column of matrix a (along axis =0)
a.max(1)
max(a,[],2)
maximum element of each column of matrix a (along axis =1)
a.max()
max(max(a))
maximum element of a
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Linear Algebra with Numpy
Numpy
MATLAB
Usage
np.linalg.inv(a)
inv(a)
inverse of square matrix a
np.linalg.pinv(a)
pinv(a)
pseudo-inverse of matrix a
np.linalg.matrix_rank(a)
rank(a)
matrix rank of a 2D array / matrix a
np.linalg.solve(a,b) if a is square;
np.linalg.lstsq(a,b) otherwise
a\b
solution of a x = b for x
Example:
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Linear Algebra with Numpy
Numpy
MATLAB
Usage
np.linalg.inv(a)
inv(a)
inverse of square matrix a
np.linalg.pinv(a)
pinv(a)
pseudo-inverse of matrix a
np.linalg.matrix_rank(a)
rank(a)
matrix rank of a 2D array / matrix a
np.linalg.solve(a,b) if a is square;
np.linalg.lstsq(a,b) otherwise
a\b
solution of a x = b for x
Example:
import numpy as np
a = np.array([[4, -1, 3],[5, -2,
9],[-4,7,4]])
b = np.array([[12],[25],[-3]])
[[ 1.4829932 ]
[-0.61904762]
[ 1.81632653]]
c = np.linalg.solve(a,b)
Recommender Systems - Lab 1, Politecnico di Milano 2016
POLITECNICO DI MILANO
Linear Algebra - NumPy/SciPy
Numpy
MATLAB
Usage
U, S, Vh = np.linalg.svd(a)
[U,S,V]=svd(a)
singular value decomposition of a
D,V = np.linalg.eig(a)
[V,D]=eig(a)
eigenvalues and eigenvectors of a
Q,R = sp.linalg.qr(a)
[Q,R,P]=qr(a,0)
QR decomposition
I = np.argsort(a[:,i]), b=a[I,:]
[b,I] = sortrows(a,i)
sort the rows of the matrix
np.linalg.lstsq(X,y)
regress(y,X)
multilinear regression
np.unique(a)
unique(a)
Example:
import numpy as np
w, v = np.linalg.eig(np.array([[1, -1], [1, 1]]))
print(w)
print(v)
Recommender Systems - Lab 1, Politecnico di Milano 2016
[[ 1.41421356 -1.41421356]
[[ 0.92387953 -0.38268343]
[ 0.38268343 0.92387953]]
POLITECNICO DI MILANO
Questions
Thanks for your Attention!
POLITECNICO DI MILANO
Scarica