Yashar Deldjoo Massimo Quadrana Lab 1: “Linear Algebra Using Python NumPy Library” Recommender Systems Course Course Lecturer: Paolo Cremonesi Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Table of Contents ● References ● Languages of Data Science ● Python Main Libraries ● Numpy Library o Introduction o Matrices o Standard Functions for Linear Algebra Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO References Excellent book on ML Libraries: numpy, Scipy, Scikit-learn Other libraries: Gensim (topic modelling), text classification, sentiment analysis Short book with focus on ML using Scikit-learn Some knowledge on Scipy and Pandas are assumed. Recommender Systems - Lab 1, Politecnico di Milano 2016 Nice reference with details on 3 main areas: supervised classification, supervised regression and unsupervised methods . Considers “big data" by introducing Hadoop, POLITECNICO DI MILANO Languages of Data Science Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Languages of Data Science: Comparison Key Libraries R ● ● ● ● gbm RTextTools dplyr, zaa ggplot2, caret, Standouts ● ● ● ● Python ● ● ● numpy, matlplotlib scipy, Scikit-Learn Pandas, nltk, theano ● ● ● ● Setbacks OpenSource Good for statistical analysis and data processing Huge collection of algorithms (as packages) Visualization support ● OpenSource Easy to learn All the benefits of general programming languages Big data ready ● ● Speed of execution Need to handle library dependencies for ver 2.x and 3.x ● Steep learning curve Obscure commands MATLAB ● ● ● ● Statistics and ML Image Processing Signal Processing Optimization and Wavelet ● ● ● Good with mathematical processes Complex matrix operations Broad range of Ml, Signal Processing, Image Processing libraries (as toolbox) ● ● Not Open Source Difficulty when the data is not in Matrix format libsvm shogun liblinear ltfat, vlfeat ● ● OpenSource Designed to handle numerical and scientific computing Good performance Ability to call C, Python functions ● Julia ● ● ● ● Relatively new language Recommender Systems - Lab 1, Politecnico di Milano 2016 ● ● POLITECNICO DI MILANO Python Main Libraries Fundamental package for scientific computing with Python A mature and popular plotting package, that provides publication-quality 2D plotting as well as rudimentary 3D plotting Recommender Systems - Lab 1, Politecnico di Milano 2016 A collection of numerical algorithms and toolboxes, including optimization, statistics, signal processing and much more. A strong machine learning library for the Python programming language. It features various classification, regression and clustering. Communicates with NumPy and Scipy quite well. POLITECNICO DI MILANO Numpy NumPy is the fundamental package for scientific computing with Python: ● a powerful N-dimensional array object ● sophisticated (broadcasting) functions ● tools for integrating C/C++ and Fortran code ● useful linear algebra, Fourier transform, and random number capabilities Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Numpy Why is it important to use NumPy arrays? Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Example: Comparison tstart = time.time() tstart = time.time() tstart = time.time() s=0 for num in range(1,10001): a = np.arange(1001) b=a*a s = s + sum(b) s=0 for num in range(1,10001): a = np.arange(1001) b = a.dot(a) s=s+b s=0 for num in range(1,10001): a = [x ** 2 for x in range(0, 1001)] s = s + sum(a) print('Sum = {}'.format(s)) tend = time.time() print('Sum = {}'.format(s)) tend = time.time() print('Sum = {}'.format(s)) tend = time.time() print(tend - tstart, 'sec' ) print(tend - tstart, 'sec' ) numpy data storage print(tend - tstart, 'sec' ) numpy data storage + using optimized function written in C normal python list written in C Sum = 3338335000000 2.08597731590271 sec Sum = 3338335000000 0.6752722263336182 sec Recommender Systems - Lab 1, Politecnico di Milano 2016 Sum = 3338335000000 0.02157735824584961 sec POLITECNICO DI MILANO Compare tstart = time.time() tstart = time.time() tstart = time.time() s=0 for num in range(1,10001): a = np.arange(1001) b=a*a s = s + sum(b) s=0 for num in range(1,10001): a = np.arange(1001) b = a.dot(a) s=s+b s=0 for num in range(1,10001): a = [x ** 2 for x in range(0, 1001)] s = s + sum(a) print('Sum = {}'.format(s)) tend = time.time() print('Sum = {}'.format(s)) tend = time.time() print('Sum = {}'.format(s)) tend = time.time() print(tend - tstart, 'sec' ) print(tend - tstart, 'sec' ) numpy data storage print(tend - tstart, 'sec' ) numpy data storage + using optimized function written in C normal python list written in C Sum = 3338335000000 2.08597731590271 sec Sum = 3338335000000 0.6752722263336182 sec Recommender Systems - Lab 1, Politecnico di Milano 2016 Sum = 3338335000000 0.02157735824584961 sec POLITECNICO DI MILANO Numpy ● Import NumPy >>> from numpy import * >>> import numpy as np Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Numpy ● Import NumPy >>> from numpy import * >>> import numpy as np Recommender Systems - Lab 1, Politecnico di Milano 2016 Pollutes the namespace recommended ✔ e.g. np.array([1,2,4,5]) POLITECNICO DI MILANO MATRICES Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Matrices ● Creating Matrices ● Indexing Matrices ● Combining Matrices ● Operations on Matrices Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Matrices python uses 0-array notation (like C, C++, NOT MATLAB) A = a00 a01 a021 a10 a11 a12 a20 a21 a22 Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Matrices ● Creating Matrices >>> import numpy as np >>> a = np.array([0,1,2,3,4,5]) Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Matrices ● Creating Matrices >>> import numpy as np >>> a = np.array([0,1,2,3,4,5]) >>> print(a) [0, 1, 2, 3, 4, 5] 1D Array >>> a.ndim 1 >>> a.shape (6,) Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Matrices ● Creating Matrices >>> import numpy as np >>> a = np.array([[0,1],[2,3],[4,5]]) 2D Array >>> print(a) [[0, 1] [2, 3] [4, 5]] Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Matrices ● Creating Matrices >>> import numpy as np >>> a = np.matrix(‘0,1;2,3;4,5’) >>> print(a) [[0, 1] [2, 3] [4, 5]] Recommender Systems - Lab 1, Politecnico di Milano 2016 2D Array POLITECNICO DI MILANO Indexing A = 1 2 3 4 5 6 7 8 9 Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Indexing [all rows, col 1] = ? A = 1 2 3 4 5 6 7 8 9 Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Indexing A[:,0] A = 1 2 3 4 5 6 7 8 9 Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Indexing [all rows, col end] = ? A = 1 2 3 4 5 6 7 8 9 Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Indexing A[:,-1] A = 1 2 3 4 5 6 7 8 9 Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Indexing ? A = 1 2 3 4 5 6 7 8 9 Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Indexing A[1:3,0:2] A = 1 2 3 4 5 6 7 8 9 Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Indexing [last three rows ] = ? A = Recommender Systems - Lab 1, Politecnico di Milano 2016 1 2 3 . . . . . . . . . 4 5 6 7 8 9 10 11 12 POLITECNICO DI MILANO Indexing A[-3:,:] A = Recommender Systems - Lab 1, Politecnico di Milano 2016 1 2 3 . . . . . . . . . 4 5 6 7 8 9 10 11 12 POLITECNICO DI MILANO Indexing [first two rows ] = ? A = Recommender Systems - Lab 1, Politecnico di Milano 2016 1 2 3 4 . . . 5 . . . 6 . . . 7 8 9 10 11 12 POLITECNICO DI MILANO Indexing A[0:2,:] A = Recommender Systems - Lab 1, Politecnico di Milano 2016 1 2 3 4 . . . 5 . . . 6 . . . 7 8 9 10 11 12 POLITECNICO DI MILANO Combining Matrices >>> a = np.array([[1,2,3],[4,5,6]]) >>> b = np.array([[7,8,9],[10,11,12]] Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Combining Matrices >>> c = hstack((a,b)) a b 1 2 3 7 8 9 4 5 6 10 11 12 >>> d = vstack((a,b)) a 1 2 3 4 5 6 7 8 9 10 11 12 Recommender Systems - Lab 1, Politecnico di Milano 2016 b POLITECNICO DI MILANO Combining Matrices a >>> c = c_[a,b] b 1 2 3 7 8 9 4 5 6 10 11 12 >>> d = r_[a,b] a 1 2 3 4 5 6 7 8 9 10 11 12 Recommender Systems - Lab 1, Politecnico di Milano 2016 b POLITECNICO DI MILANO Combining Matrices a >>> a 1 2 3 4 5 6 aT >>> d = np.transpose(a) or d = a.transpose() Recommender Systems - Lab 1, Politecnico di Milano 2016 1 4 2 5 3 6 POLITECNICO DI MILANO Linear Algebra in Numpy Numpy MATLAB Usage ndim(a) or a.ndim ndims(a) get the number of dimensions of an array size(a) or a.size numel(a) get the number of elements of an array shape(a) or a.shape size(a) get the “size” of the matrix np.array([[1.,2.,3.], [4.,5.,6.]]) [ 1 2 3; 4 5 6 ] 2x3 matrix literal vstack([hstack([a,b]), hstack([c,d])]) [ a b; c d ] construct a matrix from blocks a, b, c, and d a[-1] a(end) access last element in the 1xn matrix a a[1,4] a(2,5) access element in second row, fifth column a[0:5,:] a(1:5,:) the first five rows of a a[-5:] a(end-4:end,:) the last five rows of a a[np.ix_([1,3,4],[0,2])] a([2,4,5],[1,3]) rows 2,4 and 5 and columns 1 and 3. This allows the matrix to be modified, and doesn’t require a regular slice. Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Linear Algebra with Numpy Numpy MATLAB Usage a.transpose() or a.T a' transpose of a a.dot(b) a*b matrix multiplication a*b a .* b element-wise multiplication a/b a./b element-wise divide np.nonzero(a>0.5) find(a>0.5) find the indices where (a > 0.5) y = x.copy() y=x numpy assigns by reference Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Compare import numpy as np a = np.array([[1, 2, 3],[4, 5, 6]]) print(a) b=a b[1,1] = 77 print(b) print(a) [[1 2 3] [4 5 6]] [[ 1 2 3] [ 4 77 6]] [[ 1 2 3] [ 4 77 6]] Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Compare import numpy as np a = np.array([[1, 2, 3],[4, 5, 6]]) print(a) b=a b[1,1] = 77 print(b) print(a) [[1 2 3] [4 5 6]] [[ 1 2 3] [ 4 77 6]] [[ 1 2 3] [ 4 77 6]] Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Compare import numpy as np import numpy as np a = np.array([[1, 2, 3],[4, 5, 6]]) print(a) a = np.array([[1, 2, 3],[4, 5, 6]]) print(a) b=a b[1,1] = 77 print(b) b = a.copy() print(a) print(a) b[1,1] = 77 print(b) [[1 2 3] [4 5 6]] [[1 2 3] [4 5 6]] [[ 1 2 3] [ 4 77 6]] [[ 1 2 3] [ 4 77 6]] [[ 1 2 3] [ 4 77 6]] [[ 1 2 3] [ 4 7 6]] Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Linear Algebra with Numpy Numpy MATLAB Usage np.arange(1.,11.) or np.r_[1.:11.] 1:10 create an increasing vector np.arange(10.) or np.r_[:10.] 0:9 create an increasing vector zeros(3,4) 3x4 two-dimensional array full of 64-bit floating point zeros ones(3,4) 3x4 two-dimensional array full of 64-bit floating point ones np.zeros((3,4)) np.ones((3,4)) np.eye(3) np.diag(a) eye(3) diag(a) 3x3 identity matrix vector of diagonal elements of a np.random.rand(3,4) rand(3,4) random 3x4 matrix np.linspace(1,3,4) linspace(1,3,4) 4 equally spaced samples between 1 and 3, inclusive Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Linear Algebra with Numpy Numpy MATLAB Usage a.max(0) max(a) maximum element of each column of matrix a (along axis =0) a.max(1) max(a,[],2) maximum element of each column of matrix a (along axis =1) a.max() max(max(a)) maximum element of a Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Linear Algebra with Numpy Numpy MATLAB Usage np.linalg.inv(a) inv(a) inverse of square matrix a np.linalg.pinv(a) pinv(a) pseudo-inverse of matrix a np.linalg.matrix_rank(a) rank(a) matrix rank of a 2D array / matrix a np.linalg.solve(a,b) if a is square; np.linalg.lstsq(a,b) otherwise a\b solution of a x = b for x Example: Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Linear Algebra with Numpy Numpy MATLAB Usage np.linalg.inv(a) inv(a) inverse of square matrix a np.linalg.pinv(a) pinv(a) pseudo-inverse of matrix a np.linalg.matrix_rank(a) rank(a) matrix rank of a 2D array / matrix a np.linalg.solve(a,b) if a is square; np.linalg.lstsq(a,b) otherwise a\b solution of a x = b for x Example: import numpy as np a = np.array([[4, -1, 3],[5, -2, 9],[-4,7,4]]) b = np.array([[12],[25],[-3]]) [[ 1.4829932 ] [-0.61904762] [ 1.81632653]] c = np.linalg.solve(a,b) Recommender Systems - Lab 1, Politecnico di Milano 2016 POLITECNICO DI MILANO Linear Algebra - NumPy/SciPy Numpy MATLAB Usage U, S, Vh = np.linalg.svd(a) [U,S,V]=svd(a) singular value decomposition of a D,V = np.linalg.eig(a) [V,D]=eig(a) eigenvalues and eigenvectors of a Q,R = sp.linalg.qr(a) [Q,R,P]=qr(a,0) QR decomposition I = np.argsort(a[:,i]), b=a[I,:] [b,I] = sortrows(a,i) sort the rows of the matrix np.linalg.lstsq(X,y) regress(y,X) multilinear regression np.unique(a) unique(a) Example: import numpy as np w, v = np.linalg.eig(np.array([[1, -1], [1, 1]])) print(w) print(v) Recommender Systems - Lab 1, Politecnico di Milano 2016 [[ 1.41421356 -1.41421356] [[ 0.92387953 -0.38268343] [ 0.38268343 0.92387953]] POLITECNICO DI MILANO Questions Thanks for your Attention! POLITECNICO DI MILANO