Optimisation and Regularisation

Optimisation and Regularisation

Optimisation and Regularisation in Machine Learning

12h, 6x2h par semaine

Deudon, M., Kalaitzis, A., Goytom, I., Arefin, M. R., Lin, Z., Sankaran, K., ... & Bengio, Y. (2020). [Highres-net: Recursive fusion for multi-frame super-resolution of satellite imagery](https://arxiv.org/pdf/2002.06460.pdf). arXiv preprint arXiv:2002.06460.
Deudon, M., Kalaitzis, A., Goytom, I., Arefin, M. R., Lin, Z., Sankaran, K., … & Bengio, Y. (2020). Highres-net: Recursive fusion for multi-frame super-resolution of satellite imagery. arXiv preprint arXiv:2002.06460.

Cours

Optimisation et régularisation des modèles.

1 Definitions and notations

1.1 Reminder on functions
1.2 Derivatives

2 ML Learning

3 Optimization in ML

3.1 iid assumption
3.2 Gradient Descent Algorithm
3.3 Newton’s algorithm
3.4 Case of linear regression
3.5 Stochastic Gradient Descent (SGD)

4 Logistic Regression

4.1 Iteratively Reweighted Least Squares (IRLS)

5 Regularization in ML

5.1 Why regularize?
5.2 Disgression on evaluation in ML
5.3 Reformulation of the optimization problem
5.4 Another form of regularization: Ensembling

6 Optimization under constraints

7 Bayesian methods

TP / Pratique

tw233mi-regularisation-optimisation

TP1. Logistic regression, A to Z

TP2. First order methods for logistic regression

TP3. Regularized Logistic regression, A to Z

TD / Exercices

Coming soon.

References

    1. J. Wallis. A treatise of algebra, both historical and practical. London. 1685.
    1. W. S. McCulloch & W. Pitts. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics. 1943.
    1. Berkson, J. Application of the logistic function to bio-assay. Journal of the American statistical association. 1944.
    1. C. Lanczos. An Iteration Method for the Solution of the Eigenvalue Problem of Linear Differential and Integral Operators, Journal of Research of the National Bureau of Standards. 1950.
    1. H. Robbins & S. Monro. A stochastic approximation method. The annals of mathematical statistics. 1951.
    1. C. Lanczos. Solution of Systems of Linear Equations by Minimized Iterations. Journal of Research of the National Bureau of Standards. 1952.
    1. M. R. Hestenes & E. L. Stiefel. Methods of Conjugate Gradients for Solving Linear Systems, Journal of Research of the National Bureau of Standards. 1952.
    1. A. E. Hoerl & R. W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics. 1970.
    1. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. 1996.
    1. W. J. Fu. Penalized regressions: the bridge versus the lasso. Journal of computational and graphical statistics. 1998.
    1. F. Pedregosa, G. Varoquaux, A. Gramfort et al. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research. 2011.
    1. N. Freitas. CPSC540 lecture notes. University of British Columbia. 2012.
    1. R. Johnson & T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems. 2013.
    1. A. Defazio, F. Bach, & S. Lacoste-Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in neural information processing systems. 2014.
    1. D. P. Kingma & J. Ba. Adam: A method for stochastic optimization. International Conference for Learning Representations. 2015.
    1. M. Vazirgiannis. INF554 lecture notes. Machine Learning 1. Polytechnique. 2016.
    1. S. Gaiffas. MAP569 lecture notes. Machine Learning 2. Polytechnique. 2017.
Previous