Learning paradigms

Compare learning paradigms in NLP.

1h20 per week, for 4 weeks

Learning paradigms

Learn powerful representations

  • Theory: Linear algebra. NMF. SVD. Spectral decomposition.
  • Supervised learning (Linear, LDA/QDA, naive Bayes, Logistic, RF, MLP, SVM, Kernel)
  • Unsupervised learning, e.g., clustering (see ML1, ML2 course), PCA, ICA, t-SNE… + Bag of word, tfidf, pLSI (doc embed)

Other learning paradigms

  • Semi-supervised learning, contrastive learning (cPCA, RBM), reinforcement learning, self-supervised, curiosity-driven learning, few-shot learning, active learning, federated learning, online learning… Effort on model design or problem to solve, representation/task to learn, ?
  • Generative vs. discriminative models.
  • Parametric vs. non parametric
  • Other tools : OT, ODE,

Why/when deep learning?

  • CNN (log), RNN (linear), attention models, Bert (quadratic)
    • Limits of current models (lack of intrinsic uncertainty, interpolation in latent spaces)
    • Learning to repeat, reformulate, predict word from context… task influences representations
    • Semantic similarity: cosine, manh, kulb, w1 (OT, combinatorial complexity). Info Theory. Shannon (encode) vs Fisher (param)
  • Simple preprocessing + ranking can solve your problem?
  • Is it the solution or the problem that is wrong? Quote Einstein + Feynman.
  • Usecase:
    • Deduplicate database, build search/recommendation API… (faq)
    • Regulatory, media & political feedback
    • Summary (models, hypothesis, limits)

From language to socio dynamics

  • Behavioral psychology.
    • Usecase: Diversity & inclusion. Online Harassment. Twitter. Amnesty.
    • Usecase: Orthophonistes

The general form of the normal probability density function is:

$$ f(x) = \frac{1}{\sigma \sqrt{2\pi} } e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} $$

The parameter $\mu$ is the mean or expectation of the distribution. $\sigma$ is its standard deviation. The variance of the distribution is $\sigma^{2}$.


What is the parameter $\mu$?

The parameter $\mu$ is the mean or expectation of the distribution.


Michel Deudon. Learning semantic similarity in a continuous space. Advances in neural information processing systems. vol 31. 2018.

Gabriel Peyré and Marco Cuturi. Computational Optimal Transport. ArXiv:1803.00567. 2018.

Chloe Clavel. Traitement automatique du langage naturel et fouille d’opinions

Michalis Vazirgiannis. INF554 - Machine learning I. 2016.

Matt Kusner et al. From word embeddings to document distances. International conference on machine learning. PMLR, 2015.

Christopher Manning and Anna Goldie. CS224n. Stanford. 2000.