Learning paradigms

Compare learning paradigms in NLP.

1h20 per week, for 4 weeks

Learning paradigms

Learn powerful representations

  • Theory: Linear algebra. NMF. SVD. Spectral decomposition.
  • Supervised learning (Linear, LDA/QDA, naive Bayes, Logistic, RF, MLP, SVM, Kernel)
  • Unsupervised learning, e.g., clustering (see ML1, ML2 course), PCA, ICA, t-SNE… + Bag of word, tfidf, pLSI (doc embed)

Other learning paradigms

  • Semi-supervised learning, contrastive learning (cPCA, RBM), reinforcement learning, self-supervised, curiosity-driven learning, few-shot learning, active learning, federated learning, online learning… Effort on model design or problem to solve, representation/task to learn, ?
  • Generative vs. discriminative models.
  • Parametric vs. non parametric
  • Other tools : OT, ODE,

Why/when deep learning?

  • CNN (log), RNN (linear), attention models, Bert (quadratic)
    • Limits of current models (lack of intrinsic uncertainty, interpolation in latent spaces)
    • Learning to repeat, reformulate, predict word from context… task influences representations
    • Semantic similarity: cosine, manh, kulb, w1 (OT, combinatorial complexity). Info Theory. Shannon (encode) vs Fisher (param)
  • Simple preprocessing + ranking can solve your problem?
  • Is it the solution or the problem that is wrong? Quote Einstein + Feynman.
  • Usecase:
    • Deduplicate database, build search/recommendation API… (faq)
    • Regulatory, media & political feedback
    • Summary (models, hypothesis, limits)

From language to socio dynamics

  • Behavioral psychology.
    • Usecase: Diversity & inclusion. Online Harassment. Twitter. Amnesty.
    • Usecase: Orthophonistes

The general form of the normal probability density function is:

$$ f(x) = \frac{1}{\sigma \sqrt{2\pi} } e^{-\frac{1}{2}\left(\frac{x-\mu}{\sigma}\right)^2} $$

The parameter $\mu$ is the mean or expectation of the distribution. $\sigma$ is its standard deviation. The variance of the distribution is $\sigma^{2}$.

Quiz

What is the parameter $\mu$?

The parameter $\mu$ is the mean or expectation of the distribution.

Reference

Michel Deudon. Learning semantic similarity in a continuous space. Advances in neural information processing systems. vol 31. 2018.

Gabriel Peyré and Marco Cuturi. Computational Optimal Transport. ArXiv:1803.00567. 2018.

Chloe Clavel. Traitement automatique du langage naturel et fouille d’opinions

Michalis Vazirgiannis. INF554 - Machine learning I. 2016.

Matt Kusner et al. From word embeddings to document distances. International conference on machine learning. PMLR, 2015.

Christopher Manning and Anna Goldie. CS224n. Stanford. 2000.

Previous
Next