14 Model Training and Fine-Tuning

Now that you have your data, the next step is selecting the appropriate model. When working with predefined categories (e.g. sentiment), starting with out-of-the-box (OOTB) models that has already been fine-tuned by others to perform the task at hand might be sufficient. However, given the nature of both social media and our specific research, these models may not fully understand specific nuances (e.g., sarcasm, platform-specific language) without additional fine-tuning.

For this section, let’s assume we need to perform sentiment classification, and the reason we cannot use an OOTB sentiment classification model is because we are classifying data from a novel social media platform that has very unique language style (for example, this could be from a gaming forum where the phrase “that is sick” is actually positive, or from a thread on a new song that is described as “savage”). Whilst these are overly simplified example, the sentiment (🥁) still stands.

We am going to explore three distinct approaches to text classification, each offering different levels of complexity, resource requirements, and suitability based on the size of your dataset and the task at hand.

Logistic Regression: A simple, interpretable classification model that is trained from scratch using basic feature extraction techniques like TF-IDF or Bag-of-Words. Logistic regression is well-suited for straightforward classification tasks and small datasets where interpretability is important.
SetFit: A framework designed for few-shot learning, where limited labelled data is available. SetFit leverages pre-trained sentence transformers to generate embeddings, fine-tuning a lightweight classifier on top of these embeddings, making it ideal when you need quick results with minimal data.
Vanilla Fine-Tuning with Hugging Face Trainer: The most powerful of the three, this approach fine-tunes large pre-trained language models like BERT on task-specific datasets. It’s best used when you have access to larger datasets and need high accuracy and deep contextual understanding.

Fine-Tuning a Model

Fine-tuning involves taking a pre-trained model and retraining it on your specific dataset. How we go about this can be approached differently depending on the amount of labelled data and the complexity of the problem.

Choosing the Right Pre-Trained Model

Selecting an appropriate starting point is crucial.

Model Selection: Choose models known to perform well in NLP tasks (e.g., BERT, RoBERTa).
Domain Relevance: Prefer models pre-trained on data similar to your domain if available.