Fine Tuning Machine Learning Models
Transformers have revolutionized the way data practitioners build models for natural language processing. In a similar vein, the advent of transfer learning has changed the game. Rather than training a model from scratch, data scientists can leverage a pre-trained model. Often times these models have been trained using an inordinate amount of resources only available to large technology companies like Google and Microsoft, making them more powerful than anything we could come up with ourselves.
Transfer learning involves the following:
- Loading the trained model into memory
- Freezing the parameters so as to avoid losing any information they contain during future training rounds
- Adding some new trainable layers on top of the frozen layers
- Training the new layers on another dataset
Transfer learning is often paired with fine-tuning since it can potentially achieve meaningful improvements by adapting the pre-trained features to the new data. Fine-tuning is not the same as Transfer learning. Fine-tuning consists of unfreezing the entire model and re-training it on new data.
For those who don’t already know, OpenAI is for-profit (initially non-profit) artificial intelligence company founded in San Francisco in late 2015 by Elon Musk, Sam Altman, and others, who collectively pledged US$1 billion.
Generative Pre-trained Transformer
The first version of the Generative Pre-trained Transformer, or GPT for short, was first released by OpenAI in June 2018. The architecture was composed of 12 layers and 117 million parameters. In November 2019, they came out with a second version (i.e. GPT-2). The latter was trained to predict the next word in 40GB of Internet text and contains approximately 1.5 billion parameters. GPT-2 was to be followed by the 175-billion-parameter GPT-3, revealed to the public in July 2020. OpenAI decided not to release the pre-trained weights for GPT-3 citing societal concerns (e.g. malicious use, harmful biases). They provide access to the model through their HTTPS endpoint only. The community has since attempted to replicate a GPT-3 sized model and open source it to the public for free (e.g. GPT-Neo).
Recall that there are 3 types of Transformers:
- Encoders + Decoders
GPT is a decoder only model meaning that it strictly uses its weights to generate an output. In order words, when performing inference, it does not take an input.
In the proceeding section, we will download the GPT-2 model and train it on a new dataset in an attempt to get it to produce jokes. Since we are generating text, we do not need to add an additional layer. We can simply fine-tune the model. That is, train the model on a different kind of data in order to change the parameters slightly, and consequently the output.
Fine Tune GPT-2 using Python
aitextgen is a Python package that leverages PyTorch, Hugging Face Transformers and pytorch-lightning with specific optimizations for text generation using GPT-2. 
We begin by installing and importing the package.
!pip install -q aitextgen
from aitextgen import aitextgen
The library provides an easy to use interface for downloading the GPT-2 model. The GPT-2 model comes in 4 sizes — 124M, 355M, 774M and 1.5B parameters, respectively. We select the smallest one.
ai = aitextgen(tf_gpt2="124M", to_gpu=True)
We can try generating a document of text as follows:
In the early 1990s, a group of local business owners organized a group called the New York Taxi Coalition. The coalition ran a campaign against New York's taxi companies, and even led a campaign against New York's state legislature. The movement was called Taxi Law, and they were called to task on how they would enforce the law. The coalition, which was led in part by a former New York City mayor, was eventually led by state Senator Bob Evans. They also had their own legal defense fund. The anti-cab movement began in the mid-'90s. The New York Taxi Coalition was involved in a number of various legal actions, including New York City's anti-cab law that would eventually have led to a citywide ban on the use of T-shirts for drivers. At the same time, a number of other local and national organizations and activists were engaged in the same legal battles to stop the taxi movement. In 1995, the New York City Taxi Coalition sued New York City's Taxi Corporation Board, which had been trying to prohibit the use of T-shirts in public. In response, a state court overturned the ruling. The New York Taxi Coalition was not the first to challenge taxi companies'
Like we mentioned previously, we will try to fine-tune the model by training it on a dataset of short jokes from Kaggle. After uploading the CSV file to Google Drive, we use Pandas to extract the contents of the
Joke column and write it a text file.
import pandas as pd
df = pd.read_csv("/content/drive/MyDrive/Data/shortjokes.csv")
df["Joke"].to_csv(r"/content/drive/MyDrive/Data/shortjokes.txt", header=None, index=None, sep=",", mode="a")
We train the model.
ai.train("/content/drive/MyDrive/Data/shortjokes.txt", line_by_line=False, from_cache=False, num_steps=3000, generate_every=1000, save_every=1000, save_gdrive=False, learning_rate=1e-3, fp16=False, batch_size=1, )
We generate text every 1000 steps to see how the model is improving.
1,000 steps reached: saving model to /trained_model 1,000 steps reached: generating sample texts. ========== My girlfriend said she was a bad cook who tried to eat her? She didn't eat. ... ========== 2,000 steps reached: saving model to /trained_model 2,000 steps reached: generating sample texts. ========== "What do you call a fat girl? Pregnant, but I'll see myself in the future." ... ========== 3,000 steps reached: saving model to /trained_model 3,000 steps reached: generating sample texts. INFO:pytorch_lightning.utilities.rank_zero:`Trainer.fit` stopped: `max_steps=3000` reached. ========== "I'm such a fun dad, I'm always right before I go places like a dead horse!" ... ==========
The library will automatically write the model to the
trained_model directory. In case something happened to the runtime, we can load it into memory again as follows:
ai = aitextgen(model_folder="trained_model", to_gpu=True)
If we call
generate again, it will produce jokes (albeit bad ones), unlike the last time.
... I took the shell off a racing snail shell I'm sorry. Too shellfish. ...