ML 120: Bloom LLM (20 pts extra)

What You Need

A Web browser

Purpose

To practice making a Large Language Model, using BLOOM. This project is based on this tutorial:

Hello Large Language Models

Using Google Colab

In a browser, go to

https://colab.research.google.com/

If you see a blue "Sign In" button at the top right, click it and log into a Google account.

From the menu, click File, "New notebook".

Installing Libraries

Execute these commands to install and import the required libraries:

!pip install transformers import torch import transformers from transformers import BloomForCausalLM from transformers import BloomTokenizerFast

As shown below, the software installs.

Downloading BLOOM

Execute these commands to create a LLM using the smallest version of Bloom, with 560 million parameters:

model = BloomForCausalLM.from_pretrained("bigscience/bloom-560m") tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom-560m")

As shown below, the model downloads.

Preparing a Prompt

Execute these commands to start a sentence for the model to complete:

prompt = "Why is the sky blue?" result_length = len(prompt.split()) +50 # Number of words to add inputs = tokenizer(prompt, return_tensors="pt")

As shown below, the code runs without producing any output.

Greedy Search

A greedy search tries various possible words to add, and always chooses the word with the highest likelihood. There is no randomness.

Execute this command to perform a greedy search.

print(tokenizer.decode(model.generate(inputs["input_ids"], max_length=result_length + 50 )[0]))

As shown below, the model gets stuck, repeating the same sentences over and over. Your output may be somewhat different from the image below.

Try repeating this calculation--you'll get the same result, because a greedy search has no random variation.

Sampling with Top-k + Top-p

This uses a combination of three methods, allowing an adjustible amount of randomness in word selection.

print(tokenizer.decode(model.generate(inputs["input_ids"], max_length=result_length, do_sample=True, top_k=50, top_p=0.9 )[0]))

As shown below, the model creates random nonsense without repeating sentences.

Repeat the calculation. The output changes.

Beam Search

Execute this command to perform a beam search, which chooses the most likely word sequence, looking ahead several words.

print(tokenizer.decode(model.generate(inputs["input_ids"], max_length=result_length, num_beams=2, no_repeat_ngram_size=2, early_stopping=True )[0]))

As shown below, the model adds different text after the prompt.

Try repeating this calculation--you'll get the same result, because a beam search also as no random variation, as explained here.

Flag ML 120.5: Tokenizer (10 pts)
Execute this command:
print(tokenizer.is_fast)
The flag is covered by a green rectangle in the image below.

Flag ML 120.6: 1 Billion Parameters (10 pts)
Replace the "bloom-560m" model with the larger "bloom-1b1" model.
Execute this command:
print(tokenizer.model_input_names)
The flag is covered by a green rectangle in the image below.

Posted 5-10-23
Third flag updated 10-15-23
Flags 1-4 removed, replaced by flags 5 and 6 12-11-23
Flag 6 image fixed 12-13-23
Descriptions of methods improved 6-29-24