ML 120: Bloom LLM (20 pts extra)

What You Need

Purpose

To practice making a Large Language Model, using BLOOM. This project is based on this tutorial:

Hello Large Language Models

Using Google Colab

In a browser, go to
https://colab.research.google.com/
If you see a blue "Sign In" button at the top right, click it and log into a Google account.

From the menu, click File, "New notebook".

Installing Libraries

Execute these commands to install and import the required libraries:
!pip install transformers
import torch
import transformers
from transformers import BloomForCausalLM
from transformers import BloomTokenizerFast
As shown below, the software installs.

Downloading BLOOM

Execute these commands to create a LLM using the smallest version of Bloom, with 560 million parameters:
model = BloomForCausalLM.from_pretrained("bigscience/bloom-560m")
tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom-560m")
As shown below, the model downloads.

Preparing a Prompt

Execute these commands to start a sentence for the model to complete:
prompt = "Why is the sky blue?" 
result_length = len(prompt.split()) +50 # Number of words to add
inputs = tokenizer(prompt, return_tensors="pt") 
As shown below, the code runs without producing any output.

Greedy Search

A greedy search tries various possible words to add, and always chooses the word with the highest likelihood. There is no randomness.

Execute this command to perform a greedy search.

print(tokenizer.decode(model.generate(inputs["input_ids"], 
                       max_length=result_length + 50
                      )[0]))
As shown below, the model gets stuck, repeating the same sentences over and over. Your output may be somewhat different from the image below.

Try repeating this calculation--you'll get the same result, because a greedy search has no random variation.

Sampling with Top-k + Top-p

This uses a combination of three methods, allowing an adjustible amount of randomness in word selection.
print(tokenizer.decode(model.generate(inputs["input_ids"],
                       max_length=result_length, 
                       do_sample=True, 
                       top_k=50, 
                       top_p=0.9
                      )[0]))
As shown below, the model creates random nonsense without repeating sentences.

Repeat the calculation. The output changes.

Beam Search

Execute this command to perform a beam search, which chooses the most likely word sequence, looking ahead several words.
print(tokenizer.decode(model.generate(inputs["input_ids"],
                       max_length=result_length, 
                       num_beams=2, 
                       no_repeat_ngram_size=2,
                       early_stopping=True
                      )[0]))
As shown below, the model adds different text after the prompt.

Try repeating this calculation--you'll get the same result, because a beam search also as no random variation, as explained here.

Flag ML 120.5: Tokenizer (10 pts)

Execute this command:
print(tokenizer.is_fast)
The flag is covered by a green rectangle in the image below.

Flag ML 120.6: 1 Billion Parameters (10 pts)

Replace the "bloom-560m" model with the larger "bloom-1b1" model.

Execute this command:

print(tokenizer.model_input_names)
The flag is covered by a green rectangle in the image below.

Posted 5-10-23
Third flag updated 10-15-23
Flags 1-4 removed, replaced by flags 5 and 6 12-11-23
Flag 6 image fixed 12-13-23
Descriptions of methods improved 6-29-24