ML 120: Bloom LLM (20 pts extra)

What You Need


To practice making a Large Language Model, using BLOOM. This project is based on this tutorial:

Hello Large Language Models

Using Google Colab

In a browser, go to
If you see a blue "Sign In" button at the top right, click it and log into a Google account.

From the menu, click File, "New notebook".

Installing Libraries

Execute these commands to install and import the required libraries:
!pip install transformers
import torch
import transformers
from transformers import BloomForCausalLM
from transformers import BloomTokenizerFast
As shown below, the software installs.

Downloading BLOOM

Execute these commands to create a LLM using the smallest version of Bloom, with 560 million parameters:
model = BloomForCausalLM.from_pretrained("bigscience/bloom-560m")
tokenizer = BloomTokenizerFast.from_pretrained("bigscience/bloom-560m")
As shown below, the model downloads.

Preparing a Prompt

Execute these commands to start a sentence for the model to complete:
prompt = "Why is the sky blue?" 
result_length = len(prompt.split()) +50 # Number of words to add
inputs = tokenizer(prompt, return_tensors="pt") 
As shown below, the code runs without producing any output.

Greedy Search

A greedy search tries various possible words to add, and always chooses the word with the highest likelihood. There is no randomness.

Execute this command to perform a greedy search.

                       max_length=result_length + 50
As shown below, the model gets stuck, repeating the same sentences over and over. Your output may be somewhat different from the image below.

Try repeating this calculation--you'll get the same result, because a greedy search has no random variation.

Sampling with Top-k + Top-p

This uses a combination of three methods, allowing an adjustible amount of randomness in word selection.
As shown below, the model creates random nonsense without repeating sentences.

Repeat the calculation. The output changes.

Beam Search

Execute this command to perform a beam search, which chooses the most likely word sequence, looking ahead several words.
As shown below, the model adds different text after the prompt.

Try repeating this calculation--you'll get the same result, because a beam search also as no random variation, as explained here.

Flag ML 120.5: Tokenizer (10 pts)

Execute this command:
The flag is covered by a green rectangle in the image below.

Flag ML 120.6: 1 Billion Parameters (10 pts)

Replace the "bloom-560m" model with the larger "bloom-1b1" model.

Execute this command:

The flag is covered by a green rectangle in the image below.

Posted 5-10-23
Third flag updated 10-15-23
Flags 1-4 removed, replaced by flags 5 and 6 12-11-23
Flag 6 image fixed 12-13-23
Descriptions of methods improved 6-29-24