VP 100: Strings on Google Colab (50 pts)

What You Need

Any computer with a Web browser.

Purpose

Learn basic string operations in Python.

Using Google Colab

In a browser, go to
https://colab.research.google.com/
If you see a blue "Sign In" button at the top right, click it and log into a Google account.

From the menu, click File, "New notebook".

This project was written assuming you are using Google Colab. If you want to see the old version, which assumed you were running a local installation of Python, go here.

Simple String Processing

Execute this code, as shown below:
greet = "Hello, World!"
print(greet)

print("Start: ", greet[0:3])
print("Middle: ", greet[3:6])
print("End: ", greet[-3:])

a = greet.find(",")

print("Portion before comma", greet[:a])
Examine the output to make sure you understand how to cut out portions of a string, and how to find substrings within a string.

Here is a handy reference:

Python 3 - Strings

Sending and Receiving Data from a Server

This program sends text to my server, which echoes it back.

Execute this code, as shown below:

import socket
s = socket.socket()
s.connect(("ad.samsclass.info", 10201))
r = s.recv(1024).decode()
print(r)
s.send("Hello from Python!\n".encode())
r = s.recv(1024).decode()
print(r)
print("First five letters of response:", r[0:5])
s.close()

Explanation

 import socket import the "socket" library, which contains networking functions 
 s = socket.socket() create a socket object named "s"
 s.connect(("ad.samsclass.info", 10201)) connect to the server "ad.samsclass.info" on port 10201
 s.recv(1024) receive data from the server, up to a maximum of 1024 characters
 s.send("Hello from Python!\n".encode()) send data to the server
 s.close() close the connection and destroy the "s" object
Run the program, as shown below. It sends the string "Hello from Python!" to the server, which echoes it back.

Flag VP 100.1: Goodbye (5 pts)

Connect to the ad.samsclass.info server on port 10202.

Send it the string "Goodbye".

When you do, you will receive a flag, covered by a green box in the image below.

Flag VP 100.2: Increment (10)

Connect to the ad.samsclass.info server on port 10203.

It sends you a number. Add one to that number and send it to the server.

When you do, you will receive a flag, covered by a green box in the image below.

Hint: Python String to Int() Tutorial

Loops in Python

Execute this code, as shown below:
for i in range(3):
  print(i)
print()
for c in "CAT":
  print(c)
Observe how simple loops work in Python, as shown below.

Notice that a colon starts a loop, and that every statement in a loop must be indented.

You can read more about loops in this tutorial.

Flag VP 100.3: Add and Subtract (25)

Connect to the ad.samsclass.info server on port 10204.

Combine two numbers as required and send the result.

You have to get all five answers correct within five seconds to get the flag, covered by a green box in the image below.

Hint: Only connect once. If you connect five times, you'll always be solving the first challenge, and never see the flag.

ASCII Encoding

In Python 2 and C, strings are just a series of bytes, using ASCII encoding by default: one byte per character, a system dating from the 1960's. That system only used bytes from 0 to 127 (0x7f) for letters. The higher values were not printable, but could be used to store arbitrary binary data in a string object.

So in Python 2, you can store data like this:

Let's see how that works in Python 3.

Execute this code:

a = '\x41\x42\xff'
print(a)
print(a[0])
print(a[1])
print(a[2])
The third character now appears as a printable character, but a non-English one, as shown below.
To see the bytes used to store the string, execute these commands:
b = a.encode()
print(hex(b[0]))
print(hex(b[1]))
print(hex(b[2]))
print(hex(b[3]))
There are 4 bytes in the encoded object, not three, and none of them are ff, as shown below.

What's going on?

Unicode

Python3 uses Unicode UTF-8 encoding by default instead of ASCII. This is good because it supports many languages, but it confused me at first, because I was accustomed to C and Python 2.

So the character '\xff' in the string is interpreted a Unicode Code Point, and it's stored in two bytes, as shown below:

UTF-8 characters have variable length, from one to four bytes. This means they can be used to print a lot of useful characters,

Execute these commands:

print('\x41')
print('\U00000041')
print('\x41'.encode())
print('\U00000041'.encode())
print('\x41'.encode().hex())
print('\U00000041'.encode().hex())
'\x41' and '\U00000041' are valid representations of a character, and they are equivalent. Python requires all four bytes to be specified for every Unicode character, even when some of them are leading zeroes.
Execute these commands:
print('\xff')
print('\U000000ff')
print('\xff'.encode())
print('\U000000ff'.encode())
'\xff' and '\U000000ff' are both valid ways to refer to Code Point FF, which is encoded in two bytes.

Unicode code point values are not identical to their binary representations in memory.

Flag VP 100.4: Greek (5)

Find the eight-character Unicode code point for a capital Sigma, covered by a green box in the image below.

Flag VP 100.5: Burgertime (5)

Find the eight-character Unicode code point for a hamburger, covered by a green box in the image below.

References

PEP 223 -- Change the Meaning of \x Escapes
UTF-8 from Wikipedia
UTF-8 encoding table and Unicode characters


Upgraded to Python 3 6-29-2020
First program and first 2 images upgraded 7-2-2020
Unicode section added 7-4-2020
Hint for 100.3 added 7-15-2020
Troubleshooting text updated 8-3-2020
Nano instructions added 8-7-2020
Multithreaded servers implemented 11-2-20
Updated for Google Colab 6-28-23