Programming Fundamentals

Strings

**Puzzle:** Getting the letter "P" from the string "Python"? The answer: `"Python"[0]`. But why 0 and not 1? And why does `"2" + "2" = "22"` instead of 4?

Strings in programming are sequences of characters. Each character has its own position (index), and we can manipulate them with precise operations.

Цели урока

  • Understand that strings are sequences of characters
  • Master indexing and slicing
  • Learn the essential string methods
  • Learn about string formatting (f-strings)

Предварительные знания

  • Variables and data types (lesson 3)
  • Operators (lesson 4)

90% of data on the internet is text. Posts, comments, search queries, passwords - it's all strings. Working with strings is a core skill.

  • **Google**: parsing search queries
  • **Twitter/X**: 280-character limit, hashtags
  • **Passwords**: checking length and complexity
  • **Chatbots**: processing user messages

From Hollerith punch cards to UTF-8: how text learned to live in memory

Before the 1960s every machine had its own character set: IBM used BCD and later EBCDIC, DEC went with Radix-50, teletypes spoke Baudot. Moving text between computers often turned words into garbage. In 1963 the ASA (now ANSI) shipped X3.4, better known as ASCII: 128 characters, 7 bits, one shared mapping for Latin letters, digits and control codes. Bob Bemer at IBM drove the work. It fixed English but left everyone else stuck: Japanese, Cyrillic, Chinese and Arabic had no place in 7 bits. The 1980s gave us a swarm of regional code pages: KOI8-R and CP1251 for Russian, Shift JIS for Japanese, Big5 for Chinese. Opening a file from another country without knowing the code page produced mojibake. In 1991 the Unicode Consortium published version 1.0: one code point per character across every writing system. The open question was how to pack millions of code points into bytes. In September 1992 Ken Thompson and Rob Pike sketched UTF-8 on a placemat at a New Jersey diner: variable length 1 to 4 bytes, backward compatible with ASCII, self-synchronizing on any byte. By 2010 UTF-8 was on roughly half the web; by 2024 it carried over 98 percent of pages. In Python 2 str was a byte array and unicode was a separate type, which is part of why the Python 3 migration hurt: in Python 3 str is unicode by default, bytes is its own thing.

String Basics

**A string** is a sequence of characters. In Python, strings are enclosed in quotes - single or double.

Creating strings

Different ways

```python # Single quotes name = 'Alice' # Double quotes (same thing) greeting = "Hello, World!" # Multi-line string poem = '''Roses are red, Violets are blue, Python is awesome, And so are you!''' print(poem) ```

Single and double quotes work the same way. Either style is fine. If the string contains an apostrophe - use double quotes: `"it's"`. If quotes are needed inside - use single: `'He said "Hello"'`.

String length

The len() function

```python password = "secret123" print(len(password)) # 9 # Check password length if len(password) >= 8: print("Password is long enough") ```

What will print(len("Hello")) output?

Indexing and Slicing

Each character in a string has an **index** - a position number. **Important:** indexing starts at 0!

Indexing

Accessing individual characters

```python word = "Python" # 012345 - indices print(word[0]) # P - first character print(word[1]) # y - second character print(word[5]) # n - sixth character print(word[-1]) # n - last character! print(word[-2]) # o - second to last ``` **Negative indices** count from the end: -1 is the last, -2 is second to last, and so on.

Slices

Extracting a substring

```python text = "Hello, World!" # string[start:end] - from start up to (not including) end print(text[0:5]) # "Hello" print(text[7:12]) # "World" # Omitting boundaries print(text[:5]) # "Hello" - from the beginning print(text[7:]) # "World!" - to the end print(text[:]) # Whole string (a copy) # Step print(text[::2]) # "Hlo ol!" - every other character print(text[::-1]) # "!dlroW ,olleH" - reversed! ```

The first character of a string has index 1

The first character has index 0

Zero-based indexing is the standard in most programming languages. It's related to how data is stored in memory: index = offset from the start.

What will print("abcdef"[1:4]) output?

String Operations

Strings can be added and multiplied! But not the same way as numbers.

Concatenation (+)

Joining strings together

```python first = "Hello" second = "World" result = first + " " + second print(result) # "Hello World" # Warning: you can't add a string to a number! age = 25 # print("Age: " + age) # Error! print("Age: " + str(age)) # OK: "Age: 25" ```

Repetition (*)

Multiplying a string by a number

```python print("Ha" * 3) # "HaHaHa" print("-" * 20) # "--------------------" # Practical: formatted output title = "= Menu =" print("=" * 20) print(title.center(20)) print("=" * 20) ```

Membership test (in)

Is a substring present in a string?

```python email = "user@example.com" print("@" in email) # True print("gmail" in email) # False print("example" in email) # True # Practical: email validation if "@" in email and "." in email: print("Looks like a valid email") ```

What will print("ab" * 2 + "c") output?

String Methods

Python strings have many built-in methods - functions called with dot notation.

MethodDescriptionExample
.upper()All uppercase"hello".upper() → "HELLO"
.lower()All lowercase"HELLO".lower() → "hello"
.strip()Remove surrounding whitespace" hi ".strip() → "hi"
.replace(a, b)Replace a with b"cat".replace("c", "b") → "bat"
.split()Split into a list"a,b,c".split(",") → ["a","b","c"]
.join()Join from a list"-".join(["a","b"]) → "a-b"
.find(x)Find position of x"hello".find("l") → 2
.startswith(x)Starts with x?"hello".startswith("he") → True
.endswith(x)Ends with x?"hello".endswith("lo") → True

Methods in action

Practical examples

```python # Normalizing user input user_input = " HELLO World " clean = user_input.strip().lower() print(clean) # "hello world" # Replacing words message = "I love JavaScript" fixed = message.replace("JavaScript", "Python") print(fixed) # "I love Python" # Parsing data data = "apple,banana,cherry" fruits = data.split(",") print(fruits) # ['apple', 'banana', 'cherry'] # Checking file extension filename = "photo.jpg" if filename.endswith((".jpg", ".png", ".gif")): print("This is an image") ```

**Strings are immutable!** Methods return a NEW string - they don't modify the original. `s.upper()` won't change s - the result must be assigned: `s = s.upper()`.

What will print("a,b,c".split(",")) output?

String Formatting (f-strings)

**f-strings** (formatted string literals) are the most convenient way to embed variables into a string.

f-strings in action

The modern approach to formatting

```python name = "Alice" age = 25 # Old way (concatenation) print("Hello, " + name + "! You are " + str(age) + " years old.") # New way (f-string) - just add f before the quotes print(f"Hello, {name}! You are {age} years old.") # You can evaluate expressions inline! print(f"In 10 years you'll be {age + 10}") print(f"Name in uppercase: {name.upper()}") ```

Number formatting

Clean output

```python price = 1234.5678 print(f"Price: {price}") # 1234.5678 print(f"Price: {price:.2f}") # 1234.57 (2 decimal places) print(f"Price: {price:,.2f}") # 1,234.57 (thousands separator) # Percentage formatting ratio = 0.756 print(f"Result: {ratio:.1%}") # "Result: 75.6%" # Alignment for item in ["Apple", "Banana", "Cherry"]: print(f"{item:>10}") # Right-aligned ```

f-strings were introduced in Python 3.6 and have become the standard. They're faster and more readable than `.format()` or `%`-formatting.

What will print(f"x = {x * 2}") output when x = 5?

Connection to other topics

Strings touch almost every later lesson:

  • Conditionals — Comparing strings, membership checks, input validation
  • Loops — Iterating characters, searching, parsing text data
  • Arrays — Lists and strings share an API: indices, slices, len
  • Functions — Helpers for normalizing, parsing and formatting text

Summary

  • A string is a sequence of characters with zero-based indexing; negative indices count from the end
  • Slices [start:end:step] return a new substring, the end index is not included
  • Concatenation (+) and repetition (*) build new strings; adding a string to a number requires conversion
  • Methods upper, lower, strip, split, join, replace, find, startswith, endswith cover 80 percent of jobs
  • Strings are immutable: every method returns a new string, the original stays intact
  • f-strings (Python 3.6+) are the fastest and most readable formatting option; expressions inside {} are evaluated

Связанные уроки

  • prog-04-operators — Operators like + power string concatenation
  • prog-11-arrays — A string is essentially an array of characters
  • prog-07-loops — Iterating characters needs loops over the string
  • nlp-01 — Text processing builds directly on string operations
  • crypto-08-substitution-ciphers — Classic ciphers transform strings character by character
  • alg-01-big-o
Strings

0

1

Sign In