Data Structures

Arrays: The Foundation of Data Structures

Netflix has **200 million** subscribers. Opening the app, a profile is located among those millions in **microseconds** - not by scanning through everyone. How? The answer lies in the simplest data structure.

Цели урока

  • Understand how an array is stored in memory (contiguous)
  • Explain why access is O(1) but insertion is O(n)
  • Distinguish static and dynamic arrays
  • Know when an array is the right choice

Предварительные знания

  • Basic work with lists in Python
  • Arrays and Lists

Why is element access in an array instant, but inserting in the middle is slow? The answer isn't in algorithms - it's in how memory works.

  • **FAANG interviews**: half of all problems involve arrays (two pointers, sliding window, prefix sum)
  • **90% of data structures** use arrays internally: hash tables, heaps, graphs
  • **Cache locality**: arrays are CPU-cache-friendly → real-world speed is many times faster

The array is the oldest data structure

The array predates almost every other data structure because it maps straight onto how memory works: a contiguous block of cells with sequential addresses. The first high-level language with real arrays was FORTRAN, shipped by John Backus and his team at IBM in 1957. Indexing and multidimensional arrays sat at the heart of the language, since it was built for scientific and engineering computation. In 1962 Kenneth Iverson, in his book A Programming Language (APL), made the array the primary unit of computation: whole operations ran over entire arrays with no explicit loops, an idea that later shaped NumPy and MATLAB. In 1972 Dennis Ritchie baked the array-pointer duality into C, where arr[i] is exactly *(arr + i). That address arithmetic is what delivers the O(1) access this lesson is about.

Memory as a Street of Houses

Consider computer memory as an **infinite street with numbered houses**. Each house has an address: 0, 1, 2, 3...

An array is a **reservation of consecutive houses**. Reserving 5 elements means houses 100, 101, 102, 103, 104 are claimed. They are **contiguous** - sitting next to each other in memory.

**Key insight:** Knowing the starting address (100) and the index (i), the computer **instantly** computes where the element lives: `100 + i`. No searching required!

An array starts at address 200. Where in memory is arr[7]?

O(1) Access - The Superpower of Arrays

Thanks to contiguous layout, arrays give **random access** in **O(1)** - constant time.

**O(1)** means: whether the array has a million or a billion elements - accessing any one takes **the same amount of time**. Just arithmetic: `base + index`.

Netflix and O(1)

Netflix has 200+ million subscribers. Each has an ID (a number). Data is stored in array-like structures. ``` user_data[user_id] → user profile ``` Accessing the profile of user #150,000,000 takes the same time as accessing user #1. That's O(1).

An array has 1 billion elements. How many operations does it take to get arr[999999999]?

Insertion: Why O(n)?

Here's the **other side of the coin**. Inserting an element in the middle requires **shifting all the neighbors**.

**Worst case:** inserting at the beginning. **All n elements** must be shifted. That's **O(n)**.

**Best case:** inserting at the end. No shifting needed. That's **O(1)** (if there's room).

An array has 10,000 elements. How many elements need to be shifted to insert at position 3?

Static vs Dynamic Arrays

In classic languages (C, C++) an array has a **fixed size**. Declare `int arr[10]` - that's exactly 10 slots, no more, no less.

**Dynamic arrays** (Python list, Java ArrayList, C++ vector) **grow automatically**. How?

**Amortized O(1):** Although copying all elements (O(n)) happens occasionally, on average appending to the end is O(1). Why? Because capacity doubles, so copying happens less and less frequently.

Amortization Math

We add 1000 elements. Copies happen when capacity = 1, 2, 4, 8, 16, 32, 64, 128, 256, 512. Total copy operations: 1 + 2 + 4 + ... + 512 = 1023. Total insertions: 1000. Average: 1023/1000 ≈ 1 operation per insertion = O(1) amortized.

A dynamic array has capacity = 16 and size = 16. We add an element. What happens?

Complexity Summary Table

At this point the reasoning behind these complexities is clear - no need to just memorize a table.

**When to use an array:**

  • ✅ Fast index access is needed
  • ✅ Data is rarely inserted/deleted in the middle
  • ✅ Approximate size is known in advance

**When NOT to use:**

  • ❌ Frequent inserts/deletes at the beginning
  • ❌ Size fluctuates significantly

**Remember the question from the start?** Why can Netflix store millions of movies and instantly find any by ID? Because ID is an index, and index access is O(1). Arithmetic instead of searching.

Arrays are outdated - always use something more complex

Arrays are the foundation of most data structures. O(1) access + cache locality makes them indispensable for many tasks. Even a HashMap uses an array internally!

A chat app is being built. New messages are added to the end, old ones are deleted from the beginning. Is an array a good fit?

Complexity Analysis Determine the complexity of each operation: ```python arr = [1, 2, 3, 4, 5] # Operation 1 x = arr[2] # Operation 2 arr.append(6) # Operation 3 arr.insert(0, 0) # Operation 4 for item in arr: print(item) ```

# Operation 1: O(1) - index access # Operation 2: O(1) amortized - insert at end # Operation 3: O(n) - insert at beginning, shift everything # Operation 4: O(n) - iterate over all elements

Algorithm Optimization This code is inefficient. Why? How to fix it? ```python def build_list_badly(n): result = [] for i in range(n): result.insert(0, i) # Insert at beginning return result # At n = 100,000 this runs very slowly! ```

def build_list_well(n): result = [] for i in range(n): result.append(i) # O(1) instead of O(n) result.reverse() # One pass O(n) return result # Or even simpler: def build_list_pythonic(n): return list(range(n-1, -1, -1))

Choosing a Data Structure For each scenario, decide: is an array a good fit? Why? 1. **Instagram feed**: showing posts top-to-bottom, new posts load at the top 2. **Player leaderboard**: top-100, access by rank (1st, 2nd, 3rd...) 3. **Undo/Redo in an editor**: the last action is undone first 4. **Browser cache**: fast access to a page by URL

# 1. Instagram feed # ❌ Array is not ideal # New posts insert at beginning - O(n) # Better: deque or linked list # 2. Player leaderboard # ✅ Array is ideal # Access by index (rank) - O(1) # Updates are infrequent # 3. Undo/Redo # ✅ Array (as a stack) # Add/remove only from end - O(1) # A stack is an array with a restricted interface # 4. Browser cache # ❌ Array doesn't fit # Need key-based lookup (URL), not index # Better: hash table (dict)

What's Next?

The array is the foundation. Here's what's built on top of it:

  • Linked Lists — Solve the O(n) insertion problem at the cost of O(n) access
  • Hash Tables — Array + hash function = O(1) key lookup
  • Sorting Algorithms — Most sorting algorithms operate on arrays

Key Ideas

  • **Array** - a contiguous block of memory with O(1) index access
  • **Insert/delete** from the middle - O(n) due to element shifting
  • **Dynamic arrays** grow ×2, giving amortized O(1) append
  • **Choosing a structure** depends on frequent operations, not on trends
  • **Netflix finds a profile** in microseconds because ID = index = O(1)

Вопросы для размышления

  • Why is a Python list a dynamic array rather than a linked list?
  • Which operations would be faster/slower with a linked list?
  • Remember the question at the start about Netflix? The answer is now clear - how does it work?

Связанные уроки

  • prog-11-arrays — Basic arrays in programming are a prerequisite for understanding their internal mechanics
  • alg-01-big-o — Big O notation is needed to understand O(1) vs O(n) array operations
  • ds-02-linked-lists — Linked lists are the direct alternative to arrays with different trade-offs
  • arch-01-binary — Computer memory addressing is the physical basis of the array concept
  • arch-08-memory-hierarchy
Arrays: The Foundation of Data Structures

0

1

Sign In