NUMPY ARRAYS /// VECTORIZATION /// BROADCASTING /// NUMPY ARRAYS /// VECTORIZATION /// BROADCASTING /// NUMPY ARRAYS ///

NumPy Arrays & Ops

The core of Python Data Science. Process data at C-speeds with contiguous arrays, vectorization, and broadcasting.

main.py (Jupyter Kernel)
1 / 12
12345
📊

Tutor:Data Science requires speed. Python lists are slow. NumPy introduces highly-optimized, C-based arrays to Python.


Module Topology

UNLOCK NODES BY EXECUTING CODE.

Concept: np.array

The foundation of NumPy. Convert slow Python lists into fast, typed matrices.

Logic Verification

Which NumPy attribute returns a tuple of the array's dimensions?


Community Neural-Net

Share your Algorithms

ACTIVE

Built a cool data pipeline? Share your Jupyter notebooks and get peer reviews!

NumPy Arrays: The Engine of Data Science

Author

AI & Data Faculty

Lead Instructor // Code Syllabus

Python natively is too slow for machine learning. By mastering NumPy arrays, you unlock contiguous memory operations and C-backend speeds, laying the true foundation for Pandas, Scikit-Learn, and TensorFlow.

The Foundation: Array Creation

The core of NumPy is the `ndarray` (n-dimensional array). Unlike Python lists, which can hold mixed data types scattered across memory, a NumPy array contains data of a single type stored in a contiguous block of memory.

You instantiate them by passing a sequence (like a list) into np.array(). For multidimensional arrays (matrices), you nest the lists.

The Power: Vectorization

If you want to add 5 to every item in a Python list, you must write a `for` loop. This is computationally expensive.

NumPy introduces Vectorization. You simply write arr + 5. NumPy delegates the looping to optimized C code under the hood, making the operation exponentially faster.

The Magic: Broadcasting

What happens when you add two arrays of different sizes? NumPy attempts to broadcast the smaller array across the larger one.

  • If you multiply a matrix by a scalar, the scalar is broadcasted to every cell.
  • If you add a 1D row to a 2D matrix, the row is added to every row of the matrix (assuming column counts match).
View Performance Tips+

Never use For-Loops with NumPy. If you find yourself writing `for row in matrix:`, stop. Look for a vectorized NumPy method instead (like `np.sum(matrix, axis=0)`). Looping over NumPy arrays in Python negates all their performance benefits.

Frequently Asked Questions (GEO)

What is the difference between a Python list and a NumPy array?

Python Lists: Can contain elements of different data types. Elements are stored sequentially, but they are just arrays of pointers to objects scattered in memory. This causes massive memory overhead and slow speeds.

NumPy Arrays (ndarray): Homogeneous (all items are the same type). Stored in a contiguous block of memory. Operations are executed via highly-optimized C code, making math operations 50x to 100x faster.

What does "Vectorization" mean in NumPy?

Vectorization refers to the absence of explicit looping, indexing, etc., in your Python code. These operations take place behind the scenes in optimized, pre-compiled C code.

# Python approach (Slow) res = [x * 2 for x in my_list] # Vectorized NumPy approach (Fast) res = my_array * 2
How does broadcasting work with array shapes?

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions (right to left) and works its way forward. Two dimensions are compatible when:

  • They are equal, or
  • One of them is 1.

If these conditions are not met, a `ValueError: operands could not be broadcast together` is thrown.

NumPy Core Glossary

np.array()
Function used to create a NumPy array from a Python list or tuple.
example.py
.shape
An attribute that returns a tuple representing the dimensions of the array.
example.py
.reshape()
Method to change the shape of an array without altering its data.
example.py
vectorization
Applying operations to entire arrays instead of iterating through elements with loops.
example.py
broadcasting
The ability of NumPy to treat arrays of different shapes during arithmetic operations.
example.py
.dtype
Attribute that tells you the data type of the array's elements (e.g., float64, int32).
example.py