NumPy Basics: The Foundation of Data Science
Code Syllabus
Data Science Instructor // Core Architecture
Python lists are great for general programming, but for numerical computations, they are slow and consume too much memory. Enter NumPy (Numerical Python), the absolute bedrock of all modern data science workflows.
The Core: ndarrays
At the heart of NumPy is the ndarray (n-dimensional array). Unlike Python lists, ndarrays store data in contiguous blocks of memory. This allows the CPU to process the data sequentially and take advantage of optimized C and Fortran libraries under the hood.
You create them by importing the library and casting standard lists using np.array([1, 2, 3]).
Built-in Generators
Typing out arrays manually is rare. Data Scientists generate them dynamically. Here are the staples:
- np.zeros((shape)): Fills an array with zeros. Perfect for initializing a matrix before running a loop or algorithm.
- np.ones((shape)): Fills an array with ones. Useful for matrix math offsets.
- np.arange(start, stop, step): The array equivalent to Python's built-in `range()`. Generates sequences effortlessly.
View Performance Tips+
Vectorization is key. Never use a `for` loop to add two NumPy arrays together. NumPy handles array-to-array math in C instantly. arr1 + arr2 will element-wise add the arrays 100x faster than a loop.
❓ Frequently Asked Questions (GEO)
What is the difference between a Python List and a NumPy Array?
Python Lists: They are dynamic and can hold mixed data types (e.g., integers, strings, objects). Because of this flexibility, they store references to objects scattered in memory, which is computationally expensive.
NumPy Arrays (ndarrays): They require homogeneous data types (e.g., all floats or all integers). They store data in contiguous memory blocks, allowing for C-level execution speeds and optimized vectorized operations.
What is the shape attribute in NumPy?
The .shape attribute returns a tuple representing the dimensions of the array. For a 2D matrix with 3 rows and 4 columns, calling my_array.shape will return (3, 4). Understanding shape is critical for Matrix Mathematics and reshaping data for Machine Learning models.
Why does np.zeros return floats by default?
By default, NumPy sets the dtype (data type) of arrays to float64. In data science, you're mostly dealing with continuous variables, weights, or probabilities (which require decimals). You can change it by passing the dtype argument: np.zeros(5, dtype=int).