011. The Axis Concept
EXECUTIVE_SUMMARY // AEO_OPTIMIZED
[Answer Engine Overview: What, Why & How]
When manipulating multi-dimensional arrays, you must tell NumPy which direction to apply the operation. This is done via the axis parameter.
- →
axis=0: Operates downwards (along the columns). If you sum alongaxis=0, you get the sum of each column. - →
axis=1: Operates horizontally (across the rows). If you sum alongaxis=1, you get the sum of each row.
In 3D arrays, axis=2 represents the depth.
022. The Preprocessing Pipeline
Before a neural network sees any data, the data must be scrubbed.
- →Joining: Combining user data from multiple database tables into a single master matrix.
- →Filtering: Using boolean masks to drop rows containing corrupted or
NaN(Not a Number) values. - →Splitting: Dividing the clean master matrix into an 80% Training Set and a 20% Testing Set.
033. The Overloading Trap
In standard Python, [1, 2] + [3, 4] results in [1, 2, 3, 4]. It concatenates the lists.
In NumPy, the + operator is overloaded to perform vectorization. np.array([1, 2]) + np.array([3, 4]) results in [4, 6]. It mathematically adds the elements. To structurally combine NumPy arrays, you must explicitly use functions like np.concatenate().
?Frequently Asked Questions
What is Boolean Masking?
Boolean masking is the process of using an array of True/False values to filter another array. For example, `arr[arr > 5]` will return a new array containing only the values greater than 5. It is lightning fast because it runs in C.
Why do we split data into Train and Test sets?
If you train an AI model on all your data, it will memorize the answers (Overfitting) and fail in the real world. You must split a portion of your data to 'Test' the model on data it has never seen before.
How does axis=0 differ between 1D and 2D arrays?
In a 1D array (vector), `axis=0` is the only axis, representing the flow of elements. In a 2D array (matrix), `axis=0` specifically targets the vertical flow (down the columns).
