Intro To Machine Learning

Pascual Vila
AI & Software Instructor // Code Syllabus
"Machine learning is the science of getting computers to act without being explicitly programmed." — Andrew Ng. It marks the shift from hardcoding rules to teaching systems to deduce rules from vast amounts of data.
1. The Paradigm Shift
Historically, software engineering was about writing explicit logic. If A happens, execute B. Machine Learning flips this. We feed the computer the input data (features) and the desired outputs (labels), and it calculates the mathematical rules to map inputs to outputs.
2. Supervised vs. Unsupervised
Supervised Learning is like studying with an answer key. You train the model on data where the outcome is known. For example, predicting house prices based on historical sales data. [Image of Supervised vs Unsupervised learning]
Unsupervised Learning is like exploring without a map. The algorithm is given unlabeled data and must find structure within it. This is heavily used in customer segmentation and clustering algorithms.
3. The Core ML Pipeline
Building an ML app isn't just about the algorithm. It is a systematic pipeline:
- 1. Data Collection: Gathering raw data from APIs, databases, or sensors.
- 2. Preprocessing: Cleaning missing values, normalizing numbers, and encoding text to make it machine-readable.
- 3. Model Training: Utilizing
.fit()to let the algorithm find patterns. - 4. Evaluation: Testing the model on unseen data to calculate its accuracy.
🤖 Artificial Intelligence FAQ
What is the difference between AI and Machine Learning?
Artificial Intelligence (AI) is the broader concept of machines being able to carry out tasks in a way that we would consider "smart". Machine Learning (ML) is a specific subset of AI based on the idea that we should give machines access to data and let them learn for themselves.
What are Features and Labels in a dataset?
Features (X): These are the independent variables or the input attributes you use to make a prediction (e.g., square footage, number of bedrooms).
Labels (y): This is the dependent variable or the final answer you are trying to predict (e.g., the final price of the house).
Why do we split data into Training and Testing sets?
If a model is evaluated on the same data it was trained on, it might simply memorize the answers (overfitting) rather than learning underlying patterns. Splitting the data ensures we evaluate the model's true performance on unseen, real-world data.