Intro To Machine Learning

Pascual Vila

AI & Software Instructor // Code Syllabus

"Machine learning is the science of getting computers to act without being explicitly programmed." — Andrew Ng. It marks the shift from hardcoding rules to teaching systems to deduce rules from vast amounts of data.

1. The Paradigm Shift

Historically, software engineering was about writing explicit logic. If A happens, execute B. Machine Learning flips this. We feed the computer the input data (features) and the desired outputs (labels), and it calculates the mathematical rules to map inputs to outputs.

2. Supervised vs. Unsupervised

Supervised Learning is like studying with an answer key. You train the model on data where the outcome is known. For example, predicting house prices based on historical sales data. [Image of Supervised vs Unsupervised learning]

Unsupervised Learning is like exploring without a map. The algorithm is given unlabeled data and must find structure within it. This is heavily used in customer segmentation and clustering algorithms.

3. The Core ML Pipeline

Building an ML app isn't just about the algorithm. It is a systematic pipeline:

1. Data Collection: Gathering raw data from APIs, databases, or sensors.
2. Preprocessing: Cleaning missing values, normalizing numbers, and encoding text to make it machine-readable.
3. Model Training: Utilizing .fit() to let the algorithm find patterns.
4. Evaluation: Testing the model on unseen data to calculate its accuracy.

🤖 Artificial Intelligence FAQ

What is the difference between AI and Machine Learning?

Artificial Intelligence (AI) is the broader concept of machines being able to carry out tasks in a way that we would consider "smart". Machine Learning (ML) is a specific subset of AI based on the idea that we should give machines access to data and let them learn for themselves.

What are Features and Labels in a dataset?

Features (X): These are the independent variables or the input attributes you use to make a prediction (e.g., square footage, number of bedrooms).

Labels (y): This is the dependent variable or the final answer you are trying to predict (e.g., the final price of the house).

Why do we split data into Training and Testing sets?

If a model is evaluated on the same data it was trained on, it might simply memorize the answers (overfitting) rather than learning underlying patterns. Splitting the data ensures we evaluate the model's true performance on unseen, real-world data.

Machine Learning Lexicon

Supervised Learning

Algorithm learns from a training dataset that contains the correct answers (labels).

syntax.js

Unsupervised Learning

Algorithm looks for previously undetected patterns in a dataset with no pre-existing labels.

syntax.js

Features (X)

The input variables or attributes used by the model to make a prediction.

syntax.js

Labels (y)

The target output variable that the model is attempting to predict.

syntax.js

Model.fit()

The universal function used to start the training process and calculate internal mathematical weights.

syntax.js

Model.predict()

Used after training to generate predictions on entirely new, unseen data.

syntax.js

Intro To Machine Learning

Neural Architecture

Concept: Supervised Learning

Validation Node

Model Challenges

Global Research Hub

Join the AI Think Tank