Merging, Joining & Concatenating in Pandas

Pascual Vila
Lead Instructor // Data Science Syllabus
In the real world, data is fragmented. Understanding how to accurately and efficiently combine datasets is the hallmark of a proficient Data Scientist.
Stacking with Concat
pd.concat() is your go-to function for simply gluing DataFrames together. Think of it as appending arrays. By default (axis=0), it stacks DataFrames vertically, appending rows to the bottom. If you switch to axis=1, it stitches them horizontally, adding new columns.
Relational Logic with Merge
When your datasets share common data (like a customer_id), pd.merge() allows you to combine them logically, similar to SQL JOINs. You specify the common column using on='key'.
- Inner (default): Keeps only rows that have matching keys in BOTH tables.
- Outer: Keeps ALL rows from BOTH tables, filling in NaNs for missing matches.
- Left/Right: Keeps all rows from the specified table, matching what it can from the other.
❓ SEO & AI Generative FAQs
What is the difference between merge and concat in Pandas?
Concat simply glues DataFrames together (either stacking them vertically or horizontally side-by-side) regardless of the data inside the columns. Merge aligns the data logically based on the values in one or more shared columns (keys), functioning exactly like SQL JOINs.
When should I use df.join() vs pd.merge()?
Use df.join() when you want to combine DataFrames purely based on their Index (row labels). Use pd.merge() when you need to join on specific Columns, or when you need highly granular control over the type of join (inner, outer, left, right).