Pandas 활용(10) - 데이터프레임 합치기(concat, merge)

pandas의 concat() 과 merge() 는 둘 다 데이터프레임을 합치는 함수이지만 약간의 차이가 있는데

concat() 은 단순히 여러 개의 데이터프레임을 합치는 것에 중점을 두고 있고,

merge() 는 2개의 데이터프레임을 설정한 기준대로 합치는 것이다.

아래 예제들을 통해 사용법을 알아보자.

raw_data = {
        'Employee ID': ['1', '2', '3', '4', '5'],
        'first name': ['Diana', 'Cynthia', 'Shep', 'Ryan', 'Allen'], 
        'last name': ['Bouchard', 'Ali', 'Rob', 'Mitch', 'Steve']}
        
df_Engineering_dept = pd.DataFrame(raw_data, columns = ['Employee ID', 'first name', 'last name'])
df_Engineering_dept

raw_data = {
        'Employee ID': ['6', '7', '8', '9', '10'],
        'first name': ['Bill', 'Dina', 'Sarah', 'Heather', 'Holly'], 
        'last name': ['Christian', 'Mo', 'Steve', 'Bob', 'Michelle']}
        
df_Finance_dept = pd.DataFrame(raw_data, columns = ['Employee ID', 'first name', 'last name'])
df_Finance_dept

raw_data = {
        'Employee ID': ['1', '2', '3', '4', '5', '7', '8', '9', '10'],
        'Salary [$/hour]': [25, 35, 45, 48, 49, 32, 33, 34, 23]}
df_salary = pd.DataFrame(raw_data, columns = ['Employee ID','Salary [$/hour]'])
df_salary

pandas.concat()

concat 은 컬럼이름만 같으면 데이터프레임이 몇개든 상관없이 합칠 수 있다.

df_all = pd.concat( [df_Engineering_dept, df_Finance_dept] )
df_all

pandas.merge()

merge 는 두 데이터프레임의 공통컬럼(연결고리)가 있으면, 이 컬럼을 기준으로 해서 하나로 합칠 수 있다.

merge는 단 2개의 데이터프레임만 합칠 수 있다.

pd.merge( df_all, df_salary, on= 'Employee ID' )
# 두 데이터프레임에 공통으로 있는 데이터만 합쳐준다

df_salary 에 Employee ID 6번이 없어서 합쳐지지 않았다.

# 직원정보가 다 나오게 하기
pd.merge( df_all, df_salary, on= 'Employee ID', how = 'left' )
# how = 'left' : 왼쪽에 있는 데이터프레임의 정보가 다 나오게 함
# how = 'right' : 오른쪽에 있는 데이터프레임의 정보가 다 나오게 함

'Python > Pandas' 카테고리의 다른 글

matplotlib, seaborn 을 활용한 데이터 시각화(1) - plot, countplot, pie (0)	2022.11.29
Python 비트연산자 ~ 의 활용 (0)	2022.11.29
Pandas 활용(9) - 데이터프레임 정렬하기 sort_values, sort_index (0)	2022.11.28
Pandas 활용(8) - str 관련 메소드 (0)	2022.11.28
Pandas 활용(7) - 함수의 일괄 적용 apply (0)	2022.11.28

'Python > Pandas' 카테고리의 다른 글

티스토리툴바