df비교

Notice

Recent Posts

Recent Comments

Link

« 2025/05 »
일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tags more

Archives

관리 메뉴

아미(아름다운미소)

df비교 본문

랭귀지/pandas

df비교

유키공 2025. 3. 27. 16:47

import pandas as pd
import numpy as np

# 예시 데이터 (NaN 포함)
df1 = pd.DataFrame({'A': [1, 2, np.nan], 'B': ['a', 'b', 'c']}, index=[0, 1, 2])
df2 = pd.DataFrame({'A': [1, 2, 4], 'B': ['a', 'x', np.nan]}, index=[1, 2, 3])

# 1. merge 실행 (outer join)
merged = pd.merge(
    df1.reset_index(drop=True), 
    df2.reset_index(drop=True), 
    how='outer', 
    indicator='_source',
    on=list(df1.columns),
    suffixes=('', '_y')
)

# 2. 다른 행 필터링
diff_rows = merged[merged['_source'] != 'both'].copy()
diff_rows['_source'] = diff_rows['_source'].replace({
    'left_only': 'df1', 
    'right_only': 'df2'
})

# 3. NaN 안전 처리 + 하이라이트
for idx, row in diff_rows.iterrows():
    source = row['_source']
    other_df = df2 if source == 'df1' else df1
    pos = idx
    
    try:
        other_row = other_df.iloc[pos]
        for col in df1.columns:
            val = row[col]
            other_val = other_row[col]
            
            # NaN 비교 안전 처리 (pd.isna() 사용)
            if (pd.isna(val) and not pd.isna(other_val)) or \
               (not pd.isna(val) and pd.isna(other_val)) or \
               (not pd.isna(val) and not pd.isna(other_val) and val != other_val):
                diff_rows.at[idx, col] = f"{val} ({source})" if not pd.isna(val) else f"NaN ({source})"
    except IndexError:
        pass

# 4. 최종 결과
diff_rows = diff_rows[df1.columns.tolist() + ['_source']]
print(diff_rows)

저작자표시 (새창열림)

'랭귀지 > pandas' 카테고리의 다른 글

pandas_profiling (대규모 데이터 분석) (0)	2025.03.28
메모리 사용량을 상세히 분석 (0)	2025.03.28
데이터프레임의 메모리 사용량을 최적화하는 함수 (0)	2025.03.27
메모리 최적화 자동화 (0)	2025.03.27
category로 변환 (0)	2025.03.27

공유하기 링크

페이스북
카카오스토리
트위터

'랭귀지/pandas' Related Articles

Comments

아미(아름다운미소)

df비교 본문

df비교

'랭귀지 > pandas' 카테고리의 다른 글

티스토리툴바