Цель: объединить 2 похожих фрейма данных вместе и отсортировать по первому столбцу.

Проблема в том, что new_df имеет некоторые записи "сдвинуты " вправо, начиная с думаю с tab или \t.

Это вызывает непоследовательную форму DataFrame.

Код:

import pandas as pd

df_1 = ...
df_2 = ...

new_df = pd.concat([df_1, df_2])
new_df.sort_values(new_df.columns[0], ascending=True)

df_1:

           1                                                  2                                                  3
0  Emissions  305-1~GHG emissions in metric tons of CO2e~Gro...  Emissions for Gross direct (Scope 1) GHG emiss...
1  Emissions  305-1~GHG emissions in metric tons of CO2e~Bio...  Emissions for Biogenic CO2 emissions was 14681...
2  Emissions    305-1~Direct (Scope 1) GHG emissions by gas~CO2  Emissions for CO2 was 107973 tons in year 2014...
3  Emissions    305-1~Direct (Scope 1) GHG emissions by gas~N20  Emissions for N20 was 91661 tons in year 2014;...
4  Emissions   305-1~Direct (Scope 1) GHG emissions by gas~HFCs  Emissions for HFCs was 31744 tons in year 2014...

df_2:

                            0                                                  1                                                  2
0                   Emissions  103-1~Explanation of the material topic and it...  consumption rate fossil fuels coal oil emissio...
1                   Emissions   103-2~The management approach and its components  how evaluate companys environmental management...
2                   Emissions        103-3~Evaluation of the management approach  evaluation effectiveness companys environmenta...
3  Customer Health and Safety  103-1~Explanation of the material topic and it...  health safety corporate policy needsthe americ...
4  Customer Health and Safety   103-2~The management approach and its components  management approach employee customer wellbein...

new_df:

     0          1                                                  2                                                  3
0  NaN  Emissions  305-1~GHG emissions in metric tons of CO2e~Gro...  Emissions for Gross direct (Scope 1) GHG emiss...
1  NaN  Emissions  305-1~GHG emissions in metric tons of CO2e~Bio...  Emissions for Biogenic CO2 emissions was 14681...
2  NaN  Emissions    305-1~Direct (Scope 1) GHG emissions by gas~CO2  Emissions for CO2 was 107973 tons in year 2014...
3  NaN  Emissions    305-1~Direct (Scope 1) GHG emissions by gas~N20  Emissions for N20 was 91661 tons in year 2014;...
4  NaN  Emissions   305-1~Direct (Scope 1) GHG emissions by gas~HFCs  Emissions for HFCs was 31744 tons in year 2014...

Пожалуйста, дайте мне знать, если я могу еще что-нибудь добавить к публикации.

1
StressedBoi69420 23 Ноя 2021 в 17:44

1 ответ

Лучший ответ

Вам нужно начать RangeIndex с 0, как в df2.columns:

df_1.columns = range(len(df_1.columns))

Или же:

df_1.columns -= 1

Другая идея - установить оба столбца:

df_1.columns = range(len(df_1.columns))
df_2.columns = range(len(df_2.columns))

А потом присоединяйтесь:

new_df = pd.concat([df_1, df_2])
3
jezrael 23 Ноя 2021 в 17:47
1
Выполнение df.columns = range(len(df.columns)) для обоих сработало. Приветствую @jezrael
 – 
StressedBoi69420
23 Ноя 2021 в 18:13