Я хотел бы знать, чтобы разделить столбец местоположения на несколько новых столбцов, таких как город, код штата и страна в пандах. Из этого:

 'Location': {0: 'Warszawa, Poland',
  1: 'San Francisco, CA, United States',
  2: 'Los Angeles, CA, United States',
  3: 'Sunnyvale, CA, United States',
  4: 'Sunnyvale, CA, United States',
  5: 'San Francisco, CA, United States',
  6: 'Sunnyvale, CA, United States',
  7: 'Kraków, Poland',
  8: 'Shanghai, China',
  9: 'Mountain View, CA, United States',
  10: 'Boulder, CO, United States',
  11: 'Boulder, CO, United States',
  12: 'Xinyi District, Taiwan',
  13: 'Tel Aviv-Yafo, Israel',
  14: 'Wrocław, Poland',
  15: 'Singapore'}

К этому:

 'Country': {0: 'Poland',
  1: 'United States',
  2: 'United States',
  3: 'United States',
  4: 'United States',
  5: 'United States',
  6: 'United States',
  7: 'Poland',
  8: 'China',
  9: 'United States',
  10: 'United States',
  11: 'United States',
  12: 'Taiwan',
  13: 'Israel',
  14: 'Poland',
  15: 'Singapore'}

Спасибо.

0
Steven Wong 3 Июл 2019 в 21:24

3 ответа

Лучший ответ
$ ipython
Python 3.6.8 |Anaconda custom (64-bit)| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: d = {'Location': {0: 'Warszawa, Poland',
   ...:   1: 'San Francisco, CA, United States',
   ...:   2: 'Los Angeles, CA, United States',
   ...:   3: 'Sunnyvale, CA, United States',
   ...:   4: 'Sunnyvale, CA, United States',
   ...:   5: 'San Francisco, CA, United States',
   ...:   6: 'Sunnyvale, CA, United States',
   ...:   7: 'Kraków, Poland',
   ...:   8: 'Shanghai, China',
   ...:   9: 'Mountain View, CA, United States',
   ...:   10: 'Boulder, CO, United States',
   ...:   11: 'Boulder, CO, United States',
   ...:   12: 'Xinyi District, Taiwan',
   ...:   13: 'Tel Aviv-Yafo, Israel',
   ...:   14: 'Wrocław, Poland',
   ...:   15: 'Singapore'}}

In [2]: import pandas as pd
   ...: df = pd.DataFrame.from_dict(d)
   ...: df
Out[2]:
                            Location
0                   Warszawa, Poland
1   San Francisco, CA, United States
2     Los Angeles, CA, United States
3       Sunnyvale, CA, United States
4       Sunnyvale, CA, United States
5   San Francisco, CA, United States
6       Sunnyvale, CA, United States
7                     Kraków, Poland
8                    Shanghai, China
9   Mountain View, CA, United States
10        Boulder, CO, United States
11        Boulder, CO, United States
12            Xinyi District, Taiwan
13             Tel Aviv-Yafo, Israel
14                   Wrocław, Poland
15                         Singapore

In [3]: df['Country'] = df['Location'].str.split(',').apply(lambda x: x[-1])
   ...: df
Out[3]:
                            Location         Country
0                   Warszawa, Poland          Poland
1   San Francisco, CA, United States   United States
2     Los Angeles, CA, United States   United States
3       Sunnyvale, CA, United States   United States
4       Sunnyvale, CA, United States   United States
5   San Francisco, CA, United States   United States
6       Sunnyvale, CA, United States   United States
7                     Kraków, Poland          Poland
8                    Shanghai, China           China
9   Mountain View, CA, United States   United States
10        Boulder, CO, United States   United States
11        Boulder, CO, United States   United States
12            Xinyi District, Taiwan          Taiwan
13             Tel Aviv-Yafo, Israel          Israel
14                   Wrocław, Poland          Poland
15                         Singapore       Singapore

In [4]: df['Country'].to_dict()
Out[4]:
{0: ' Poland',
 1: ' United States',
 2: ' United States',
 3: ' United States',
 4: ' United States',
 5: ' United States',
 6: ' United States',
 7: ' Poland',
 8: ' China',
 9: ' United States',
 10: ' United States',
 11: ' United States',
 12: ' Taiwan',
 13: ' Israel',
 14: ' Poland',
 15: 'Singapore'}
0
Kamaraju Kusumanchi 3 Июл 2019 в 20:15

Это немного более изощренно, выполняет ту же работу и может быть помещено в одну строку кода.

b['City'] = b['Location'].str.split(',').apply(lambda x: x[0])
b['Country'] = b['Location'].str.split(',').apply(lambda x: x[-1])
b

Выход:

    Location                            City             Country
0   Warszawa, Poland                    Warszawa          Poland
1   San Francisco, CA, United States    San Francisco     United States
2   Los Angeles, CA, United States      Los Angeles       United States
3   Sunnyvale, CA, United States        Sunnyvale         United States
4   Sunnyvale, CA, United States        Sunnyvale         United States
5   San Francisco, CA, United States    San Francisco     United States
6   Sunnyvale, CA, United States        Sunnyvale         United States
7   Kraków, Poland                      Kraków            Poland
8   Shanghai, China                     Shanghai          China

Это однострочная версия, но у меня проблемы с тем, чтобы они были в двух разных столбцах. Что-то здесь не так, я не могу это выяснить.

b['City', 'Country']= pd.DataFrame (b['Location'].str.split(',').apply(lambda x:( x[0], x[-1]))) 


    (City,  Country)
0   (Warszawa, Poland)
1   (San Francisco, United States)
2   (Los Angeles, United States)
3   (Sunnyvale, United States)
4   (Sunnyvale, United States)
5   (San Francisco, United States)
0
Vishwas 4 Июл 2019 в 04:45

Я не уверен, что это лучший метод, другие, пожалуйста, прокомментируйте или предложите лучший метод. Я пытался разделить данные, но проблема в том, что в зарубежных странах есть только название города и страны, а в записях из США - город, штат и страна. Следовательно, я не смог разделить его одним методом. Ниже приведены два метода, которые я использовал для разделения данных, а затем вы должны выяснить, как объединить один фрейм данных.

 b = pd.DataFrame ({'Location': {0: 'Warszawa, Poland',
  1: 'San Francisco, CA, United States',
  2: 'Los Angeles, CA, United States',
  3: 'Sunnyvale, CA, United States',
  4: 'Sunnyvale, CA, United States',
  5: 'San Francisco, CA, United States',
  6: 'Sunnyvale, CA, United States',
  7: 'Kraków, Poland',
  8: 'Shanghai, China',
  9: 'Mountain View, CA, United States',
  10: 'Boulder, CO, United States',
  11: 'Boulder, CO, United States',
  12: 'Xinyi District, Taiwan',
  13: 'Tel Aviv-Yafo, Israel',
  14: 'Wrocław, Poland',
  15: 'Singapore'}})

c[['City', 'Country']] = b['Location'].str.split(',', n=1, expand=True) # This splits the data into city and Country. So this works very well for Foriegn address or data with just city and country. 

 Out put is:

     City       Country
0   Warszawa    Poland
1   San Francisco   CA, United States
2   Los Angeles CA, United States
3   Sunnyvale   CA, United States
4   Sunnyvale   CA, United States
5   San Francisco   CA, United States
6   Sunnyvale   CA, United States
7   Kraków  Poland
8   Shanghai    China

Второй метод:

regex = r'(?P<City>[^,]+)\s*,\s*(?P<State>[^\s]+)\s+(?P<Country>[^,]+)'
df=b['Location'].str.extract(regex)
df # This splits the data into City, State and Country, so it works well for US address. 

Output is :

    City       State    Country
0   NaN          NaN    NaN
1   San Francisco CA,   United States
2   Los Angeles CA,     United States
3   Sunnyvale   CA,     United States
4   Sunnyvale   CA,     United States
5   San Francisco CA,   United States
6   Sunnyvale   CA,     United States
7   NaN          NaN    NaN
1
Vishwas 3 Июл 2019 в 20:25