import pandas as pd
Sortowanie list¶
Jako wprowadzenie zobaczmy, jak w Pythonie można sortować zwykłe listy.
lista = ['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
lista
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
Funkcja sorted zwraca nową listę, która zawiera te same dane, co oryginał, ale jest posortowana.
Nie modyfikuje oryginału
sorted(lista)
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']
Nadal poprzednia kolejność:
lista
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
Metoda .sort() sortuje elementy wewnątrz listy - modyfikuje sąmą listę.
lista.sort()
lista
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']
Wróćmy do początkowej kolejności...
lista = ['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
lista
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
Do obu operacji można przekazać takie same parametry reverse oraz key, które wpływają na kolejność.
reverse=True powoduje sortowanie w odwrotnej kolejności.
sorted(lista, reverse=True)
['Żaneta', 'Łukasz', 'Ącki', 'Ola', 'Ola', 'Marek', 'Ewelina', 'Ala', 'Ada']
W parametrze key podaje się funkcję, która będzie wywołana dla każdego elementu listy i dopiero wyniki tej funkcji będą decydować o końcowej kolejności.
Przykład: przekazujemy funkcję len, która oblicza długość napisu - dzięki temu uzyskujemy sortowanie wg długości napisów.
sorted(lista, key=len)
['Ala', 'Ola', 'Ada', 'Ola', 'Ącki', 'Marek', 'Łukasz', 'Żaneta', 'Ewelina']
len('Marek')
5
lista
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
list(map(len, lista))
[5, 3, 3, 7, 3, 6, 3, 6, 4]
Sortowanie wg alfabetu polskiego (lub innego) → omówione w dalszej części.
Sortowanie serii¶
kraje = pd.Series({
'Polska': 38_000_000,
'Chiny': 1_400_000_000,
'Niemcy': 83_000_000,
'Czechy': 11_000_000,
})
kraje
Polska 38000000 Chiny 1400000000 Niemcy 83000000 Czechy 11000000 dtype: int64
Sortowanie według wartości indeksu. Warto zauważyć, że dane wczytane z pliku nie muszą być posortowane wg indeksu.
kraje.sort_index()
Chiny 1400000000 Czechy 11000000 Niemcy 83000000 Polska 38000000 dtype: int64
Sortowanie wg wartości.
kraje.sort_values()
Czechy 11000000 Polska 38000000 Niemcy 83000000 Chiny 1400000000 dtype: int64
Domyślnie metody sortujące zwracają w wyniku nowy obiekt, a nie modyfikują oryginału.
Aby dane zmieniły kolejność wewnątrz obiektu serii (a za chwilę DataFrame), należy dodać parametr inplace=True. Wtedy operacja nie zwraca wyniku, tylko zmienia obiekt "w środku".
kraje
Polska 38000000 Chiny 1400000000 Niemcy 83000000 Czechy 11000000 dtype: int64
kraje.sort_values(ascending=False, inplace=True)
kraje
Chiny 1400000000 Niemcy 83000000 Polska 38000000 Czechy 11000000 dtype: int64
Sortowanie DataFrame¶
emps = pd.read_csv('emps.csv', sep=';', index_col='employee_id', parse_dates=['hire_date'])
W przypadku DataFrame operacja sort_index działa w zasadzie tak samo, jak dla serii, natomiast sort_values wymaga podania kolumny, zgodnie z którą sortujemy (można podać jedną nazwę lub listę nazw).
emps.sort_values('salary')
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 132 | TJ | Olson | Stock Clerk | 2100 | 2009-04-10 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 136 | Hazel | Philtanker | Stock Clerk | 2200 | 2011-02-06 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 128 | Steven | Markle | Stock Clerk | 2200 | 2010-03-08 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 135 | Ki | Gee | Stock Clerk | 2400 | 2009-12-12 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 127 | James | Landry | Stock Clerk | 2400 | 2009-01-14 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 146 | Karen | Partners | Sales Manager | 13500 | 2007-01-05 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 145 | John | Russell | Sales Manager | 14000 | 2006-10-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 102 | Lex | De Haan | Administration Vice President | 17000 | 2003-01-13 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 101 | Neena | Kochhar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 100 | Steven | King | President | 24000 | 1997-06-17 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
107 rows × 10 columns
emps.sort_values(['last_name', 'first_name'])
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 130 | Mozhe | Atkinson | Stock Clerk | 2800 | 2007-10-30 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 105 | David | Austin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 123 | Shanta | Vollman | Stock Manager | 6500 | 2007-10-10 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 196 | Alana | Walsh | Shipping Clerk | 3100 | 2008-04-24 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 120 | Matthew | Weiss | Stock Manager | 8000 | 2006-07-18 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 200 | Jennifer | Whalen | Administration Assistant | 4400 | 1987-09-17 | Administration | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
107 rows × 10 columns
Sortowanie malejące ze zmianą danych wewnątrz oryginalnej tabeli.
emps.sort_values('salary', ascending=False, inplace=True)
emps
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 100 | Steven | King | President | 24000 | 1997-06-17 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 102 | Lex | De Haan | Administration Vice President | 17000 | 2003-01-13 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 101 | Neena | Kochhar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 145 | John | Russell | Sales Manager | 14000 | 2006-10-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 146 | Karen | Partners | Sales Manager | 13500 | 2007-01-05 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 127 | James | Landry | Stock Clerk | 2400 | 2009-01-14 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 135 | Ki | Gee | Stock Clerk | 2400 | 2009-12-12 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 128 | Steven | Markle | Stock Clerk | 2200 | 2010-03-08 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 136 | Hazel | Philtanker | Stock Clerk | 2200 | 2011-02-06 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 132 | TJ | Olson | Stock Clerk | 2100 | 2009-04-10 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
107 rows × 10 columns
emps.sort_index(inplace=True)
emps
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 100 | Steven | King | President | 24000 | 1997-06-17 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 101 | Neena | Kochhar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 102 | Lex | De Haan | Administration Vice President | 17000 | 2003-01-13 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 103 | Alexander | Hunold | Programmer | 9000 | 2000-01-03 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 104 | Bruce | Ernst | Programmer | 6000 | 2001-05-21 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 202 | Pat | Fay | Marketing Representative | 6000 | 2007-08-17 | Marketing | 147 Spadina Ave | M5V 2L7 | Toronto | Canada |
| 203 | Susan | Mavris | Human Resources Representative | 6500 | 2004-06-07 | Human Resources | 8204 Arthur St | NaN | London | United Kingdom |
| 204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
| 205 | Shelley | Higgins | Accounting Manager | 12000 | 2004-06-07 | Accounting | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 206 | William | Gietz | Public Accountant | 8300 | 2004-06-07 | Accounting | 2004 Charade Rd | 98199 | Seattle | United States of America |
107 rows × 10 columns
Sortowanie wg kilku kryteriów - podajemy listę kolumn.
emps.sort_values(['city', 'job_title', 'salary']).head(30)
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 203 | Susan | Mavris | Human Resources Representative | 6500 | 2004-06-07 | Human Resources | 8204 Arthur St | NaN | London | United Kingdom |
| 204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
| 149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 148 | Gerald | Cambrault | Sales Manager | 11000 | 2009-10-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 147 | Alberto | Errazuriz | Sales Manager | 12000 | 2007-03-10 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 146 | Karen | Partners | Sales Manager | 13500 | 2007-01-05 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 145 | John | Russell | Sales Manager | 14000 | 2006-10-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 173 | Sundita | Kumar | Sales Representative | 6100 | 2010-04-21 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 167 | Amit | Banda | Sales Representative | 6200 | 2000-04-21 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 179 | Charles | Johnson | Sales Representative | 6200 | 2011-01-04 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 165 | David | Lee | Sales Representative | 6800 | 2000-02-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 155 | Oliver | Tuvault | Sales Representative | 7000 | 2009-11-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 161 | Sarath | Sewall | Sales Representative | 7000 | 2008-11-03 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 164 | Mattea | Marvins | Sales Representative | 7200 | 2010-01-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 172 | Elizabeth | Bates | Sales Representative | 7300 | 2009-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 171 | William | Smith | Sales Representative | 7400 | 2009-02-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 154 | Nanette | Cambrault | Sales Representative | 7500 | 2008-12-09 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 160 | Louise | Doran | Sales Representative | 7500 | 2007-12-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 153 | Christopher | Olsen | Sales Representative | 8000 | 2008-03-30 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 159 | Lindsey | Smith | Sales Representative | 8000 | 2007-03-10 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 177 | Jack | Livingston | Sales Representative | 8400 | 2008-04-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 176 | Jonathon | Taylor | Sales Representative | 8600 | 2008-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 175 | Alyssa | Hutton | Sales Representative | 8800 | 2007-03-19 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 152 | Peter | Hall | Sales Representative | 9000 | 2007-08-20 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 158 | Allan | McEwen | Sales Representative | 9000 | 2006-08-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 151 | David | Bernstein | Sales Representative | 9500 | 2007-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 157 | Patrick | Sully | Sales Representative | 9500 | 2006-03-04 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 163 | Danielle | Greene | Sales Representative | 9500 | 2009-03-19 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 170 | Tayler | Fox | Sales Representative | 9600 | 2008-01-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
Aby w tej sytuacji określić kolejność, należy podać listę wartości True/False
emps.sort_values(['city', 'job_title', 'salary'], ascending=[True, True, False]).head(30)
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 203 | Susan | Mavris | Human Resources Representative | 6500 | 2004-06-07 | Human Resources | 8204 Arthur St | NaN | London | United Kingdom |
| 204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
| 145 | John | Russell | Sales Manager | 14000 | 2006-10-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 146 | Karen | Partners | Sales Manager | 13500 | 2007-01-05 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 147 | Alberto | Errazuriz | Sales Manager | 12000 | 2007-03-10 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 148 | Gerald | Cambrault | Sales Manager | 11000 | 2009-10-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 168 | Lisa | Ozer | Sales Representative | 11500 | 2007-03-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 162 | Clara | Vishney | Sales Representative | 10500 | 2007-11-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 150 | Peter | Tucker | Sales Representative | 10000 | 2007-01-30 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 156 | Janette | King | Sales Representative | 10000 | 2006-01-30 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 169 | Harrison | Bloom | Sales Representative | 10000 | 2008-03-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 170 | Tayler | Fox | Sales Representative | 9600 | 2008-01-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 151 | David | Bernstein | Sales Representative | 9500 | 2007-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 157 | Patrick | Sully | Sales Representative | 9500 | 2006-03-04 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 163 | Danielle | Greene | Sales Representative | 9500 | 2009-03-19 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 152 | Peter | Hall | Sales Representative | 9000 | 2007-08-20 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 158 | Allan | McEwen | Sales Representative | 9000 | 2006-08-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 175 | Alyssa | Hutton | Sales Representative | 8800 | 2007-03-19 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 176 | Jonathon | Taylor | Sales Representative | 8600 | 2008-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 177 | Jack | Livingston | Sales Representative | 8400 | 2008-04-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 153 | Christopher | Olsen | Sales Representative | 8000 | 2008-03-30 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 159 | Lindsey | Smith | Sales Representative | 8000 | 2007-03-10 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 154 | Nanette | Cambrault | Sales Representative | 7500 | 2008-12-09 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 160 | Louise | Doran | Sales Representative | 7500 | 2007-12-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 171 | William | Smith | Sales Representative | 7400 | 2009-02-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 172 | Elizabeth | Bates | Sales Representative | 7300 | 2009-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 164 | Mattea | Marvins | Sales Representative | 7200 | 2010-01-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 155 | Oliver | Tuvault | Sales Representative | 7000 | 2009-11-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
Sortowanie zgodne z alfabetem narodowym¶
Domyślnie operacje sorotwania w "normalnym Pythonie" oraz w Pandas porównują napisy na bazie kodów Unicode poszczególnych znaków. Dopóki teksty zawierają tylko litery łacińskie, nie ma problemu.
Ale gdy pojawią się polskie ogonki itd...
emps.iloc[1,1] = 'Ćmochar'
emps.iloc[3,1] = 'Łunold'
emps.iloc[4,1] = 'Ęrnst'
emps.iloc[5,1] = 'Ąustin'
Dane przed sortowaniem.
emps.head(7)
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 100 | Steven | King | President | 24000 | 1997-06-17 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 101 | Neena | Ćmochar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 102 | Lex | De Haan | Administration Vice President | 17000 | 2003-01-13 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 103 | Alexander | Łunold | Programmer | 9000 | 2000-01-03 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 104 | Bruce | Ęrnst | Programmer | 6000 | 2001-05-21 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 105 | David | Ąustin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 106 | Valli | Pataballa | Programmer | 4800 | 2008-02-05 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
emps.sort_values('last_name')
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 130 | Mozhe | Atkinson | Stock Clerk | 2800 | 2007-10-30 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
| 116 | Shelli | Baida | Purchasing Clerk | 2900 | 2007-12-24 | Purchasing | 2004 Charade Rd | 98199 | Seattle | United States of America |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 105 | David | Ąustin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 101 | Neena | Ćmochar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 104 | Bruce | Ęrnst | Programmer | 6000 | 2001-05-21 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 103 | Alexander | Łunold | Programmer | 9000 | 2000-01-03 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
107 rows × 10 columns
Zwykłe listy¶
Aby zacząć od prostszych instrukcji, wyciągam dziesięć nazwisk do zwykłej listy.
lista = ['Marek', 'Ola', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Żaneta', 'Ącki']
lista
['Marek', 'Ola', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Żaneta', 'Ącki']
sorted(lista)
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']
W tym momencie potrzebujemy modułu locale i odp. ustawień.
import locale
locale.getdefaultlocale()
# może przestać działać w przyszłych wersjach Pythona
C:\Users\patcz\AppData\Local\Temp\ipykernel_6716\1056971489.py:1: DeprecationWarning: Use setlocale(), getencoding() and getlocale() instead locale.getdefaultlocale()
('pl_PL', 'cp1250')
locale.getlocale()
('Polish_Poland', '1250')
Wydaje się, że mamy ustawiony język polski, ale to nie dotyczy kolejności sortowania - jest to specjalna "kategoria" ustawień regionalnych COLLATE - porównywanie i sortowanie tekstów.
locale.getlocale(locale.LC_COLLATE)
(None, None)
Teraz nawet operacja strxfrm da domyślną kolejność.
sorted(lista, key=locale.strxfrm)
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']
Zmieniam ustawienia regionalne w zakresie sortowania tekstu.
locale.setlocale(locale.LC_COLLATE, 'pl_PL')
# dla brytyjskiego byłoby en_GB
#locale.setlocale(locale.LC_COLLATE, 'Polish')
'pl_PL'
locale.getlocale(locale.LC_COLLATE)
('pl_PL', 'ISO8859-2')
Gdy podamy napis '' , to przyjmowane są bieżące ustawienia systemowe.
locale.setlocale(locale.LC_COLLATE, '')
locale.getlocale(locale.LC_COLLATE)
('Polish_Poland', '1250')
sorted(lista, key=locale.strxfrm)
['Ada', 'Ala', 'Ącki', 'Ewelina', 'Łukasz', 'Marek', 'Ola', 'Ola', 'Żaneta']
Wymagane jest podanie funkcji strxfrm, która przekształca teksty ze znakami narodowymi na takie ich wersje, które będą się dobrze sortować.
{s: locale.strxfrm(s) for s in lista}
{'Marek': '\x0eQ\x0e\x02\x0e\x8a\x0e!\x0e6\x01\x01\x12\x01\x01',
'Ola': '\x0e|\x0eH\x0e\x02\x01\x01\x12\x01\x01',
'Ala': '\x0e\x02\x0eH\x0e\x02\x01\x01\x12\x01\x01',
'Ewelina': '\x0e!\x0e¤\x0e!\x0eH\x0e2\x0ep\x0e\x02\x01\x01\x12\x01\x01',
'Ada': '\x0e\x02\x0e\x1a\x0e\x02\x01\x01\x12\x01\x01',
'Łukasz': '\x0eL\x0e\x9f\x0e6\x0e\x02\x0e\x91\x0e©\x01\x01\x12\x01\x01',
'Żaneta': '\x0e¬\x0e\x02\x0ep\x0e!\x0e\x99\x0e\x02\x01\x01\x12\x01\x01',
'Ącki': '\x0e\x04\x0e\n\x0e6\x0e2\x01\x01\x12\x01\x01'}
Cały czas zwykłe sorted oraz sort_values, bez opcji key, sortują po staremu.
sorted(lista)
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']
Minimalna konfiguracja¶
Jeśli chcemy po prostu posortować dane zgodnie z polskim alfabetem, to minimalny kod, który tego dokonuje wygląda tak:
import locale
locale.setlocale(locale.LC_COLLATE, 'pl_PL')
lista.sort(key=locale.strxfrm)
lista
['Ada', 'Ala', 'Ącki', 'Ewelina', 'Łukasz', 'Marek', 'Ola', 'Ola', 'Żaneta']
Czas na powrót do Pandas¶
emps.sort_values('last_name')
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 130 | Mozhe | Atkinson | Stock Clerk | 2800 | 2007-10-30 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
| 116 | Shelli | Baida | Purchasing Clerk | 2900 | 2007-12-24 | Purchasing | 2004 Charade Rd | 98199 | Seattle | United States of America |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 105 | David | Ąustin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 101 | Neena | Ćmochar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 104 | Bruce | Ęrnst | Programmer | 6000 | 2001-05-21 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 103 | Alexander | Łunold | Programmer | 9000 | 2000-01-03 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
107 rows × 10 columns
Niestety nie wystarczy przekazać gotowej funkcji strxfrm, jak to było dla list.
# emps.sort_values('last_name', key=locale.strxfrm)
Trzeba stworzyć własną funkcję działającą na całych seriach, a nie na pojedynczych wartościach. Wtedy zadziała sortowanie alfabetyczne zgodne z językiem polskim (lub innym).
emps.sort_values('last_name', key=lambda seria: [locale.strxfrm(x) for x in seria]).head(30)
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 130 | Mozhe | Atkinson | Stock Clerk | 2800 | 2007-10-30 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 105 | David | Ąustin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
| 116 | Shelli | Baida | Purchasing Clerk | 2900 | 2007-12-24 | Purchasing | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 167 | Amit | Banda | Sales Representative | 6200 | 2000-04-21 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 172 | Elizabeth | Bates | Sales Representative | 7300 | 2009-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 192 | Sarah | Bell | Shipping Clerk | 4000 | 2006-02-04 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 151 | David | Bernstein | Sales Representative | 9500 | 2007-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 129 | Laura | Bissot | Stock Clerk | 3300 | 2007-08-20 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 169 | Harrison | Bloom | Sales Representative | 10000 | 2008-03-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 185 | Alexis | Bull | Shipping Clerk | 4100 | 2007-02-20 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 187 | Anthony | Cabrio | Shipping Clerk | 3000 | 2009-02-07 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 148 | Gerald | Cambrault | Sales Manager | 11000 | 2009-10-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 154 | Nanette | Cambrault | Sales Representative | 7500 | 2008-12-09 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 110 | John | Chen | Accountant | 8200 | 2007-09-28 | Finance | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 188 | Kelly | Chung | Shipping Clerk | 3800 | 2007-06-14 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 119 | Karen | Colmenares | Purchasing Clerk | 2500 | 2009-08-10 | Purchasing | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 101 | Neena | Ćmochar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 142 | Curtis | Davies | Stock Clerk | 3100 | 2007-01-29 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 102 | Lex | De Haan | Administration Vice President | 17000 | 2003-01-13 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 186 | Julia | Dellinger | Shipping Clerk | 3400 | 2008-06-24 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 189 | Jennifer | Dilly | Shipping Clerk | 3600 | 2007-08-13 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 160 | Louise | Doran | Sales Representative | 7500 | 2007-12-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 147 | Alberto | Errazuriz | Sales Manager | 12000 | 2007-03-10 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 193 | Britney | Everett | Shipping Clerk | 3900 | 2007-03-03 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 104 | Bruce | Ęrnst | Programmer | 6000 | 2001-05-21 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 109 | Daniel | Faviet | Accountant | 9000 | 2004-08-16 | Finance | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 202 | Pat | Fay | Marketing Representative | 6000 | 2007-08-17 | Marketing | 147 Spadina Ave | M5V 2L7 | Toronto | Canada |
Alternatywny poprawny zapis:
emps.sort_values('last_name', key=lambda seria: seria.apply(locale.strxfrm))
| first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
|---|---|---|---|---|---|---|---|---|---|---|
| employee_id | ||||||||||
| 174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
| 130 | Mozhe | Atkinson | Stock Clerk | 2800 | 2007-10-30 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 105 | David | Ąustin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
| 204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 123 | Shanta | Vollman | Stock Manager | 6500 | 2007-10-10 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 196 | Alana | Walsh | Shipping Clerk | 3100 | 2008-04-24 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 120 | Matthew | Weiss | Stock Manager | 8000 | 2006-07-18 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
| 200 | Jennifer | Whalen | Administration Assistant | 4400 | 1987-09-17 | Administration | 2004 Charade Rd | 98199 | Seattle | United States of America |
| 149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
107 rows × 10 columns