import pandas as pd
Sortowanie list¶
Jako wprowadzenie zobaczmy, jak w Pythonie można sortować zwykłe listy.
lista = ['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
lista
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
Funkcja sorted
zwraca nową listę, która zawiera te same dane, co oryginał, ale jest posortowana.
Nie modyfikuje oryginału
sorted(lista)
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']
Nadal poprzednia kolejność:
lista
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
Metoda .sort()
sortuje elementy wewnątrz listy - modyfikuje sąmą listę.
lista.sort()
lista
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']
Wróćmy do początkowej kolejności...
lista = ['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
lista
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
Do obu operacji można przekazać takie same parametry reverse
oraz key
, które wpływają na kolejność.
reverse=True
powoduje sortowanie w odwrotnej kolejności.
sorted(lista, reverse=True)
['Żaneta', 'Łukasz', 'Ącki', 'Ola', 'Ola', 'Marek', 'Ewelina', 'Ala', 'Ada']
W parametrze key
podaje się funkcję, która będzie wywołana dla każdego elementu listy i dopiero wyniki tej funkcji będą decydować o końcowej kolejności.
Przykład: przekazujemy funkcję len
, która oblicza długość napisu - dzięki temu uzyskujemy sortowanie wg długości napisów.
sorted(lista, key=len)
['Ala', 'Ola', 'Ada', 'Ola', 'Ącki', 'Marek', 'Łukasz', 'Żaneta', 'Ewelina']
len('Marek')
5
lista
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
list(map(len, lista))
[5, 3, 3, 7, 3, 6, 3, 6, 4]
Sortowanie wg alfabetu polskiego (lub innego) → omówione w dalszej części.
Sortowanie serii¶
kraje = pd.Series({
'Polska': 38_000_000,
'Chiny': 1_400_000_000,
'Niemcy': 83_000_000,
'Czechy': 11_000_000,
})
kraje
Polska 38000000 Chiny 1400000000 Niemcy 83000000 Czechy 11000000 dtype: int64
Sortowanie według wartości indeksu. Warto zauważyć, że dane wczytane z pliku nie muszą być posortowane wg indeksu.
kraje.sort_index()
Chiny 1400000000 Czechy 11000000 Niemcy 83000000 Polska 38000000 dtype: int64
Sortowanie wg wartości.
kraje.sort_values()
Czechy 11000000 Polska 38000000 Niemcy 83000000 Chiny 1400000000 dtype: int64
Domyślnie metody sortujące zwracają w wyniku nowy obiekt, a nie modyfikują oryginału.
Aby dane zmieniły kolejność wewnątrz obiektu serii (a za chwilę DataFrame), należy dodać parametr inplace=True
. Wtedy operacja nie zwraca wyniku, tylko zmienia obiekt "w środku".
kraje
Polska 38000000 Chiny 1400000000 Niemcy 83000000 Czechy 11000000 dtype: int64
kraje.sort_values(ascending=False, inplace=True)
kraje
Chiny 1400000000 Niemcy 83000000 Polska 38000000 Czechy 11000000 dtype: int64
Sortowanie DataFrame¶
emps = pd.read_csv('emps.csv', sep=';', index_col='employee_id', parse_dates=['hire_date'])
W przypadku DataFrame operacja sort_index
działa w zasadzie tak samo, jak dla serii, natomiast sort_values
wymaga podania kolumny, zgodnie z którą sortujemy (można podać jedną nazwę lub listę nazw).
emps.sort_values('salary')
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
132 | TJ | Olson | Stock Clerk | 2100 | 2009-04-10 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
136 | Hazel | Philtanker | Stock Clerk | 2200 | 2011-02-06 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
128 | Steven | Markle | Stock Clerk | 2200 | 2010-03-08 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
135 | Ki | Gee | Stock Clerk | 2400 | 2009-12-12 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
127 | James | Landry | Stock Clerk | 2400 | 2009-01-14 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
146 | Karen | Partners | Sales Manager | 13500 | 2007-01-05 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
145 | John | Russell | Sales Manager | 14000 | 2006-10-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
102 | Lex | De Haan | Administration Vice President | 17000 | 2003-01-13 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
101 | Neena | Kochhar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
100 | Steven | King | President | 24000 | 1997-06-17 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
107 rows × 10 columns
emps.sort_values(['last_name', 'first_name'])
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
130 | Mozhe | Atkinson | Stock Clerk | 2800 | 2007-10-30 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
105 | David | Austin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
123 | Shanta | Vollman | Stock Manager | 6500 | 2007-10-10 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
196 | Alana | Walsh | Shipping Clerk | 3100 | 2008-04-24 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
120 | Matthew | Weiss | Stock Manager | 8000 | 2006-07-18 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
200 | Jennifer | Whalen | Administration Assistant | 4400 | 1987-09-17 | Administration | 2004 Charade Rd | 98199 | Seattle | United States of America |
149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
107 rows × 10 columns
Sortowanie malejące ze zmianą danych wewnątrz oryginalnej tabeli.
emps.sort_values('salary', ascending=False, inplace=True)
emps
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
100 | Steven | King | President | 24000 | 1997-06-17 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
102 | Lex | De Haan | Administration Vice President | 17000 | 2003-01-13 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
101 | Neena | Kochhar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
145 | John | Russell | Sales Manager | 14000 | 2006-10-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
146 | Karen | Partners | Sales Manager | 13500 | 2007-01-05 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
127 | James | Landry | Stock Clerk | 2400 | 2009-01-14 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
135 | Ki | Gee | Stock Clerk | 2400 | 2009-12-12 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
128 | Steven | Markle | Stock Clerk | 2200 | 2010-03-08 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
136 | Hazel | Philtanker | Stock Clerk | 2200 | 2011-02-06 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
132 | TJ | Olson | Stock Clerk | 2100 | 2009-04-10 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
107 rows × 10 columns
emps.sort_index(inplace=True)
emps
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
100 | Steven | King | President | 24000 | 1997-06-17 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
101 | Neena | Kochhar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
102 | Lex | De Haan | Administration Vice President | 17000 | 2003-01-13 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
103 | Alexander | Hunold | Programmer | 9000 | 2000-01-03 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
104 | Bruce | Ernst | Programmer | 6000 | 2001-05-21 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
202 | Pat | Fay | Marketing Representative | 6000 | 2007-08-17 | Marketing | 147 Spadina Ave | M5V 2L7 | Toronto | Canada |
203 | Susan | Mavris | Human Resources Representative | 6500 | 2004-06-07 | Human Resources | 8204 Arthur St | NaN | London | United Kingdom |
204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
205 | Shelley | Higgins | Accounting Manager | 12000 | 2004-06-07 | Accounting | 2004 Charade Rd | 98199 | Seattle | United States of America |
206 | William | Gietz | Public Accountant | 8300 | 2004-06-07 | Accounting | 2004 Charade Rd | 98199 | Seattle | United States of America |
107 rows × 10 columns
Sortowanie wg kilku kryteriów - podajemy listę kolumn.
emps.sort_values(['city', 'job_title', 'salary']).head(30)
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
203 | Susan | Mavris | Human Resources Representative | 6500 | 2004-06-07 | Human Resources | 8204 Arthur St | NaN | London | United Kingdom |
204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
148 | Gerald | Cambrault | Sales Manager | 11000 | 2009-10-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
147 | Alberto | Errazuriz | Sales Manager | 12000 | 2007-03-10 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
146 | Karen | Partners | Sales Manager | 13500 | 2007-01-05 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
145 | John | Russell | Sales Manager | 14000 | 2006-10-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
173 | Sundita | Kumar | Sales Representative | 6100 | 2010-04-21 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
167 | Amit | Banda | Sales Representative | 6200 | 2000-04-21 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
179 | Charles | Johnson | Sales Representative | 6200 | 2011-01-04 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
165 | David | Lee | Sales Representative | 6800 | 2000-02-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
155 | Oliver | Tuvault | Sales Representative | 7000 | 2009-11-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
161 | Sarath | Sewall | Sales Representative | 7000 | 2008-11-03 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
164 | Mattea | Marvins | Sales Representative | 7200 | 2010-01-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
172 | Elizabeth | Bates | Sales Representative | 7300 | 2009-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
171 | William | Smith | Sales Representative | 7400 | 2009-02-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
154 | Nanette | Cambrault | Sales Representative | 7500 | 2008-12-09 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
160 | Louise | Doran | Sales Representative | 7500 | 2007-12-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
153 | Christopher | Olsen | Sales Representative | 8000 | 2008-03-30 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
159 | Lindsey | Smith | Sales Representative | 8000 | 2007-03-10 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
177 | Jack | Livingston | Sales Representative | 8400 | 2008-04-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
176 | Jonathon | Taylor | Sales Representative | 8600 | 2008-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
175 | Alyssa | Hutton | Sales Representative | 8800 | 2007-03-19 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
152 | Peter | Hall | Sales Representative | 9000 | 2007-08-20 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
158 | Allan | McEwen | Sales Representative | 9000 | 2006-08-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
151 | David | Bernstein | Sales Representative | 9500 | 2007-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
157 | Patrick | Sully | Sales Representative | 9500 | 2006-03-04 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
163 | Danielle | Greene | Sales Representative | 9500 | 2009-03-19 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
170 | Tayler | Fox | Sales Representative | 9600 | 2008-01-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
Aby w tej sytuacji określić kolejność, należy podać listę wartości True/False
emps.sort_values(['city', 'job_title', 'salary'], ascending=[True, True, False]).head(30)
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
203 | Susan | Mavris | Human Resources Representative | 6500 | 2004-06-07 | Human Resources | 8204 Arthur St | NaN | London | United Kingdom |
204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
145 | John | Russell | Sales Manager | 14000 | 2006-10-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
146 | Karen | Partners | Sales Manager | 13500 | 2007-01-05 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
147 | Alberto | Errazuriz | Sales Manager | 12000 | 2007-03-10 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
148 | Gerald | Cambrault | Sales Manager | 11000 | 2009-10-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
168 | Lisa | Ozer | Sales Representative | 11500 | 2007-03-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
162 | Clara | Vishney | Sales Representative | 10500 | 2007-11-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
150 | Peter | Tucker | Sales Representative | 10000 | 2007-01-30 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
156 | Janette | King | Sales Representative | 10000 | 2006-01-30 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
169 | Harrison | Bloom | Sales Representative | 10000 | 2008-03-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
170 | Tayler | Fox | Sales Representative | 9600 | 2008-01-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
151 | David | Bernstein | Sales Representative | 9500 | 2007-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
157 | Patrick | Sully | Sales Representative | 9500 | 2006-03-04 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
163 | Danielle | Greene | Sales Representative | 9500 | 2009-03-19 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
152 | Peter | Hall | Sales Representative | 9000 | 2007-08-20 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
158 | Allan | McEwen | Sales Representative | 9000 | 2006-08-01 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
175 | Alyssa | Hutton | Sales Representative | 8800 | 2007-03-19 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
176 | Jonathon | Taylor | Sales Representative | 8600 | 2008-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
177 | Jack | Livingston | Sales Representative | 8400 | 2008-04-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
153 | Christopher | Olsen | Sales Representative | 8000 | 2008-03-30 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
159 | Lindsey | Smith | Sales Representative | 8000 | 2007-03-10 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
154 | Nanette | Cambrault | Sales Representative | 7500 | 2008-12-09 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
160 | Louise | Doran | Sales Representative | 7500 | 2007-12-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
171 | William | Smith | Sales Representative | 7400 | 2009-02-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
172 | Elizabeth | Bates | Sales Representative | 7300 | 2009-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
164 | Mattea | Marvins | Sales Representative | 7200 | 2010-01-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
155 | Oliver | Tuvault | Sales Representative | 7000 | 2009-11-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
Sortowanie zgodne z alfabetem narodowym¶
Domyślnie operacje sorotwania w "normalnym Pythonie" oraz w Pandas porównują napisy na bazie kodów Unicode poszczególnych znaków. Dopóki teksty zawierają tylko litery łacińskie, nie ma problemu.
Ale gdy pojawią się polskie ogonki itd...
emps.iloc[1,1] = 'Ćmochar'
emps.iloc[3,1] = 'Łunold'
emps.iloc[4,1] = 'Ęrnst'
emps.iloc[5,1] = 'Ąustin'
Dane przed sortowaniem.
emps.head(7)
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
100 | Steven | King | President | 24000 | 1997-06-17 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
101 | Neena | Ćmochar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
102 | Lex | De Haan | Administration Vice President | 17000 | 2003-01-13 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
103 | Alexander | Łunold | Programmer | 9000 | 2000-01-03 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
104 | Bruce | Ęrnst | Programmer | 6000 | 2001-05-21 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
105 | David | Ąustin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
106 | Valli | Pataballa | Programmer | 4800 | 2008-02-05 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
emps.sort_values('last_name')
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
130 | Mozhe | Atkinson | Stock Clerk | 2800 | 2007-10-30 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
116 | Shelli | Baida | Purchasing Clerk | 2900 | 2007-12-24 | Purchasing | 2004 Charade Rd | 98199 | Seattle | United States of America |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
105 | David | Ąustin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
101 | Neena | Ćmochar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
104 | Bruce | Ęrnst | Programmer | 6000 | 2001-05-21 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
103 | Alexander | Łunold | Programmer | 9000 | 2000-01-03 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
107 rows × 10 columns
Zwykłe listy¶
Aby zacząć od prostszych instrukcji, wyciągam dziesięć nazwisk do zwykłej listy.
lista = ['Marek', 'Ola', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Żaneta', 'Ącki']
lista
['Marek', 'Ola', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Żaneta', 'Ącki']
sorted(lista)
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']
W tym momencie potrzebujemy modułu locale
i odp. ustawień.
import locale
locale.getdefaultlocale()
# może przestać działać w przyszłych wersjach Pythona
C:\Users\patcz\AppData\Local\Temp\ipykernel_6716\1056971489.py:1: DeprecationWarning: Use setlocale(), getencoding() and getlocale() instead locale.getdefaultlocale()
('pl_PL', 'cp1250')
locale.getlocale()
('Polish_Poland', '1250')
Wydaje się, że mamy ustawiony język polski, ale to nie dotyczy kolejności sortowania - jest to specjalna "kategoria" ustawień regionalnych COLLATE
- porównywanie i sortowanie tekstów.
locale.getlocale(locale.LC_COLLATE)
(None, None)
Teraz nawet operacja strxfrm
da domyślną kolejność.
sorted(lista, key=locale.strxfrm)
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']
Zmieniam ustawienia regionalne w zakresie sortowania tekstu.
locale.setlocale(locale.LC_COLLATE, 'pl_PL')
# dla brytyjskiego byłoby en_GB
#locale.setlocale(locale.LC_COLLATE, 'Polish')
'pl_PL'
locale.getlocale(locale.LC_COLLATE)
('pl_PL', 'ISO8859-2')
Gdy podamy napis ''
, to przyjmowane są bieżące ustawienia systemowe.
locale.setlocale(locale.LC_COLLATE, '')
locale.getlocale(locale.LC_COLLATE)
('Polish_Poland', '1250')
sorted(lista, key=locale.strxfrm)
['Ada', 'Ala', 'Ącki', 'Ewelina', 'Łukasz', 'Marek', 'Ola', 'Ola', 'Żaneta']
Wymagane jest podanie funkcji strxfrm
, która przekształca teksty ze znakami narodowymi na takie ich wersje, które będą się dobrze sortować.
{s: locale.strxfrm(s) for s in lista}
{'Marek': '\x0eQ\x0e\x02\x0e\x8a\x0e!\x0e6\x01\x01\x12\x01\x01', 'Ola': '\x0e|\x0eH\x0e\x02\x01\x01\x12\x01\x01', 'Ala': '\x0e\x02\x0eH\x0e\x02\x01\x01\x12\x01\x01', 'Ewelina': '\x0e!\x0e¤\x0e!\x0eH\x0e2\x0ep\x0e\x02\x01\x01\x12\x01\x01', 'Ada': '\x0e\x02\x0e\x1a\x0e\x02\x01\x01\x12\x01\x01', 'Łukasz': '\x0eL\x0e\x9f\x0e6\x0e\x02\x0e\x91\x0e©\x01\x01\x12\x01\x01', 'Żaneta': '\x0e¬\x0e\x02\x0ep\x0e!\x0e\x99\x0e\x02\x01\x01\x12\x01\x01', 'Ącki': '\x0e\x04\x0e\n\x0e6\x0e2\x01\x01\x12\x01\x01'}
Cały czas zwykłe sorted
oraz sort_values
, bez opcji key
, sortują po staremu.
sorted(lista)
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']
Minimalna konfiguracja¶
Jeśli chcemy po prostu posortować dane zgodnie z polskim alfabetem, to minimalny kod, który tego dokonuje wygląda tak:
import locale
locale.setlocale(locale.LC_COLLATE, 'pl_PL')
lista.sort(key=locale.strxfrm)
lista
['Ada', 'Ala', 'Ącki', 'Ewelina', 'Łukasz', 'Marek', 'Ola', 'Ola', 'Żaneta']
Czas na powrót do Pandas¶
emps.sort_values('last_name')
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
130 | Mozhe | Atkinson | Stock Clerk | 2800 | 2007-10-30 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
116 | Shelli | Baida | Purchasing Clerk | 2900 | 2007-12-24 | Purchasing | 2004 Charade Rd | 98199 | Seattle | United States of America |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
105 | David | Ąustin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
101 | Neena | Ćmochar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
104 | Bruce | Ęrnst | Programmer | 6000 | 2001-05-21 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
103 | Alexander | Łunold | Programmer | 9000 | 2000-01-03 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
107 rows × 10 columns
Niestety nie wystarczy przekazać gotowej funkcji strxfrm
, jak to było dla list.
# emps.sort_values('last_name', key=locale.strxfrm)
Trzeba stworzyć własną funkcję działającą na całych seriach, a nie na pojedynczych wartościach. Wtedy zadziała sortowanie alfabetyczne zgodne z językiem polskim (lub innym).
emps.sort_values('last_name', key=lambda seria: [locale.strxfrm(x) for x in seria]).head(30)
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
130 | Mozhe | Atkinson | Stock Clerk | 2800 | 2007-10-30 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
105 | David | Ąustin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
116 | Shelli | Baida | Purchasing Clerk | 2900 | 2007-12-24 | Purchasing | 2004 Charade Rd | 98199 | Seattle | United States of America |
167 | Amit | Banda | Sales Representative | 6200 | 2000-04-21 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
172 | Elizabeth | Bates | Sales Representative | 7300 | 2009-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
192 | Sarah | Bell | Shipping Clerk | 4000 | 2006-02-04 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
151 | David | Bernstein | Sales Representative | 9500 | 2007-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
129 | Laura | Bissot | Stock Clerk | 3300 | 2007-08-20 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
169 | Harrison | Bloom | Sales Representative | 10000 | 2008-03-23 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
185 | Alexis | Bull | Shipping Clerk | 4100 | 2007-02-20 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
187 | Anthony | Cabrio | Shipping Clerk | 3000 | 2009-02-07 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
148 | Gerald | Cambrault | Sales Manager | 11000 | 2009-10-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
154 | Nanette | Cambrault | Sales Representative | 7500 | 2008-12-09 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
110 | John | Chen | Accountant | 8200 | 2007-09-28 | Finance | 2004 Charade Rd | 98199 | Seattle | United States of America |
188 | Kelly | Chung | Shipping Clerk | 3800 | 2007-06-14 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
119 | Karen | Colmenares | Purchasing Clerk | 2500 | 2009-08-10 | Purchasing | 2004 Charade Rd | 98199 | Seattle | United States of America |
101 | Neena | Ćmochar | Administration Vice President | 17000 | 1999-09-21 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
142 | Curtis | Davies | Stock Clerk | 3100 | 2007-01-29 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
102 | Lex | De Haan | Administration Vice President | 17000 | 2003-01-13 | Executive | 2004 Charade Rd | 98199 | Seattle | United States of America |
186 | Julia | Dellinger | Shipping Clerk | 3400 | 2008-06-24 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
189 | Jennifer | Dilly | Shipping Clerk | 3600 | 2007-08-13 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
160 | Louise | Doran | Sales Representative | 7500 | 2007-12-15 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
147 | Alberto | Errazuriz | Sales Manager | 12000 | 2007-03-10 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
193 | Britney | Everett | Shipping Clerk | 3900 | 2007-03-03 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
104 | Bruce | Ęrnst | Programmer | 6000 | 2001-05-21 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
109 | Daniel | Faviet | Accountant | 9000 | 2004-08-16 | Finance | 2004 Charade Rd | 98199 | Seattle | United States of America |
202 | Pat | Fay | Marketing Representative | 6000 | 2007-08-17 | Marketing | 147 Spadina Ave | M5V 2L7 | Toronto | Canada |
Alternatywny poprawny zapis:
emps.sort_values('last_name', key=lambda seria: seria.apply(locale.strxfrm))
first_name | last_name | job_title | salary | hire_date | department_name | address | postal_code | city | country | |
---|---|---|---|---|---|---|---|---|---|---|
employee_id | ||||||||||
174 | Ellen | Abel | Sales Representative | 11000 | 2006-05-11 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
166 | Sundar | Ande | Sales Representative | 6400 | 2000-03-24 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
130 | Mozhe | Atkinson | Stock Clerk | 2800 | 2007-10-30 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
105 | David | Ąustin | Programmer | 4800 | 2007-06-25 | IT | 2014 Jabberwocky Rd | 26192 | Southlake | United States of America |
204 | Hermann | Baer | Public Relations Representative | 10000 | 2004-06-07 | Public Relations | Schwanthalerstr. 7031 | 80925 | Munich | Germany |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
123 | Shanta | Vollman | Stock Manager | 6500 | 2007-10-10 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
196 | Alana | Walsh | Shipping Clerk | 3100 | 2008-04-24 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
120 | Matthew | Weiss | Stock Manager | 8000 | 2006-07-18 | Shipping | 2011 Interiors Blvd | 99236 | South San Francisco | United States of America |
200 | Jennifer | Whalen | Administration Assistant | 4400 | 1987-09-17 | Administration | 2004 Charade Rd | 98199 | Seattle | United States of America |
149 | Eleni | Zlotkey | Sales Manager | 10500 | 2000-01-29 | Sales | Magdalen Centre, The Oxford Science Park | OX9 9ZB | Oxford | United Kingdom |
107 rows × 10 columns