In [1]:
import pandas as pd

Sortowanie list¶

Jako wprowadzenie zobaczmy, jak w Pythonie można sortować zwykłe listy.

In [2]:
lista = ['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
In [3]:
lista
Out[3]:
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']

Funkcja sorted zwraca nową listę, która zawiera te same dane, co oryginał, ale jest posortowana. Nie modyfikuje oryginału

In [4]:
sorted(lista)
Out[4]:
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']

Nadal poprzednia kolejność:

In [5]:
lista
Out[5]:
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']

Metoda .sort() sortuje elementy wewnątrz listy - modyfikuje sąmą listę.

In [6]:
lista.sort()
In [7]:
lista
Out[7]:
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']

Wróćmy do początkowej kolejności...

In [8]:
lista = ['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
lista
Out[8]:
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']

Do obu operacji można przekazać takie same parametry reverse oraz key, które wpływają na kolejność.

reverse=True powoduje sortowanie w odwrotnej kolejności.

In [9]:
sorted(lista, reverse=True)
Out[9]:
['Żaneta', 'Łukasz', 'Ącki', 'Ola', 'Ola', 'Marek', 'Ewelina', 'Ala', 'Ada']

W parametrze key podaje się funkcję, która będzie wywołana dla każdego elementu listy i dopiero wyniki tej funkcji będą decydować o końcowej kolejności.

Przykład: przekazujemy funkcję len, która oblicza długość napisu - dzięki temu uzyskujemy sortowanie wg długości napisów.

In [10]:
sorted(lista, key=len)
Out[10]:
['Ala', 'Ola', 'Ada', 'Ola', 'Ącki', 'Marek', 'Łukasz', 'Żaneta', 'Ewelina']
In [11]:
len('Marek')
Out[11]:
5
In [12]:
lista
Out[12]:
['Marek', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Ola', 'Żaneta', 'Ącki']
In [13]:
list(map(len, lista))
Out[13]:
[5, 3, 3, 7, 3, 6, 3, 6, 4]

Sortowanie wg alfabetu polskiego (lub innego) → omówione w dalszej części.

Sortowanie serii¶

In [14]:
kraje = pd.Series({
    'Polska': 38_000_000,
    'Chiny': 1_400_000_000,
    'Niemcy': 83_000_000,
    'Czechy': 11_000_000,
})
In [15]:
kraje
Out[15]:
Polska      38000000
Chiny     1400000000
Niemcy      83000000
Czechy      11000000
dtype: int64

Sortowanie według wartości indeksu. Warto zauważyć, że dane wczytane z pliku nie muszą być posortowane wg indeksu.

In [16]:
kraje.sort_index()
Out[16]:
Chiny     1400000000
Czechy      11000000
Niemcy      83000000
Polska      38000000
dtype: int64

Sortowanie wg wartości.

In [17]:
kraje.sort_values()
Out[17]:
Czechy      11000000
Polska      38000000
Niemcy      83000000
Chiny     1400000000
dtype: int64

Domyślnie metody sortujące zwracają w wyniku nowy obiekt, a nie modyfikują oryginału.

Aby dane zmieniły kolejność wewnątrz obiektu serii (a za chwilę DataFrame), należy dodać parametr inplace=True. Wtedy operacja nie zwraca wyniku, tylko zmienia obiekt "w środku".

In [18]:
kraje
Out[18]:
Polska      38000000
Chiny     1400000000
Niemcy      83000000
Czechy      11000000
dtype: int64
In [19]:
kraje.sort_values(ascending=False, inplace=True)
In [20]:
kraje
Out[20]:
Chiny     1400000000
Niemcy      83000000
Polska      38000000
Czechy      11000000
dtype: int64

Sortowanie DataFrame¶

In [21]:
emps = pd.read_csv('emps.csv', sep=';', index_col='employee_id', parse_dates=['hire_date'])

W przypadku DataFrame operacja sort_index działa w zasadzie tak samo, jak dla serii, natomiast sort_values wymaga podania kolumny, zgodnie z którą sortujemy (można podać jedną nazwę lub listę nazw).

In [22]:
emps.sort_values('salary')
Out[22]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
132 TJ Olson Stock Clerk 2100 2009-04-10 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
136 Hazel Philtanker Stock Clerk 2200 2011-02-06 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
128 Steven Markle Stock Clerk 2200 2010-03-08 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
135 Ki Gee Stock Clerk 2400 2009-12-12 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
127 James Landry Stock Clerk 2400 2009-01-14 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
... ... ... ... ... ... ... ... ... ... ...
146 Karen Partners Sales Manager 13500 2007-01-05 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
145 John Russell Sales Manager 14000 2006-10-01 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
102 Lex De Haan Administration Vice President 17000 2003-01-13 Executive 2004 Charade Rd 98199 Seattle United States of America
101 Neena Kochhar Administration Vice President 17000 1999-09-21 Executive 2004 Charade Rd 98199 Seattle United States of America
100 Steven King President 24000 1997-06-17 Executive 2004 Charade Rd 98199 Seattle United States of America

107 rows × 10 columns

In [23]:
emps.sort_values(['last_name', 'first_name'])
Out[23]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
174 Ellen Abel Sales Representative 11000 2006-05-11 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
166 Sundar Ande Sales Representative 6400 2000-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
130 Mozhe Atkinson Stock Clerk 2800 2007-10-30 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
105 David Austin Programmer 4800 2007-06-25 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
204 Hermann Baer Public Relations Representative 10000 2004-06-07 Public Relations Schwanthalerstr. 7031 80925 Munich Germany
... ... ... ... ... ... ... ... ... ... ...
123 Shanta Vollman Stock Manager 6500 2007-10-10 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
196 Alana Walsh Shipping Clerk 3100 2008-04-24 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
120 Matthew Weiss Stock Manager 8000 2006-07-18 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
200 Jennifer Whalen Administration Assistant 4400 1987-09-17 Administration 2004 Charade Rd 98199 Seattle United States of America
149 Eleni Zlotkey Sales Manager 10500 2000-01-29 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom

107 rows × 10 columns

Sortowanie malejące ze zmianą danych wewnątrz oryginalnej tabeli.

In [24]:
emps.sort_values('salary', ascending=False, inplace=True)
In [25]:
emps
Out[25]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
100 Steven King President 24000 1997-06-17 Executive 2004 Charade Rd 98199 Seattle United States of America
102 Lex De Haan Administration Vice President 17000 2003-01-13 Executive 2004 Charade Rd 98199 Seattle United States of America
101 Neena Kochhar Administration Vice President 17000 1999-09-21 Executive 2004 Charade Rd 98199 Seattle United States of America
145 John Russell Sales Manager 14000 2006-10-01 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
146 Karen Partners Sales Manager 13500 2007-01-05 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
... ... ... ... ... ... ... ... ... ... ...
127 James Landry Stock Clerk 2400 2009-01-14 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
135 Ki Gee Stock Clerk 2400 2009-12-12 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
128 Steven Markle Stock Clerk 2200 2010-03-08 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
136 Hazel Philtanker Stock Clerk 2200 2011-02-06 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
132 TJ Olson Stock Clerk 2100 2009-04-10 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America

107 rows × 10 columns

In [26]:
emps.sort_index(inplace=True)
In [27]:
emps
Out[27]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
100 Steven King President 24000 1997-06-17 Executive 2004 Charade Rd 98199 Seattle United States of America
101 Neena Kochhar Administration Vice President 17000 1999-09-21 Executive 2004 Charade Rd 98199 Seattle United States of America
102 Lex De Haan Administration Vice President 17000 2003-01-13 Executive 2004 Charade Rd 98199 Seattle United States of America
103 Alexander Hunold Programmer 9000 2000-01-03 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
104 Bruce Ernst Programmer 6000 2001-05-21 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
... ... ... ... ... ... ... ... ... ... ...
202 Pat Fay Marketing Representative 6000 2007-08-17 Marketing 147 Spadina Ave M5V 2L7 Toronto Canada
203 Susan Mavris Human Resources Representative 6500 2004-06-07 Human Resources 8204 Arthur St NaN London United Kingdom
204 Hermann Baer Public Relations Representative 10000 2004-06-07 Public Relations Schwanthalerstr. 7031 80925 Munich Germany
205 Shelley Higgins Accounting Manager 12000 2004-06-07 Accounting 2004 Charade Rd 98199 Seattle United States of America
206 William Gietz Public Accountant 8300 2004-06-07 Accounting 2004 Charade Rd 98199 Seattle United States of America

107 rows × 10 columns

Sortowanie wg kilku kryteriów - podajemy listę kolumn.

In [28]:
emps.sort_values(['city', 'job_title', 'salary']).head(30)
Out[28]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
203 Susan Mavris Human Resources Representative 6500 2004-06-07 Human Resources 8204 Arthur St NaN London United Kingdom
204 Hermann Baer Public Relations Representative 10000 2004-06-07 Public Relations Schwanthalerstr. 7031 80925 Munich Germany
149 Eleni Zlotkey Sales Manager 10500 2000-01-29 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
148 Gerald Cambrault Sales Manager 11000 2009-10-15 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
147 Alberto Errazuriz Sales Manager 12000 2007-03-10 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
146 Karen Partners Sales Manager 13500 2007-01-05 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
145 John Russell Sales Manager 14000 2006-10-01 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
173 Sundita Kumar Sales Representative 6100 2010-04-21 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
167 Amit Banda Sales Representative 6200 2000-04-21 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
179 Charles Johnson Sales Representative 6200 2011-01-04 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
166 Sundar Ande Sales Representative 6400 2000-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
165 David Lee Sales Representative 6800 2000-02-23 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
155 Oliver Tuvault Sales Representative 7000 2009-11-23 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
161 Sarath Sewall Sales Representative 7000 2008-11-03 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
164 Mattea Marvins Sales Representative 7200 2010-01-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
172 Elizabeth Bates Sales Representative 7300 2009-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
171 William Smith Sales Representative 7400 2009-02-23 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
154 Nanette Cambrault Sales Representative 7500 2008-12-09 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
160 Louise Doran Sales Representative 7500 2007-12-15 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
153 Christopher Olsen Sales Representative 8000 2008-03-30 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
159 Lindsey Smith Sales Representative 8000 2007-03-10 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
177 Jack Livingston Sales Representative 8400 2008-04-23 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
176 Jonathon Taylor Sales Representative 8600 2008-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
175 Alyssa Hutton Sales Representative 8800 2007-03-19 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
152 Peter Hall Sales Representative 9000 2007-08-20 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
158 Allan McEwen Sales Representative 9000 2006-08-01 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
151 David Bernstein Sales Representative 9500 2007-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
157 Patrick Sully Sales Representative 9500 2006-03-04 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
163 Danielle Greene Sales Representative 9500 2009-03-19 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
170 Tayler Fox Sales Representative 9600 2008-01-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom

Aby w tej sytuacji określić kolejność, należy podać listę wartości True/False

In [29]:
emps.sort_values(['city', 'job_title', 'salary'], ascending=[True, True, False]).head(30)
Out[29]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
203 Susan Mavris Human Resources Representative 6500 2004-06-07 Human Resources 8204 Arthur St NaN London United Kingdom
204 Hermann Baer Public Relations Representative 10000 2004-06-07 Public Relations Schwanthalerstr. 7031 80925 Munich Germany
145 John Russell Sales Manager 14000 2006-10-01 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
146 Karen Partners Sales Manager 13500 2007-01-05 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
147 Alberto Errazuriz Sales Manager 12000 2007-03-10 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
148 Gerald Cambrault Sales Manager 11000 2009-10-15 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
149 Eleni Zlotkey Sales Manager 10500 2000-01-29 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
168 Lisa Ozer Sales Representative 11500 2007-03-11 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
174 Ellen Abel Sales Representative 11000 2006-05-11 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
162 Clara Vishney Sales Representative 10500 2007-11-11 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
150 Peter Tucker Sales Representative 10000 2007-01-30 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
156 Janette King Sales Representative 10000 2006-01-30 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
169 Harrison Bloom Sales Representative 10000 2008-03-23 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
170 Tayler Fox Sales Representative 9600 2008-01-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
151 David Bernstein Sales Representative 9500 2007-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
157 Patrick Sully Sales Representative 9500 2006-03-04 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
163 Danielle Greene Sales Representative 9500 2009-03-19 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
152 Peter Hall Sales Representative 9000 2007-08-20 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
158 Allan McEwen Sales Representative 9000 2006-08-01 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
175 Alyssa Hutton Sales Representative 8800 2007-03-19 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
176 Jonathon Taylor Sales Representative 8600 2008-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
177 Jack Livingston Sales Representative 8400 2008-04-23 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
153 Christopher Olsen Sales Representative 8000 2008-03-30 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
159 Lindsey Smith Sales Representative 8000 2007-03-10 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
154 Nanette Cambrault Sales Representative 7500 2008-12-09 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
160 Louise Doran Sales Representative 7500 2007-12-15 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
171 William Smith Sales Representative 7400 2009-02-23 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
172 Elizabeth Bates Sales Representative 7300 2009-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
164 Mattea Marvins Sales Representative 7200 2010-01-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
155 Oliver Tuvault Sales Representative 7000 2009-11-23 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom

Sortowanie zgodne z alfabetem narodowym¶

Domyślnie operacje sorotwania w "normalnym Pythonie" oraz w Pandas porównują napisy na bazie kodów Unicode poszczególnych znaków. Dopóki teksty zawierają tylko litery łacińskie, nie ma problemu.

Ale gdy pojawią się polskie ogonki itd...

In [30]:
emps.iloc[1,1] = 'Ćmochar'
emps.iloc[3,1] = 'Łunold'
emps.iloc[4,1] = 'Ęrnst'
emps.iloc[5,1] = 'Ąustin'

Dane przed sortowaniem.

In [31]:
emps.head(7)
Out[31]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
100 Steven King President 24000 1997-06-17 Executive 2004 Charade Rd 98199 Seattle United States of America
101 Neena Ćmochar Administration Vice President 17000 1999-09-21 Executive 2004 Charade Rd 98199 Seattle United States of America
102 Lex De Haan Administration Vice President 17000 2003-01-13 Executive 2004 Charade Rd 98199 Seattle United States of America
103 Alexander Łunold Programmer 9000 2000-01-03 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
104 Bruce Ęrnst Programmer 6000 2001-05-21 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
105 David Ąustin Programmer 4800 2007-06-25 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
106 Valli Pataballa Programmer 4800 2008-02-05 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
In [32]:
emps.sort_values('last_name')
Out[32]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
174 Ellen Abel Sales Representative 11000 2006-05-11 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
166 Sundar Ande Sales Representative 6400 2000-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
130 Mozhe Atkinson Stock Clerk 2800 2007-10-30 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
204 Hermann Baer Public Relations Representative 10000 2004-06-07 Public Relations Schwanthalerstr. 7031 80925 Munich Germany
116 Shelli Baida Purchasing Clerk 2900 2007-12-24 Purchasing 2004 Charade Rd 98199 Seattle United States of America
... ... ... ... ... ... ... ... ... ... ...
149 Eleni Zlotkey Sales Manager 10500 2000-01-29 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
105 David Ąustin Programmer 4800 2007-06-25 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
101 Neena Ćmochar Administration Vice President 17000 1999-09-21 Executive 2004 Charade Rd 98199 Seattle United States of America
104 Bruce Ęrnst Programmer 6000 2001-05-21 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
103 Alexander Łunold Programmer 9000 2000-01-03 IT 2014 Jabberwocky Rd 26192 Southlake United States of America

107 rows × 10 columns

Zwykłe listy¶

Aby zacząć od prostszych instrukcji, wyciągam dziesięć nazwisk do zwykłej listy.

In [33]:
lista = ['Marek', 'Ola', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Żaneta', 'Ącki']
In [34]:
lista
Out[34]:
['Marek', 'Ola', 'Ala', 'Ola', 'Ewelina', 'Ada', 'Łukasz', 'Żaneta', 'Ącki']
In [35]:
sorted(lista)
Out[35]:
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']

W tym momencie potrzebujemy modułu locale i odp. ustawień.

In [36]:
import locale
In [37]:
locale.getdefaultlocale()
# może przestać działać w przyszłych wersjach Pythona
C:\Users\patcz\AppData\Local\Temp\ipykernel_6716\1056971489.py:1: DeprecationWarning: Use setlocale(), getencoding() and getlocale() instead
  locale.getdefaultlocale()
Out[37]:
('pl_PL', 'cp1250')
In [38]:
locale.getlocale()
Out[38]:
('Polish_Poland', '1250')

Wydaje się, że mamy ustawiony język polski, ale to nie dotyczy kolejności sortowania - jest to specjalna "kategoria" ustawień regionalnych COLLATE - porównywanie i sortowanie tekstów.

In [39]:
locale.getlocale(locale.LC_COLLATE)
Out[39]:
(None, None)

Teraz nawet operacja strxfrm da domyślną kolejność.

In [40]:
sorted(lista, key=locale.strxfrm)
Out[40]:
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']

Zmieniam ustawienia regionalne w zakresie sortowania tekstu.

In [41]:
locale.setlocale(locale.LC_COLLATE, 'pl_PL')
# dla brytyjskiego byłoby en_GB
#locale.setlocale(locale.LC_COLLATE, 'Polish')
Out[41]:
'pl_PL'
In [42]:
locale.getlocale(locale.LC_COLLATE)
Out[42]:
('pl_PL', 'ISO8859-2')

Gdy podamy napis '' , to przyjmowane są bieżące ustawienia systemowe.

In [43]:
locale.setlocale(locale.LC_COLLATE, '')
locale.getlocale(locale.LC_COLLATE)
Out[43]:
('Polish_Poland', '1250')
In [44]:
sorted(lista, key=locale.strxfrm)
Out[44]:
['Ada', 'Ala', 'Ącki', 'Ewelina', 'Łukasz', 'Marek', 'Ola', 'Ola', 'Żaneta']

Wymagane jest podanie funkcji strxfrm, która przekształca teksty ze znakami narodowymi na takie ich wersje, które będą się dobrze sortować.

In [45]:
{s: locale.strxfrm(s) for s in lista}
Out[45]:
{'Marek': '\x0eQ\x0e\x02\x0e\x8a\x0e!\x0e6\x01\x01\x12\x01\x01',
 'Ola': '\x0e|\x0eH\x0e\x02\x01\x01\x12\x01\x01',
 'Ala': '\x0e\x02\x0eH\x0e\x02\x01\x01\x12\x01\x01',
 'Ewelina': '\x0e!\x0e¤\x0e!\x0eH\x0e2\x0ep\x0e\x02\x01\x01\x12\x01\x01',
 'Ada': '\x0e\x02\x0e\x1a\x0e\x02\x01\x01\x12\x01\x01',
 'Łukasz': '\x0eL\x0e\x9f\x0e6\x0e\x02\x0e\x91\x0e©\x01\x01\x12\x01\x01',
 'Żaneta': '\x0e¬\x0e\x02\x0ep\x0e!\x0e\x99\x0e\x02\x01\x01\x12\x01\x01',
 'Ącki': '\x0e\x04\x0e\n\x0e6\x0e2\x01\x01\x12\x01\x01'}

Cały czas zwykłe sorted oraz sort_values, bez opcji key, sortują po staremu.

In [46]:
sorted(lista)
Out[46]:
['Ada', 'Ala', 'Ewelina', 'Marek', 'Ola', 'Ola', 'Ącki', 'Łukasz', 'Żaneta']

Minimalna konfiguracja¶

Jeśli chcemy po prostu posortować dane zgodnie z polskim alfabetem, to minimalny kod, który tego dokonuje wygląda tak:

In [47]:
import locale
locale.setlocale(locale.LC_COLLATE, 'pl_PL')
lista.sort(key=locale.strxfrm)
In [48]:
lista
Out[48]:
['Ada', 'Ala', 'Ącki', 'Ewelina', 'Łukasz', 'Marek', 'Ola', 'Ola', 'Żaneta']

Czas na powrót do Pandas¶

In [49]:
emps.sort_values('last_name')
Out[49]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
174 Ellen Abel Sales Representative 11000 2006-05-11 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
166 Sundar Ande Sales Representative 6400 2000-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
130 Mozhe Atkinson Stock Clerk 2800 2007-10-30 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
204 Hermann Baer Public Relations Representative 10000 2004-06-07 Public Relations Schwanthalerstr. 7031 80925 Munich Germany
116 Shelli Baida Purchasing Clerk 2900 2007-12-24 Purchasing 2004 Charade Rd 98199 Seattle United States of America
... ... ... ... ... ... ... ... ... ... ...
149 Eleni Zlotkey Sales Manager 10500 2000-01-29 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
105 David Ąustin Programmer 4800 2007-06-25 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
101 Neena Ćmochar Administration Vice President 17000 1999-09-21 Executive 2004 Charade Rd 98199 Seattle United States of America
104 Bruce Ęrnst Programmer 6000 2001-05-21 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
103 Alexander Łunold Programmer 9000 2000-01-03 IT 2014 Jabberwocky Rd 26192 Southlake United States of America

107 rows × 10 columns

Niestety nie wystarczy przekazać gotowej funkcji strxfrm, jak to było dla list.

In [50]:
# emps.sort_values('last_name', key=locale.strxfrm)

Trzeba stworzyć własną funkcję działającą na całych seriach, a nie na pojedynczych wartościach. Wtedy zadziała sortowanie alfabetyczne zgodne z językiem polskim (lub innym).

In [51]:
emps.sort_values('last_name', key=lambda seria: [locale.strxfrm(x) for x in seria]).head(30)
Out[51]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
174 Ellen Abel Sales Representative 11000 2006-05-11 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
166 Sundar Ande Sales Representative 6400 2000-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
130 Mozhe Atkinson Stock Clerk 2800 2007-10-30 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
105 David Ąustin Programmer 4800 2007-06-25 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
204 Hermann Baer Public Relations Representative 10000 2004-06-07 Public Relations Schwanthalerstr. 7031 80925 Munich Germany
116 Shelli Baida Purchasing Clerk 2900 2007-12-24 Purchasing 2004 Charade Rd 98199 Seattle United States of America
167 Amit Banda Sales Representative 6200 2000-04-21 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
172 Elizabeth Bates Sales Representative 7300 2009-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
192 Sarah Bell Shipping Clerk 4000 2006-02-04 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
151 David Bernstein Sales Representative 9500 2007-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
129 Laura Bissot Stock Clerk 3300 2007-08-20 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
169 Harrison Bloom Sales Representative 10000 2008-03-23 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
185 Alexis Bull Shipping Clerk 4100 2007-02-20 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
187 Anthony Cabrio Shipping Clerk 3000 2009-02-07 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
148 Gerald Cambrault Sales Manager 11000 2009-10-15 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
154 Nanette Cambrault Sales Representative 7500 2008-12-09 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
110 John Chen Accountant 8200 2007-09-28 Finance 2004 Charade Rd 98199 Seattle United States of America
188 Kelly Chung Shipping Clerk 3800 2007-06-14 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
119 Karen Colmenares Purchasing Clerk 2500 2009-08-10 Purchasing 2004 Charade Rd 98199 Seattle United States of America
101 Neena Ćmochar Administration Vice President 17000 1999-09-21 Executive 2004 Charade Rd 98199 Seattle United States of America
142 Curtis Davies Stock Clerk 3100 2007-01-29 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
102 Lex De Haan Administration Vice President 17000 2003-01-13 Executive 2004 Charade Rd 98199 Seattle United States of America
186 Julia Dellinger Shipping Clerk 3400 2008-06-24 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
189 Jennifer Dilly Shipping Clerk 3600 2007-08-13 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
160 Louise Doran Sales Representative 7500 2007-12-15 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
147 Alberto Errazuriz Sales Manager 12000 2007-03-10 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
193 Britney Everett Shipping Clerk 3900 2007-03-03 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
104 Bruce Ęrnst Programmer 6000 2001-05-21 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
109 Daniel Faviet Accountant 9000 2004-08-16 Finance 2004 Charade Rd 98199 Seattle United States of America
202 Pat Fay Marketing Representative 6000 2007-08-17 Marketing 147 Spadina Ave M5V 2L7 Toronto Canada

Alternatywny poprawny zapis:

In [52]:
emps.sort_values('last_name', key=lambda seria: seria.apply(locale.strxfrm))
Out[52]:
first_name last_name job_title salary hire_date department_name address postal_code city country
employee_id
174 Ellen Abel Sales Representative 11000 2006-05-11 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
166 Sundar Ande Sales Representative 6400 2000-03-24 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom
130 Mozhe Atkinson Stock Clerk 2800 2007-10-30 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
105 David Ąustin Programmer 4800 2007-06-25 IT 2014 Jabberwocky Rd 26192 Southlake United States of America
204 Hermann Baer Public Relations Representative 10000 2004-06-07 Public Relations Schwanthalerstr. 7031 80925 Munich Germany
... ... ... ... ... ... ... ... ... ... ...
123 Shanta Vollman Stock Manager 6500 2007-10-10 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
196 Alana Walsh Shipping Clerk 3100 2008-04-24 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
120 Matthew Weiss Stock Manager 8000 2006-07-18 Shipping 2011 Interiors Blvd 99236 South San Francisco United States of America
200 Jennifer Whalen Administration Assistant 4400 1987-09-17 Administration 2004 Charade Rd 98199 Seattle United States of America
149 Eleni Zlotkey Sales Manager 10500 2000-01-29 Sales Magdalen Centre, The Oxford Science Park OX9 9ZB Oxford United Kingdom

107 rows × 10 columns

In [ ]: