In [1]:
import pandas as pd
In [2]:
df = pd.read_csv('Mcdonalds.csv')
In [3]:
df
Out[3]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

In [4]:
df.columns
Out[4]:
Index(['Category', 'Item', 'Serving Size', 'Calories', 'Calories from Fat',
       'TotalFat', 'Total Fat (% Daily Value)', 'Saturated Fat',
       'Saturated Fat (% Daily Value)', 'Trans Fat', 'Cholesterol',
       'Cholesterol (% Daily Value)', 'Sodium', 'Sodium (% Daily Value)',
       'Carbohydrates', 'Carbohydrates (% Daily Value)', 'Dietary Fiber',
       'Dietary Fiber (% Daily Value)', 'Sugars', 'Protein',
       'Vitamin A (% Daily Value)', 'Vitamin C (% Daily Value)',
       'Calcium (% Daily Value)', 'Iron (% Daily Value)'],
      dtype='object')

Większość kolumn ma takie nazwy (ze spacjami itp.), że można je wskazywać tylko za pomocą []

In [5]:
df['Cholesterol (% Daily Value)']
Out[5]:
0      87
1       8
2      15
3      95
4      16
       ..
255    14
256    19
257     9
258    20
259    10
Name: Cholesterol (% Daily Value), Length: 260, dtype: int64

Zasadniczo obiekty Series oraz DataFrame są "mutowalne", tzn. można zmieniać dane w nich zawarte.

Gdy taki obiekt zapiszemy do innej zmiennej, to nie jest tworzona kopia, tylko to jest dowiązanie do tego samego obiektu.

In [6]:
df2 = df
seria2 = df['Calories']
In [7]:
df2.head(3)
Out[7]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10

3 rows × 24 columns

In [8]:
seria2.head(3)
Out[8]:
0    300
1    250
2    370
Name: Calories, dtype: int64

Bezpośrednia zmiana wartości w komórce. Komórkę można wskazać na kilka sposobów: [kolumna][wiersz], .iloc[nrwiersza, nrkolumny], .loc[indekswiersza, nazwakolumny]

In [9]:
df.iloc[0,3] += 1
In [10]:
df.head(3)
Out[10]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 301 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10

3 rows × 24 columns

In [11]:
df2.head(3)
Out[11]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 301 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10

3 rows × 24 columns

In [12]:
seria2.head(3)
Out[12]:
0    301
1    250
2    370
Name: Calories, dtype: int64
In [13]:
seria2.iloc[0] += 1
C:\Users\patcz\AppData\Local\Temp\ipykernel_7256\254415301.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  seria2.iloc[0] += 1
In [14]:
df.head(3)
Out[14]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 302 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10

3 rows × 24 columns

In [15]:
df2.iloc[0,3] -= 2
In [16]:
df.head(3)
Out[16]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10

3 rows × 24 columns

Świadome kopiowane¶

Kiedy naszym celem jest utworzenie kopii danych w pamięci, to używamy metody copy.

In [17]:
df3 = df.copy()
In [18]:
df3.head(3)
Out[18]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10

3 rows × 24 columns

In [19]:
df.iloc[0,3] += 1
In [20]:
df.head(3)
Out[20]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 301 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10

3 rows × 24 columns

In [21]:
df3.head(3)
Out[21]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10

3 rows × 24 columns

In [22]:
df.iloc[0,3] = 300

Podobnie, jak w Numpy, tak i w Pandas, łatwo można zmieniać dane w całej serii.

In [23]:
df3.Calories *= 2
In [24]:
df3.head(5)
Out[24]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 600 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 500 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 740 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 900 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 800 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10

5 rows × 24 columns

In [25]:
df3['Calories'] = df3['Calories'] - df3['Calories from Fat']
# df3['Calories'] -= df3['Calories from Fat']
In [26]:
df3.head(5)
Out[26]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 480 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 430 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 540 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 650 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 590 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10

5 rows × 24 columns

Przywracam oryginalne dane...

In [27]:
df3['Calories'] = df['Calories']

Można stworzyć nową kolumnę jako wynik operacji na seriach.

In [28]:
df3['Kalorie nie z tłuszczu'] = df3['Calories'] - df3['Calories from Fat']
In [29]:
df3.iloc[:5, [0,1,2,3,4,-1]]
Out[29]:
Category Item Serving Size Calories Calories from Fat Kalorie nie z tłuszczu
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 180
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 180
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 170
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 200
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 190

Usuwanie kolumn¶

Operacja drop, jak wiele innych operacji w Pandas, działa tak, że:

  • domyślnie zwraca nowy obiekt DataFrame (w oddzielnym miejscu pamięci, w pewnym sensie kopia danych), a oryginalny obiekt się nie zmienia,
  • gdy użyjemy parametru inplace=True, wtedy zmienia oryginalny obiekt i niczego nie zwraca.
In [30]:
df3.drop(columns=['Saturated Fat', 'Saturated Fat (% Daily Value)']).head(3)
Out[30]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Trans Fat Cholesterol Cholesterol (% Daily Value) ... Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value) Kalorie nie z tłuszczu
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 0.0 260 87 ... 10 4 17 3 17 10 0 25 15 180
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 0.0 25 8 ... 10 4 17 3 18 6 0 25 8 180
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 0.0 45 15 ... 10 4 17 2 14 8 0 25 10 170

3 rows × 23 columns

In [31]:
df3.head(3)
# nie widać zmian
Out[31]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value) Kalorie nie z tłuszczu
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 10 4 17 3 17 10 0 25 15 180
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 10 4 17 3 18 6 0 25 8 180
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 10 4 17 2 14 8 0 25 10 170

3 rows × 25 columns

In [32]:
df3.drop(columns='Total Fat (% Daily Value)', inplace=True)
In [33]:
df3.head(3)
Out[33]:
Category Item Serving Size Calories Calories from Fat TotalFat Saturated Fat Saturated Fat (% Daily Value) Trans Fat Cholesterol ... Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value) Kalorie nie z tłuszczu
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 5.0 25 0.0 260 ... 10 4 17 3 17 10 0 25 15 180
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 3.0 15 0.0 25 ... 10 4 17 3 18 6 0 25 8 180
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 8.0 42 0.0 45 ... 10 4 17 2 14 8 0 25 10 170

3 rows × 24 columns

Usuwanie wierszy¶

Według wartości indeksu.

In [34]:
df3.drop(index=259, inplace=True)
df3
Out[34]:
Category Item Serving Size Calories Calories from Fat TotalFat Saturated Fat Saturated Fat (% Daily Value) Trans Fat Cholesterol ... Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value) Kalorie nie z tłuszczu
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 5.0 25 0.0 260 ... 10 4 17 3 17 10 0 25 15 180
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 3.0 15 0.0 25 ... 10 4 17 3 18 6 0 25 8 180
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 8.0 42 0.0 45 ... 10 4 17 2 14 8 0 25 10 170
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 10.0 52 0.0 285 ... 10 4 17 2 21 15 0 30 15 200
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 8.0 42 0.0 50 ... 10 4 17 2 21 6 0 25 10 190
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
254 Smoothies & Shakes McFlurry with M&M’s Candies (Snack) 7.3 oz (207 g) 430 140 15.0 10.0 48 0.0 35 ... 21 1 4 59 9 10 0 30 4 290
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 9.0 44 0.5 45 ... 27 1 4 64 12 15 0 40 8 360
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 12.0 58 1.0 55 ... 35 1 5 85 15 20 0 50 10 490
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 6.0 29 0.0 30 ... 18 1 2 43 8 10 0 25 6 240
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 15.0 76 1.0 60 ... 38 2 9 103 21 20 0 60 6 520

259 rows × 24 columns

Zmiana nazw kolumn¶

Aby zmienić jedną lub kilka nazw, można użyć metody rename i podać "słownik zamian".

Ta operacja obsługuje parametr inplace, czyli domyślnie zwraca nową tabelę, a gdy podamy inplace=True, to zmienia nazwy w oryginalnej tabeli.

In [35]:
df3.rename(columns={'Category': 'Kategoria', 'Calories': 'Kalorie', 'Item': 'Produkt'}, inplace=True)
In [36]:
df3.head(1)
Out[36]:
Kategoria Produkt Serving Size Kalorie Calories from Fat TotalFat Saturated Fat Saturated Fat (% Daily Value) Trans Fat Cholesterol ... Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value) Kalorie nie z tłuszczu
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 5.0 25 0.0 260 ... 10 4 17 3 17 10 0 25 15 180

1 rows × 24 columns

Można też podać nowe nazwy wszystkich kolumn w formie listy lub serii.

Aby to pokazać, biorę mały wycinek df:

In [37]:
df4 = df[['Category', 'Item', 'Serving Size']]
In [38]:
df4.head(1)
Out[38]:
Category Item Serving Size
0 Breakfast Egg McMuffin 4.8 oz (136 g)
In [39]:
df4.columns
Out[39]:
Index(['Category', 'Item', 'Serving Size'], dtype='object')
In [40]:
df4.columns = ['Kategoria', 'Produkt', 'Rozmiar']
In [41]:
df4.head(1)
Out[41]:
Kategoria Produkt Rozmiar
0 Breakfast Egg McMuffin 4.8 oz (136 g)

Gdy chcemy "globalnie" w całym DF zmienić nazwy kolumn zgodnie z jakąś regułą, np. chcemy zamienić spacje na znaki _, to możemy:

  1. Wygenerować nowe nazwy np. za pomocą list comprehension (wyrażenie listotwórcze) i użyć tego sposobu ↑ do podmiany wszystkich nazw.
In [42]:
[nazwa.lower().replace(' ', '_') for nazwa in df3.columns]
Out[42]:
['kategoria',
 'produkt',
 'serving_size',
 'kalorie',
 'calories_from_fat',
 'totalfat',
 'saturated_fat',
 'saturated_fat_(%_daily_value)',
 'trans_fat',
 'cholesterol',
 'cholesterol_(%_daily_value)',
 'sodium',
 'sodium_(%_daily_value)',
 'carbohydrates',
 'carbohydrates_(%_daily_value)',
 'dietary_fiber',
 'dietary_fiber_(%_daily_value)',
 'sugars',
 'protein',
 'vitamin_a_(%_daily_value)',
 'vitamin_c_(%_daily_value)',
 'calcium_(%_daily_value)',
 'iron_(%_daily_value)',
 'kalorie_nie_z_tłuszczu']
In [43]:
[nazwa.lower()
    .replace(' ', '_')
    .replace('%', 'percent')
    .replace('(', '')
    .replace(')', '')
 for nazwa in df3.columns]
Out[43]:
['kategoria',
 'produkt',
 'serving_size',
 'kalorie',
 'calories_from_fat',
 'totalfat',
 'saturated_fat',
 'saturated_fat_percent_daily_value',
 'trans_fat',
 'cholesterol',
 'cholesterol_percent_daily_value',
 'sodium',
 'sodium_percent_daily_value',
 'carbohydrates',
 'carbohydrates_percent_daily_value',
 'dietary_fiber',
 'dietary_fiber_percent_daily_value',
 'sugars',
 'protein',
 'vitamin_a_percent_daily_value',
 'vitamin_c_percent_daily_value',
 'calcium_percent_daily_value',
 'iron_percent_daily_value',
 'kalorie_nie_z_tłuszczu']
In [44]:
df3.columns = [nazwa.lower()
    .replace(' ', '_')
    .replace('%', 'percent')
    .replace('(', '')
    .replace(')', '')
 for nazwa in df3.columns]
In [45]:
df3.head(1)
Out[45]:
kategoria produkt serving_size kalorie calories_from_fat totalfat saturated_fat saturated_fat_percent_daily_value trans_fat cholesterol ... carbohydrates_percent_daily_value dietary_fiber dietary_fiber_percent_daily_value sugars protein vitamin_a_percent_daily_value vitamin_c_percent_daily_value calcium_percent_daily_value iron_percent_daily_value kalorie_nie_z_tłuszczu
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 5.0 25 0.0 260 ... 10 4 17 3 17 10 0 25 15 180

1 rows × 24 columns

In [46]:
df3.saturated_fat_percent_daily_value.mean()
Out[46]:
29.934362934362934
  1. (Jeśli operacja jest na tyle prosta) można użyć akcesora str i wykonać operacje na strinach na wszystkich elementach serii (bez pisania pętli).
In [47]:
df5 = df.copy()
In [48]:
df5.columns
Out[48]:
Index(['Category', 'Item', 'Serving Size', 'Calories', 'Calories from Fat',
       'TotalFat', 'Total Fat (% Daily Value)', 'Saturated Fat',
       'Saturated Fat (% Daily Value)', 'Trans Fat', 'Cholesterol',
       'Cholesterol (% Daily Value)', 'Sodium', 'Sodium (% Daily Value)',
       'Carbohydrates', 'Carbohydrates (% Daily Value)', 'Dietary Fiber',
       'Dietary Fiber (% Daily Value)', 'Sugars', 'Protein',
       'Vitamin A (% Daily Value)', 'Vitamin C (% Daily Value)',
       'Calcium (% Daily Value)', 'Iron (% Daily Value)'],
      dtype='object')
In [49]:
df5.columns.str.upper()
Out[49]:
Index(['CATEGORY', 'ITEM', 'SERVING SIZE', 'CALORIES', 'CALORIES FROM FAT',
       'TOTALFAT', 'TOTAL FAT (% DAILY VALUE)', 'SATURATED FAT',
       'SATURATED FAT (% DAILY VALUE)', 'TRANS FAT', 'CHOLESTEROL',
       'CHOLESTEROL (% DAILY VALUE)', 'SODIUM', 'SODIUM (% DAILY VALUE)',
       'CARBOHYDRATES', 'CARBOHYDRATES (% DAILY VALUE)', 'DIETARY FIBER',
       'DIETARY FIBER (% DAILY VALUE)', 'SUGARS', 'PROTEIN',
       'VITAMIN A (% DAILY VALUE)', 'VITAMIN C (% DAILY VALUE)',
       'CALCIUM (% DAILY VALUE)', 'IRON (% DAILY VALUE)'],
      dtype='object')
In [50]:
(df5.columns.str.replace(' ', '_')
    .str.replace('%', 'percent')
    .str.replace('(', '')
    .str.replace(')', ''))
Out[50]:
Index(['Category', 'Item', 'Serving_Size', 'Calories', 'Calories_from_Fat',
       'TotalFat', 'Total_Fat_percent_Daily_Value', 'Saturated_Fat',
       'Saturated_Fat_percent_Daily_Value', 'Trans_Fat', 'Cholesterol',
       'Cholesterol_percent_Daily_Value', 'Sodium',
       'Sodium_percent_Daily_Value', 'Carbohydrates',
       'Carbohydrates_percent_Daily_Value', 'Dietary_Fiber',
       'Dietary_Fiber_percent_Daily_Value', 'Sugars', 'Protein',
       'Vitamin_A_percent_Daily_Value', 'Vitamin_C_percent_Daily_Value',
       'Calcium_percent_Daily_Value', 'Iron_percent_Daily_Value'],
      dtype='object')
In [51]:
df5.columns = df5.columns.str.replace(' ', '_').str.replace('%', 'Percent').str.replace('(', '').str.replace(')', '')
In [52]:
df5.head(2)
Out[52]:
Category Item Serving_Size Calories Calories_from_Fat TotalFat Total_Fat_Percent_Daily_Value Saturated_Fat Saturated_Fat_Percent_Daily_Value Trans_Fat ... Carbohydrates Carbohydrates_Percent_Daily_Value Dietary_Fiber Dietary_Fiber_Percent_Daily_Value Sugars Protein Vitamin_A_Percent_Daily_Value Vitamin_C_Percent_Daily_Value Calcium_Percent_Daily_Value Iron_Percent_Daily_Value
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8

2 rows × 24 columns

Modyfikacja danych za pomocą dedykowanych operacji¶

Za pomocą decykowanych operacji:

  • replace - zamiana konkretnych wartości na inne, jak Search & Replace w edytorach
  • fillna - dedykowana wersja do zamiany pustych wartości, omówiona w innym notatniku
  • apply - zastosowanie dowolnej funkcji napisanej w Pythonie - bardzo ogólne „programistyczne” podejście

replace¶

W podstawowej wersji replace zamienia komórki o podanej wartości na inną wartość. Przy takim wywołaniu dotyczy to całego DataFrame, ale tylko tych komórek, które w całości mają dokładnie taką wartość (czyli nie dotyczy fragmentów większego tekstu).

Kolejna operacja z opcją inplace - takie wywołania, jak poniżej, zwracają nowe tabele, a nie modyfikują oryginału.

In [53]:
df
Out[53]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

In [54]:
df.replace('Breakfast', 'Śniadanie')
Out[54]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Śniadanie Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Śniadanie Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Śniadanie Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Śniadanie Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Śniadanie Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

In [55]:
df.replace(35, 999)
Out[55]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 999 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 999 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 999 12.0 58 1.0 ... 106 999 1 5 85 15 20 0 50 10
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

Gdy wywołamy replace na serii, to wynikiem jest seria.

In [56]:
df['Total Fat (% Daily Value)'].replace(35, 999)
Out[56]:
0       20
1       12
2      999
3       43
4      999
      ... 
255     26
256    999
257     17
258     50
259     25
Name: Total Fat (% Daily Value), Length: 260, dtype: int64

Aby podmienić dane w DF w jednej kolumnie, można np. tak:

In [57]:
df5['Total_Fat_Percent_Daily_Value'] = df5['Total_Fat_Percent_Daily_Value'].replace(35, 999)
In [58]:
df5
Out[58]:
Category Item Serving_Size Calories Calories_from_Fat TotalFat Total_Fat_Percent_Daily_Value Saturated_Fat Saturated_Fat_Percent_Daily_Value Trans_Fat ... Carbohydrates Carbohydrates_Percent_Daily_Value Dietary_Fiber Dietary_Fiber_Percent_Daily_Value Sugars Protein Vitamin_A_Percent_Daily_Value Vitamin_C_Percent_Daily_Value Calcium_Percent_Daily_Value Iron_Percent_Daily_Value
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 999 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 999 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 999 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

replace obsługuje różne tryby uruchomienia; na różne spsooby można podawać w jakich kolumnach jakie wartości mają być zamienia. To jest rozbudowane, a zobaczymy wybrane możliwości.

Kilka róznych wartości wejściowych jest zamienianych na tę samą wartość wynikową:

In [59]:
df.replace(['Desserts', 'Smoothies & Shakes'], 'Desery')
Out[59]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Desery McFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Desery McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Desery McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Desery McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Desery McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

Można też podać dwie listy i wtedy pierwszy element z lewej listy zamienia się na pierwszy element drugiej listy, drugi na drugi, itd. Mapowanie 1-1.

In [60]:
df.replace(['Breakfast', 'Dessert', 'Smoothies & Shakes'], ['Śniadanie', 'Deser', 'Szejk'])
Out[60]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Śniadanie Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Śniadanie Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Śniadanie Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Śniadanie Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Śniadanie Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Szejk McFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Szejk McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Szejk McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Szejk McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Szejk McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

Gdy podamy słownik, to klucz w słowniku jest traktowany jak nazwa koluny, w której dokonujemy zamian.

In [61]:
df.replace({
    'Category': ['Desserts', 'Smoothies & Shakes'],
    'Item': ['McFlurry with Oreo Cookies (Small)'],
}, 'Deser')
Out[61]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Deser Deser 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Deser McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Deser McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Deser McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Deser McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

In [62]:
df.replace({
    'Category': ['Breakfast', 'Desserts', 'Smoothies & Shakes'],
    'Item': ['McFlurry with Oreo Cookies (Small)'],
},
{
    'Category': ['Śniadanie', 'Desery', 'Smufi'],
    'Item': ['Oreo'],
})
Out[62]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Śniadanie Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Śniadanie Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Śniadanie Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Śniadanie Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Śniadanie Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smufi Oreo 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smufi McFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smufi McFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smufi McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smufi McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

Zamiana tekstu¶

za pomocą replace

Domyślnie replace zamienia całe komórki, a nie fragmenty tekstu.

In [63]:
df.head(5)
Out[63]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10

5 rows × 24 columns

In [64]:
df.replace('Egg', 'Jajo').head(5)
Out[64]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10

5 rows × 24 columns

Opcja regex=True powoduje, że:

  • zamieniane są fragmenty tekstu na ich podane odpowiedniki,
  • podany parametr jest traktowany jako wyrażenie regularne, do które dopasowywane są teksty znajdujące się w arkuszu.
In [65]:
df.replace('Egg', 'Jajo', regex=True).head(5)
Out[65]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Jajo McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Jajo White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Jajo 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Jajo Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10

5 rows × 24 columns

In [66]:
df.replace('g', 'gram', regex=True)
Out[66]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egramgram McMuffin 4.8 oz (136 gram) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egramgram White Deligramht 4.8 oz (135 gram) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausagrame McMuffin 3.9 oz (111 gram) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausagrame McMuffin with Egramgram 5.7 oz (161 gram) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausagrame McMuffin with Egramgram Whites 5.7 oz (161 gram) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 10.1 oz (285 gram) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 13.4 oz (381 gram) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 6.7 oz (190 gram) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 gram) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 gram) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

Przykłady prostych wyrażeń reg.

^ oznacza początek tekstu. Zamienią się tylko te wystąpienia Egg, które są na samym początku komórki

In [67]:
df.replace('^Egg', 'Jajo', regex=True).head(5)
Out[67]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Jajo McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Jajo White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10

5 rows × 24 columns

$ oznacza koniec tekstu

In [68]:
df.replace('Egg$', 'Jajo', regex=True).head(5)
Out[68]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Jajo 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10

5 rows × 24 columns

\w oznacza "część słowa", co obejmuje litery, cyfry i znak _

\w+ oznacza niepusty ciąg takich znaków. Tutaj chodzi nam o słowa rozpoczynające się od Mc

Litera r przed stringiem w Pythonie oznacza, że to jest "raw-string", czyli że w jego wnętrzu znaki \ nie mają specjalnego naczenia z punktu widzenia Pythona. Mają specjalne znaczenie w języku wyrażeń regularnych

In [69]:
df.replace(r'Mc\w+', 'Makcoś', regex=True)
Out[69]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg Makcoś 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage Makcoś 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage Makcoś with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage Makcoś with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes Makcoś with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes Makcoś with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes Makcoś with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes Makcoś with Reese's Peanut Butter Cups (Medium) 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes Makcoś with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

Za pomocą "grup" w wyrażeniach regularnych, można:

  • nawiasami okrągłymi zaznaczyć fragment wzorca ("po lewej stronie replace"),
  • "po prawej stronie" wstawić frament tekstu, który wpasował sę w te nawiasy, za pomocą \1, \2, ...
In [70]:
df.replace(r'Mc(\w+)', r'Mak\1', regex=True)
Out[70]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg MakMuffin 4.8 oz (136 g) 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage MakMuffin 3.9 oz (111 g) 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage MakMuffin with Egg 5.7 oz (161 g) 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage MakMuffin with Egg Whites 5.7 oz (161 g) 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes MakFlurry with Oreo Cookies (Small) 10.1 oz (285 g) 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes MakFlurry with Oreo Cookies (Medium) 13.4 oz (381 g) 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes MakFlurry with Oreo Cookies (Snack) 6.7 oz (190 g) 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes MakFlurry with Reese's Peanut Butter Cups (Med... 14.2 oz (403 g) 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes MakFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz (202 g) 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

In [71]:
seria = pd.Series(['Ala ma kota', 'Ola ma psa', 'Adam ma rybki'])
In [72]:
seria
Out[72]:
0      Ala ma kota
1       Ola ma psa
2    Adam ma rybki
dtype: object
In [73]:
seria.replace(r'(\w+) ma (\w+)', r'Osoba \1 posiada zwierzę \2', regex=True)
Out[73]:
0      Osoba Ala posiada zwierzę kota
1       Osoba Ola posiada zwierzę psa
2    Osoba Adam posiada zwierzę rybki
dtype: object

Przykład¶

Wyciągnijmy informację o gramaturze produktu z kolumny Serving Size.

Na razie zobaczmy, jaki wzorzec pasuje do fragmentu takiego jak (333 g)

In [74]:
df.replace(r'\((\d+) g\)', r'\1 gramów', regex=True)
Out[74]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 4.8 oz 136 gramów 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 4.8 oz 135 gramów 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 3.9 oz 111 gramów 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 5.7 oz 161 gramów 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz 161 gramów 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 10.1 oz 285 gramów 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 13.4 oz 381 gramów 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 6.7 oz 190 gramów 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 14.2 oz 403 gramów 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Snack) 7.1 oz 202 gramów 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

Dopasowując i usuwając również wszystkie znaki poprzedzające ten fragment, możemy uzyskać:

In [75]:
df.replace(r'^.*\((\d+) g\)', r'\1', regex=True)
Out[75]:
Category Item Serving Size Calories Calories from Fat TotalFat Total Fat (% Daily Value) Saturated Fat Saturated Fat (% Daily Value) Trans Fat ... Carbohydrates Carbohydrates (% Daily Value) Dietary Fiber Dietary Fiber (% Daily Value) Sugars Protein Vitamin A (% Daily Value) Vitamin C (% Daily Value) Calcium (% Daily Value) Iron (% Daily Value)
0 Breakfast Egg McMuffin 136 300 120 13.0 20 5.0 25 0.0 ... 31 10 4 17 3 17 10 0 25 15
1 Breakfast Egg White Delight 135 250 70 8.0 12 3.0 15 0.0 ... 30 10 4 17 3 18 6 0 25 8
2 Breakfast Sausage McMuffin 111 370 200 23.0 35 8.0 42 0.0 ... 29 10 4 17 2 14 8 0 25 10
3 Breakfast Sausage McMuffin with Egg 161 450 250 28.0 43 10.0 52 0.0 ... 30 10 4 17 2 21 15 0 30 15
4 Breakfast Sausage McMuffin with Egg Whites 161 400 210 23.0 35 8.0 42 0.0 ... 30 10 4 17 2 21 6 0 25 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
255 Smoothies & Shakes McFlurry with Oreo Cookies (Small) 285 510 150 17.0 26 9.0 44 0.5 ... 80 27 1 4 64 12 15 0 40 8
256 Smoothies & Shakes McFlurry with Oreo Cookies (Medium) 381 690 200 23.0 35 12.0 58 1.0 ... 106 35 1 5 85 15 20 0 50 10
257 Smoothies & Shakes McFlurry with Oreo Cookies (Snack) 190 340 100 11.0 17 6.0 29 0.0 ... 53 18 1 2 43 8 10 0 25 6
258 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Medium) 403 810 290 32.0 50 15.0 76 1.0 ... 114 38 2 9 103 21 20 0 60 6
259 Smoothies & Shakes McFlurry with Reese's Peanut Butter Cups (Snack) 202 410 150 16.0 25 8.0 38 0.0 ... 57 19 1 5 51 10 10 0 30 4

260 rows × 24 columns

Idąc dalej, chcemy uzyskać wartość liczbową.

Wynik takiego replace, jak powyżej, można zrzutować na typ float. Ponieważ jednak w danych znajdują się komórki niepasujące do wzorca, trzeba je zignorować, np. zamienić na 0. Drugie wyrażenie mówi - zamień teksty, które zawierają coś innego niż cyfry, na 0.

In [76]:
df['Serving Size']\
    .replace(r'^.*\((\d+) g\)', r'\1', regex=True)\
    .replace(r'^.*[^\d].*$', '0', regex=True) \
    .astype('float32')
Out[76]:
0      136.0
1      135.0
2      111.0
3      161.0
4      161.0
       ...  
255    285.0
256    381.0
257    190.0
258    403.0
259    202.0
Name: Serving Size, Length: 260, dtype: float32
In [77]:
%%timeit
df['Serving Size']\
    .replace(r'^.*\((\d+) g\)', r'\1', regex=True)\
    .replace(r'^.*[^\d].*$', '0', regex=True) \
    .astype('float32')
2.83 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Operacja apply¶

Bardziej ogólnym sposobem, aby dane w tabelach/seriach zamieniać na "wyniki dowolnej funkcji", jest operacja apply.

Za pomocą def można zdefiniować funkcję, która przyjmuje parametr i zwraca wynik.

In [78]:
def funkcja(tekst):
    fragment = tekst[:7]
    return fragment.upper()

Normalnie w Pythonie takie funkcje się wywołuje:

In [79]:
funkcja('Abrakadabra')
Out[79]:
'ABRAKAD'

W Pandas za pomocą apply taką funkcję można zastosować do wszystkich elementów serii:

In [80]:
seria
Out[80]:
0      Ala ma kota
1       Ola ma psa
2    Adam ma rybki
dtype: object
In [81]:
seria.apply(funkcja)
Out[81]:
0    ALA MA 
1    OLA MA 
2    ADAM MA
dtype: object
In [82]:
df['Item'].apply(funkcja)
Out[82]:
0      EGG MCM
1      EGG WHI
2      SAUSAGE
3      SAUSAGE
4      SAUSAGE
        ...   
255    MCFLURR
256    MCFLURR
257    MCFLURR
258    MCFLURR
259    MCFLURR
Name: Item, Length: 260, dtype: object

Gdy funkcja jest prosta (da się ją krótko zapisać), to zamiast def można zdefiniować ją za pomocą wyrażenia lambda.

In [83]:
df['Item'].apply(lambda x: x[2:10].lower())
Out[83]:
0      g mcmuff
1      g white 
2      usage mc
3      usage mc
4      usage mc
         ...   
255    flurry w
256    flurry w
257    flurry w
258    flurry w
259    flurry w
Name: Item, Length: 260, dtype: object

apply może być też używany dla kolumn liczbowych

In [84]:
df.Calories.head(5)
Out[84]:
0    300
1    250
2    370
3    450
4    400
Name: Calories, dtype: int64
In [85]:
df.Calories.apply(lambda cal: cal * 2 if cal < 400 else cal*10).head(5)
Out[85]:
0     600
1     500
2     740
3    4500
4    4000
Name: Calories, dtype: int64

Wracamy do wyciągania informacji o gramach. Teraz zrobimy to definiując odp. funkcje i stosując apply.

W tej części zastosujemy wyrażenia regularne obsługiwane bezpośrednio przez Pythona (moduł re).

In [86]:
import re
In [87]:
def gramy1(tekst):
    m = re.search(r'\((\d+) g\)', tekst)
    if m:
        return float(m[1]) # albo m.group(1)
    else:
        return 0.0
In [88]:
df['Serving Size'].apply(gramy1)
Out[88]:
0      136.0
1      135.0
2      111.0
3      161.0
4      161.0
       ...  
255    285.0
256    381.0
257    190.0
258    403.0
259    202.0
Name: Serving Size, Length: 260, dtype: float64
In [89]:
%%timeit
df['Serving Size'].apply(gramy1)
808 µs ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Aby podnieść wydajność działania wyrażeń reg., powinniśmy "skompilować" wyrażenie i w funckji używać obiektu pattern.

In [90]:
pattern = re.compile(r'\((\d+) g\)')
In [91]:
def gramy2(tekst):
    m = pattern.search(tekst)
    return float(m[1]) if m else 0.0
In [92]:
%%timeit
df['Serving Size'].apply(gramy2)
528 µs ± 24.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Można też próbować sobie poradzić bez wyrażeń regularnych, zwykłymi operacjami split, index, find, ...

In [93]:
def gramy3(napis):
    if 'g' not in napis:
        return 0.0
    p = napis.index('(')
    k = napis.index(' g')
    return float(napis[p+1:k])
In [94]:
%%timeit
df['Serving Size'].apply(gramy3)
512 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Ostatecznie tak znalezione informacje o gramach dodajemy do tabeli i wykorzystujemy w obliczeniu.

In [95]:
df['Gramatura'] = df['Serving Size'].apply(gramy2)
In [96]:
df.iloc[:5, [0,1,2,3,-1]]
Out[96]:
Category Item Serving Size Calories Gramatura
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 136.0
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 135.0
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 111.0
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 161.0
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 161.0

Obliczamy "kalorie na gram". Dla gramatury 0 wyjdzie nieskończoność :)

In [97]:
df['Nasycenie kaloriami'] = df.Calories / df.Gramatura
In [98]:
df.iloc[:5, [0,1,2,3,-2, -1]]
Out[98]:
Category Item Serving Size Calories Gramatura Nasycenie kaloriami
0 Breakfast Egg McMuffin 4.8 oz (136 g) 300 136.0 2.205882
1 Breakfast Egg White Delight 4.8 oz (135 g) 250 135.0 1.851852
2 Breakfast Sausage McMuffin 3.9 oz (111 g) 370 111.0 3.333333
3 Breakfast Sausage McMuffin with Egg 5.7 oz (161 g) 450 161.0 2.795031
4 Breakfast Sausage McMuffin with Egg Whites 5.7 oz (161 g) 400 161.0 2.484472
In [99]:
df.iloc[150:153, [0,1,2,3,-2, -1]]
Out[99]:
Category Item Serving Size Calories Gramatura Nasycenie kaloriami
150 Coffee & Tea Latte (Large) 20 fl oz cup 280 0.0 inf
151 Coffee & Tea Caramel Latte (Small) 12 fl oz cup 270 0.0 inf
152 Coffee & Tea Caramel Latte (Medium) 16 fl oz cup 340 0.0 inf
In [100]:
df[df.Gramatura > 0].iloc[:, [0,1,2,3,-2, -1]].sort_values('Nasycenie kaloriami', ascending=False)
Out[100]:
Category Item Serving Size Calories Gramatura Nasycenie kaloriami
104 Desserts Chocolate Chip Cookie 1 cookie (33 g) 160 33.0 4.848485
105 Desserts Oatmeal Raisin Cookie 1 cookie (33 g) 150 33.0 4.545455
39 Breakfast Cinnamon Melts 4 oz (114 g) 460 114.0 4.035088
10 Breakfast Sausage Biscuit (Regular Biscuit) 4.1 oz (117 g) 430 117.0 3.675214
11 Breakfast Sausage Biscuit (Large Biscuit) 4.6 oz (131 g) 480 131.0 3.664122
... ... ... ... ... ... ...
89 Salads Premium Southwest Salad with Grilled Chicken 11.8 oz (335 g) 290 335.0 0.865672
84 Salads Premium Bacon Ranch Salad (without Chicken) 7.9 oz (223 g) 140 223.0 0.627803
87 Salads Premium Southwest Salad (without Chicken) 8.1 oz (230 g) 140 230.0 0.608696
101 Snacks & Sides Apple Slices 1.2 oz (34 g) 15 34.0 0.441176
100 Snacks & Sides Side Salad 3.1 oz (87 g) 20 87.0 0.229885

118 rows × 6 columns

In [ ]: