import pandas as pd
df = pd.read_csv('Mcdonalds.csv')
df
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
df.columns
Index(['Category', 'Item', 'Serving Size', 'Calories', 'Calories from Fat', 'TotalFat', 'Total Fat (% Daily Value)', 'Saturated Fat', 'Saturated Fat (% Daily Value)', 'Trans Fat', 'Cholesterol', 'Cholesterol (% Daily Value)', 'Sodium', 'Sodium (% Daily Value)', 'Carbohydrates', 'Carbohydrates (% Daily Value)', 'Dietary Fiber', 'Dietary Fiber (% Daily Value)', 'Sugars', 'Protein', 'Vitamin A (% Daily Value)', 'Vitamin C (% Daily Value)', 'Calcium (% Daily Value)', 'Iron (% Daily Value)'], dtype='object')
Większość kolumn ma takie nazwy (ze spacjami itp.), że można je wskazywać tylko za pomocą []
df['Cholesterol (% Daily Value)']
0 87 1 8 2 15 3 95 4 16 .. 255 14 256 19 257 9 258 20 259 10 Name: Cholesterol (% Daily Value), Length: 260, dtype: int64
Zasadniczo obiekty Series
oraz DataFrame
są "mutowalne", tzn. można zmieniać dane w nich zawarte.
Gdy taki obiekt zapiszemy do innej zmiennej, to nie jest tworzona kopia, tylko to jest dowiązanie do tego samego obiektu.
df2 = df
seria2 = df['Calories']
df2.head(3)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 rows × 24 columns
seria2.head(3)
0 300 1 250 2 370 Name: Calories, dtype: int64
Bezpośrednia zmiana wartości w komórce. Komórkę można wskazać na kilka sposobów: [kolumna][wiersz]
, .iloc[nrwiersza, nrkolumny]
, .loc[indekswiersza, nazwakolumny]
df.iloc[0,3] += 1
df.head(3)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 301 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 rows × 24 columns
df2.head(3)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 301 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 rows × 24 columns
seria2.head(3)
0 301 1 250 2 370 Name: Calories, dtype: int64
seria2.iloc[0] += 1
C:\Users\patcz\AppData\Local\Temp\ipykernel_7256\254415301.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy seria2.iloc[0] += 1
df.head(3)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 302 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 rows × 24 columns
df2.iloc[0,3] -= 2
df.head(3)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 rows × 24 columns
Świadome kopiowane¶
Kiedy naszym celem jest utworzenie kopii danych w pamięci, to używamy metody copy
.
df3 = df.copy()
df3.head(3)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 rows × 24 columns
df.iloc[0,3] += 1
df.head(3)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 301 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 rows × 24 columns
df3.head(3)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 rows × 24 columns
df.iloc[0,3] = 300
Podobnie, jak w Numpy, tak i w Pandas, łatwo można zmieniać dane w całej serii.
df3.Calories *= 2
df3.head(5)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 600 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 500 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 740 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 900 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 800 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
5 rows × 24 columns
df3['Calories'] = df3['Calories'] - df3['Calories from Fat']
# df3['Calories'] -= df3['Calories from Fat']
df3.head(5)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 480 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 430 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 540 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 650 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 590 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
5 rows × 24 columns
Przywracam oryginalne dane...
df3['Calories'] = df['Calories']
Można stworzyć nową kolumnę jako wynik operacji na seriach.
df3['Kalorie nie z tłuszczu'] = df3['Calories'] - df3['Calories from Fat']
df3.iloc[:5, [0,1,2,3,4,-1]]
Category | Item | Serving Size | Calories | Calories from Fat | Kalorie nie z tłuszczu | |
---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 180 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 180 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 170 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 200 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 190 |
Usuwanie kolumn¶
Operacja drop
, jak wiele innych operacji w Pandas, działa tak, że:
- domyślnie zwraca nowy obiekt
DataFrame
(w oddzielnym miejscu pamięci, w pewnym sensie kopia danych), a oryginalny obiekt się nie zmienia, - gdy użyjemy parametru
inplace=True
, wtedy zmienia oryginalny obiekt i niczego nie zwraca.
df3.drop(columns=['Saturated Fat', 'Saturated Fat (% Daily Value)']).head(3)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Trans Fat | Cholesterol | Cholesterol (% Daily Value) | ... | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | Kalorie nie z tłuszczu | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 0.0 | 260 | 87 | ... | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 | 180 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 0.0 | 25 | 8 | ... | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 | 180 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 0.0 | 45 | 15 | ... | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 | 170 |
3 rows × 23 columns
df3.head(3)
# nie widać zmian
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | Kalorie nie z tłuszczu | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 | 180 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 | 180 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 | 170 |
3 rows × 25 columns
df3.drop(columns='Total Fat (% Daily Value)', inplace=True)
df3.head(3)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | Cholesterol | ... | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | Kalorie nie z tłuszczu | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 5.0 | 25 | 0.0 | 260 | ... | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 | 180 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 3.0 | 15 | 0.0 | 25 | ... | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 | 180 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 8.0 | 42 | 0.0 | 45 | ... | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 | 170 |
3 rows × 24 columns
Usuwanie wierszy¶
Według wartości indeksu.
df3.drop(index=259, inplace=True)
df3
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | Cholesterol | ... | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | Kalorie nie z tłuszczu | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 5.0 | 25 | 0.0 | 260 | ... | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 | 180 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 3.0 | 15 | 0.0 | 25 | ... | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 | 180 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 8.0 | 42 | 0.0 | 45 | ... | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 | 170 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 10.0 | 52 | 0.0 | 285 | ... | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 | 200 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 8.0 | 42 | 0.0 | 50 | ... | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 | 190 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
254 | Smoothies & Shakes | McFlurry with M&M’s Candies (Snack) | 7.3 oz (207 g) | 430 | 140 | 15.0 | 10.0 | 48 | 0.0 | 35 | ... | 21 | 1 | 4 | 59 | 9 | 10 | 0 | 30 | 4 | 290 |
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 9.0 | 44 | 0.5 | 45 | ... | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 | 360 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 12.0 | 58 | 1.0 | 55 | ... | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 | 490 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 6.0 | 29 | 0.0 | 30 | ... | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 | 240 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 15.0 | 76 | 1.0 | 60 | ... | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 | 520 |
259 rows × 24 columns
Zmiana nazw kolumn¶
Aby zmienić jedną lub kilka nazw, można użyć metody rename
i podać "słownik zamian".
Ta operacja obsługuje parametr inplace
, czyli domyślnie zwraca nową tabelę, a gdy podamy inplace=True
, to zmienia nazwy w oryginalnej tabeli.
df3.rename(columns={'Category': 'Kategoria', 'Calories': 'Kalorie', 'Item': 'Produkt'}, inplace=True)
df3.head(1)
Kategoria | Produkt | Serving Size | Kalorie | Calories from Fat | TotalFat | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | Cholesterol | ... | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | Kalorie nie z tłuszczu | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 5.0 | 25 | 0.0 | 260 | ... | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 | 180 |
1 rows × 24 columns
Można też podać nowe nazwy wszystkich kolumn w formie listy lub serii.
Aby to pokazać, biorę mały wycinek df:
df4 = df[['Category', 'Item', 'Serving Size']]
df4.head(1)
Category | Item | Serving Size | |
---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) |
df4.columns
Index(['Category', 'Item', 'Serving Size'], dtype='object')
df4.columns = ['Kategoria', 'Produkt', 'Rozmiar']
df4.head(1)
Kategoria | Produkt | Rozmiar | |
---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) |
Gdy chcemy "globalnie" w całym DF zmienić nazwy kolumn zgodnie z jakąś regułą, np. chcemy zamienić spacje na znaki _
, to możemy:
- Wygenerować nowe nazwy np. za pomocą list comprehension (wyrażenie listotwórcze) i użyć tego sposobu ↑ do podmiany wszystkich nazw.
[nazwa.lower().replace(' ', '_') for nazwa in df3.columns]
['kategoria', 'produkt', 'serving_size', 'kalorie', 'calories_from_fat', 'totalfat', 'saturated_fat', 'saturated_fat_(%_daily_value)', 'trans_fat', 'cholesterol', 'cholesterol_(%_daily_value)', 'sodium', 'sodium_(%_daily_value)', 'carbohydrates', 'carbohydrates_(%_daily_value)', 'dietary_fiber', 'dietary_fiber_(%_daily_value)', 'sugars', 'protein', 'vitamin_a_(%_daily_value)', 'vitamin_c_(%_daily_value)', 'calcium_(%_daily_value)', 'iron_(%_daily_value)', 'kalorie_nie_z_tłuszczu']
[nazwa.lower()
.replace(' ', '_')
.replace('%', 'percent')
.replace('(', '')
.replace(')', '')
for nazwa in df3.columns]
['kategoria', 'produkt', 'serving_size', 'kalorie', 'calories_from_fat', 'totalfat', 'saturated_fat', 'saturated_fat_percent_daily_value', 'trans_fat', 'cholesterol', 'cholesterol_percent_daily_value', 'sodium', 'sodium_percent_daily_value', 'carbohydrates', 'carbohydrates_percent_daily_value', 'dietary_fiber', 'dietary_fiber_percent_daily_value', 'sugars', 'protein', 'vitamin_a_percent_daily_value', 'vitamin_c_percent_daily_value', 'calcium_percent_daily_value', 'iron_percent_daily_value', 'kalorie_nie_z_tłuszczu']
df3.columns = [nazwa.lower()
.replace(' ', '_')
.replace('%', 'percent')
.replace('(', '')
.replace(')', '')
for nazwa in df3.columns]
df3.head(1)
kategoria | produkt | serving_size | kalorie | calories_from_fat | totalfat | saturated_fat | saturated_fat_percent_daily_value | trans_fat | cholesterol | ... | carbohydrates_percent_daily_value | dietary_fiber | dietary_fiber_percent_daily_value | sugars | protein | vitamin_a_percent_daily_value | vitamin_c_percent_daily_value | calcium_percent_daily_value | iron_percent_daily_value | kalorie_nie_z_tłuszczu | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 5.0 | 25 | 0.0 | 260 | ... | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 | 180 |
1 rows × 24 columns
df3.saturated_fat_percent_daily_value.mean()
29.934362934362934
- (Jeśli operacja jest na tyle prosta) można użyć akcesora
str
i wykonać operacje na strinach na wszystkich elementach serii (bez pisania pętli).
df5 = df.copy()
df5.columns
Index(['Category', 'Item', 'Serving Size', 'Calories', 'Calories from Fat', 'TotalFat', 'Total Fat (% Daily Value)', 'Saturated Fat', 'Saturated Fat (% Daily Value)', 'Trans Fat', 'Cholesterol', 'Cholesterol (% Daily Value)', 'Sodium', 'Sodium (% Daily Value)', 'Carbohydrates', 'Carbohydrates (% Daily Value)', 'Dietary Fiber', 'Dietary Fiber (% Daily Value)', 'Sugars', 'Protein', 'Vitamin A (% Daily Value)', 'Vitamin C (% Daily Value)', 'Calcium (% Daily Value)', 'Iron (% Daily Value)'], dtype='object')
df5.columns.str.upper()
Index(['CATEGORY', 'ITEM', 'SERVING SIZE', 'CALORIES', 'CALORIES FROM FAT', 'TOTALFAT', 'TOTAL FAT (% DAILY VALUE)', 'SATURATED FAT', 'SATURATED FAT (% DAILY VALUE)', 'TRANS FAT', 'CHOLESTEROL', 'CHOLESTEROL (% DAILY VALUE)', 'SODIUM', 'SODIUM (% DAILY VALUE)', 'CARBOHYDRATES', 'CARBOHYDRATES (% DAILY VALUE)', 'DIETARY FIBER', 'DIETARY FIBER (% DAILY VALUE)', 'SUGARS', 'PROTEIN', 'VITAMIN A (% DAILY VALUE)', 'VITAMIN C (% DAILY VALUE)', 'CALCIUM (% DAILY VALUE)', 'IRON (% DAILY VALUE)'], dtype='object')
(df5.columns.str.replace(' ', '_')
.str.replace('%', 'percent')
.str.replace('(', '')
.str.replace(')', ''))
Index(['Category', 'Item', 'Serving_Size', 'Calories', 'Calories_from_Fat', 'TotalFat', 'Total_Fat_percent_Daily_Value', 'Saturated_Fat', 'Saturated_Fat_percent_Daily_Value', 'Trans_Fat', 'Cholesterol', 'Cholesterol_percent_Daily_Value', 'Sodium', 'Sodium_percent_Daily_Value', 'Carbohydrates', 'Carbohydrates_percent_Daily_Value', 'Dietary_Fiber', 'Dietary_Fiber_percent_Daily_Value', 'Sugars', 'Protein', 'Vitamin_A_percent_Daily_Value', 'Vitamin_C_percent_Daily_Value', 'Calcium_percent_Daily_Value', 'Iron_percent_Daily_Value'], dtype='object')
df5.columns = df5.columns.str.replace(' ', '_').str.replace('%', 'Percent').str.replace('(', '').str.replace(')', '')
df5.head(2)
Category | Item | Serving_Size | Calories | Calories_from_Fat | TotalFat | Total_Fat_Percent_Daily_Value | Saturated_Fat | Saturated_Fat_Percent_Daily_Value | Trans_Fat | ... | Carbohydrates | Carbohydrates_Percent_Daily_Value | Dietary_Fiber | Dietary_Fiber_Percent_Daily_Value | Sugars | Protein | Vitamin_A_Percent_Daily_Value | Vitamin_C_Percent_Daily_Value | Calcium_Percent_Daily_Value | Iron_Percent_Daily_Value | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 rows × 24 columns
Modyfikacja danych za pomocą dedykowanych operacji¶
Za pomocą decykowanych operacji:
replace
- zamiana konkretnych wartości na inne, jak Search & Replace w edytorachfillna
- dedykowana wersja do zamiany pustych wartości, omówiona w innym notatnikuapply
- zastosowanie dowolnej funkcji napisanej w Pythonie - bardzo ogólne „programistyczne” podejście
replace¶
W podstawowej wersji replace
zamienia komórki o podanej wartości na inną wartość. Przy takim wywołaniu dotyczy to całego DataFrame
, ale tylko tych komórek, które w całości mają dokładnie taką wartość (czyli nie dotyczy fragmentów większego tekstu).
Kolejna operacja z opcją inplace
- takie wywołania, jak poniżej, zwracają nowe tabele, a nie modyfikują oryginału.
df
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
df.replace('Breakfast', 'Śniadanie')
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Śniadanie | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Śniadanie | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Śniadanie | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Śniadanie | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Śniadanie | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
df.replace(35, 999)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 999 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 999 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 999 | 12.0 | 58 | 1.0 | ... | 106 | 999 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
Gdy wywołamy replace
na serii, to wynikiem jest seria.
df['Total Fat (% Daily Value)'].replace(35, 999)
0 20 1 12 2 999 3 43 4 999 ... 255 26 256 999 257 17 258 50 259 25 Name: Total Fat (% Daily Value), Length: 260, dtype: int64
Aby podmienić dane w DF w jednej kolumnie, można np. tak:
df5['Total_Fat_Percent_Daily_Value'] = df5['Total_Fat_Percent_Daily_Value'].replace(35, 999)
df5
Category | Item | Serving_Size | Calories | Calories_from_Fat | TotalFat | Total_Fat_Percent_Daily_Value | Saturated_Fat | Saturated_Fat_Percent_Daily_Value | Trans_Fat | ... | Carbohydrates | Carbohydrates_Percent_Daily_Value | Dietary_Fiber | Dietary_Fiber_Percent_Daily_Value | Sugars | Protein | Vitamin_A_Percent_Daily_Value | Vitamin_C_Percent_Daily_Value | Calcium_Percent_Daily_Value | Iron_Percent_Daily_Value | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 999 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 999 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 999 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
replace
obsługuje różne tryby uruchomienia; na różne spsooby można podawać w jakich kolumnach jakie wartości mają być zamienia. To jest rozbudowane, a zobaczymy wybrane możliwości.
Kilka róznych wartości wejściowych jest zamienianych na tę samą wartość wynikową:
df.replace(['Desserts', 'Smoothies & Shakes'], 'Desery')
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Desery | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Desery | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Desery | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Desery | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Desery | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
Można też podać dwie listy i wtedy pierwszy element z lewej listy zamienia się na pierwszy element drugiej listy, drugi na drugi, itd. Mapowanie 1-1
.
df.replace(['Breakfast', 'Dessert', 'Smoothies & Shakes'], ['Śniadanie', 'Deser', 'Szejk'])
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Śniadanie | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Śniadanie | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Śniadanie | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Śniadanie | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Śniadanie | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Szejk | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Szejk | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Szejk | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Szejk | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Szejk | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
Gdy podamy słownik, to klucz w słowniku jest traktowany jak nazwa koluny, w której dokonujemy zamian.
df.replace({
'Category': ['Desserts', 'Smoothies & Shakes'],
'Item': ['McFlurry with Oreo Cookies (Small)'],
}, 'Deser')
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Deser | Deser | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Deser | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Deser | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Deser | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Deser | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
df.replace({
'Category': ['Breakfast', 'Desserts', 'Smoothies & Shakes'],
'Item': ['McFlurry with Oreo Cookies (Small)'],
},
{
'Category': ['Śniadanie', 'Desery', 'Smufi'],
'Item': ['Oreo'],
})
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Śniadanie | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Śniadanie | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Śniadanie | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Śniadanie | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Śniadanie | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smufi | Oreo | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smufi | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smufi | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smufi | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smufi | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
df.head(5)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
5 rows × 24 columns
df.replace('Egg', 'Jajo').head(5)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
5 rows × 24 columns
Opcja regex=True
powoduje, że:
- zamieniane są fragmenty tekstu na ich podane odpowiedniki,
- podany parametr jest traktowany jako wyrażenie regularne, do które dopasowywane są teksty znajdujące się w arkuszu.
df.replace('Egg', 'Jajo', regex=True).head(5)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Jajo McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Jajo White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Jajo | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Jajo Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
5 rows × 24 columns
df.replace('g', 'gram', regex=True)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egramgram McMuffin | 4.8 oz (136 gram) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egramgram White Deligramht | 4.8 oz (135 gram) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausagrame McMuffin | 3.9 oz (111 gram) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausagrame McMuffin with Egramgram | 5.7 oz (161 gram) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausagrame McMuffin with Egramgram Whites | 5.7 oz (161 gram) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 10.1 oz (285 gram) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 13.4 oz (381 gram) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 6.7 oz (190 gram) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 gram) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 gram) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
Przykłady prostych wyrażeń reg.
^
oznacza początek tekstu. Zamienią się tylko te wystąpienia Egg, które są na samym początku komórki
df.replace('^Egg', 'Jajo', regex=True).head(5)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Jajo McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Jajo White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
5 rows × 24 columns
$
oznacza koniec tekstu
df.replace('Egg$', 'Jajo', regex=True).head(5)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Jajo | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
5 rows × 24 columns
\w
oznacza "część słowa", co obejmuje litery, cyfry i znak _
\w+
oznacza niepusty ciąg takich znaków. Tutaj chodzi nam o słowa rozpoczynające się od Mc
Litera r
przed stringiem w Pythonie oznacza, że to jest "raw-string", czyli że w jego wnętrzu znaki \
nie mają specjalnego naczenia z punktu widzenia Pythona. Mają specjalne znaczenie w języku wyrażeń regularnych
df.replace(r'Mc\w+', 'Makcoś', regex=True)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg Makcoś | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage Makcoś | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage Makcoś with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage Makcoś with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | Makcoś with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | Makcoś with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | Makcoś with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | Makcoś with Reese's Peanut Butter Cups (Medium) | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | Makcoś with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
Za pomocą "grup" w wyrażeniach regularnych, można:
- nawiasami okrągłymi zaznaczyć fragment wzorca ("po lewej stronie
replace
"), - "po prawej stronie" wstawić frament tekstu, który wpasował sę w te nawiasy, za pomocą
\1
,\2
, ...
df.replace(r'Mc(\w+)', r'Mak\1', regex=True)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg MakMuffin | 4.8 oz (136 g) | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage MakMuffin | 3.9 oz (111 g) | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage MakMuffin with Egg | 5.7 oz (161 g) | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage MakMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | MakFlurry with Oreo Cookies (Small) | 10.1 oz (285 g) | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | MakFlurry with Oreo Cookies (Medium) | 13.4 oz (381 g) | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | MakFlurry with Oreo Cookies (Snack) | 6.7 oz (190 g) | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | MakFlurry with Reese's Peanut Butter Cups (Med... | 14.2 oz (403 g) | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | MakFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz (202 g) | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
seria = pd.Series(['Ala ma kota', 'Ola ma psa', 'Adam ma rybki'])
seria
0 Ala ma kota 1 Ola ma psa 2 Adam ma rybki dtype: object
seria.replace(r'(\w+) ma (\w+)', r'Osoba \1 posiada zwierzę \2', regex=True)
0 Osoba Ala posiada zwierzę kota 1 Osoba Ola posiada zwierzę psa 2 Osoba Adam posiada zwierzę rybki dtype: object
Przykład¶
Wyciągnijmy informację o gramaturze produktu z kolumny Serving Size.
Na razie zobaczmy, jaki wzorzec pasuje do fragmentu takiego jak (333 g)
df.replace(r'\((\d+) g\)', r'\1 gramów', regex=True)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz 136 gramów | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 4.8 oz 135 gramów | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 3.9 oz 111 gramów | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz 161 gramów | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz 161 gramów | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 10.1 oz 285 gramów | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 13.4 oz 381 gramów | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 6.7 oz 190 gramów | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 14.2 oz 403 gramów | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Snack) | 7.1 oz 202 gramów | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
Dopasowując i usuwając również wszystkie znaki poprzedzające ten fragment, możemy uzyskać:
df.replace(r'^.*\((\d+) g\)', r'\1', regex=True)
Category | Item | Serving Size | Calories | Calories from Fat | TotalFat | Total Fat (% Daily Value) | Saturated Fat | Saturated Fat (% Daily Value) | Trans Fat | ... | Carbohydrates | Carbohydrates (% Daily Value) | Dietary Fiber | Dietary Fiber (% Daily Value) | Sugars | Protein | Vitamin A (% Daily Value) | Vitamin C (% Daily Value) | Calcium (% Daily Value) | Iron (% Daily Value) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 136 | 300 | 120 | 13.0 | 20 | 5.0 | 25 | 0.0 | ... | 31 | 10 | 4 | 17 | 3 | 17 | 10 | 0 | 25 | 15 |
1 | Breakfast | Egg White Delight | 135 | 250 | 70 | 8.0 | 12 | 3.0 | 15 | 0.0 | ... | 30 | 10 | 4 | 17 | 3 | 18 | 6 | 0 | 25 | 8 |
2 | Breakfast | Sausage McMuffin | 111 | 370 | 200 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 29 | 10 | 4 | 17 | 2 | 14 | 8 | 0 | 25 | 10 |
3 | Breakfast | Sausage McMuffin with Egg | 161 | 450 | 250 | 28.0 | 43 | 10.0 | 52 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 15 | 0 | 30 | 15 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 161 | 400 | 210 | 23.0 | 35 | 8.0 | 42 | 0.0 | ... | 30 | 10 | 4 | 17 | 2 | 21 | 6 | 0 | 25 | 10 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
255 | Smoothies & Shakes | McFlurry with Oreo Cookies (Small) | 285 | 510 | 150 | 17.0 | 26 | 9.0 | 44 | 0.5 | ... | 80 | 27 | 1 | 4 | 64 | 12 | 15 | 0 | 40 | 8 |
256 | Smoothies & Shakes | McFlurry with Oreo Cookies (Medium) | 381 | 690 | 200 | 23.0 | 35 | 12.0 | 58 | 1.0 | ... | 106 | 35 | 1 | 5 | 85 | 15 | 20 | 0 | 50 | 10 |
257 | Smoothies & Shakes | McFlurry with Oreo Cookies (Snack) | 190 | 340 | 100 | 11.0 | 17 | 6.0 | 29 | 0.0 | ... | 53 | 18 | 1 | 2 | 43 | 8 | 10 | 0 | 25 | 6 |
258 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Medium) | 403 | 810 | 290 | 32.0 | 50 | 15.0 | 76 | 1.0 | ... | 114 | 38 | 2 | 9 | 103 | 21 | 20 | 0 | 60 | 6 |
259 | Smoothies & Shakes | McFlurry with Reese's Peanut Butter Cups (Snack) | 202 | 410 | 150 | 16.0 | 25 | 8.0 | 38 | 0.0 | ... | 57 | 19 | 1 | 5 | 51 | 10 | 10 | 0 | 30 | 4 |
260 rows × 24 columns
Idąc dalej, chcemy uzyskać wartość liczbową.
Wynik takiego replace, jak powyżej, można zrzutować na typ float
.
Ponieważ jednak w danych znajdują się komórki niepasujące do wzorca, trzeba je zignorować, np. zamienić na 0.
Drugie wyrażenie mówi - zamień teksty, które zawierają coś innego niż cyfry, na 0.
df['Serving Size']\
.replace(r'^.*\((\d+) g\)', r'\1', regex=True)\
.replace(r'^.*[^\d].*$', '0', regex=True) \
.astype('float32')
0 136.0 1 135.0 2 111.0 3 161.0 4 161.0 ... 255 285.0 256 381.0 257 190.0 258 403.0 259 202.0 Name: Serving Size, Length: 260, dtype: float32
%%timeit
df['Serving Size']\
.replace(r'^.*\((\d+) g\)', r'\1', regex=True)\
.replace(r'^.*[^\d].*$', '0', regex=True) \
.astype('float32')
2.83 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Operacja apply
¶
Bardziej ogólnym sposobem, aby dane w tabelach/seriach zamieniać na "wyniki dowolnej funkcji", jest operacja apply
.
Za pomocą def
można zdefiniować funkcję, która przyjmuje parametr i zwraca wynik.
def funkcja(tekst):
fragment = tekst[:7]
return fragment.upper()
Normalnie w Pythonie takie funkcje się wywołuje:
funkcja('Abrakadabra')
'ABRAKAD'
W Pandas za pomocą apply
taką funkcję można zastosować do wszystkich elementów serii:
seria
0 Ala ma kota 1 Ola ma psa 2 Adam ma rybki dtype: object
seria.apply(funkcja)
0 ALA MA 1 OLA MA 2 ADAM MA dtype: object
df['Item'].apply(funkcja)
0 EGG MCM 1 EGG WHI 2 SAUSAGE 3 SAUSAGE 4 SAUSAGE ... 255 MCFLURR 256 MCFLURR 257 MCFLURR 258 MCFLURR 259 MCFLURR Name: Item, Length: 260, dtype: object
Gdy funkcja jest prosta (da się ją krótko zapisać), to zamiast def
można zdefiniować ją za pomocą wyrażenia lambda.
df['Item'].apply(lambda x: x[2:10].lower())
0 g mcmuff 1 g white 2 usage mc 3 usage mc 4 usage mc ... 255 flurry w 256 flurry w 257 flurry w 258 flurry w 259 flurry w Name: Item, Length: 260, dtype: object
apply
może być też używany dla kolumn liczbowych
df.Calories.head(5)
0 300 1 250 2 370 3 450 4 400 Name: Calories, dtype: int64
df.Calories.apply(lambda cal: cal * 2 if cal < 400 else cal*10).head(5)
0 600 1 500 2 740 3 4500 4 4000 Name: Calories, dtype: int64
Wracamy do wyciągania informacji o gramach. Teraz zrobimy to definiując odp. funkcje i stosując apply
.
W tej części zastosujemy wyrażenia regularne obsługiwane bezpośrednio przez Pythona (moduł re
).
import re
def gramy1(tekst):
m = re.search(r'\((\d+) g\)', tekst)
if m:
return float(m[1]) # albo m.group(1)
else:
return 0.0
df['Serving Size'].apply(gramy1)
0 136.0 1 135.0 2 111.0 3 161.0 4 161.0 ... 255 285.0 256 381.0 257 190.0 258 403.0 259 202.0 Name: Serving Size, Length: 260, dtype: float64
%%timeit
df['Serving Size'].apply(gramy1)
808 µs ± 34.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Aby podnieść wydajność działania wyrażeń reg., powinniśmy "skompilować" wyrażenie i w funckji używać obiektu pattern
.
pattern = re.compile(r'\((\d+) g\)')
def gramy2(tekst):
m = pattern.search(tekst)
return float(m[1]) if m else 0.0
%%timeit
df['Serving Size'].apply(gramy2)
528 µs ± 24.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Można też próbować sobie poradzić bez wyrażeń regularnych, zwykłymi operacjami split
, index
, find
, ...
def gramy3(napis):
if 'g' not in napis:
return 0.0
p = napis.index('(')
k = napis.index(' g')
return float(napis[p+1:k])
%%timeit
df['Serving Size'].apply(gramy3)
512 µs ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Ostatecznie tak znalezione informacje o gramach dodajemy do tabeli i wykorzystujemy w obliczeniu.
df['Gramatura'] = df['Serving Size'].apply(gramy2)
df.iloc[:5, [0,1,2,3,-1]]
Category | Item | Serving Size | Calories | Gramatura | |
---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 136.0 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 135.0 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 111.0 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 161.0 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 161.0 |
Obliczamy "kalorie na gram". Dla gramatury 0 wyjdzie nieskończoność :)
df['Nasycenie kaloriami'] = df.Calories / df.Gramatura
df.iloc[:5, [0,1,2,3,-2, -1]]
Category | Item | Serving Size | Calories | Gramatura | Nasycenie kaloriami | |
---|---|---|---|---|---|---|
0 | Breakfast | Egg McMuffin | 4.8 oz (136 g) | 300 | 136.0 | 2.205882 |
1 | Breakfast | Egg White Delight | 4.8 oz (135 g) | 250 | 135.0 | 1.851852 |
2 | Breakfast | Sausage McMuffin | 3.9 oz (111 g) | 370 | 111.0 | 3.333333 |
3 | Breakfast | Sausage McMuffin with Egg | 5.7 oz (161 g) | 450 | 161.0 | 2.795031 |
4 | Breakfast | Sausage McMuffin with Egg Whites | 5.7 oz (161 g) | 400 | 161.0 | 2.484472 |
df.iloc[150:153, [0,1,2,3,-2, -1]]
Category | Item | Serving Size | Calories | Gramatura | Nasycenie kaloriami | |
---|---|---|---|---|---|---|
150 | Coffee & Tea | Latte (Large) | 20 fl oz cup | 280 | 0.0 | inf |
151 | Coffee & Tea | Caramel Latte (Small) | 12 fl oz cup | 270 | 0.0 | inf |
152 | Coffee & Tea | Caramel Latte (Medium) | 16 fl oz cup | 340 | 0.0 | inf |
df[df.Gramatura > 0].iloc[:, [0,1,2,3,-2, -1]].sort_values('Nasycenie kaloriami', ascending=False)
Category | Item | Serving Size | Calories | Gramatura | Nasycenie kaloriami | |
---|---|---|---|---|---|---|
104 | Desserts | Chocolate Chip Cookie | 1 cookie (33 g) | 160 | 33.0 | 4.848485 |
105 | Desserts | Oatmeal Raisin Cookie | 1 cookie (33 g) | 150 | 33.0 | 4.545455 |
39 | Breakfast | Cinnamon Melts | 4 oz (114 g) | 460 | 114.0 | 4.035088 |
10 | Breakfast | Sausage Biscuit (Regular Biscuit) | 4.1 oz (117 g) | 430 | 117.0 | 3.675214 |
11 | Breakfast | Sausage Biscuit (Large Biscuit) | 4.6 oz (131 g) | 480 | 131.0 | 3.664122 |
... | ... | ... | ... | ... | ... | ... |
89 | Salads | Premium Southwest Salad with Grilled Chicken | 11.8 oz (335 g) | 290 | 335.0 | 0.865672 |
84 | Salads | Premium Bacon Ranch Salad (without Chicken) | 7.9 oz (223 g) | 140 | 223.0 | 0.627803 |
87 | Salads | Premium Southwest Salad (without Chicken) | 8.1 oz (230 g) | 140 | 230.0 | 0.608696 |
101 | Snacks & Sides | Apple Slices | 1.2 oz (34 g) | 15 | 34.0 | 0.441176 |
100 | Snacks & Sides | Side Salad | 3.1 oz (87 g) | 20 | 87.0 | 0.229885 |
118 rows × 6 columns