Aller au contenu

Python/Pandas

Avy amin'i Wikibooks

Pandas: Torolalana Fohy ho an'ny Fampiasa Mandroso

[hanova]

MultiIndex (Hierarchical Indexing)

[hanova]
# Mamorona DataFrame MultiIndex
arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
df = pd.DataFrame({'value': [10, 20, 30, 40]}, index=index)

# Fidirana amin'ny angon-drakitra MultiIndex
df.loc['A', 'one']

Famolavolana amin'ny `stack` sy `unstack`

[hanova]
# Unstacking ny DataFrame MultiIndex
df.unstack('second')

# Stacking indray
df.stack()

Famakafakana Time Series

[hanova]
# DataFrame Time Series
date_rng = pd.date_range(start='2024-01-01', periods=5, freq='D')
df = pd.DataFrame(date_rng, columns=['date']).set_index('date')
df['data'] = np.random.randint(0, 100, size=len(date_rng))

# Resample sy Rolling
df.resample('W').sum()
df['rolling_mean'] = df['data'].rolling(window=3).mean()

Angon-drakitra Karazana (Categorical Data)

[hanova]
# Mamorona sy mampiasa angon-drakitra karazana (Categorical Data)
df['category'] = pd.Categorical(['A', 'B', 'A', 'C', 'B'], categories=['A', 'B', 'C'])
df['category'].cat.codes

Fampiasana Method Chaining amin'ny `pipe`

[hanova]
# Asa manokana sy chaining method
def add_ten(df): df['value'] += 10; return df
df.pipe(add_ten).pipe(lambda df: df[df['value'] > 12])

Fikarohana amin'ny `query` sy `eval`

[hanova]
# Fikarohana sy fampitahana manazava (evaluate) expressions
df.query('A > 2 & B < 14')
df['D'] = df.eval('A + B + C')

Fampiasa String Vectorized

[hanova]
# Asa amin'ny string mampiasa `.str` accessor
df['text_upper'] = df['text'].str.upper()
df['text_split'] = df['text'].str.split()
df['contains_pandas'] = df['text'].str.contains('Pandas')

Fikirakirana DataFrame Lehibe amin'ny `Dask`

[hanova]
import dask.dataframe as dd

# Dask DataFrame avy amin'ny CSV
ddf = dd.read_csv('large_dataset.csv')
ddf.groupby('column_name').sum().compute()

Fandidiana Data I/O Efficient

[hanova]
# Fikirakirana rakitra amin'ny format samihafa
df.to_csv('data.csv')
df.to_excel('data.xlsx')
df.to_sql('table_name', con=sqlalchemy_engine)
df.to_parquet('data.parquet')

DataFrame Sparse

[hanova]
# Mamorona DataFrame Sparse
df = pd.DataFrame({'A': [0, 1, 0, 0, 5], 'B': [0, 0, 3, 0, 0]}).astype(pd.SparseDtype(int, fill_value=0))
df.memory_usage(deep=True)

Fampiasana Transformations Manokana amin'ny `apply`

[hanova]
# Fampiharana asa manokana isaky ny andalana
def custom_transformation(row): return row['A'] * row['B']
df['new_column'] = df.apply(custom_transformation, axis=1)

Profiling Data amin'ny `pandas_profiling`

[hanova]
from pandas_profiling import ProfileReport

# Mamorona tatitra profiling
profile = ProfileReport(df, title="Tatitra Profiling Pandas")
profile.to_file("report.html")

Fampiasana `GroupBy` amin'ny Fanaovana Aggregations Maro

[hanova]
# Group by sy aggregations maro
df.groupby('category').agg({'value1': ['sum', 'mean'], 'value2': ['max', 'min']})

Fampiratiana Sary mandroso amin'ny `plot`

[hanova]
# Famoronana bar chart sy histogram
df.plot(kind='bar', x='category', y='values')
df['values'].plot(kind='hist', bins=5)

Asa Rolling Window Manokana

[hanova]
# Asa rolling window manokana
def custom_rolling_func(x): return np.sum(x) * 0.5
df['custom_rolling'] = df['value'].rolling(window=3).apply(custom_rolling_func)

Fampiasana Vectorized amin'ny `NumPy`

[hanova]
import numpy as np

# Asa isaky ny singa amin'ny NumPy
df['C'] = np.multiply(df['A'], df['B'])