Python/Pandas
Apparence
< Python
Pandas: Torolalana Fohy ho an'ny Fampiasa Mandroso
[hanova]MultiIndex (Hierarchical Indexing)
[hanova]# Mamorona DataFrame MultiIndex
arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
df = pd.DataFrame({'value': [10, 20, 30, 40]}, index=index)
# Fidirana amin'ny angon-drakitra MultiIndex
df.loc['A', 'one']
Famolavolana amin'ny `stack` sy `unstack`
[hanova]# Unstacking ny DataFrame MultiIndex
df.unstack('second')
# Stacking indray
df.stack()
Famakafakana Time Series
[hanova]# DataFrame Time Series
date_rng = pd.date_range(start='2024-01-01', periods=5, freq='D')
df = pd.DataFrame(date_rng, columns=['date']).set_index('date')
df['data'] = np.random.randint(0, 100, size=len(date_rng))
# Resample sy Rolling
df.resample('W').sum()
df['rolling_mean'] = df['data'].rolling(window=3).mean()
Angon-drakitra Karazana (Categorical Data)
[hanova]# Mamorona sy mampiasa angon-drakitra karazana (Categorical Data)
df['category'] = pd.Categorical(['A', 'B', 'A', 'C', 'B'], categories=['A', 'B', 'C'])
df['category'].cat.codes
Fampiasana Method Chaining amin'ny `pipe`
[hanova]# Asa manokana sy chaining method
def add_ten(df): df['value'] += 10; return df
df.pipe(add_ten).pipe(lambda df: df[df['value'] > 12])
Fikarohana amin'ny `query` sy `eval`
[hanova]# Fikarohana sy fampitahana manazava (evaluate) expressions
df.query('A > 2 & B < 14')
df['D'] = df.eval('A + B + C')
Fampiasa String Vectorized
[hanova]# Asa amin'ny string mampiasa `.str` accessor
df['text_upper'] = df['text'].str.upper()
df['text_split'] = df['text'].str.split()
df['contains_pandas'] = df['text'].str.contains('Pandas')
Fikirakirana DataFrame Lehibe amin'ny `Dask`
[hanova]import dask.dataframe as dd
# Dask DataFrame avy amin'ny CSV
ddf = dd.read_csv('large_dataset.csv')
ddf.groupby('column_name').sum().compute()
Fandidiana Data I/O Efficient
[hanova]# Fikirakirana rakitra amin'ny format samihafa
df.to_csv('data.csv')
df.to_excel('data.xlsx')
df.to_sql('table_name', con=sqlalchemy_engine)
df.to_parquet('data.parquet')
DataFrame Sparse
[hanova]# Mamorona DataFrame Sparse
df = pd.DataFrame({'A': [0, 1, 0, 0, 5], 'B': [0, 0, 3, 0, 0]}).astype(pd.SparseDtype(int, fill_value=0))
df.memory_usage(deep=True)
Fampiasana Transformations Manokana amin'ny `apply`
[hanova]# Fampiharana asa manokana isaky ny andalana
def custom_transformation(row): return row['A'] * row['B']
df['new_column'] = df.apply(custom_transformation, axis=1)
Profiling Data amin'ny `pandas_profiling`
[hanova]from pandas_profiling import ProfileReport
# Mamorona tatitra profiling
profile = ProfileReport(df, title="Tatitra Profiling Pandas")
profile.to_file("report.html")
Fampiasana `GroupBy` amin'ny Fanaovana Aggregations Maro
[hanova]# Group by sy aggregations maro
df.groupby('category').agg({'value1': ['sum', 'mean'], 'value2': ['max', 'min']})
Fampiratiana Sary mandroso amin'ny `plot`
[hanova]# Famoronana bar chart sy histogram
df.plot(kind='bar', x='category', y='values')
df['values'].plot(kind='hist', bins=5)
Asa Rolling Window Manokana
[hanova]# Asa rolling window manokana
def custom_rolling_func(x): return np.sum(x) * 0.5
df['custom_rolling'] = df['value'].rolling(window=3).apply(custom_rolling_func)
Fampiasana Vectorized amin'ny `NumPy`
[hanova]import numpy as np
# Asa isaky ny singa amin'ny NumPy
df['C'] = np.multiply(df['A'], df['B'])