3.Pandas高级函数应用

其他
2025-08-12 12:33:02

3.1 函数应用 3.1.1 apply

apply（）是一种可自定义的函数，可以对Series或DataFrame的行列进行操作并返回结果，可以用于复杂逻辑的实现，针对Series和DataFrame的应用有区别：Series作用每一个元素，不用设行列；DataFrame需要设置行列方向，并作用于其中一种。

（1）Series

df['score'].apply(lambda x: x-3 if x>90 else x)

（2）DataFrame

# 作用于列元素：aixs=0 def col(x): if x.name='score': return x+5 else: return x df.apply(col, axis=0) # 作用于行元素：axis=1 def row(x): if x['subject']=='lakers': a = 1 else: a = 1.2 return x['score']*a df.apply(row, axis=1)

（3）传入参数

# args额外参数 def score_bias(x, bias): if x>90: return x+bias else: return x df['score'] = df['score'].apply(score_bias, args=(bias,))

（4）传入关键字

def subject_map(x, **kwargs): return kwargs[x] df['subject_no'] = df['subject'].apply(subject_map, english=0, math=1) 3.1.2 applymap

只能作用在DataFrame上，操作对象是每个元素，即接收一个标量元素返回一个标量元素，点对点操作。

def el_cook(x): if isinstance(x, str): return 's_'+x else: return str(x) df.apply(el_cook) 3.1.3 map

只能作用在Series上。

（1）字典映射

GENDER_ENCODING = { "male": 0, "femal": 1 } df['gender_map'] = df['gender'].map(GENDER_ENCODING)

（2）函数映射

map相对于apply无法传参，但是效率高很多。

# 普通函数 df['score'],map(np.sqrt) df['student'].map(list) # 自定义函数 df['score'].map(lambda x:x-3 if x>90 else x) 3.1.4 transform

对series和DataFrame都使用，DataFrame可选择处理的轴方向，默认是列方向。

不支持有降维功能的函数，比如聚合函数min、mean、std。

返回结果与自身形状相同，不改变原数据形状。

### 单个函数 df.transform(np.exp).transform(lambda x: round(x,2)) ### 多个函数 # 列表形式 df.transform([np.square, np.sqrt]).transform(lambda x: round(x,2)) # 平方、开平方根（产生多级索引，一级是列名，二级是函数名） # 字典形式 df.transform({'C_0':np.square, 'C_2':[np.square, np.sqrt]}).transform(lambda x:round(x,2)) # 对指定列进行差异化的函数转换 3.1.5 pipe

不同于applymap元素级，apply/transform行列级应用，pipe是一个表级应用函数，也称管道函数。

### 单个函数 df.pipe(np.exp).pipe(lambda x:round(x,2)) ### 链式调用 pi = df.pipe(np.square).\ pipe(np.multiply, 1.5).\ pipe(np.add, 8) ### 特殊传参方式 def spcl(num, df): return df.add(num) df.pipe((spcl, 'df'), 2) # spcl指定函数，2指定参数 3.2 表达式求值 3.2.1 eval

eval()可以通过字符串表达式的方式对series和DataFrame进行计算和解析等操作。

eval（）函数有两大优势：

对数据较大的DataFrame对象操作更高效；

对复杂的算术和布尔运算更快速，因为后端计算引擎默认是numexpr

如果数据量较小则没必要用eval，一般当数据量较大超过10000行的时候才建议使用eval（）函数进行加速。

eval（）支持以下算术操作：

算术运算：除左移（<<）和右移（>>）运算符外的算术运算；

比较操作：包括链式比较，例如，2<df<df2；

布尔运算：例如，df<df2 and df3<df4 or not df_bool；

列表和元组：如[1,2],(1,2);

属性访问：如df.a;

下标表达式：如df[0];