如何选出A列中元素在对应B列中出现的行?
df.loc[[i in j.split(",") for i, j in zip(df.A, df.B)]]
A | B | |
---|---|---|
0 | w | h,x,m,i,n,j,f,r,c,m |
1 | 2 | 1,w,9,e,k,p,u,b,h,q |
2 | c | q,f,h,7,i,l,4,d,6,w |
3 | 4 | h,i,4,3,7,t,l,h,9,j |
4 | k | w,d,2,p,x,t,d,p,l,e |
如何选出A列中元素在对应B列中出现的行?
df.loc[[i in j.split(",") for i, j in zip(df.A, df.B)]]
A | B | |
---|---|---|
0 | w | h,x,m,i,n,j,f,r,c,m |
1 | 2 | 1,w,9,e,k,p,u,b,h,q |
2 | c | q,f,h,7,i,l,4,d,6,w |
3 | 4 | h,i,4,3,7,t,l,h,9,j |
4 | k | w,d,2,p,x,t,d,p,l,e |
Mario_吴怡:
df[df.apply(lambda x: x[0] in x[1].split(","), axis=1)]
Jan:
df[df.apply(lambda x: x['security_code'] in x['fund_item'].split(","), axis=1)]
GYH:
df.B = df.B.str.split(',')
df_explode = df.explode('B')
check = df_explode.A == df_explode.B
df.loc[check.index[check].drop_duplicates()]
构造一个一般的例子:
n = 10**5
L = list('123456789abcdefghijklmnopqrstuvwxyz')
df = pd.DataFrame({'A':np.random.choice(L, n),
'B':[','.join(np.random.choice(L,10)) for i in range(n)]})
最容易想到的是用逐行的 apply
写出:
df[df.apply(lambda x: x.A in x.B.split(','), 1)]
但在 pandas
中应当尽可能地避免使用 apply
函数,可以利用 explode
和 drop_duplicates
如下写出:
df.B = df.B.str.split(',')
df_explode = df.explode('B')
check = df_explode.A == df_explode.B
df.loc[check.index[check].drop_duplicates()]
附:性能对比
n | apply | apply_std | explode | explode_std |
---|---|---|---|---|
1*10^3 | 0.0137 | 0.000162 | 0.00574 | 0.000238 |
5*10^3 | 0.0643 | 0.00142 | 0.019 | 0.000417 |
1*10^4 | 0.129 | 0.00429 | 0.038 | 0.0013 |
5*10^4 | 0.661 | 0.0201 | 0.181 | 0.0013 |
1*10^5 | 1.31 | 0.0207 | 0.367 | 0.00662 |
5*10^5 | 6.57 | 0.0426 | 1.8 | 0.0162 |
1*10^6 | 13.5 | 0.248 | 3.69 | 0.0565 |