python - Converting panda's dataframe group iteration into groupby with apply -
I need to split a dataframe into groups, and for those groups, which is a strange number of lines, Needing to pull is the first row whose column corresponds to a certain position and then I need to collect all the first rows (hence the only ones in the odd number of groups matching a situation). I can do it in a loop (it works) as below, but with this implementation it can not work again in a group Can you help? Interesting problem that I solve by writing a function that you Then pass to apply. Suppose you had such data on PD import random DF = PD. DataFrame ({'key': [1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 7, 7, 7], 'Data 1': [' A ',' B ',' C ',' A ',' B ',' B ',' B ',' C ',' A ',' B ',' A ',' B ',' C ',' [Random.random ()) for x in 'A', 'B', 'B', 'B', 'C'], 'data2': xrange (18)]} < P> Where is the key on which you will group, and the 'Data 1' column will be used to test the situation you have in this data is odd number of observations for groups 1, 3, 5 and 7. On the first observations, 'Data 1' values of those groups are 'A', 'B', 'A', 'B'. Let's say for example you wanted to get a new data frame with the first observations in those groups, but only that data line '1' in the first line is equal to B. We can write a general function like this: And after that, the group says it as follows: returns the following output: That is not me wrong I Of course you really do not need all those branches in the ceremony, I include them to make it clear. The shortest way is: Note that when applying If a function is called to pass, then the fist is the only given Detroit and it is done automatically. This is the reason why you need to specify 'df' logic when passing the function to apply In fact, if you get an error, then say that you have given a lot of arguments. In my discretion when passing the function, something strange is that the arguments Is provided after commas instead of parenting. It seems misleading to me, but that is what it is ....
def apply_func (df, col, condition): if len (df)% 2 == 0: return None other: if df.irow (0) [col] == condition: return df.irow (0) else: return None
< Code> DF.groupby ('key'). Applied (apply_func, 'data1 dropdown ()
Data 1 Data 2 Key2b 0.980814 3 6B 0.428402 7 < / Code>
def apply_func (df, col, condition): if len (df)% 2! = 0 and df.irow (0) [col] == Position: Return df.irow (0)
Comments
Post a Comment