python - Converting panda's dataframe group iteration into groupby with apply -


I need to split a dataframe into groups, and for those groups, which is a strange number of lines, Needing to pull is the first row whose column corresponds to a certain position and then I need to collect all the first rows (hence the only ones in the odd number of groups matching a situation). I can do it in a loop (it works) as below, but with this implementation it can not work again in a group Can you help? grp_by_cols = ['A', 'B'] new_df = pd.DataFrame (column = grp_by_cols), group txn.groupby (grp_by_cols): if len (group)% 2! = 0: new_df = new_df.append (group [group ['c'] == 'some')]. Head (1))

Interesting problem that I solve by writing a function that you Then pass to apply.

Suppose you had such data on PD import random DF = PD. DataFrame ({'key': [1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5, 6, 6, 7, 7, 7], 'Data 1': [' A ',' B ',' C ',' A ',' B ',' B ',' B ',' C ',' A ',' B ',' A ',' B ',' C ',' [Random.random ()) for x in 'A', 'B', 'B', 'B', 'C'], 'data2': xrange (18)]} < P> Where is the key on which you will group, and the 'Data 1' column will be used to test the situation you have in this data is odd number of observations for groups 1, 3, 5 and 7. On the first observations, 'Data 1' values ​​of those groups are 'A', 'B', 'A', 'B'. Let's say for example you wanted to get a new data frame with the first observations in those groups, but only that data line '1' in the first line is equal to B. We can write a general function like this:

  def apply_func (df, col, condition): if len (df)% 2 == 0: return None other: if df.irow (0) [col] == condition: return df.irow (0) else: return None   

And after that, the group says it as follows:

 < Code> DF.groupby ('key'). Applied (apply_func, 'data1 dropdown ()   

returns the following output:

  Data 1 Data 2 Key2b 0.980814 3 6B 0.428402 7 < / Code>  

That is not me wrong I

Of course you really do not need all those branches in the ceremony, I include them to make it clear. The shortest way is:

  def apply_func (df, col, condition): if len (df)% 2! = 0 and df.irow (0) [col] == Position: Return df.irow (0)   

Note that when applying If a function is called to pass, then the fist is the only given Detroit and it is done automatically. This is the reason why you need to specify 'df' logic when passing the function to apply In fact, if you get an error, then say that you have given a lot of arguments. In my discretion when passing the function, something strange is that the arguments Is provided after commas instead of parenting. It seems misleading to me, but that is what it is ....

Comments

Popular posts from this blog

php - PDO bindParam() fatal error -

logging - How can I log both the Request.InputStream and Response.OutputStream traffic in my ASP.NET MVC3 Application for specific Actions? -

java - Why my included JSP file won't get processed correctly? -