Questions tagged [pandas-groupby]

1

votes
2

answer
24

Views

pandas how to efficiently split large dataframe into two sets with a grouped condition on datetime

I have a large dataframe (~40 million rows) and I want to split it into two parts. Column 'group' indicates to which group the sample belongs and column 'date' which date the sample occurred. In the following test case, there can be multiple equal samples, but in the original set, this is not the ca...
Skyy2010
0

votes
1

answer
11

Views

How to convert Monthly data into Yearly data in pandas dataframe?

All, My dataframe looks like following. I am trying to convert my Monthly data into Yearly data. I am trying to aggregate my dataframe such that I can add the monthly data-points for the year 1997 and display the sum column. I would like to perform this activity for the years 1997-2018. I have al...
Data_is_Power
0

votes
2

answer
14

Views

How to efficiently index a Groupby object?

I have a dataframe dfyg which is a Groupby object containing 120,000 groups. What's the best way to select 10,000 of these groups and pass them to the multiprocessing.Pool.map() function? I can think of a for loop which selects 10,000 groups and puts them in a list. I cannot filter the dataframe be...
apkul
2

votes
3

answer
18

Views

Last occurrence of a Groupby object under certain conditions

Let's say I have a DataFrame that looks like this: Categories Values 0 Category 0 1 1 Category 0 0 2 Category 0 -1 3 Category 0 0 4 Category 1 1 5 Category 1 0 6 Category 1 -1 7 Category 1 0 8 Category 2 1 9 Category 2 0...
mathguy
0

votes
2

answer
18

Views

pandas: how to groupby using string using a string

I have csv file with newline delimiters that I read into a pandas dataframe. df = pd.dataframe("data.csv", delimiter="\n", header=None) This returns something like this marker1 10 20 30 marker2 40 50 marker3 60 70 80 90 100 ..... I want to generate a dataframe as follows marker1 10 marker1 20 marker...
1

votes
2

answer
284

Views

Find minimum daily value using pandas GroupBy or pivot_table

I have a Dataframe obtained from a csv file (after some filtering) that looks like this: df3.head(n = 10) DateTime Det_ID Speed 16956 2014-01-01 07:00:00 1201085 65.0 16962 2014-01-01 07:00:00 1201110 69.5 19377 2014-01-01 08:00:00 1201085 65.0 19383 2014-01-01 08:00:00 1201110 6...
VivianT
1

votes
2

answer
489

Views

pandas dataframe groupby by column position

I have a function that does group by on a pandas dataframe. The problem is my dataframe can have variable number of columns. I want to aggregate: sum the last column by the first column. The name of the last column is different, but, the name of the first column is fixed. How could I achieve the gro...
add787
1

votes
1

answer
40

Views

Pandas compute datetime diff, but for each user

Dataset is related to time user spent on viewing items: user_id item_id view_started 121 160 2015-10-20 17:02:02 231 160 2015-10-18 11:02:29 231 161 2015-10-18 11:05:23 121 166 2015-10-18 11:04:34 231 180 2015-10-18 11:06:...
Null-Hypothesis
1

votes
2

answer
53

Views

Create new columns from aggregated categories

I have a dataframe looks like: SK_ID_CURR CREDIT_ACTIVE 0 215354 Closed 1 215354 Active 2 215354 Active 3 215354 Active 4 215354 Active 5 215354 Active 6 215354 Active 7 162297 Closed 8 162297 Closed 9 162297 Active I would like to aggregate the number of active and cl...
hk_03
1

votes
1

answer
115

Views

How to get rid of nested column names in Pandas from group by aggregation?

I have the following code that finds the total and unique sales for each employee using a group by with Employee_id and aggregation with Customer_id. Sales.groupby('Employee_id').agg({ 'Customer_id': [ ('total_sales', 'count'), ('unique_sales', 'nunique') ]}) It is important to know that I will per...
Jane Sully
1

votes
1

answer
37

Views

groupby one column and convert remaining columns to dictionary

I have a dataframe like this import pandas as pd df = pd.DataFrame({'keyid': ['d1', 'd1', 'd2', 'd2'], 'keys': ['key1', 'key2', 'key1', 'key2'], 'vals': ['val1', 'val2', 'val3', 'val4']}) keyid keys vals 0 d1 key1 val1 1 d1 key2 val2 2 d2 key1 val3 3 d2 key2 val4 which I want t...
Cleb
1

votes
2

answer
41

Views

Aggregating data using pandas python

I have the following data similar to the below: Table 1 Colour Make Red Ford Blue BMW Blue BMW Green Golf Yellow Audi Yellow Audi Yellow Audi Table 2 Colour Make Count Green Ford 5 Blue BMW 1 Green Golf 6 Orange BMW 1 I would like to use pandas to aggregate...
sytup
1

votes
1

answer
36

Views

Grouping on identical column names in pandas

time A1 A1 A2 A2 A2 A3 A3 2017-01 a1 a2 b1 b2 c ..... 2017-02 a3 a4 b3 b4 c 2017-03 a5 a6 b5 b6 c .... There is a dataframe as shown above. How to get mean value of the columns which have the same name( as shown below)?...
xianyuyu
1

votes
3

answer
51

Views

How to merge other rows of data frame to the current row with Python/Pandas

I have a data frame that looks something like this: A1 A2 A3 A4 1001 1002 1003 1004 5001 5002 5003 5004 7001 7002 7003 7004 I would like to merge the other rows to the current row to look like this. For Eg: For the first row the first four columns remain the same but the columns...
Nishant Kumar
1

votes
3

answer
49

Views

groupby(“date”) - get datetime of min and max

For this pandas DataFrame (that is in reality much longer), I would like to get the value of b and date, where b is minimum and b is maximum for that day. Performance is an issue. b date 0 1 1999-12-29 23:59:12 1 2 1999-12-29 23:59:13 2 3 1999-12-29 23:59:14 3 3 1999-12-30 23:59:1...
user7468395
1

votes
3

answer
100

Views

IndexError when replacing missing values with mode using groupby in pandas

I have a dataset which requires missing value treatment. Column Missing Values Complaint_ID 0 Date_received 0 Transaction_Type 0 Complaint_reason 0 Company_response...
Ashu Grover
1

votes
1

answer
23

Views

Panda Group by time and count value of column

Let say I have an array with event and log time, like this: Time Event 01/01/2019 8h00 X 01/01/2019 8h10 Y 01/01/2019 9h10 X 02/01/2019 7h10 Z 02/01/2019 8h10 Y 02/01/2019 9h10 Y ... I want to have an output like this: 01/01/2019 [(X,2), (Y,1)] 02/01/2019 [(Y, 2), (Z,1)] ... For n...
Whysmerhill
1

votes
2

answer
43

Views

How to calculate difference between datetime within a group in Python?

I have a df sorted by AccountID and PurchaseDate. What I want to do is calculate and create new column of the difference between PurchaseDatethat are in each group of AccountID. AccountID PurchaseDate Price | 113 2018-09-01 22:56:30 13| | 113 2018-09...
IDontKnowAnything
1

votes
2

answer
30

Views

Why groupby in Pandas print not all columns?

q = [{"name":"Mike","age":21, "text": "aaa"},{"name":"Jow","age":22, "text": "bbb"},{"name":"Piter","age":22, "text": "ccc"},{"name":"David","age":25, "text": "ddd"}] df = pd.DataFrame(q) result = df["name"].groupby(df['age']).agg(','.join).to_frame() print(df) print('---') print(result) Output: $ a...
Dmitry Bubnenkov
1

votes
1

answer
37

Views

Pandas groupby: treat two columns as one

I have a dataframe, two of the columns are latitude and longitude. Each lat-lon pair represents a single location, and I would like to groupby that location. I could do this groupby operation by converting the two columns into a single column of tuples, and groupby that column. However, my actual d...
natemcintosh
1

votes
3

answer
39

Views

Pandas Sum Diagonal Value with groupby

I would like to sum the diagonal value of each year and residue, grouping by Object. For example for object a will be 1 + 10 + 11 + 12 + 13. Is there any way to do it without splitting the table by object? Note that the number of rows might be different for each object. I have tried: df.groupby('Com...
Teck
1

votes
2

answer
355

Views

Find a first non NaN value in Pandas

I have a Pandas dataframe such that |user_id|value|No| |:-:|:-:|:-:| |id1|100|1| |id1|200|2| |id1|250|3| |id2|NaN|1| |id2|100|2| |id3|400|1| |id3|NaN|2| |id3|200|3| |id4|NaN|1| |id4|NaN|2| |id4|300|3|. Then I want the folloing dataset: |user_id|value|No|NewNo| |:-:|:-:|:-:|:-:| |id1|100|1|1| |id1|20...
s_narisawa
0

votes
0

answer
5

Views

Pandas dataframe Merge text rows group by ID

I have a dataframe as follows: ID Date Text 1 01/01/2019 abcd 1 01/01/2019 pqrs 2 01/02/2019 abcd 2 01/02/2019 xyze I want to merge Text by ID in Python using group by clause. I want to merge 'Text' columns by grou...
ParagS
5

votes
5

answer
41

Views

Pandas: remove multiple rows based on condition

Below is a subset of a pandas dataframe I have and I am trying to remove multiple rows based on some conditions. code1 code2 grp1 grp2 dist_km 0 M001 M002 AAA AAA 112 1 M001 M003 AAA IHH 275 2 M002 M005 AAA XXY 150 3 M002 M004 AAA AAA 65 4 M003 M443 IHH GRR...
Funkeh-Monkeh
1

votes
1

answer
17

Views

Convert categories in columns into multiple columns coded as 1 or 0 based on the unique key in Python

I have data like this: user reg ind prod A Asia Tele TV A Asia Bank Phone A Japan Tele Book B US Fin Paper B US Data Shop B Asia Tele TV B Africa Invest Book C Asia...
Kshitij Yadav
2

votes
1

answer
21

Views

Pandas top N records in each group sorted by a column's value

import pandas as pd d = { 'resource': [1,2,3,4,5,6,7], 'branch': ['a', 'b', 'c', 'a', 'a', 'c', 'b'], 'utilization': [0.7, 0.76, 0.9, 0.3, 0.55, 0.87, 0.71] } df = pd.DataFrame(data=d) I need to display the top 2 utilized resources by branches Something like this: df.groupby('branch')[['resource',...
DmitrySemenov
1

votes
2

answer
30

Views

Pandas: Group by unknown time period

I have a dataset with different time periods. I'd like to group it per id and per time period, but: I don't know, how long each time period is or when it even starts. The one thing I surely know: A new time period starts, when the difference between two timestamps is higher than two minutes. Example...
nanoteilchen
6

votes
3

answer
79

Views

Cross tabulate counts between pairs of keywords per group with pandas

I have a table with keywords associated with articles, looks like this: article_id keyword 1 A 1 B 1 C 2 A 2 B 2 D 3 E 3 F 3 D I need to get a sort of a pivot table: A B C D E F A - 2 1 1 0...
Ildar Akhmetov
1

votes
1

answer
101

Views

Complex Groupby Pandas Operation to Replace For Loops and If Statements

I have a complex group of a group problem I need help with. I have names of drivers, each of who have driven several cars over time. Each time they turn on the car and drive, I capture cycles and hours, which are transmitted remotely. What I am trying to do is use grouping to see when the driver...
Ben C.
3

votes
2

answer
20

Views

Pandas - Iterate through lists / dictionaries for calculations

I am new to coding & I am looking for a pythonic way to implement the following code. Here is a sample dataframe with code: np.random.seed(1111) df2 = pd.DataFrame({ 'Product':np.random.choice( ['Prod 1','Prod 2','Prod 3', 'Prod 4','Prod 5','Prod 6','Box 1','Box 2','Box 3'], 10000), 'Transaction_Typ...
keg5038
3

votes
1

answer
22

Views

Getting value of a column where another column is minimum from group

INPUT I have an input dataframe with text, character length and a 'x' value. x text len flag 0 1 hi 2 1 1 1 hello 5 0 2 1 how 3 1 3 2 are 3 1 4 2 you? 4 1 5 2 kiddo 5 1 I want to groupby x and get the text of l...
Vishnudev
3

votes
3

answer
43

Views

How to groupby and sum if the cell value of certain columns fit specific conditions

I feel like what I'm trying to do is quite basic but I can't seem to find a similar post here. Please let me know if my post is indeed is duplicate. The data I have is about transportation crash incidents. The first two columns show the exact number of fatalities and injuries of the incident, but th...
Bowen Liu
4

votes
3

answer
25

Views

Groupby to create new columns

From a dataframe, I want to create a dataframe with new columns if the index is already found BUT I don't know how many columns I will create : pd.DataFrame([["John","guitar"],["Michael","football"],["Andrew","running"],["John","dancing"],["Andrew","cars"]]) and I want : pd.DataFrame([["John","guita...
FFL75
1

votes
1

answer
33

Views

Seaborn swarmplot of grouped dataframe

When I have a dataframe likes this here: import pandas as pd import seaborn as sns import random random.seed(0) df = pd.DataFrame({"Data":[random.random() for i in range(100)], "Cluster":[random.randint(0,10) for i in range(100)]}) I can easily plot the clusters with seaborn as boxplots: sns.boxplot...
F. Jehn
2

votes
2

answer
36

Views

group by a dataframe by values that are just less than a second off - pandas

Let's say i have a pandas dataframe as below: >>> df=pd.DataFrame({'dt':pd.to_datetime(['2018-12-10 16:35:34.246','2018-12-10 16:36:34.243','2018-12-10 16:38:34.216','2018-12-10 16:42:34.123']),'value':[1,2,3,4]}) >>> df dt value 0 2018-12-10 16:35:34.246 1 1 2018-12-10 16:36:34.243 2 2 2...
U9-Forward
2

votes
2

answer
42

Views

All columns are not passed when we use apply on result of groupby with a custom function

Create a DataFrame, x_df = pd.DataFrame({'a': [1,2,3,4,5,6], 'b': [1,2,1,2,1,2], 'c': ['x','x','y','y','z','z']}) Out[56]: a b c 0 1 1 x 1 2 2 x 2 3 1 y 3 4 2 y 4 5 1 z 5 6 2 z Now I want to use a function on every value of column 'c'. So I use the apply() function on the result...
Gautam Kumar
2

votes
1

answer
25

Views

How to get Top 3 box aggregation in pandas groupby - percentage of scores greater than 7 in 10-point scale?

I have the following Input pandas dataframe: Index respID company month score 0 101 AAA Oct'18 8 1 102 AAA Oct'18 10 2 103 AAA Oct'18 5 3 104 AAA Oct'18 4 4 105 BBB Oct'18 5 5 106 BBB Oct'18 6 6 107 BBB O...
ibarant
2

votes
1

answer
22

Views

Grouping columns by data type in pandas series throws TypeError: data type not understood

I am grouping values by type as follows: groups = frame.columns.to_series().groupby(frame.dtypes).groups by I get error: TypeError: data type not understood What would be the right way to go about grouping columns by datatype to prevent such errors? EDIT: Sample input 0 0 0 1985...
YohanRoth
1

votes
2

answer
24

Views

Pandas - Creating new dataframe with date as one df and staff details in another df

I have a Dataframe with list of all dates in a calander month. I have another Dataframe that has attendance of staff by day. I am trying to build a new Dataframe that would merge both these Dataframe. Given below is how df1 looks: date 10/1/2018 10/2/2018 10/3/2018 df2 looks as below: date,emp_id 1...
scott martin
0

votes
1

answer
12

Views

Pandas - Trying to merge a grouped Dataframe

I have a Dataframe that is grouped based on a few columns, if I tried to use this Dataframe to merge with another Dataframe get an error ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat Given below is how my Dataframe looks like: emp_id...
scott martin

View additional questions