Questions tagged [pandas-groupby]

1

votes
3

answer
67

Views

how to group by different columns

I'm trying to group by different columns based on year and apply for just same year and finally store the result in a .csv file. My data and code is : ISO3 Income_Cat_1980 Income_Cat_1985 DWWC1980 DWWC1985 AFG L LM 5 10 AGO LM H...
water77
1

votes
1

answer
28

Views

Turning groupby into single row with new columns

I want to be able to turn a groupby into a single row, but the values of a second column in that groupby to be aggregated into new columns or -99, if there isn't sufficient data. After we group by session_id with this input: user_id session_id timestamp step impressions n_clicks 0 0...
winnie
1

votes
1

answer
33

Views

PANDAS: a way to combine rows that are grouped by a field

I have a DataFrame that looks like: test1 = pd.DataFrame( { 'ROUTE' : ['MIA-ORD', 'MIA-AUA', 'ORD-MIA', 'MIA-HOU', 'MIA-JFK', 'JFK-MIA', 'JFK-YYZ'], 'TICKET' : ['123', '345', '123', '678', '456', '345', '456'], 'COUPON' : [1,4,2,1,1,3,2], 'PAX' : ['Jessica', 'Alex', 'Jessica', 'Jamanica', 'Ernest','...
Efrain Valles
1

votes
3

answer
35

Views

Pandas data-frame based on months between date columns and average of value

I am working with a pandas dataframe, using df.groupby() was able to end in this that includes ['start_date'] and ['end_date'] and a value for an specific id. | id | start_date | end_date |value| |:-----------|------------======|:---------------|-----| | 1 | 02-01...
gizq
0

votes
1

answer
14

Views

Pandas groupby and short value and take top 3 With Rank unique in python?

I have Data Frame Like this Val1 Val2 0 a 1.0 1 a 1.0 2 a 0.98 3 a 0.78 4 a 0.70 5 b 0.97 6 b 0.67 7 b 0.75 8 b 1.0 I want to do groupby on Val1 and arrange val2 in decending order and take top un...
3

votes
2

answer
21

Views

Dataframe groupby - list of values

I have a following dataframe: driver_id status dttm 9f8f9bf3ee8f4874873288c246bd2d05 free 2018-02-04 00:19 9f8f9bf3ee8f4874873288c246bd2d05 busy 2018-02-04 01:03 8f174ffd446c456eaf3cca0915d0368d free 2018-02-03 15:43 8f174ffd446c456eaf3cca0915d0368d en...
Egor Maksimov
1

votes
2

answer
28

Views

Ordering event by date

I have a Pandas DataFrame of app installs which has one row per user per install - so that a user who has installed multiple apps will have multiple rows. The columns are user name, app name, and install date. A user can install multiple apps on the same day. How can I find the order of occurrence...
0

votes
0

answer
5

Views

Groupby and find the difference

I have a pandas DF: df = pd.DataFrame(np.random.randint(1,10,size=(6,2)),columns = list('AB')) df['A'] = ['1111','2222','1111','1111','2222','1111'] df['B'] = ['2001-01-10','2001-01-02','2001-02-11','2001-03-14','2001-02-01','2001-04-14'] df OP: A B 0 1111 2001-01-10 1 2222 2001-01...
vikky
1

votes
1

answer
31

Views

Join strings in each group and assign back to the original DataFrame

I have dataframe with two columns: user and lang. Each user knows one or more languages: lang user 0 Python Mike 1 Scala Mike 2 R John 3 Julia Michael 4 Java Michael I need to get for each row in user all the languages which he/she knows. I can do that: df.groupby('us...
Oysiyl
1

votes
2

answer
27

Views

Group re-occurring rows and find time difference from single datetime column in Pandas

I have a dataframe with a time column, and then a value column which has repeating A/B values. I need to be able to group these values into pairs and find the timedelta between them. import pandas as pd df = pd.DataFrame() df['time1'] = pd.date_range('2018-01-01', periods=6, freq='H') df['id'] = ra...
mbadd
1

votes
2

answer
19

Views

Expanding sum with group by date

I have a dataframe where I'm trying to do an expanding sum of values and group them by date. Specifically, my data looks like: creationDateTime OK Fail 2017-01-06 21:30:00 4 0 2017-01-06 21:35:00 4 0 2017-01-06 21:36:00 4 0 2017-01-07 21:48:00 3 1 2017-01-07 21:53:00 4 0 2017-01-08...
Radu Gheorghiu
1

votes
0

answer
13

Views

Grouping & aggregating large dataset by multiple columns

I'm trying to group my data by multiple columns and then aggregate values in other columns. While I've found numerous examples of this online, I'm running into issues when I attempt to apply the same practices to my DataFrame. I'm thinking it might be due to size (1.5mm+ rows). I have a DataFrame...
Le Chase
2

votes
1

answer
40

Views

Python Pandas: How to group and sort rows by column value?

I am having trouble figuring out how to group and sort rows by column value. My goal is to count the number of UNIQUE 'Package Codes' where column values are orange and blue. There are duplicate 'Package Code' values and all rows with the same 'Package Code' will also have the same country and color...
newuser245
1

votes
2

answer
45

Views

How can I perform a value dependent pivot table/Groupby in Pandas?

I have the following dataframe: Tran ID Category Quantity 0 001 A 5 1 001 B 2 2 001 C 3 3 002 A 4 4 002 C 2 5 003 D 6 I want to transform it into: Tran ID A...
Alex Kinman
1

votes
2

answer
27

Views

How to create a 2 level groupby of top n items

I have this dataframe STATE County POP 1 Alabama Autauga County 54571 2 Alabama Baldwin County 182265 3 Alabama Barbour County 27457 4 Alabama Bibb County 22915 5 Alabama Blount Cou...
CW Gan
1

votes
2

answer
45

Views

what is different between groupby.first, groupby.nth, groupby.head when as_index=False

Edit: the rookie mistake I made in string np.nan having pointed out by @coldspeed, @wen-ben, @ALollz. Answers are quite good, so I don't delete this question to keep those answers. Original: I have read this question/answer What's the difference between groupby.first() and groupby.head(1)? That an...
Andy L.
1

votes
1

answer
21

Views

Getting an error when calculating standard deviation using Pandas

I am trying to calculate standard deviation of multiple columns using two variables in the groupby. However, my code throws in an error and I am having a hard time figuring it out. I am using https://www.shanelynn.ie/summarising-aggregation-and-grouping-data-in-python-pandas/ as a guide. Below is a...
Lonewolf
1

votes
1

answer
310

Views

Compare timestamps in subsequent records with pandas

I have a large data set of 30000 KB (saved as a 'pandas' dataFrame) of chat conversations between experts and users. Each row represents a message sent by either the expert or the user. I want to measure the time between the second message the user sent and the second response of the expert. (notice...
Sharonio
1

votes
1

answer
507

Views

Create a dataframe based on column values of another dataframe

I have a dataframe as 20000 X 50. Two of the columns are Date and Time (represented as hour). Remaining columns have observations of some parameters during the time. What I am trying to achieve is create a new dataframe which averages all the remaining column values for every 3 hours per day and cre...
Techflu
1

votes
2

answer
27

Views

Pandas dataframe output

I have the following Pandas data frame created. #usr/bin/python import pandas as pd vals = [ 1 , 2 , 3 ] ctry_grp = ['USA', 'USA', 'USA'] state_grp = ['MA' , 'MA' , 'CT' ] country_mean = pd.DataFrame( {'values': vals,'country': ctry_grp,'state': state_grp }).groupby(['country'])....
Maa
1

votes
0

answer
251

Views

Time-dependent rank autocorrelation in pandas

I have a MultiIndex pandas DataFrame of this schematic form (although the real dataframe I'm working with has millions of rows): import pandas as pd df = pd.DataFrame([['Alpha', 'a', 1,10], ['Alpha', 'a', 2,20],['Alpha', 'a', 3,30], ['Alpha', 'b', 1,50],['Alpha', 'b', 2,60],['Alpha', 'b', 3,10], ['A...
Jon
1

votes
1

answer
156

Views

Pandas: Conditional replace on consecutive rows within a group

I am trying to build 'episodes' from a list of transactions organized by group (patient). I used to do this with Stata, but I'm not sure how to do it in Python. In Stata, I would say something like: by patient: replace startDate = startDate[_n-1] if startDate-endDate[_n-1]
EdB65
1

votes
1

answer
36

Views

groupby and apply on two dataframes

I have a pandas dataframe with 3 columns: key1, key2, document. All three columns are text fields with the size of document ranging from 50 characters to 5000 characters. I identify a vocabulary based on minimum frequency from the set of documents for each (key1, key2) for which I am using scikit...
ironv
66

votes
3

answer
35.7k

Views

Multiple aggregations of the same column using pandas GroupBy.agg()

Given the following (totally overkill) data frame example import pandas as pd import datetime as dt df = pd.DataFrame({ 'date' : [dt.date(2012, x, 1) for x in range(1, 11)], 'returns' : 0.05 * np.random.randn(10), 'dummy' : np.repeat(1, 10) }) is there an existing built-in way to apply two...
ely
0

votes
0

answer
15

Views

Subset Pandas DataFrame with between time function

I am working with a Pandas Data Frame and have a very specific desired result in mind. My Data Frame resembles the following: Date Last Price Volume SMAVG (15) 0 4/18/19 15:59 203.86 3173667 276179.0 1 4/18/19 15:58 203.95 71103 66533.0 2 4/18/19 15:57...
QFII
1

votes
1

answer
24

Views

Finding mean of those values in column B who reside in rows having one of the K largest elements in Column A: Pandas Dataframe GroupBy Object

I have a panda dataframe, call it df1, with many columns (col1, col2, ...) I want to group the data on two particular columns - say col4 and col7 In each group, I want to find the top K values in col9. Then, I want to find the mean of values in col10, which satisfy the condition of having the top K...
Ayush Soni
1

votes
2

answer
80

Views

How to find min and max time in Chat log conversation using pandas for calculating duration?

Want to calculate the duration of each ID and to write in the separate Columns ID Ques Time Expected output ---------------------------------- 11 Hi 11.21 1min 11 Hello 11.22 13 hey 12.11 10mins 13 what 12.22 14 so 01.01 2mins 14 ok 01.03 ----------------------...
RVKNLP
1

votes
1

answer
318

Views

Group consecutive rows in a pandas dataframe by conditioning on hitting max value in another column

I have a pandas dataframe indexed by a time series with columns of GPS latitude and acceleration for a satellite orbiting the Earth. This latitude oscillates between maximum and minimum values with a constant time period as expected. What I want to do is integrate the acceleration column over each o...
Andreas Ioannou
1

votes
0

answer
155

Views

How to do qualitative entropy aggregation in python pandas DataFrame

Lets us assume that we have a pandas DataFrame that looks as the following: |-----------|----------|-----| | member_id | group_id | pet | |-----------|----------|-----| | 111 | aaa | cat | | 222 | aaa | dog | | 333 | aaa | cat | | 444 | aaa | rat | | 555...
annievic
1

votes
1

answer
99

Views

Python pandas dataframe reshape

I am new to python and dataframes. I have a dataframe with the following structure: ID |DATE |COLUMN_1|COLUMN_2|COLUMN_3| ID_1 |2017-04-01 |VALA |VALB |VALC | ID_1 |2016-12-31 |VALD |VALE |VALF | ID_1 |2016-09-24 |VALG |VALH |VALI | ID_2 |2008-06-30 |VALJ |VALK...
cristi.calugaru
1

votes
0

answer
64

Views

pandas indexing to multiindex data

I have a dataset where df.columns=MultiIndex(levels=[['Area\n (In sq. km)', 'District Code', 'India/ State/ Union Territory/ District/ Sub-district', 'Name', 'Number of households', 'Number of towns', 'Number of villages', 'Population', 'Population per sq. km.', 'Sub District Code', 'Total/\nRural/\...
SudipM
1

votes
0

answer
136

Views

python resample/group by OHLC data

I have hourly OHLC data that I am trying to regroup to see only from 9pm to 5am in one row and than for every day like that. I've tried several ways suggested here, but without success. index_21_09 = eur.index.indexer_between_time('21:00','05:00') df = eur.iloc[index_21_09] With this I filter data...
Milos Jovanovic
1

votes
0

answer
552

Views

Python 3.6, Pandas Import .csv file, filter and analyze columns, save results to new .csv file

I have a .csv file 'Data.csv', and want to import it to Python3.6 using pandas. I want to analyze this data by filtering multiple columns, and save my results to new .csv files with my analyzed data. My 'Data.csv' is separated by ',', and works with loc to filter my columns. If my imported file is c...
Charlie
1

votes
0

answer
46

Views

most frequent count of a column value when the values are comma separated

I am trying to find the top most no of ratings watched by gender and age with ratings separated by comma in that particular column, I need to get the top most one with combination of gender and age. Data: gender age rating M young pg13, r, nr M adult r,pg13, pg F young nr,r,pg13 M ad...
pylearner
1

votes
0

answer
209

Views

Use Pandas to take Monthly Averages of data in a .csv file and save by Year in new .csv file

Python 3.6, Pandas 0.22.0: I have imported a .csv file called Data.csv. It contains weather data with columns 'NAME' 'DATE' 'SNOW' that references the location name, the date in MM/DD/YYYY format, and the amount of snowfall on that day. I want to group all rows by 'NAME', then calculate monthly aver...
Charlie
1

votes
0

answer
60

Views

Populate column based on previous row with a twist

I'm struggling with a Pandas problem. I have the following data. +--------+------+---------+---------+-------------+-------------+--------------+------------+-------------+------------+----------+ | symbol | side | status | origQty | executedQty | qty | availableQty | price | boughtVal...
nidkil
1

votes
1

answer
144

Views

missing date columns in pandas dataframe after using groupby

I have a dataframe which I am creating by reading an Excel file: Project Release Name Cycle Name Cycle Start Date Cycle End Date Exec Date Planned Exec Date Available Test Cases Planned Tested Passed Failed Blocked No Run Tester B1 Y1 CM1 2/7/2018 2/20/2018 2/6/2018 2/6/...
novastar
1

votes
1

answer
21

Views

Any method to avoid creating individual files when using groupby and sortvalues in pandas

this is a small part of my dataset which contains thousands of rows designation names runs wickets catches batsman brendon mccullum 78 0 12 bowler shane bond 0 3 0 bowler mitchell mcclenaghan 20 1...
Jhonny
1

votes
1

answer
42

Views

standardizing a value, iterating over a groupby object

I need some help iterating over a groupby object in python. I have people nested under a single ID variable, and then under each one of those, they have balances for anywhere from 3 to 6 months. So, printing the groupby object looks, for example, like this: (1, Primary BP Product Rpt Month Cl...
lucretiuss
1

votes
1

answer
156

Views

pandas groupby, cannot apply iloc to grouped objects

Apologies if my question has been answered before, or the answer is obvious. Let's say that in my dataset there are two tasks, 20 different trials each. Now I would like to select only last 6 seconds of each trial for further analysis. The dataset looks sort of like this (+more columns). This sample...
user396156

View additional questions