Questions tagged [pandas-groupby]

1

votes
1

answer
181

Views

Count following groupby on two columns in Pandas doesn't include groups with a zero count

I am grouping by two columns in a Pandas DataFrame, after which I count the size of each group. This grouped DataFrame will then be filtered and the data plotted in a bar chart. The issue i am having is that if a group has a zero count, it is not shown in the DataFrame and therefore does not appear...
MatthewCarterIO
1

votes
3

answer
66

Views

How do I group by date with Pandas?

I made a game and got the players’s data like this: StartTime Id Rank Score 2018-04-24 08:46:35.684000 aaa 1 280 2018-04-24 23:54:47.742000 bbb 2 176 2018-04-25 15:28:36.050000 ccc 1 223 2018-04-25 00:13:00.120000 aaa 4 79 2018-04-26 04:59:...
Alex Ran
1

votes
1

answer
597

Views

Group by Column in Dataframe and create seperate csv for all the group

I have a huge CSV file of 100kb which contains records. Example like below city employee california jhon delhi kumar us raj california brakers us kroja ... So I want to group them and stored in a separate CSV file My output result for above example would be c...
lava kumar
1

votes
0

answer
43

Views

Group By Pandas Datframe and Get Counts [duplicate]

This question already has an answer here: Get statistics for each group (such as count, mean, etc) using pandas GroupBy? 5 answers Pandas groupby.size vs series.value_counts vs collections.Counter with multiple series 1 answer I have a simple question but somehow I am not getting the results. My d...
Rafael
1

votes
1

answer
28

Views

Assigning values to columns based on groups in pandas

I have a data set that looks approximately like this: data_set = pd.DataFrame([ {'img_type': 'bias', 'CCD-TEMP': -10, 'explen': 0, 'mean': 1023.4234}, {'img_type': 'bias', 'CCD-TEMP': -10, 'explen': 0, 'mean': 1024.4334}, {'img_type': 'bias', 'CCD-TEMP': -15, 'explen': 0, 'mean': 1022.2344}, {'img_t...
schwim
1

votes
1

answer
52

Views

Pandas DataFrame Advanced Indexing

I am looking for some help with pandas DataFrame sorting. I have a Data frame of 8 columns that go like; ['Date' , 'S ID', 'Se ID', 'S #', 'File Size (Mb)', 'HD name', 'Start Time', 'End time'] I've then done a: DataFile.groupby(['HD Name','Date','Se ID','S ID'])['File Size (Mb)'].agg({'Sequenc...
Logan Voorneman
1

votes
0

answer
29

Views

How to plot timeseries data where not all dates present and day extends beyond 24h period?

I have a dataframe with a timestamp index: val reportTime 2017-01-07 00:14:00 49 2017-01-07 00:29:00 46 2017-01-07 00:44:00 49 2017-01-07 00:59:00 46 2017-01-07 01:14:00 49 The data is in 15 minute intervals for some (not all) Saturdays and Sundays in 2017. For each weekend in th...
lovelyzoo
1

votes
0

answer
60

Views

Cryptic ValueError in groupby.diff

I'm encountering an unhelpful ValueError when attempting to do a simple diff on a moderately sized dataframe. I've tracked it down to occuring at a particular line but there's nothing unusual about that line. (It does correspond to a value where there's only one instance for the group_by key, but...
A. Leistra
1

votes
1

answer
255

Views

compare the next row value and change the current row value using pandas python

any way of comparing a row value with the next row value and change the current row value using pandas? Basically in the the first Data frame DF1, in the value column, one of the value is '999', so the values of the next rows for that 'user-id' is less than the value '999'. so in this case i want to...
Arav
1

votes
1

answer
48

Views

How to use a combination of column values to filter data and create subsets?

I'm new to python and would like some help in scaling a project I'm working on. I have a data set with 25 columns. I need to filter that data by the unique combinations of 3 particular columns. Then name each of the unique filters as a subset (preferably just the combo of the values in each of the 3...
SoSincere3
1

votes
1

answer
99

Views

Killed worker when aggregating Dask data first over ID then on minutes

My goal is to aggregate NYC Citibike data first over station_id then on minutes of starttime in Dask. The head of the Dask DataFrame looks as follows, df_start.head() displays, starttime start_station_name start_station_id 72 2017-08-15 16:02:02 W 52 St & 11 Ave 72 2017-12-01 09:52:20 W...
Stereo
1

votes
2

answer
28

Views

Group a column when none of the rows are unique into one using pandas

Name Class Marks1 Marks2 AA CC 10 AA CC 33 AA CC 21 AA CC 24 I want to transform data in the above format into Name Class Marks1 Marks2 AA CC 10 33 AA CC 21 24 How should i achieve the result? PS- This is just an example of the dat...
Saumya Pandey
1

votes
2

answer
26

Views

Pandas grouped on a numetric condition

Here is my problem: I have a dataframe on this forme : name number A 2 B 10 C 25 D 35 E 45 F 55 and I want to group the name on numeric condition. In more details, I want to groupe by interval : [0,15), [15,40), [40,+inf) so I want the group (A, B), (C, D), (E,F) Do you kno...
kilag
1

votes
0

answer
37

Views

Pandas complex GroupBy or Pivot with non numerical data

I have a set of data that I'm trying to filter and group. For each unique ID, I want to get a count of unique users, IPs, and names, then list those things out. I've used this line to filter the data down to what I want, and I think its working: df = df[df.isin(df[df.duplicated(subset=['id'], keep=...
r1ty
1

votes
1

answer
35

Views

Grouping a dataframe and reordering based on date and counts

I have the following dataframe, that is grouped according to the invoice cycle first, then added to a count of clinics in each invoice cycle. Dataframe after groupby function I used the following code to add the count column: df5 = df4.groupby(['Invoice Cycle', 'Clinic']).size().reset_index(name='co...
Ali Javaid
1

votes
1

answer
55

Views

Cannot change Pandas Groupby Object

I'm trying to resample a group in a Pandas object. The resampling works, but somehow the object isn't modified... Do I need to create a new group or something? This is my code: grouped_by_product_comp = competitor_df.sort_values(['history_date']).groupby(['item_id']) for name, group in grouped_by_p...
Muriel
1

votes
1

answer
96

Views

Pandas - groupby - get_group with interval/date range

I'm trying use an interval/date range with the get_group() method. ranges = pd.date_range(start='1/1/1900', periods=12, freq='120M') dates = df.groupby(pd.cut(df['dob'], ranges)) I know typically you can use dates.get_group('groupName'). However, since I'm using a date range, I'm unable to get it to...
csf
1

votes
1

answer
34

Views

How to aggregate count of each unique value per column as a row indexed by column header?

I have a series looking like this: month_1 | month_2 | ... | month_X user_1 | label_1 | label_2 | ... | label_2 user_2 | label_2 | label_3 | ... | label_4 .... user_X | label_4 | label_1 | ... | label_55 I want to convert this into a table looking like this: month_1 | label_1 | count(la...
alm
1

votes
1

answer
41

Views

Pandas Top n % of grouped sum

I work for a company and am trying to calculate witch products produced the top 80% of Gross Revenue in different years. Here is a short example of my data: Part_no Revision Gross_Revenue Year 1 a 1 2014 2 a 2 2014 3 c...
SDS
1

votes
1

answer
802

Views

How do I sum unique values per column in Python? [duplicate]

This question already has an answer here: Get statistics for each group (such as count, mean, etc) using pandas GroupBy? 5 answers I am working with weblogs and have data containing account_id and session_id. Multiple sessions can be associated with one account. I want to create a new dataframe con...
Tadas Melnikas
1

votes
0

answer
25

Views

Custom function + groupby Pandas with different conditions on grouped by variables

I want to generate some weights using groupby on a data that originally looks like this : V1 V2 MONTH CHOICES PRIORITY X T1 M1 C1 1 X T1 M1 C2 0 X T1 M1 C3 0 X T2 M1 C1 1 X T2 M1 C5 0 X T2 M1 C6 0 X T2...
mjab
1

votes
2

answer
39

Views

GroupBy dataframe and find out max number of occurrences of another column

I have to use groupby() on a dataframe in python 3.x. Column name is Origin, then based upon the origin, I have to find out the destination with maximum occurrences. Sample df is like: year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time arr_delay origin dest 0...
Sandeep Sharma
1

votes
1

answer
185

Views

error: unhashable type: 'list'. While using df.groupby.apply

Here's my dataframe: I want to sort my dataframe by airline and then within this group by tweet_created. airline and tweet_created are two columns in my dataframe. I tried te following df.groupby(['airline']).apply(lambda x: x.sort_values(['tweet_created'])).reset_index(drop = True) But got this err...
Justin
1

votes
1

answer
19

Views

pandas dataframe group by next occurance of column value

Below is my dataframe info date time file msg 0 INFO: 2018-09-12 16:10:10: view.py: phone 1 INFO: 2018-09-12 16:10:10: view.py: asdasd 2 INFO: 2018-09-12 16:10:43: view.py: contact start 3 INFO: 2018-09-12 16:10:43: view.py:...
Athena
1

votes
1

answer
50

Views

Pandas: number of events since last win per id

This is an example of my dataset, which is about online gaming. We have the session id qualifying the bet, the date when the bet occurred and the result of the bet (win-draw-lose): e = {'session': ['1', '3', '1', '1', '3', '1', '2', '2', '1', '3', '3', '3', '3', '3', '2', '3', '3'], 'date': ['2018...
EAMC
1

votes
1

answer
43

Views

Pandas DataFrame grouping

I have a Dataframe that looks like the following: enter image description here The dataframe counts the number of question according to their state: question_count_data.columns = ['date', 'curriculum_name_en', 'concept', 'language', 'concept_name_en', 'concept_name_tc', 'state', 'question_count'] q...
Ismail Hossain
1

votes
1

answer
367

Views

Merge pandas groupBy objects

I have a huge dataset of 292 million rows (6GB) in CSV format. Panda's read_csv function is not working for such big file. So I am reading data in small chunks (10 million rows) iteratively using this code : for chunk in pd.read_csv('hugeData.csv', chunksize=10**7): #something ... In the #something...
Pushpendu Ghosh
1

votes
1

answer
33

Views

How to plot grouped dataframe for multiple years and countries?

This is the grouped DataFrame I am working on. I have several variables for several countries, and years. How do I plot a line chart using matplotlib that shows the evolution of the variable 'gdp_share' over time for the different countries? I have grouped the DataFrame using: data = data.groupby(['...
1

votes
2

answer
140

Views

Python Pandas groupby and join

I am fairly new to python pandas and cannot find the answer to my problem in any older posts. I have a simple dataframe that looks something like that: dfA ={'stop':[1,2,3,4,5,1610,1611,1612,1613,1614,2915,...] 'seq':[B, B, D, A, C, C, A, B, A, C, A,...] } Now I want to merge the 'seq' values from e...
Przemko5
1

votes
2

answer
41

Views

How to group a date column into year and sum a spending column according to the year?

I am trying to group my data to years and sum the spending according to the year they belong to. Here's a sample data: date: spend_amt: 2/1/2014 10000 2/5/2014 98 1/2/2015 5834.2 7/8/2017 561236 9/3/2017 568 28/1/2016 989895.3 My curre...
justalazyguy
1

votes
1

answer
53

Views

Python Pandas grouping columns

This is a Pandas question - my brain is too tired to figure this out today. Could someone please help me? I have a dataframe with many columns with one column as a category: Category B C D .... Z 1 2 11 1.0 'HOME' .... 1 3 21 1.0 'HOME' .... 1 1 33 .9 'GOPHER' .... 2 4...
old_guy
1

votes
0

answer
78

Views

using melt function in groupby for large data sets in python

I have one data frame with 1782568 distinct groups. So, when i melt that data by grouping level my kernal got stuck. So, I am decided to to melt the data by group wise and then i will combine all of them sequentially. For that I wrote the following function. def split(df,key): df2=pd.DataFrame() fo...
neeraja
1

votes
2

answer
51

Views

Pandas Multi Level Groupby: Pass grouped value range to function

I have a data frame with three columns: 'Company Name', 'Product', 'Spend'. Now I want to do the following: 1) Groupby 'Company Name' and 'Product' to see the money spend per Company and Product. grouped=df.groupby(['Company Name', 'Product']) 2) Iterate only over the 'Company Name' column of groupe...
MaximJ
1

votes
1

answer
16

Views

group id's according to their respective value in pandas panel

In my panda panel I have two columns, 'id' and 'amount'. There are multiple transactions for the same id too. There can be positive and negative values in the 'amount'-column. Now, I want to group all id's where the amount is negative and count them. How can I achieve this?
chetan parmar
1

votes
1

answer
23

Views

Counting total values per month while plotting only yearly labels

I have the following DataFrame : H T date date 1990-08-26 11:30:00 38.0 11.6 1990-08-26 1990-08-26 11:30:00 63.0 11.3 1990-08-26 1990-08-26 11:30:00 87.0 10.9 1990-08-26 1990-08-26 11:30:00 111.0 10.6 1990-08-26 1990-08-26 11:30:00 134.0...
Ioana Colfescu
1

votes
1

answer
35

Views

How to merge dictionaries of a pandas dataframe when grouping by rows

I have a dataframe of the form: id date area1 area2 01 20181010 {'a': 10, 'b': 15} {'a': 20, 'c': 13} 01 20181010 {'c': 17} {'b': 12} 02 20180506 {'a': 2, 'b': 3} {'c': 4} 02 20180506 Nan {'a': 18} I would like to group all rows with matching 'id' and 'da...
Juan M. Grados
1

votes
0

answer
16

Views

Spreading quantity over capacity

Have two dataframes: machines and demand as follows: machines import pandas as pd import numpy as np import itertools dates = pd.Series([d.date() for d in pd.date_range('1/1/2018', periods=4, freq='W')]) sites = pd.Series('TH, ID'.split(',')) product = list('AB') machine = ['M1', 'M2'] # can have mu...
reservoirinvest
1

votes
0

answer
31

Views

Line graph not showing with barchart

I'm having difficulty getting this code to generate a bar chart with a line chart on top of it when I filter my dataframe down to the last 7 dates. Currently, I can run each graph individually and they work fine but when I run them both together it gives me the bar chart with no line chart. How do I...
Jack Kenny
1

votes
1

answer
30

Views

Pandas groupby aggregation with variable time windows

I have a dataframe (df) that is like the one below: month-year name a b c start_date end_date 2018-01 X 2 1 4 2018-01-01 2018-01-31 2018-01 Y 1 0 5 2018-01-01 2018-02-31 2018-01 X 1 6 3 2018-01-01 2018-01-31 2018-01...
dukekeith313
1

votes
2

answer
57

Views

Pandas: how to remove duplicate rows, but keep ALL rows with max value [duplicate]

This question already has an answer here: Python : Getting the Row which has the max value in groups using groupby 10 answers How can I remove duplicate rows, but keep ALL rows with the max value. For example, I have a dataframe with 4 rows: data = [{'a': 1, 'b': 2, 'c': 3},{'a': 7, 'b': 10, 'c': 2...
Tuan Anh

View additional questions