Questions tagged [pandas]

53735 questions
1

votes
1

answer
50

Views

how to convert pandas time data into the format that can be processed by matplotlib

in the 3 section of lecture, i encounrtered a problem that I could not upload any finance data from yahoo, so I use pandas datareader to uploaded a stock info for microsoft here is that code: MS= data.DataReader(name = 'MSFT', data_source = 'yahoo', start = '2007-07- 10', end = '2008-12-10') MS.head...
Xiaoyang Wu
1

votes
2

answer
88

Views

Create a new column in a dataframe if the column contains a string from a column of another dataframe

I want to create a new column in my dataframe if the column contains any of the values from a column of a second dataframe. First dataframe WXYnineZAB EFGsixHIJ QRSeightTUV GHItwoJKL YZAfiveBCD EFGsixHIJ MNOthreePQR ABConeDEF MNOthreePQR MNOthreePQR YZAfiveBCD WXYnineZAB GHItwoJKL KLMsevenNOP EFGsix...
Prasad
1

votes
0

answer
1.1k

Views

Pandas Styling HTML with Style.apply [duplicate]

This question already has an answer here: How to use `style` in conjunction with the `to_html` classes on a DataFrame? 1 answer Change the color of text within a pandas dataframe html table python using styles and css 2 answers I am trying to use my highlight_diff function to compare two dataframe...
Tyler Russell
1

votes
0

answer
2.9k

Views

Key Error - Jupyter Notebook

I have started to learn pandas. I am using Jupyter notebook. I have imported the test data file -Weather I read the file using panda. Below i have given the code. While I try to read max value of temperature or any other column. I am getting below error. Can you please help me to solve the issue....
vivek rajagopalan
1

votes
2

answer
85

Views

If logic in Pandas, Python 3

I've got a Pandas Dataframe from which I want to compare two columns and create a new column with a calculation based off the result of the comparison. Logic would be the following: If df['column1']>df['column2'] : df['New column']=(df['column1']+df['column2']) else : df['New column']=(df['column1']...
Salva Tdl
1

votes
1

answer
36

Views

If I extend a class in python, how to automatically return the result as the new class?

Here is a simple example: import pandas as pd class test_pd(pd.DataFrame): def __init__(self): super().__init__() def my_copy(self): return self.copy() if __name__=='__main__': a = test_pd() #a has a.my_copy() b = a.my_copy() #b does not have b.my_copy() I would like the test_pd.my_copy() function t...
fnosdy
1

votes
0

answer
618

Views

Write Pandas Dataframe To Existing Tab in Excel

I have a workbook called 'Pivot Template.xlsx'. I have data from a query in a Pandas dataframe called 'results'. I need to put this data on a tab called 'Pivot Data'. I am using the code below for writing to the file, but the problem is that deletes all of the existing tabs when writing the file....
Eric Shreve
1

votes
1

answer
32

Views

How to seperate an array value and put into DataFrame in pandas

I have an array which looks like this, [{'interval': '1', 'paramlist': [{'PARAMCODE': 'P7-3-5-2-0', 'UNIT': 'k', 'VALUE': '0'}, {'PARAMCODE': 'P2-1-3-4-0', 'UNIT': 'A', 'VALUE': '0'}]}, {'interval': '2', 'paramlist': [{'PARAMCODE': 'P7-3-5-2-0', 'UNIT': 'k', 'VALUE': '0'}, {'PARAMCODE': 'P2-1-3-4-0...
Noumi
1

votes
1

answer
118

Views

MiniBatchSparsePCA on Text Data

Goal I'm trying to replicate an application described in this paper (section 4.1), where Sparse Principal Component Analysis is applied to a text corpus with the output being K principal components, each displaying a 'structure that is otherwise hidden'. In other words, the principal components shou...
SeánMcK
0

votes
2

answer
17

Views

Concatenate related fields and replace within data frame

I'm in the process of concatenating two related fields throughout a large dataset. I feel like I have most of what I need but can't concat the fields properly. dataframe: id| date1foo| time1bar| date2foo| time2bar| date3foo | time3bar --|---------|---------|---------|---------|----------|--------...
grigs
0

votes
1

answer
20

Views

Do something if df does not contain specific column name

I want to do an if else operation based on non-existence of a specific column name in the df. if a_specific_column_is_NOT_in_the_df: print('not ok') else: print('ok') With the following code I can do the reverse of my task. if [col for col in df.columns if 'A' in col]: print('ok') else: prin...
k.koen
0

votes
0

answer
10

Views

Unable to attach a excel file which created from pandas dataframe in email from python

I am reading an file from amazon S3 and then doing some data manulipations in pandas and then sending an automoated email . The filename is always is going renamin the same . I am trying to convert the dataframemodelput to xls file and attach it in the email How to fix this My code import os import...
Rahul rajan
1

votes
0

answer
33

Views

Python: counting number of co-orindates within specified windows

I have two tab delimited files. Each has two columns, one for chromosome and one for column. I want to identify the number of positions in file2 that are within a specified window range of the positions in file1, and then check in the next window, and the next and so forth. So if this is the first r...
spiral01
1

votes
0

answer
46

Views

How to Convert 6s Power Consumption Time Series Data To 1 Hour Data?

I have a Time Series Data-set that looks like the following: Dates Power 09-11-12 23:40 123 09-11-12 23:40 0 09-11-12 23:40 0 09-11-12 23:40 0 09-11-12 23:40 0 09-11-12 23:40 123 09-11-12 23:40 123 09-11-12 23:40 122 09-11-12 23:40 122 09-11-12 23:41 122 09-11-12 23:41 0 09-11-...
Anisul Islam
1

votes
0

answer
141

Views

Pandas manual label encoding

Hello and Happy new year everybody. I came across some weird behaviour of pandas when tried to manually encode some labels. It would be awesome if anybody could explain why this is happening. So here is my code import numpy as np import pandas as pd from seaborn import load_dataset as data ti...
Marvin Taschenberger
1

votes
0

answer
361

Views

Accessing the columns of pivot table in Python Pandas

I'm using a python pandas pivot. How can I get access the columns of pivot on new data frame? KM_pivot_first = pd.pivot_table(read_sql_KM, values=['IMPRESSIONS','ENGAGEMENTS'],index='PLACEMENT_ID',aggfunc=np.sum) KM_data_summary = KM_pivot_first[['PLACEMENT_ID', 'IMPRESSIONS', 'ENGAGEMENTS']] error:...
dharmendra mishra
1

votes
0

answer
89

Views

Matrix operations on labeled arrays

I have a system with a linearised set of equations, such that the time-update operation can be performed by a matrix multiplication, y' = Ay, but I would also like to be able to index y using the names of the state variables, e.g. y['vel']. Is there a way to index in this way without losing the abil...
Joe H
1

votes
1

answer
136

Views

Pandas — Type error from Pivot

I have the data below. I am trying to create a pivot table, with 'Profile ID' at the side, and 'Booking Type' as the column headers. dfpivot=y.pivot_table(index='Profile Id', columns='Booking Type', aggfunc='count', fill_value=0) But I encounter the error below. TypeError: '>' not supported between...
unclegood
1

votes
0

answer
258

Views

How do I import multiple excel files and merge them using openpyxl

How do I import muleiple excel files and merge them using openpyxl the code I have so far is this..... from openpyxl import Workbook def Test (filepath, yaml_sheetname): assumed_cell = 'A1' wb = openpyxl.load_workbook('/Users/c1carpenter/Desktop/Test.xlsx') wb = openpyxl.load_workbook('/Users/c1carp...
Carolina Estrella
1

votes
0

answer
221

Views

pd.read_csv killed when Reading large files

I have 30 each has 2 gb of data compressed files in s3 location and I'm trying to decompress the files, convert them to data frames and then want to subset the data based on column names. When I'm running the code it is getting Killed(Please refer the screenshot for error). Please help me out how to...
DPs
1

votes
0

answer
164

Views

Pandas, Postgres, and Daylight Savings Time (DST)

I am using read_sql_query from Pandas vesion 0.22.0 to pull time series data from a local PostgreSQL database. If I do not parse the date columns, then I get the following data frame: dataid localminute use 0 1642 2012-05-11 19:00:00-05:00 0.827 1 1642 2012-05-11 19:01:0...
davidrpugh
1

votes
0

answer
168

Views

Pandas plot table is not plotting all the columns

I am trying to plot the dataframe which is represented in an image. def count_of_each_categories(self,y_train=None,y_test=None,technology_segment=None): ''' :param y_train: :param y_test: :return: ''' df_train = pd.DataFrame() df_train[technology_segment] = y_train df_test = pd.DataFrame() df_train...
Nitesh kumar
1

votes
1

answer
105

Views

Get special group in pandas multiindex

I have a DataFrame with MultiIndex like this: In [5]: df Out[5]: a b lvl0 lvl1 lvl2 A0 B0 C0 0 1 C1 2 3 C2 4 5 C3 6 7 B1 C0 8 9 C1 10 11 C2 12 13 C3 14 15 A1 B0 C0 16 47 C1 18 49 C2 20 41 C3 22 43 B1 C0 24 25 C1 26 27 C2...
zwordcn
1

votes
2

answer
72

Views

Combine columns and sort the values before creating a new column

I am making a python script that I want to combine several columns of string data and sort them alphabetically before creating the new column. To simplify my example here is a really simple example of the format of the data I am dealing with: Ingredient 1, Ingredient 2, Ingredient 3 pickles, beef, m...
Marcel
1

votes
0

answer
144

Views

Sometimes Getting empty DataFrame while using SQL query through pandas.read_sql_query

The below code is working fine when I run it, but sometimes it returns empty DataFrame. The table contains 2 million rows. conn = sqlite3.connect('somedatabase.db') exp = input('Enter Date') df = pd.read_sql_query('SELECT SYMBOL,TIME_STAMP,OPTION_TYP,STRIKE_PR,OPEN_INT,CONTRACTS,EXPIRY_DT FROM datat...
aditass
1

votes
1

answer
39

Views

Pandas compare values for the same time every day

I have this data frame: date_time value 1/10/2016 0:00:00 28.4 1/10/2016 0:05:00 28.4 1/10/2016 0:10:00 28.4 1/11/2016 0:00:00 27.4 1/11/2016 0:05:00 27.4 1/11/2016 0:10:00 27.4 I want to calculate the difference between two rows in the same timestamp everyday, then add new ca...
Trần Danh Lưu
1

votes
1

answer
27

Views

Creating pandas dataframes from nested json file that has lista

a picture on how the data look like So, I have a json file with data, the file is really nested, I want to take only the words and create a new dataframe for each post id. Can anyone help with this?
penestia
1

votes
1

answer
939

Views

Python list to .csv

I create a list like this my_list=[('a','b',1,2),('a1,'b',1,2)] i want it to dump to a .csv file with headers my_df = pd.DataFrame(dis) my_df.to_csv('E:\list.csv' ,header=['col1','col2','col3','col4'],index=False) but after running my code, The csv file does not have any headers,but instead of it it...
saddle point
1

votes
1

answer
216

Views

Unable to fillna a column in dataframe with values from a series

I am trying to fillna in a specific column of the dataframe with the mean of not-null values of the same type (based on the value from another column in the dataframe). Here is the code to reproduce my issue: import numpy as np import pandas as pd df = pd.DataFrame() #Create the DateFrame with a col...
Sachin Myneni
1

votes
1

answer
52

Views

counting number of new values per month in pandas dataframe

I have a huge list(pandas dataframe) that looks like this user userID Date 1/1/2018 Annual 12345 1/3/2018 Annual 12345 1/5/2018 One Time 1/11/2018 One Time 1/12/2018 One Time 1/13/2018 Annual 98765 . . 2/1/2018 Annual 12345 2/3/2018 Annual 12345...
Hirotaka Nakagame
1

votes
1

answer
505

Views

Large data with pivot table using Pandas

I’m currently using Postgres database to store survey answers. My problem I’m facing is that I need to generate pivot table from Postgres database. When the dataset is small, it’s easy to just read whole data set and use Pandas to produce the pivot table. However, my current database now has a...
Dat Nguyen
1

votes
1

answer
242

Views

Custom PeriodIndex (Python / Pandas equivalent to SAS INTNX)

I have a SAS background and I am new to Python. I would like to how to use PeriodIndex in a similar way that we use SAS intervals. This is my problem: We have an official interest rate that is published more or less monthly. This interest rate is valid until the next one is published. My objective i...
Bruno
1

votes
1

answer
310

Views

Compare timestamps in subsequent records with pandas

I have a large data set of 30000 KB (saved as a 'pandas' dataFrame) of chat conversations between experts and users. Each row represents a message sent by either the expert or the user. I want to measure the time between the second message the user sent and the second response of the expert. (notice...
Sharonio
1

votes
1

answer
180

Views

pandas pivot table with multiple information in cells

I am not overly familiar with pandas so this may be a dumb question. I was trying to pivot the following data: df = pd.DataFrame({ 'Country' : ['country1', 'country2', 'country3', 'country4'], 'Industry' : ['industry1:\$20 \n industry4:\$30', 'industry10:\$100', 'industry3:\$2 \n industry4:\$30 \n...
nomore
1

votes
0

answer
26

Views

Pandas - Writing a Change Log Between Multiple Dataframes

I am trying to think of the most efficient way to write a change log for large dataframes. I have thousands of dataframes that have one million rows and 20 column, so efficiency is paramount. I have a couple solutions for checking for differences between two dataframes, but I cannot figure out the...
Tyler Russell
1

votes
1

answer
56

Views

Cleaning date column imported from excel

So I have this data set: 1.0 20/20/1999 2.0 31/2014 3.0 2015 4.0 2008-01-01 00:00:00 5.0 1903-10-31 00:00:00 6.0 1900-01-20 00:00:00 7.0 2011-02-21 00:00:00 8.0 1999-10-11 00:00:00 Those dates imported from e...
MarkoBox
1

votes
1

answer
132

Views

Slice a Pandas Dataframe based on the results of a function on a column

I want to slice a dataframe using a condition based on a DateTime column's month, element by element: Met_Monthly_DF = Metsite_DF.iloc[Metsite_DF['DateTime'].month == Month] I get the error: builtins.AttributeError: 'Series' object has no attribute 'month' It works on an element by element basis if...
Georgina Peach
1

votes
1

answer
65

Views

Pandas and CSV Libraries CSV Manipulation

I am building a simple app. I want some values in my CSV to be updated every 15 minutes. I want this part of my app to run in the background to prevent blocking the interface. I couldn't get it to work the way I want to. My code: # I'm using the pandas, sched and time imports here: #INTERFACE @app....
Anne
1

votes
0

answer
37

Views

compute dataframe operations on selected rows only

I have time series based data. time,val 2018-04-25,30 2018-04-26,10 2018-04-27,-30 2018-04-28,0 2018-04-29,60 I need 4 columns to be added for: 1. mean 2. average 3. (shifted)shifted val by 1 4. (diff)1 if val>0 else 0 In first go I calculate these as: df['mean'] = df[val].ewm...
ashwani
1

votes
1

answer
370

Views

Pandas: Convert Month Name to Int + Concat to Column and Convert to Date time

Convert Month Name (ex.October) to int value. Append this column 'Month' to the beginning of another column 'FY' Convert new column 'Month FY' to date I've tried to used pandas, calendar, datetime, but have been unsuccessful. I would like to accomplish this without having to create a dict. End Goal...
G.G.

View additional questions