Questions tagged [pandas]

34828 questions
1

votes
1

answer
572

Views

Pandas read_table() thousands=',' not working

I'm trying to read in some population data as an exercise to learn pandas: >>> countries = pd.read_table('country_data.txt', thousands=',', header=None, names=["Country Name", "Area (km^2)", "Areami2", "Population", "Densitykm2", "Densitymi2", "Date", "Source"], usecols=["Country Name", "Area (km^2)...
mszep
0

votes
0

answer
37

Views

How to compress rows after groupby in pandas

I have performed a groupby on my dataframe. grouped = data_df.groupby(['Cluster','Visit Number Final'])['Visitor_ID'].count() I am getting the below output : data_df.groupby(['Cluster','Visit Number Final'])['Visitor_ID'].count() Out[81]: Cluster Visit Number Final 0 1...
shreeja7
2

votes
0

answer
15

Views

XGBoost, handling continous and fixed data for loan dataset

Background: I am using XGBoost to develop a model to predict whether a particular loan will default or not. I have now included time-series data on Fico score, and other variables that change throughout time. Thus I have 13,202 unique loans but with over 300,000 rows with variable and fixed data. Qu...
Wolfy
0

votes
4

answer
56

Views

Python: Assign Labels to values in an array

I have an array which represents some time series data: array([[[-0.59776013], [-0.59776013], [-0.59776013], [-0.31863936], [-0.31863936], [-0.31863936], [-0.31863936], [-0.31863936], [-0.31863936], [ 0.31863936], [ 0.31863936], [ 0.31863936], [-0.31863936], [-0.31863936], [-0.31863936], [-0.3186393...
Murray
1

votes
2

answer
661

Views

How to slice a pandas dataframe by columns using a mix of array of labels and slice of objects?

Is there a way to slice a pandas dataframe mixing an 'array of labels' with a 'slice of objects'. I couldn't find an example here Indexing and Selecting Data A list or array of labels ['a', 'b', 'c'] A slice object with labels 'a':'f' Here is an example of what I am trying to do without just manuall...
IcemanBerlin
1

votes
3

answer
847

Views

Broadcasting a list in Pandas

I have a dataframe (a) , from which I want to subtract a list (b), column-wise: import numpy as np import pandas as pd In:a=pd.DataFrame(np.arange(0,20).reshape(5,4)) print(a) Out: 0 1 2 3 0 0 1 2 3 1 4 5 6 7 2 8 9 10 11 3 12 13 14 15 4 16 17 18 19 In: b=[1,2,3,...
Chris
0

votes
0

answer
22

Views

Efficient way to randomly select all rows from pandas dataframe corresponding to a column value

I have a pandas dataframe containing about 2 Million rows which looks like the following example ID V1 V2 V3 V4 V5 12 0.2 0.3 0.5 0.03 0.9 12 0.5 0.4 0.6 0.7 1.8 01 3.8 2.9 1.1 1.6 1.5 17 0.9 1.2 1.8 2.6 9.0 02 0.2 0.3 0.5 0.03 0.9 12 0.5 0.4 0.6 0.7...
iprof0214
0

votes
1

answer
21

Views

Generate minutely time range in python

Currently I am generating hourly time using the below code import pandas as pd times=[pd.to_datetime(i) for i in '09:14:00','10:15:00','11:15:00','12:15:00','13:15:00','14:15:00','15:15:00', '15:30:00'] I need to have minutely times like '09:14:00','09:15:00' Is there a way to have minutely times wi...
pythonRcpp
1

votes
2

answer
1.4k

Views

Pandas: Sum of first N non-missing values per row

I'd like to efficiently sum the first N non-missing values of a pandas DataFrame. For example, if I had dataframe like this: "df" sid 1900 1899 332 855 1285 1413 1063 1768 2320 1117 bid 309 -0.02 -0.03 -0.03...
ssquaxe
1

votes
2

answer
2.2k

Views

Pandas return NaT when is should not

My DataFrame is time NTCS001G002 NTCS001W005 0 2013-05-30 23:00:00 NaN NaN 1 2013-06-30 23:00:00 249 60 2 2013-07-31 23:00:00 161 2 3 2013-09-01 23:00:00 151 11 4 2013-09-04 23:00:00 14 0 5 2013-...
glennpierce
1

votes
1

answer
554

Views

Define two columns with one map in Pandas DataFrame

I have a function which returns a list of length 2. I would like to apply this function to one column in my dataframe and assign the result to two columns. This actually works: from pandas import * def twonumbers(x): return [2*x, 3*x] df = DataFrame([1,4,11],columns=['v1']) concat([df,DataFrame(df['...
Pekka
0

votes
1

answer
23

Views

problem with condition statement despite using right operator [duplicate]

This question already has an answer here: Logical operators for boolean indexing in Pandas 3 answers I wrote this script to create a specific variable that takes different values according to the number of reports. Count of Report is an integer column. no_audit = df_bei_index['Count of Report'] ==...
Filippo Sebastio
1

votes
0

answer
15

Views

How to compare datetime between dataframes in multi logic statements?

I am having issue comparing dates between two dataframes from inside a multi logic statement. df1: email datetimecreated [email protected] 2019-02-12 20:47:00 df2: EmailAddress DateTimeCreated [email protected] 2019-02-07 20:47:00 [email protected] 2018-11-13 20:47:00 [email protected] 2018-11-04 20:47:...
RustyShackleford
1

votes
3

answer
20

Views

Group by date range in pandas dataframe

I have a time series data in pandas, and I would like to group by a certain time window in each year and calculate its min and max. For example: times = pd.date_range(start = '1/1/2011', end = '1/1/2016', freq = 'D') df = pd.DataFrame(np.random.rand(len(times)), index=times, columns=["value"]) How...
Jason
1

votes
2

answer
24

Views

Python - reshape a dataframe using pandas

I have a .csv file like this ID FirstName LastName Age FirstName LastName Age 1 Sid Than 21 Sidd Thang 26 2 Art Mari 21 Arth Mariap 28 When I read this inside python using pandas the column names automatically changes to FirstName_y LastNam...
Sid29
0

votes
1

answer
19

Views

First row not recognized as column headers

I have the following code: import pandas as pd df = pd.read_csv("14_5.csv") print(df.head()) Price,Date,Ticker 104.0,2016-07-01,A 104.87815067615534,2016-07-05,A 104.41190933506331,2016-07-06,A 104.93195657145004,2016-07-07,A 104.42127356374375,2016-07-08,A When I add: prices = df.Price to the code,...
Mattpats
0

votes
0

answer
20

Views

Can we vectorize any function on pandas dataframes?

Assume that we have a pandas dataframe and we want to do a calculation using some columns of this dataframe and one additional data structure and fill a new column to this dataframe. Can we do this in a vectorized way instead of using apply method: snps_df['Context'] = snps_df.apply(getMutationInfo...
burcak
1

votes
1

answer
2.2k

Views

Custom time series resampling in Pandas

I have a df with OHLC data in a 1m frequency: Open High Low Close DateTime 2005-09-06 18:00:00 1230.25 1231.50 1230.25 1230.25 2005-09-06 18:01:00 1230.50 1231.75 1229.25 1230.50 . . 2005-09-07 15:59:00 1234.50 1235.50 1234.25 12...
hernanavella
1

votes
1

answer
51

Views

Aggregate over an index in pandas?

How can I aggregate (sum) over an index which I intend to map to new values? Basically I have a groupby result by two variables where I want to groupby one variable into larger classes. The following code does this operation on s by mapping the first by-variable but seems too complicating: import pa...
Gerenuk
1

votes
1

answer
2.3k

Views

How to filter strings in pandas series index

I'm trying to filter a pandas series by using a boolean expression on its index, which contains strings. For example, in the code below I wish to create a new Series (Sman) by filtering another series (S) for rows where the index items contain the substring 'man': from pandas import Series S = Serie...
dreme
1

votes
3

answer
5.3k

Views

Replace WhiteSpace with a 0 in Pandas (Python 3)

simple question here -- how do I replace all of the whitespaces in a column with a zero? For example: Name Age John 12 Mary Tim 15 into Name Age John 12 Mary 0 Tim 15 I've been trying using something like this but I am unsure how Pandas actually reads whi...
user3682157
1

votes
1

answer
3k

Views

Set pandas datetime index one day forward

Is it possible to set one day forward a datetime index in pandas? I've got this index mydata.data.index Out[7]: [2013-01-27 22:00:00, ..., 2014-12-16 22:00:00] Length: 500, Freq: None, Timezone: None and its wrong. The correct date that the data begin is the day after. That is 2014-12-17 and they e...
Uninvited
0

votes
0

answer
26

Views

Decimal class rounding in Pandas

i have problems rounding Decimals() inside a Pandas Dataframe. The round() method does not work and using quantize() neither. I've searched for a solution with no luck so far. round() does nothing, i asume it is meant for float numbers quantize() won't work because it is not a DataFrame function An...
Juanito
0

votes
2

answer
17

Views

Pandas - Python Remodel the Date Column

I have a date column like this in my pandas data-frame. My DataFrame looks like this, ID SerialDate 1 2008-1-15 2 T1 3 2008-1-17 4 T1 T1 is the only text that will be found in this column and there won't be any blanks.The dtype of this column is object I need to change this to look like, E...
Sid29
1

votes
2

answer
19

Views

How to create new file from two other csv files?

I have two .csv files. First: col. names: 'student_id' and 'mark' Second: col. names: 'student_id','name','surname' and I want create third .csv file with 'student_id','name', 'surname' where row['mark'] == 'five' or 'four' good_student=[] for index, row in first_file.iterrows(): if row['mark'] == '...
Badum
1

votes
1

answer
77

Views

creating a new dataframe based off if a particular value matches a value in a list

What I have is data in a pandas dataframe. There is one column that contains an customer_id. These are not unique ids. I have a list of selected customer ids (there are no repeating values in the list). What I want to do is create a new dataframe based on the ids in the list. I want all rows fo...
knop
1

votes
1

answer
3.7k

Views

Calculating similarity between rows of pandas dataframe

Goal is to identify top 10 similar rows for each row in dataframe. I start with following dictionary: import pandas as pd import numpy as np from scipy.spatial.distance import cosine d = {'0001': [('skiing',0.789),('snow',0.65),('winter',0.56)],'0002': [('drama', 0.89),('comedy', 0.678),('action',-0...
Null-Hypothesis
1

votes
1

answer
25

Views

How can I get similar distribution from different groups?

I've to find in the dataset subgroups with similar average for 2 metrics than my original group. For example, I'd like to find a city or group of cities with the closest average(metric 1) = 10 and average(metric 2) = 5. Dataset example: How can I do it?
gabriel.almeida
0

votes
1

answer
19

Views

Python Pandas - Replacing relevant texts fails

I am trying to compare two columns - primary column and secondary column. The secondary column might have a(.) or a text like " (On Leave) after the desired string. I learned that to replace ("."), it has to be passed with ("\.") If the secondary column holds a particular value like "NOTAPPLICABLEH...
Sid29
0

votes
0

answer
8

Views

Python scipy or pandas.series

I have some EMG signals and some inertial sensor data to analyse. I used to do things in MATLAB and now I want to try Python. Which library would you recommend. Scipy or pandas.series.
Laleh
1

votes
3

answer
6.3k

Views

ValueError: index must be monotonic increasing or decreasing

ser3 = Series(['USA','Mexico','Canada'],index = ['0','5','10']) here ranger = range(15) I get an error while using Forward fill in iPython ser3.reindex(ranger,method = 'ffill') /Users/varun/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in _searchsorted_monotonic(self, label, side) 2395...
Varun
1

votes
2

answer
2.6k

Views

Python - How to stream large (11 gb) JSON file to be broken up [duplicate]

This question already has an answer here: Opening A large JSON file in Python 3 answers I have a very large JSON (11 gb) file that is too large to read into my memory. I would like to break it up into smaller files to analyze the data. I am currently using Python and Pandas for the analysis and I a...
rgalbo
1

votes
2

answer
5.9k

Views

Pandas series to numpy array conversion error

I have a pandas series with foll. value_counts output(): NaN 2741 197 1891 127 188 194 42 195 24 122 21 When I perform describe() on this series, I get: df[col_name].describe() count 2738.000000 mean 172.182250 std 47.387496 min 0.000000 25% 171...
user308827
1

votes
2

answer
1.7k

Views

Add a new column to a Pandas DataFrame by using values in another column to lookup values in a dictionary

How do I add a column to a Pandas DataFrame, by multiplying an existing column by a factor from an external dictionary looked up using values from a second column in the same DataFrame as keys? I have a pd.DataFrame dataframe df roughly of the form code blah... year nominal 0 T.rrr bl...
curlew77
1

votes
2

answer
620

Views

Read from / write to a specific location in Excel file

Have a real use case for this. Want to be able to do some data aggregation and manipulation with Pandas, envisioned workflow as such: Find in an Excel file a named cell reach the boundary of the cell block (boundary defined by empty column / row) read the cell block into Pandas DataFrame do stuff wi...
PaulDong
0

votes
2

answer
28

Views

unsupported operand type(s) for &: 'str' and 'Timestamp'

Statement : df[df['Symbol'] =="TLT" & df['Date'].max()] Error : unsupported operand type(s) for &: 'str' and 'Timestamp' My pandas dataframe is df. It consists of a trading log. When I filter the df on Symbol and(&) timestamp I get the above error What did I do incorrectly ? I don't want to change...
John John
0

votes
2

answer
22

Views

To summarize difference from present row to the previous

Calculating the difference from present row to the previous, I have a simple data set and codes below: import pandas as pd data = {'Month' : [1,2,3,4,5,6,7,8,9,10,11,12], 'Rainfall': [112,118,132,129,121,135,148,148,136,119,104,118]} df = pd.DataFrame(data) Rainfall = df["Rainfall"] df['Changes'] =...
Mark K
1

votes
1

answer
192

Views

Pandas pivot_table using a given list of indices and columns

I would like some help to figure out how to pivot a pandas dataframe into a table with a given list of indices and columns (instead of the default behavior where the indices and columns are picked automatically by pandas). Apologies if this is trivial. I am new to python/pandas. Consider the followi...
balaks
1

votes
1

answer
471

Views

Use string.capwords with Pandas column

Given this data frame: df = pd.DataFrame( {'A' : ['''And's one''', 'And two', 'and Three'], 'B' : ['A', 'B', 'A']}) df A B 0 And's one A 1 And two B 2 and Three A I am attempting to capitalize the first letter only (without capitalizing the "s" in "And's"). The desired result...
Dance Party2
1

votes
1

answer
5.5k

Views

Pass Pandas DataFrame to Scipy.optimize.curve_fit

I'd like to know the best way to use Scipy to fit Pandas DataFrame columns. If I have a data table (Pandas DataFrame) with columns (A, B, C, D and Z_real) where Z depends on A, B, C and D, I want to fit a function of each DataFrame row (Series) which makes a prediction for Z (Z_pred). The signature...
Sman789

View additional questions