# Questions tagged [pandas]

34828 questions

1

votes

1

answer

572

Views

### Pandas read_table() thousands=',' not working

I'm trying to read in some population data as an exercise to learn pandas:
>>> countries = pd.read_table('country_data.txt',
thousands=',',
header=None,
names=["Country Name", "Area (km^2)", "Areami2",
"Population", "Densitykm2", "Densitymi2",
"Date", "Source"],
usecols=["Country Name", "Area (km^2)...

0

votes

0

answer

37

Views

### How to compress rows after groupby in pandas

I have performed a groupby on my dataframe.
grouped = data_df.groupby(['Cluster','Visit Number Final'])['Visitor_ID'].count()
I am getting the below output :
data_df.groupby(['Cluster','Visit Number Final'])['Visitor_ID'].count()
Out[81]:
Cluster Visit Number Final
0 1...

2

votes

0

answer

15

Views

### XGBoost, handling continous and fixed data for loan dataset

Background:
I am using XGBoost to develop a model to predict whether a particular loan will default or not. I have now included time-series data on Fico score, and other variables that change throughout time. Thus I have 13,202 unique loans but with over 300,000 rows with variable and fixed data.
Qu...

0

votes

4

answer

56

Views

### Python: Assign Labels to values in an array

I have an array which represents some time series data:
array([[[-0.59776013],
[-0.59776013],
[-0.59776013],
[-0.31863936],
[-0.31863936],
[-0.31863936],
[-0.31863936],
[-0.31863936],
[-0.31863936],
[ 0.31863936],
[ 0.31863936],
[ 0.31863936],
[-0.31863936],
[-0.31863936],
[-0.31863936],
[-0.3186393...

1

votes

2

answer

661

Views

### How to slice a pandas dataframe by columns using a mix of array of labels and slice of objects?

Is there a way to slice a pandas dataframe mixing an 'array of labels' with a 'slice of objects'.
I couldn't find an example here Indexing and Selecting Data
A list or array of labels ['a', 'b', 'c']
A slice object with labels 'a':'f'
Here is an example of what I am trying to do without just manuall...

1

votes

3

answer

847

Views

### Broadcasting a list in Pandas

I have a dataframe (a) , from which I want to subtract a list (b), column-wise:
import numpy as np
import pandas as pd
In:a=pd.DataFrame(np.arange(0,20).reshape(5,4))
print(a)
Out: 0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
In: b=[1,2,3,...

0

votes

0

answer

22

Views

### Efficient way to randomly select all rows from pandas dataframe corresponding to a column value

I have a pandas dataframe containing about 2 Million rows which looks like the following example
ID V1 V2 V3 V4 V5
12 0.2 0.3 0.5 0.03 0.9
12 0.5 0.4 0.6 0.7 1.8
01 3.8 2.9 1.1 1.6 1.5
17 0.9 1.2 1.8 2.6 9.0
02 0.2 0.3 0.5 0.03 0.9
12 0.5 0.4 0.6 0.7...

0

votes

1

answer

21

Views

### Generate minutely time range in python

Currently I am generating hourly time using the below code
import pandas as pd
times=[pd.to_datetime(i) for i in '09:14:00','10:15:00','11:15:00','12:15:00','13:15:00','14:15:00','15:15:00', '15:30:00']
I need to have minutely times like '09:14:00','09:15:00'
Is there a way to have minutely times wi...

1

votes

2

answer

1.4k

Views

### Pandas: Sum of first N non-missing values per row

I'd like to efficiently sum the first N non-missing values of a pandas DataFrame.
For example, if I had dataframe like this:
"df"
sid 1900 1899 332 855 1285 1413 1063 1768 2320 1117
bid
309 -0.02 -0.03 -0.03...

1

votes

2

answer

2.2k

Views

### Pandas return NaT when is should not

My DataFrame is
time NTCS001G002 NTCS001W005
0 2013-05-30 23:00:00 NaN NaN
1 2013-06-30 23:00:00 249 60
2 2013-07-31 23:00:00 161 2
3 2013-09-01 23:00:00 151 11
4 2013-09-04 23:00:00 14 0
5 2013-...

1

votes

1

answer

554

Views

### Define two columns with one map in Pandas DataFrame

I have a function which returns a list of length 2. I would like to apply this function to one column in my dataframe and assign the result to two columns.
This actually works:
from pandas import *
def twonumbers(x):
return [2*x, 3*x]
df = DataFrame([1,4,11],columns=['v1'])
concat([df,DataFrame(df['...

0

votes

1

answer

23

Views

### problem with condition statement despite using right operator [duplicate]

This question already has an answer here:
Logical operators for boolean indexing in Pandas
3 answers
I wrote this script to create a specific variable that takes different values according to the number of reports. Count of Report is an integer column.
no_audit = df_bei_index['Count of Report'] ==...

1

votes

0

answer

15

Views

### How to compare datetime between dataframes in multi logic statements?

I am having issue comparing dates between two dataframes from inside a multi logic statement.
df1:
email datetimecreated
[email protected] 2019-02-12 20:47:00
df2:
EmailAddress DateTimeCreated
[email protected] 2019-02-07 20:47:00
[email protected] 2018-11-13 20:47:00
[email protected] 2018-11-04 20:47:...

1

votes

3

answer

20

Views

### Group by date range in pandas dataframe

I have a time series data in pandas, and I would like to group by a certain time window in each year and calculate its min and max.
For example:
times = pd.date_range(start = '1/1/2011', end = '1/1/2016', freq = 'D')
df = pd.DataFrame(np.random.rand(len(times)), index=times, columns=["value"])
How...

1

votes

2

answer

24

Views

### Python - reshape a dataframe using pandas

I have a .csv file like this
ID FirstName LastName Age FirstName LastName Age
1 Sid Than 21 Sidd Thang 26
2 Art Mari 21 Arth Mariap 28
When I read this inside python using pandas the column names automatically changes to FirstName_y LastNam...

0

votes

1

answer

19

Views

### First row not recognized as column headers

I have the following code:
import pandas as pd
df = pd.read_csv("14_5.csv")
print(df.head())
Price,Date,Ticker
104.0,2016-07-01,A
104.87815067615534,2016-07-05,A
104.41190933506331,2016-07-06,A
104.93195657145004,2016-07-07,A
104.42127356374375,2016-07-08,A
When I add:
prices = df.Price
to the code,...

0

votes

0

answer

20

Views

### Can we vectorize any function on pandas dataframes?

Assume that we have a pandas dataframe and we want to do a calculation using some columns of this dataframe and one additional data structure and fill a new column to this dataframe.
Can we do this in a vectorized way instead of using apply method:
snps_df['Context'] = snps_df.apply(getMutationInfo...

1

votes

1

answer

2.2k

Views

### Custom time series resampling in Pandas

I have a df with OHLC data in a 1m frequency:
Open High Low Close
DateTime
2005-09-06 18:00:00 1230.25 1231.50 1230.25 1230.25
2005-09-06 18:01:00 1230.50 1231.75 1229.25 1230.50
.
.
2005-09-07 15:59:00 1234.50 1235.50 1234.25 12...

1

votes

1

answer

51

Views

### Aggregate over an index in pandas?

How can I aggregate (sum) over an index which I intend to map to new values? Basically I have a groupby result by two variables where I want to groupby one variable into larger classes. The following code does this operation on s by mapping the first by-variable but seems too complicating:
import pa...

1

votes

1

answer

2.3k

Views

### How to filter strings in pandas series index

I'm trying to filter a pandas series by using a boolean expression on its index, which contains strings. For example, in the code below I wish to create a new Series (Sman) by filtering another series (S) for rows where the index items contain the substring 'man':
from pandas import Series
S = Serie...

1

votes

3

answer

5.3k

Views

### Replace WhiteSpace with a 0 in Pandas (Python 3)

simple question here -- how do I replace all of the whitespaces in a column with a zero?
For example:
Name Age
John 12
Mary
Tim 15
into
Name Age
John 12
Mary 0
Tim 15
I've been trying using something like this but I am unsure how Pandas actually reads whi...

1

votes

1

answer

3k

Views

### Set pandas datetime index one day forward

Is it possible to set one day forward a datetime index in pandas?
I've got this index
mydata.data.index
Out[7]:
[2013-01-27 22:00:00, ..., 2014-12-16 22:00:00]
Length: 500, Freq: None, Timezone: None
and its wrong. The correct date that the data begin is the day after. That is 2014-12-17 and they e...

0

votes

0

answer

26

Views

### Decimal class rounding in Pandas

i have problems rounding Decimals() inside a Pandas Dataframe. The round() method does not work and using quantize() neither. I've searched for a solution with no luck so far.
round() does nothing, i asume it is meant for float numbers
quantize() won't work because it is not a DataFrame function
An...

0

votes

2

answer

17

Views

### Pandas - Python Remodel the Date Column

I have a date column like this in my pandas data-frame.
My DataFrame looks like this,
ID SerialDate
1 2008-1-15
2 T1
3 2008-1-17
4 T1
T1 is the only text that will be found in this column and there won't be any blanks.The dtype of this column is object
I need to change this to look like,
E...

1

votes

2

answer

19

Views

### How to create new file from two other csv files?

I have two .csv files.
First:
col. names: 'student_id' and 'mark'
Second:
col. names: 'student_id','name','surname'
and I want create third .csv file with 'student_id','name', 'surname' where row['mark'] == 'five' or 'four'
good_student=[]
for index, row in first_file.iterrows():
if row['mark'] == '...

1

votes

1

answer

77

Views

### creating a new dataframe based off if a particular value matches a value in a list

What I have is data in a pandas dataframe. There is one column that contains an customer_id. These are not unique ids. I have a list of selected customer ids (there are no repeating values in the list). What I want to do is create a new dataframe based on the ids in the list. I want all rows fo...

1

votes

1

answer

3.7k

Views

### Calculating similarity between rows of pandas dataframe

Goal is to identify top 10 similar rows for each row in dataframe.
I start with following dictionary:
import pandas as pd
import numpy as np
from scipy.spatial.distance import cosine
d = {'0001': [('skiing',0.789),('snow',0.65),('winter',0.56)],'0002': [('drama', 0.89),('comedy', 0.678),('action',-0...

1

votes

1

answer

25

Views

### How can I get similar distribution from different groups?

I've to find in the dataset subgroups with similar average for 2 metrics than my original group.
For example, I'd like to find a city or group of cities with the closest average(metric 1) = 10 and average(metric 2) = 5.
Dataset example:
How can I do it?

0

votes

1

answer

19

Views

### Python Pandas - Replacing relevant texts fails

I am trying to compare two columns - primary column and secondary column. The secondary column might have a(.) or a text like " (On Leave) after the desired string.
I learned that to replace ("."), it has to be passed with ("\.")
If the secondary column holds a particular value like "NOTAPPLICABLEH...

0

votes

0

answer

8

Views

### Python scipy or pandas.series

I have some EMG signals and some inertial sensor data to analyse. I used to do things in MATLAB and now I want to try Python. Which library would you recommend. Scipy or pandas.series.

1

votes

3

answer

6.3k

Views

### ValueError: index must be monotonic increasing or decreasing

ser3 = Series(['USA','Mexico','Canada'],index = ['0','5','10'])
here ranger = range(15)
I get an error while using Forward fill in iPython
ser3.reindex(ranger,method = 'ffill')
/Users/varun/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in _searchsorted_monotonic(self, label, side)
2395...

1

votes

2

answer

2.6k

Views

### Python - How to stream large (11 gb) JSON file to be broken up [duplicate]

This question already has an answer here:
Opening A large JSON file in Python
3 answers
I have a very large JSON (11 gb) file that is too large to read into my memory.
I would like to break it up into smaller files to analyze the data. I am currently using Python and Pandas for the analysis and I a...

1

votes

2

answer

5.9k

Views

### Pandas series to numpy array conversion error

I have a pandas series with foll. value_counts output():
NaN 2741
197 1891
127 188
194 42
195 24
122 21
When I perform describe() on this series, I get:
df[col_name].describe()
count 2738.000000
mean 172.182250
std 47.387496
min 0.000000
25% 171...

1

votes

2

answer

1.7k

Views

### Add a new column to a Pandas DataFrame by using values in another column to lookup values in a dictionary

How do I add a column to a Pandas DataFrame, by multiplying an existing column by a factor from an external dictionary looked up using values from a second column in the same DataFrame as keys?
I have a pd.DataFrame dataframe df roughly of the form
code blah... year nominal
0 T.rrr bl...

1

votes

2

answer

620

Views

### Read from / write to a specific location in Excel file

Have a real use case for this. Want to be able to do some data aggregation and manipulation with Pandas, envisioned workflow as such:
Find in an Excel file a named cell
reach the boundary of the cell block (boundary defined by empty column / row)
read the cell block into Pandas DataFrame
do stuff wi...

0

votes

2

answer

28

Views

### unsupported operand type(s) for &: 'str' and 'Timestamp'

Statement :
df[df['Symbol'] =="TLT" & df['Date'].max()]
Error : unsupported operand type(s) for &: 'str' and 'Timestamp'
My pandas dataframe is df. It consists of a trading log.
When I filter the df on Symbol and(&) timestamp I get the above error
What did I do incorrectly ? I don't want to change...

0

votes

2

answer

22

Views

### To summarize difference from present row to the previous

Calculating the difference from present row to the previous, I have a simple data set and codes below:
import pandas as pd
data = {'Month' : [1,2,3,4,5,6,7,8,9,10,11,12],
'Rainfall': [112,118,132,129,121,135,148,148,136,119,104,118]}
df = pd.DataFrame(data)
Rainfall = df["Rainfall"]
df['Changes'] =...

1

votes

1

answer

192

Views

### Pandas pivot_table using a given list of indices and columns

I would like some help to figure out how to pivot a pandas dataframe into a table with a given list of indices and columns (instead of the default behavior where the indices and columns are picked automatically by pandas). Apologies if this is trivial. I am new to python/pandas.
Consider the followi...

1

votes

1

answer

471

Views

### Use string.capwords with Pandas column

Given this data frame:
df = pd.DataFrame(
{'A' : ['''And's one''', 'And two', 'and Three'],
'B' : ['A', 'B', 'A']})
df
A B
0 And's one A
1 And two B
2 and Three A
I am attempting to capitalize the first letter only (without capitalizing the "s" in "And's").
The desired result...

1

votes

1

answer

5.5k

Views

### Pass Pandas DataFrame to Scipy.optimize.curve_fit

I'd like to know the best way to use Scipy to fit Pandas DataFrame columns. If I have a data table (Pandas DataFrame) with columns (A, B, C, D and Z_real) where Z depends on A, B, C and D, I want to fit a function of each DataFrame row (Series) which makes a prediction for Z (Z_pred).
The signature...