Questions tagged [pandas]

54145 questions
1

votes
1

answer
2.1k

Views

Plot percentiles using matplotlib

I have three dataframes df1, df2 and df3. I combine these into one dataframe df. Now i want to find the min, 5 percentile, 25 percentile, median, 90 percentile and max for each date in the dataframe and plot it (line graph for each date) where X axis has the percentiles and Y axis has the values. df...
Sun
1

votes
1

answer
6.5k

Views

Python - Pandas delete specific rows/columns in excel

i have the following excel file, and i would like to clean specific rows/columns so that i can further process the file. I have tried this, but i have not managed to remove any of the blank lines, i ve only managed to trim from those containing data. Here, i was trying to only save the data from the...
onlyf
1

votes
1

answer
269

Views

python pandas non-unique dict keys

I have an Excel file with data like this Fruits Description oranges This is an orange apples This is an apple oranges This is also oranges plum this is a plum plum th...
jumpman8947
1

votes
1

answer
37

Views

How to combine strings in one DataFrame

I am processing inbound user data. I receive DataFrame h that is supposed to contain all float but has some strings: >>> h = pd.DataFrame(np.random.rand(3, 2), columns=['a', 'b']) >>> h.loc[0, 'a'] = 'bad' >>> h.loc[1, 'b'] = 'robot' >>> h a b 0 bad 0.747314 1 0.921919 ro...
Jason Strimpel
1

votes
1

answer
24

Views

Compare every element between two dataframes

Assuming that I have two dataframes: # df1 +-----------------------+ | Name_1 |Age| Location | +-----------------------+ | A | 18 | UK | | B | 19 | US | +-----------------------+ # df2 +-------------------------+ | Name_2 | Age | Location | +-------------------------+ | A | 18...
Old-School
0

votes
0

answer
20

Views

Is there a method similar to lag (in R) or shift (in Python) for the javascript framework data-forge?

I have a code like this: const dataForge = require('data-forge'); var df = new dataForge.DataFrame({columns: {col1: [1,2,3,4,5]}}); console.log(df.toString()); with this result: __index__ col1 --------- ---- 0 1 1 2 2 3 3 4 4 5 in R and in Python like n...
EFA
-1

votes
1

answer
19

Views

how to build function for following simple problem?

I have multiple data frames. I need to merge them all and then set one by one column from all df. I make it simple for you.i have multiple lists .like l1=[a,b,c] l2=[d,e,f] l3=[g,h,i] I want my list such that give below. list=[a,d,g,b,e,h,c,f,i]
Aamir Siddiqui
0

votes
0

answer
15

Views

How do I dynamically change a dataframe in my site by user input? (for example maybe rearrange the same column by value? from max to min?)

I want to have an option in the table in my site that allows the user to rearrange the table by min or max value in a specific column, but couldn't find anything about this. The dataset I am using is this: https://raw.githubusercontent.com/plotly/datasets/master/bubble_chart_tutorial.csv I want the...
Dat Guy
1

votes
1

answer
811

Views

Computing the difference between first and last values in a rolling window

I am using the Pandas rolling window tool on a one-column dataframe whose index is in datetime form. I would like to compute, for each window, the difference between the first value and the last value of said window. How do I refer to the relative index when giving a lambda function? (in the bracket...
Pandora
1

votes
3

answer
5.7k

Views

How to Open csv file with pandas data frame

There is a CSV format file with three column dataframe. The third column has long text. This error message occurred, when i tried to open the file using pandas.read_csv message : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte. But there is no problem openi...
Antenna_
1

votes
5

answer
835

Views

Replace whole string if it contains substring in pandas dataframe

I have an sample dataset. raw_data = { 'categories': ['sweet beverage', 'salty snacks', 'beverage,sweet', 'fruit juice,beverage,', 'salty crackers'], 'product_name': ['coca-cola', 'salted pistachios', 'fruit juice', 'lemon tea', 'roasted peanuts']} df_a = pd.DataFrame(raw_data) I need to iterate th...
Zoozoo
1

votes
3

answer
105

Views

Creating new pandas dataframe from partial string match [duplicate]

This question already has an answer here: Select by partial string from a pandas DataFrame 8 answers I have a relatively simple dataframe that looks like this (see below). One of the columns, 'Book', is a list of strings. My goal is to make new dataframes for each of the three distinct values in '...
Warthog1
1

votes
1

answer
492

Views

Pandas - cut records with the custom percentiles

I have a pandas dataframe with a column of continous variables. I need to convert them into 3 bins, such that first bin encompases values 80th percentile. I am trying to achieve it by first getting the bin boundaries for such percentiles and then using pandas cut function. The issue is that I get an...
Maksim Khaitovich
1

votes
1

answer
706

Views

Getting the n largest values for groups [duplicate]

This question already has an answer here: Pandas get topmost n records within each group 2 answers I am looking to isolate the top 2 values per group for the following data. Brand | Product | Rank A | P1 | 1000 | P2 | 1210 | P3 | 2000 | P4 | 600 | P5 | 756 |...
Dys_Lexi_A
1

votes
3

answer
475

Views

Python: dividing each column of panda dataframe by a data series

I have a panda dataframe in which I want to divide each column by the same data series values for each row. Each column and the data series have the same length. The data series has only float numbers but some cells in the dataframe have NaNs. I tried various things but somehow I cannot get it solve...
Rolf12
1

votes
1

answer
198

Views

How to mark a verb in a sentence, using spaCy? Python

I want to mark verbs in sentences by adding an 'X' at the end of the verb word, like this verbX. SpaCy assigns tags to sentence elements that Python does not index separately. For example, spaCy sees a bracket '(' or full stop behind a word '.' as a separate position, whereas Python does not. As a r...
twhale
0

votes
1

answer
11

Views

Pandas BuildPaths Efficiently

I have a Pandas dataframe like below, which has two arbitrary customers with 2 months' data(there are more months) and ATL_Flag which are marketing channels(there are more of them too): |App_Flag|ATL_Flag|Cust_No|month1|month2 | 0 | TV | 1 | 1 | 0 | 0 | FB | 1 | 0 | 0...
Ahsan
0

votes
0

answer
19

Views

how to handle with continuous values in array

I would like to create a submission file to the problem, but my predictions got continuous values in the array, please help me how to solve. I have array values like this: predictions array([[5.5161709e-01, 4.4297403e-01, 5.3959554e-03, 1.2935511e-05], [5.5161709e-01, 4.4297403e-01, 5.3959554e-03, 1...
suri
1

votes
2

answer
276

Views

using duplicates values from one column to remove entire row in pandas dataframe

I have the data in the csv file uploaded in the following link Clikc here for the data In this file i have the following columns Team Group Model SimStage Points GpWinner GpRunnerup 3rd 4th There will be duplicates in the columns Team. Another columns is SimStage. Simstage is having a s...
Zephyr
1

votes
1

answer
2k

Views

converty numpy array of arrays to 2d array

I have a pandas series features that has the following values (features.values) array([array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]), ..., array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0]), array([0, 0, 0, ..., 0, 0, 0])], dtype=object) N...
Nate Stemen
1

votes
2

answer
332

Views

pandas: extend dataframe and increase indices automatically

Given a DataFrame with a monotonically increasing index, e.g. values 100 10 200 9 300 15 400 7 I'd like to extend it by copying the last value, and automatically continue the indices (or perhaps by supplying the step, that's still fine): values 100 10 200 9 300 15 400 7 500...
Ziofil
1

votes
1

answer
2.1k

Views

Pandas DataFrame as an Argument to a Function - Python

Suppose a Pandas DataFrame is passed to a function as an argument. Then, does Python implicitly copy that DataFrame or is the actual DataFrame being passed in? Hence, if I perform operations on the DataFrame within the function, will I be changing the original (because the references are still inta...
WhiteDillPickle
1

votes
4

answer
110

Views

Best way to match list of words with a list of job descriptions python

Here is my problem (I'm working on python) : I have a Dataframe with columns: Index(['job_title', 'company', 'job_label', 'description'], dtype='object') And I have a list of words that contains 300 skills: keywords = ['C++','Data Analytics','python','R', ............ 'Django'] I need to match tho...
Eddie A.
1

votes
1

answer
172

Views

Pandas compare two columns and copy value of another column if there is a match only for first unique value

I have two different dataframes of which I want to compare two columns. If the value of the first dataframe appears anywhere in the column of the second dataframe, I want to copy the value next to the matching value and copy this to a new column in the first dataframe. The dataframes look like this:...
Stefan
1

votes
1

answer
348

Views

looping through a dictionary of dataframes in python

First of all, I wrote this function: def writing_in_excel(path, df, sheet_name): writer = pd.ExcelWriter(path, datetime_format='m/d/yyyy') df.to_excel(writer, sheet_name = sheet_name, index=False, freeze_panes = (0,1)) writer.save() writer.close() Then, I have a dictionary of dataframes: import pand...
Yun Tae Hwang
1

votes
1

answer
176

Views

write unicode data to mssql with python?

I'm trying to write a table from a .csv file with Hebrew text in it to an sql server database. the table is valid and pandas reads the data correct (even displays the hebrew properly in pycharm), but when i try to write it to a table in the database i get question marks ('???') where the Hebrew shou...
Dror Bogin
1

votes
2

answer
55

Views

Fill missing values based on another column in a pandas DataFrame

I'm working with Pandas and numpy, For the following data frame, lets call it 'data', for the Borough values with data['Borough'] == 'Unspecified', I need to use the zip code in the Incident Zip field to the left of it to do a lookup on the Incident Zip column for the matching zip code and Borough....
Shihan Rehman
1

votes
1

answer
490

Views

Adding a trend line to a matplotlib line plot python

Apologies if this has already been asked but I can't find the answer anywhere. I want to add an overall trend line to a plt plot. Sample data: import pandas as pd data = pd.DataFrame({'year': [2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019], 'value': [2, 5, 8, 4, 1, 6, 10, 14, 8]}) import mat...
prmlmu
1

votes
1

answer
21

Views

How to access nested JSON object in Python DataFrame

I have a JSON response that I am converting to a DataFrame in Python. JSON response: [ { 'id': 123456, 'first_name': 'John', 'last_name': 'Doe', 'fields': [ { 'title': 'ABC', 'value': '123' }, { 'title': 'DEF', 'value': '456' } ] } ] When I parse this JSON to a DataFrame, the columns appear as id, f...
Matt
0

votes
1

answer
28

Views

Specifying arguments in a function from a list

I've read up on a number of threads (here and here) and the docs (here and here). However, I can't get this to work. I get an error of AxisError: axis 0 is out of bounds for array of dimension 0 Thanks. import pandas as pd from scipy.stats import levene data = {'A': [1,2,3,4,5,6,7,8], 'B': [9,10,11...
Christopher
1

votes
1

answer
22

Views

Pandas: How to find number of unique elements for one column coming from another column?

I have a dataframe like this: import numpy as np import pandas as pd df = pd.DataFrame({'carrier': ['c1','c1','c1','c2','c2','c2','c3','c4','c5','c5'], 'airport': ['a1','a3','a1','a1','a2','a2','a3','a4','a4','a1'], }) df carrier airport 0 c1 a1 1 c1 a3 2 c1 a1...
astro123
0

votes
0

answer
27

Views

How to count the number of instances between two dates/times

Noobie here, so please bear with me. I'll try to make this as concise as possible. I have two dataframes: df2: Consists of unique visit number for each person, time the person arrived to our store, time the person departure departed from our store df1: Is a subset of visit numbers from df2 (as well...
gwf215
0

votes
1

answer
20

Views

pandas remove parentheses and the inside stuff in a string keep one space inside it

I would like to remove parentheses and the inside stuff in a string in pandas framework but I want to keep one space if the '()' inside the string. e.g. (.)y(...)rfer --> y rfer a(...)ewq() --> a ewq my code: df['a_id'].apply(lambda x: x.replace('\(.*\)', ' ')) does not work. thanks
user3448011
0

votes
1

answer
18

Views

How do I allow pandas to process quote characters without recognizing them as EOF?

I'm trying to process a CSV file with pandas. One of my fields is book titles. Some of them have a comma in the title. I need to escape the comma with quotes in order for the INSERT statement to execute correctly in postgresql. However, when parsing the file, pandas sees '' in the last line as EOF....
trimonkeytri
0

votes
0

answer
8

Views

Drop pandas rows where multiple_condition is true

I've a df with 950 rows in it. Let's pretend that the columns are timestamp, quantity, event, file. This is a good approximation of df. I want to: select all rows where event is this_event and file is this_file and drop the rows if the row has the same timestamp as a row where file is my_file and th...
uncle-junky
1

votes
1

answer
1.1k

Views

Where is pandas.tools

After installing pandas: idf:~/Documents/python/plot$ pip3 install pandas --user Collecting pandas Using cached https://files.pythonhosted.org/packages/f9/e1/4a63ed31e1b1362d40ce845a5735c717a959bda992669468dae3420af2cd/pandas-0.24.0-cp36-cp36m-manylinux1_x86_64.whl Requirement already satisfied: num...
Ivan
1

votes
1

answer
112

Views

MonthEnd object result in <11 * MonthEnds> instead of number

In my pandas dataframe I want to find the difference between dates in months. The function .dt.to_period('M') results in a MonthEnd object like instead of the month number. I tried to change the column type with pd.to_numeric() and to remove the letters with re.sub('[^0-9]', '', 'blablabla123bla')....
Inge
1

votes
2

answer
51

Views

Assign values with for loops to pandas DataFrame columns

I am a Python beginner and have a problem with a for loop. I want to assign a list of numbers to different DataFrame columns. Manually, I can assign my values with the correct code, but copy and paste isn't a good style for programming. The correct manual code looks like this: df = pd.DataFrame(colu...
Tom
1

votes
1

answer
18

Views

Pivot with multi index in Pandas data frame

I'm working on a report, and I need to create a pivot table. Some context: The data has two date columns: The origination date The observation date Each row contains multiple values: Payments Balance ... So, my original dataframe looks something like this (a little sample): obs_date orig_date...
Barranka
1

votes
1

answer
35

Views

Convert Multiple pandas Column into json

My dataframe df is like: col_1 col_2 col_3 A Product 1 B product 2 C Offer 1 D Product 1 What i want is to convert all this column to json with the condition that row of col_2 and col_1 should be key value pair. I have tried the following: df['col_1_2'] = df.apply(lambd...
sourav khanna

View additional questions