Questions tagged [dataframe]

25829 questions
1

votes
1

answer
37

Views

How to combine strings in one DataFrame

I am processing inbound user data. I receive DataFrame h that is supposed to contain all float but has some strings: >>> h = pd.DataFrame(np.random.rand(3, 2), columns=['a', 'b']) >>> h.loc[0, 'a'] = 'bad' >>> h.loc[1, 'b'] = 'robot' >>> h a b 0 bad 0.747314 1 0.921919 ro...
Jason Strimpel
0

votes
0

answer
20

Views

Is there a method similar to lag (in R) or shift (in Python) for the javascript framework data-forge?

I have a code like this: const dataForge = require('data-forge'); var df = new dataForge.DataFrame({columns: {col1: [1,2,3,4,5]}}); console.log(df.toString()); with this result: __index__ col1 --------- ---- 0 1 1 2 2 3 3 4 4 5 in R and in Python like n...
EFA
0

votes
0

answer
15

Views

How do I dynamically change a dataframe in my site by user input? (for example maybe rearrange the same column by value? from max to min?)

I want to have an option in the table in my site that allows the user to rearrange the table by min or max value in a specific column, but couldn't find anything about this. The dataset I am using is this: https://raw.githubusercontent.com/plotly/datasets/master/bubble_chart_tutorial.csv I want the...
Dat Guy
0

votes
0

answer
8

Views

How to stack columns of a time series data in python

I have a dataframe containing time series feature. In that, I would like to stack 5 features from this dataframe, each of shape(129,300). to extract one feature alone I use: Feature=df.iloc[:,start:end] #df is the dataframe containing the complete dataset This retreives me that particular channel o...
hakuna_code
1

votes
1

answer
811

Views

Computing the difference between first and last values in a rolling window

I am using the Pandas rolling window tool on a one-column dataframe whose index is in datetime form. I would like to compute, for each window, the difference between the first value and the last value of said window. How do I refer to the relative index when giving a lambda function? (in the bracket...
Pandora
1

votes
3

answer
5.7k

Views

How to Open csv file with pandas data frame

There is a CSV format file with three column dataframe. The third column has long text. This error message occurred, when i tried to open the file using pandas.read_csv message : UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 0: invalid start byte. But there is no problem openi...
Antenna_
1

votes
3

answer
105

Views

Creating new pandas dataframe from partial string match [duplicate]

This question already has an answer here: Select by partial string from a pandas DataFrame 8 answers I have a relatively simple dataframe that looks like this (see below). One of the columns, 'Book', is a list of strings. My goal is to make new dataframes for each of the three distinct values in '...
Warthog1
1

votes
1

answer
848

Views

EXCEPT on Specific columns Spark 1.6

I'm trying to filter out rows from dfA using dfB. dfA: +----+---+----+------------+-----+ |year|cid|X| Y|Z| +----+---+----+------------+-----+ +----+---+----+------------+-----+. dfB: +----+---+ |year|cid| +----+---+ +----+---+ My goal is to fillter all couples year cid in dfB from dfA. I see...
RefiPeretz
1

votes
1

answer
492

Views

Pandas - cut records with the custom percentiles

I have a pandas dataframe with a column of continous variables. I need to convert them into 3 bins, such that first bin encompases values 80th percentile. I am trying to achieve it by first getting the bin boundaries for such percentiles and then using pandas cut function. The issue is that I get an...
Maksim Khaitovich
1

votes
1

answer
728

Views

How do I select an ambiguous column reference? [duplicate]

This question already has an answer here: Enable case sensitivity for spark.sql globally 1 answer Here's some sample code illustrating what I'm trying to do. There is a dataframe with columns companyid and companyId. I want to select companyId, but the reference is ambiguous. How do I unambiguously...
chris.mclennon
1

votes
1

answer
684

Views

Consuming RESTful API and converting to Dataframe in Apache Spark

I am trying to convert output of url directly from RESTful api to Dataframe conversion in following way: package trials import org.apache.spark.sql.SparkSession import org.json4s.jackson.JsonMethods.parse import scala.io.Source.fromURL object DEF { implicit val formats = org.json4s.DefaultFormats ca...
Utkarsh Saraf
1

votes
2

answer
332

Views

pandas: extend dataframe and increase indices automatically

Given a DataFrame with a monotonically increasing index, e.g. values 100 10 200 9 300 15 400 7 I'd like to extend it by copying the last value, and automatically continue the indices (or perhaps by supplying the step, that's still fine): values 100 10 200 9 300 15 400 7 500...
Ziofil
1

votes
1

answer
2.1k

Views

Pandas DataFrame as an Argument to a Function - Python

Suppose a Pandas DataFrame is passed to a function as an argument. Then, does Python implicitly copy that DataFrame or is the actual DataFrame being passed in? Hence, if I perform operations on the DataFrame within the function, will I be changing the original (because the references are still inta...
WhiteDillPickle
1

votes
1

answer
37

Views

create dataframe in python from list

I have extracted multiple data from file and now I want to create the dataframe of my data of interest. I have tried following way: anticodon = re.findall(r'(at.\w\w-\w\w)', line) for line in anticodon: anticod = line.replace('at ', '') import pandas as pd df1 = pd.DataFrame({'id': [m_id], 'cod': [a...
Kritika Rajain
1

votes
2

answer
698

Views

Spark collect_list and limit resulting list

I have a dataframe of the following format: name merged key1 (internalKey1, value1) key1 (internalKey2, value2) ... key2 (internalKey3, value3) ... What I want to do is group the dataframe by the name, collect the list and limit the size of the list. This is how i group by the name...
pirox22
1

votes
1

answer
172

Views

Pandas compare two columns and copy value of another column if there is a match only for first unique value

I have two different dataframes of which I want to compare two columns. If the value of the first dataframe appears anywhere in the column of the second dataframe, I want to copy the value next to the matching value and copy this to a new column in the first dataframe. The dataframes look like this:...
Stefan
1

votes
1

answer
348

Views

looping through a dictionary of dataframes in python

First of all, I wrote this function: def writing_in_excel(path, df, sheet_name): writer = pd.ExcelWriter(path, datetime_format='m/d/yyyy') df.to_excel(writer, sheet_name = sheet_name, index=False, freeze_panes = (0,1)) writer.save() writer.close() Then, I have a dictionary of dataframes: import pand...
Yun Tae Hwang
1

votes
2

answer
55

Views

Fill missing values based on another column in a pandas DataFrame

I'm working with Pandas and numpy, For the following data frame, lets call it 'data', for the Borough values with data['Borough'] == 'Unspecified', I need to use the zip code in the Incident Zip field to the left of it to do a lookup on the Incident Zip column for the matching zip code and Borough....
Shihan Rehman
0

votes
0

answer
27

Views

How to count the number of instances between two dates/times

Noobie here, so please bear with me. I'll try to make this as concise as possible. I have two dataframes: df2: Consists of unique visit number for each person, time the person arrived to our store, time the person departure departed from our store df1: Is a subset of visit numbers from df2 (as well...
gwf215
1

votes
1

answer
112

Views

MonthEnd object result in <11 * MonthEnds> instead of number

In my pandas dataframe I want to find the difference between dates in months. The function .dt.to_period('M') results in a MonthEnd object like instead of the month number. I tried to change the column type with pd.to_numeric() and to remove the letters with re.sub('[^0-9]', '', 'blablabla123bla')....
Inge
1

votes
2

answer
51

Views

Assign values with for loops to pandas DataFrame columns

I am a Python beginner and have a problem with a for loop. I want to assign a list of numbers to different DataFrame columns. Manually, I can assign my values with the correct code, but copy and paste isn't a good style for programming. The correct manual code looks like this: df = pd.DataFrame(colu...
Tom
1

votes
1

answer
37

Views
1

votes
2

answer
27

Views

Parse the dataframe

I have the dataframe something like below in csv format: Country Status People_eligible_Count XYZ True 100000 XYZ False 14000 XYZ Not Ap 360000 I want to turn the above dataframe to below format: Country True False Not Ap XYZ 100000 14000 36000
user2277472
1

votes
5

answer
61

Views

Subtract rows varying one column but keeping others fixed

I have an experiment where I need to subtract values of two different treatments from the Control (baseline), but these subtractions must correspond to other columns, named block and year sampled. Dummy data frame: df
Lucas
1

votes
2

answer
24

Views

How can I select n rows preceding an index row in a DataFrame?

I have a DataFrame and am trying to select a row (given a particular index) and the n rows preceding it. I've tried something like: last_10 = self.market_data.iloc[index:-10] But this appears to give everything from the index up until the end of the dataframe minus 10 rows. What I'd like to happen i...
Shamoon
1

votes
1

answer
36

Views

Add 2 columns and create new column after those 2 (R) [duplicate]

This question already has an answer here: Add (insert) a column between two columns in a data.frame 15 answers Let's say I have a dataframe with columns a,b,c,d,e,f,g,h. I want to add up the values of column d and e and create a column containing the results right after d and e such that it becomes...
TYL
1

votes
1

answer
50

Views

How can I find nearest neighbors of points in a data frame from another data frame

I want to find k nearest neighbors of all points in dataframe A from a dataframe B. How is that doable? It seems sklearn.neighbors.NearestNeighbors takes only one set of data, and just one query point. Like: samples = [[0., 0., 0.], [0., .5, 0.], [1., 1., .5]] from sklearn.neighbors import NearestNe...
No Lie
1

votes
1

answer
90

Views

Merge 3 columns based on unique values?

I am trying to do a merge on 3 columns to a single one. The column values are separated by ';' and the new column need to unzip all the 3 column values and put the unique values. I know how to perform the merge column. But I am struggling to do unzipping the row value in 3 columns and finding unique...
johnsmith0
1

votes
2

answer
29

Views

Compare content of two pandas dataframes even if the rows are differently ordered

I have two pandas dataframes, which rows are in different orders but contain the same columns. My goal is to easily compare the two dataframes and confirm that they both contain the same rows. I have tried the 'equals' function, but there seems to be something I am missing, because the results are...
jotNewie
1

votes
1

answer
31

Views

Pandas: How to pad value for every row that missing years

I have a table contains keyword and its occurrence on each year, but if it doesn't occur in some years, those years are missing. But I need to pad those years with zero now, how can I do it with Pandas dataframe? My data is like the table below, each keyword should be padded zero up to 13 years from...
TomLeung
1

votes
4

answer
28

Views

Extracting dates that exist in another dataframe within each subject (R)

I have 2 dataframes, both with different dates: Dataframe 1 ID Date A 21/1/2018 A 22/1/2018 B 21/1/2018 B 26/2/2018 C 19/9/2019 Dataframe 2 ID Date A 21/1/2018 A 22/1/2018 A 23/1/2018 B 21/1/2018 B 22/1/2018 B 23/1/2018 C 20/1/2018 C 04/5/2018 I want to e...
TYL
1

votes
2

answer
65

Views

Check if a character is in data frame in R

I'm looking for a simple way to check if values in an R data frame have comma (or any character for that matter). Let's suppose I have the following data frame: df
Joy In Data Stuff
1

votes
1

answer
43

Views

Retain index when combining two dataframes

The problem When I merge two dataframes, I lose the rownames. I want to avoid this. Note that some of the rows in the dataframes have different names. I have tried different versions of 'merge' from Pandas without success. Code example: df1 = pd.DataFrame() series1 = pd.Series([1,2]) series1 .rename...
KJA
1

votes
2

answer
59

Views

Sum columns of a Spark dataframe and create another dataframe

I have a dataframe like below - I am trying to create another dataframe from this which has 2 columns - the column name and the sum of values in each column like this - So far, I've tried this (in Spark 2.2.0) but throws a stack trace - val get_count: (String => Long) = (c: String) => { df.groupB...
van_d39
1

votes
2

answer
37

Views

Renaming colums after dataframe names

I am still a newbee in R. I have 24 csv files. I would like to import them (at once without having to call them one by one) as dataframes with shorter dataframe names and, for each dataframe replace some colnames based on the name of the dataframe (or the .csv file name). Here follows an example wit...
Elixterra
1

votes
2

answer
30

Views

slicing pandas dataframe encounter KeyError: 'n_tokens_content', how to locate the bad rows efficiently?

I am trying to explore this dataset with pandas 0.20.3 in Python 3.6.2. %pylab inline import pandas as pd df = pd.read_csv('OnlineNewsPopularity.csv') df['n_tokens_content'][:9] last line produces error KeyError Traceback (most recent call last) ~/anaconda3/envs/tf11...
brennn
1

votes
1

answer
25

Views

How can I use gather function to manipulate my data frame? [duplicate]

This question already has an answer here: Collapse / concatenate / aggregate a column to a single comma separated string within each group 3 answers I Have a data frame as follows: df
yas.f
1

votes
1

answer
32

Views

Convert dataframe into list

I would like to convert a dataframe containing zip codes and relationships between zip codes into a list. I looked at ways to convert dataframes into lists as well as the spread function. The first column (codes1) contains all zip codes of a zone. The second column (codes2) contains the zip codes...
helene
1

votes
2

answer
38

Views

List of Series to Dataframe

I have a list having Pandas Series objects, which I've created by doing something like this: li = [] li.append(input_df.iloc[0]) li.append(input_df.iloc[4]) where input_df is a Pandas Dataframe I want to convert this list of Series objects back to Pandas Dataframe object, and was wondering if there...
Saurabh Verma
1

votes
3

answer
96

Views

Converting marks into grade into R

I have a very simple problem. Let's suppose, I have given marks of 100 students like this: set.seed(1234) Marks
Neeraj

View additional questions