How to modify a column based on the values in another column of a PySpark dataframe? F.when edge case


April 2019


11 time


I'd like to go through each row in a pyspark dataframe, and change the value of a column based on the content of another column. The value I am changing it to is also based on the current value of the column to be changed.

Specifically, I have a column which contains DenseVectors, and another column which contains the index of the vector that I need.

Alternatively, I could also replace the DenseVector with the larger of the two values in the DenseVector.

I am mainly trying to use F.when() in conjunction with withColumn, but I am running into trouble with the second element of F.when(), as I want to store the correct index of the vector, but cannot directly index on a column.

   a                        b  
1  DenseVector([0.1, 0.9])  1.0
2  DenseVector([0.6, 0.4])  0.0
df = df.withColumn('a', F.when(df.b == 0.0, df.a[0])

0 answers