2

I have a specific requirement wherein, i need to check for empty DataFrame. If empty then populate a default value. Here is what i tried but not getting what i want.

def checkNotEmpty(df: org.apache.spark.sql.DataFrame, col: String):org.apache.spark.sql.DataFrame = 
 {
 if (!df.rdd.isEmpty())  df
    else
  df.na.fill(0, Seq(col))
 }

val age = checkNotEmpty(w_feature_md.filter("age='22'").select("age_index"),"age_index")

The idea is to get the df if it is not empty. If it is empty then fill in a default value of ZERO. This doesn't seem to work. The following is what i am getting.

scala> age.show
+---------+
|age_index|
+---------+
+---------+

Please help..

1 Answer 1

2
  def checkNotEmpty(df: org.apache.spark.sql.DataFrame, col: String):org.apache.spark.sql.DataFrame = 
     {
     if (!df.rdd.isEmpty())  df
        else
      df.na.fill(0, Seq(col))
     }

In your method :

control goes to if part if the df is not empty .

And goes to else part when df is empty .

df.na (org.apache.spark.sql.DataFrameNaFunctions) : Functionality for working with missing data in DataFrames.
Since you are using df.na on an empty dataframe , there is nothing to replace hence result is always empty.

Check this ques for more on replacing null values in df.

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks @p2. Is there a way to fill in a default value of ) when it is empty
thanks again. It is still not working as i expected. ` def checkNotEmpty(df: org.apache.spark.sql.DataFrame, col: String):org.apache.spark.sql.DataFrame = { if (df.rdd.isEmpty()) { println("here"); df.na.fill(0.0,Seq(col)) } else df } ` I tried the above as well. The value is not NULL but empty and hence i don't think df.na.fill works in this case..
you can try somthing like this: df.na.replace("age", Map(35 -> 61,24 -> 12))).show()
thanks once more.. This did not help either. I was not sure what are we doing, but did what you had suggested. Below are the things what i tried. ** scala> age.na.replace("age",Map(35->61,24 ->12)).show() +---------+ |age_index| +---------+ +---------+ scala> age.na.replace("age_index",Map(35->61,24 ->12)).show() +---------+ |age_index| +---------+ +---------+ scala> age.na.replace("",Map(35->61,24 ->12)).show() +---------+ |age_index| +---------+ +---------+ **
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.