0

I want to write output in nested JSON using pyspark in AWS glue. I have done the following steps :

I used below pyspark code in AWS glue like so

applymapping1 = ApplyMapping.apply(frame = dynJoin, mappings = [("patientid", "decimal(19,0)", 
 "patientid", "decimal(19,0)"),("last_name", "string", "last_name", "string"),("first_name", "string", 
 "first_name", "string"),("middle_name", "string", "middle_name", "string"),("prefix", "string", 
 "prefix", "string"),("suffix", "string", "suffixe", "string"),("street_address_1", "string", 
 "street_address_1", "string"), ("street_address_2", "string", "street_address_2", "string"), ("zip", 
 "string", "zip", "string"),   ("city", "string", "city", "string"),("country", "string", "country", 
 "string"), ("group_name", "string", "group_name", "string"),  ("group_id", "string", "group_id", 
 "string"),("current_member_id", "decimal(19,0)", "current_member_id", "decimal(19,0)")], 
 transformation_ctx = "applymapping1")

def MergeAddress(rec):

  del rec["street_address_1"]
  del rec["street_address_2"]
  del rec["zip"]
  del rec["city"]
  del rec["country"]
  return rec

mapped_dyF =  Map.apply(frame = applymapping1, f = MergeAddress)

And output is :-

{"patientid":8002,"Address":{"Array":["18 Orchard Avenue",null,"19001","Abington",null]},"group_id":"OLRX","group_name":"OLR Executive","current_member_id":1000434787}
{"patientid":8001,"Address":{"Array":["333 Oak Street",null,"34801","Bradenton",null]},"group_id":"OLRX","group_name":"OLR Executive","current_member_id":1222333444}
{"patientid":8001,"Address":{"Array":["102 North Main Street","Suite 41","32801","Orlando",null]},"group_id":"OLRX","group_name":"OLR Executive","current_member_id":1222333444}
{"patientid":8003,"Address":{"Array":[null,null,null,null,null]},"group_id":"OLRX","group_name":"OLR Executive","current_member_id":12288889444}

However, the output needs to be in the below format

{"patientid":8001,
"Address":
[{"street_address_1":"333 Oak Street","street_address_2":null,"zip":"34801","city":"Bradenton","country":null},
{"street_address_1":"102 North Main Street","street_address_2":"Suite 41","zip":"32801","city":"Orlando","country":null}
]
,"group_id":"OLRX","group_name":"OLR Executive","current_member_id":1222333444
}
5
  • Are you sure json is the format you want to go with? If your going through the effort of using a glue job, if its for data science/warehousing purposes your better off converting it to Parquet, Else depending on your file size / library dependencies you could just use a lambda for this. Commented Jun 16, 2020 at 21:41
  • @pkarfs : Thanks for the details , but as per requirement I have to use json format, can you provide more details on lambda function usage Commented Jun 17, 2020 at 2:13
  • can you update the input data and required output Commented Jun 17, 2020 at 5:48
  • stackoverflow.com/questions/53704434/… Commented Jun 23, 2020 at 14:17
  • above link helped to achieve the result Commented Jun 23, 2020 at 14:18

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.