Saya sedang mengerjakan kerangka data percikan dan saya perlu melakukan pengelompokan oleh karyawan kolom, penunjukan dan perusahaan dan mengubah nilai kolom dari baris yang dikelompokkan menjadi array elemen sebagai kolom baru. Contoh:

Input:

employee | Company Address | designation | company | Home Adress
--------------------------------------------------
Micheal  |  NY     | Head        | xyz     | YN
Micheal  |  NJ     | Head        | xyz     | YM

Output:

employee | designation | company | Address
--------------------------------------------------
Micheal  | Head        | xyz     | [Company Address : NY , Home Adress YN], [Company Address : NJ , Home Adress : Ym]

Setiap bantuan sangat dihargai.!

1
Aditya Seth 3 Mei 2020, 19:46

2 jawaban

Jawaban Terbaik

Solusi di bawah ini dalam percikan untuk array alih-alih json,

val df1 = sc.parallelize([['Micheal','NY','head','XYZ','YN'], ['Micheal','NJ','head','XYZ','YM']]).toDF(("Employee", "Company Address", "designation", "company","Home Adress"))

val df2 = df1.groupBy("Employee", "designation", "company").agg(collect_list(struct(col("Company Address"),col("Home Adress"))).alias("Address"))

df2.show(1,False)

Keluaran:

+--------+-----------+-------+--------------------+
|Employee|designation|company|Address             |
+--------+-----------+-------+--------------------+
|Micheal |head       |XYZ    |[[NY, YN], [NJ, YM]]|
+--------+-----------+-------+--------------------+
1
Ajay Kharade 3 Mei 2020, 22:01

Gunakan groupBy dengan fungsi agregat collect_list + to_json + struct bawaan.

df.show()
//+--------+---------------+-----------+-------+------------+
//|employee|company_address|designation|company|Home_address|
//+--------+---------------+-----------+-------+------------+
//| Micheal|             NY|       Head|    xyz|          YN|
//| Micheal|             NJ|       Head|    xyz|          YM|
//+--------+---------------+-----------+-------+------------+    

df.groupBy("employee","designation","company").agg(collect_list(to_json(struct("company_address","Home_address"))).alias("Address")).show(false)
//+--------+-----------+-------+--------------------------------------------------------------------------------------------+
//|employee|designation|company|Address                                                                                     |
//+--------+-----------+-------+--------------------------------------------------------------------------------------------+
//|Micheal |Head       |xyz    |[{"company_address":"NY","Home_address":"YN"}, {"company_address":"NJ","Home_address":"YM"}]|
//+--------+-----------+-------+--------------------------------------------------------------------------------------------+
0
Shu 3 Mei 2020, 17:16