So imagine, you have this table called A with this structure
+----------+--------------+---------+----------+
| location | order_count | gmv | net_gmv |
+----------+--------------+---------+----------+
| TX | 1000 | 9000.0 | 8000.0 |
| FL | 1000 | 9000.0 | 8000.0 |
+----------+--------------+---------+----------+
then you want to have another table called B with this structure
+-------+--------------+---------+----------+
| age | order_count | gmv | net_gmv |
+-------+--------------+---------+----------+
| 20-30 | 1000 | 9000.0 | 8000.0 |
| 30-40 | 1000 | 9000.0 | 8000.0 |
| 40-50 | 1000 | 9000.0 | 8000.0 |
+-------+--------------+---------+----------+
The location and age are the dimension needed for the report, eventually we'll be having different dimension needed for our report. What we're doing now is we develop a Spark-SQL job for each table. But we think this is not gonna scale because every time we want to add new dimension, we need to develop the Spark-SQL job again (same logic but different group by dimension)
So I'm wondering whether there's a better way to do this. Anyone has any experience with this kind of problem before? Any pointer how to do this efficiently (I'm thinking someone could just specify the dimension they need and there'll be a script where it'll automatically generate the new table based on the specified dimension)
Thanks