if we have many columns in one table, and all the columns are String. The problem is how to infer the real types.
sample_json = '''
{
"a": "123",
"b": true,
"c": "2022-01-01T18:00:00+8:00[Asia/Shanghai]"
}
'''
# provide the sample json and infer types by it
schemas = schema_of_json(sample_json)
spark = ...
df = ...
df.select(from_json(payload_json, schemas).alias("data")).select("data.*")
.withColumn("created_time", to_timestamp(regexp_replace(col("a"), r'\.\d+|\[.*\]', ''), "yyyy-MM-dd'T'HH:mm:ssXXX"))
...