databrickslabs/dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
GitHub repository with 470 stars and 97 forks.
Language: Python
Topics: datagen, pyspark, python, data-generation, faker, spark, spark-streaming, delta-live-tables, deltalake, databricks