With batch exports, data can be exported to an S3 bucket.
Creating the batch export
- Subscribe to data pipelines add-on in your billing settings if you haven't already.
- Click Data pipelines in the navigation and go to the exports tab in your PostHog instance.
- Click "Create export workflow".
- Select S3 as the batch export type.
- Fill in the necessary configuration details.
- Finalize the creation by clicking on "Create".
- Done! The batch export will schedule its first run on the start of the next period.
S3 configuration
Configuring a batch export targeting S3 requires the following S3-specific configuration values:
- Bucket name: The name of the S3 bucket where the data is to be exported.
- Region: The AWS region where the bucket is located.
- Key prefix: A key prefix to use for each S3 object created. This key can include template variables
- Compression: Select a compression method (like gzip) to use for exported files or no compression.
- Encryption: Select a server-side encryption method (
AES256
oraws:kms
) for AWS to encrypt data at rest. - Format: Select a file format to use in the export. See here for details on which file formats are supported.
- AWS Access Key ID: An AWS access key ID with access to the S3 bucket.
- AWS Secret Access Key: An AWS secret access key with access to the S3 bucket.
- AWS KMS Key ID: The AWS KMS Key ID to use for server-side encryption. Only required when selecting
aws:kms
encryption. - Events to exclude: A list of events to omit from the exported data.
- Endpoint URL: Required if exporting to an S3-compatible blob storage.
S3 key prefix template variables
The key prefix provided for data exporting can include template variables which are formatted at runtime. All template variables are defined between curly brackets (for example {day}
). This allows you partition files in your S3 bucket, such as by date.
Template variables include:
- Date and time variables:
year
.month
.day
.hour
.minute
.second
.
- Name of the table exported (for now, only "events"):
table
.
- Batch export data bounds:
data_interval_start
.data_interval_end
.
So, as an example, setting {year}-{month}-{day}_{table}/
as a key prefix, will produce files prefixed with keys like 2023-07-28_events/
.
S3 file formats
PostHog S3 batch exports support two file formats for exporting data:
- JSON lines.
- Apache Parquet (latest version of the format specification is the only one supported).
The batch export format is selected via a drop down menu when creating or editing an export.
We intend to add support for other common formats, and format-specific configuration options. You can follow the roadmap to track progress.
S3-compatible blob storage
PostHog S3 batch exports may also export data to an S3-compatible blob storage like MinIO. Simply set the Endpoint URL to your blob storage's host and port, for example: https://my-minio-storage:9000
.