The real power of the data warehouse is the ability to combine data from multiple tables in a single query. Joins enable you to do this. They enable you to choose fields that act as connections between PostHog and external sources.
Table joins
You can join external data on existing PostHog schemas and other external data tables. These joins are saved and interpreted anytime they're accessed on the origin table.
To define a join, go to the data warehouse tab, click the three dots next to your source table, and click Add join. Here you define the source table key, joining table, and joining table key as well as how the fields are accessed.
For example, if you import your Stripe data, you can define a join between the events
table's distinct_id
key and the stripe_customer
table's email
key. You can then access the stripe_customer
table through the events
table like SELECT stripe_customer.id FROM events
.
Once joined, source properties can be used in filters, breakdowns, and HogQL expressions.
To edit or delete a table join, click the three dots next to your source table, click View table schema, click the three dots next to your joined table, and select Edit or Delete.
Person joins
Person joins are a special type of table joins. They are joins on the persons
table in PostHog. When you join external data on this table, we enable you to use it like a person filter in insights.
Note: Be sure that your joined keys actually match. For example
persons.id
returns a UUID, even if you use an email as adistinct_id
when capturing events. You might need to add a person property like
Query joins
If you only want to join data together for a single insight or query, you can use SQL commands like WHERE IN
and JOIN
SQL commands.
For example, to get a count of events for your Hubspot contacts you can filter events.distinct_id
by email FROM hubspot_contacts
like this:
SELECT COUNT() AS event_count, distinct_idFROM eventsWHERE distinct_id IN (SELECT email FROM hubspot_contacts)GROUP BY distinct_idORDER BY event_count DESC
You can also use a JOIN
such as INNER JOIN
or LEFT JOIN
to combine data. For example, to get a count of events for your Stripe customers you can INNER JOIN
on distinct_id
and email
like this:
SELECT events.distinct_id, COUNT() AS event_countFROM eventsINNER JOIN prod_stripe_customer ON events.distinct_id = prod_stripe_customer.emailGROUP BY events.distinct_idORDER BY event_count DESC