Skip to main content

Running SQL queries

The queries tool exposes a full featured SQL notebook to query your events. In the backend we use Clickhouse as our data store, and give you full access to query your data using their SQL reference https://clickhouse.com/docs/en/sql-reference/.

You can find the query tool here https://www.graphjson.com/dashboard/queries

What are some example queries to get me started?

Starter query

The following will basically show you what we store behind the scenes at GraphJSON:

SELECT
*
FROM
all_events
LIMIT
10

You'll notice that we are selecting from all_events, this is the collection that contains all the events across all your collections. If you want, you can also replace all_events with a more specific collection. For instance if you had a collection named user_events, you could query that directly with SELECT * FROM user_events

Basic filters

If we want to filter for only events over the past day we can do something like

SELECT
*
FROM
all_events
WHERE
toDateTime(timestamp) > subtractDays(today(), 1)
LIMIT
10

Extracting JSON

If we want to extract JSON, we can use the helper functions JSONExtractString and JSONExtractFloat. For instance if we had JSON logged like this

{
price: 12.99,
product: "book",
}

We could get extract the fields with a query like this

SELECT
timestamp,
JSONExtractString(json, 'product') as product,
JSONExtractFloat(json, 'price') as price,
FROM
all_events
LIMIT 10

Substring filters

We can also do substring filter. For instance if we wanted to look up all events that contain the word error in the message field, we could run something like this

SELECT
*
FROM
all_events
WHERE
JSONExtractString(json, 'message') LIKE '%error%'
LIMIT
10

Grouping / Aggregations

You can construct more complex things. Imagine you were Uber rides, and each ride had a day of week field. You could see how many rides on each day with the following query:

SELECT
COUNT(*) as count,
JSONExtractString(json, 'Day_Of_Week') as day
FROM
all_events
GROUP BY
day
LIMIT
10

Common Table Expressions (CTEs)

It's pretty useful to sometimes assign subquery result to some variable. We can do this with common table expressions. The following query uses a CTE, notice the with clause, to calculate DAUs (daily active users) over time:

WITH activity AS (
SELECT
JSONExtractString(json, 'uid') as uid,
toUnixTimestamp(
toDateTime(
toStartOfDay(toDateTime(timestamp, 'America/Los_Angeles'))
)
) AS timestamp
FROM
all_events
WHERE
toDateTime(timestamp, 'America/Los_Angeles') > subtractDays(today(), 365)
GROUP BY
uid,
timestamp
)
SELECT COUNT(DISTINCT uid) as DAU, timestamp FROM activity GROUP BY timestamp

Joins

You can also do joins! For instance you might run the following query to see which riders took a ride on both Saturday and Sunday:

SELECT
*
FROM
(
SELECT
*
FROM
all_events
WHERE
JSONExtractString(json, 'Day_Of_Week') = 'Saturday'
) AS a
JOIN
(
SELECT
*
FROM
all_events
WHERE
JSONExtractString(json, 'Day_Of_Week') = 'Sunday'
) AS b
ON
JSONExtractString(a.json, 'Rider_Id') =
JSONExtractString(b.json, 'Rider_Id')
LIMIT
10

If you want to do some more advanced things. I would recommend reading the following followup guide https://docs.graphjson.com/Guides/a-b-funnels