Running SQL queries
The queries tool exposes a full featured SQL notebook to query your events. In the backend we use Clickhouse as our data store, and give you full access to query your data using their SQL reference https://clickhouse.com/docs/en/sql-reference/.
You can find the query tool here https://www.graphjson.com/dashboard/queries
What are some example queries to get me started?
Starter query
The following will basically show you what we store behind the scenes at GraphJSON:
SELECT
*
FROM
all_events
LIMIT
10
You'll notice that we are selecting from all_events
, this is the collection that contains all the events across all your collections. If you want, you can also replace all_events
with a more specific collection. For instance if you had a collection named user_events
, you could query that directly with SELECT * FROM user_events
Basic filters
If we want to filter for only events over the past day we can do something like
SELECT
*
FROM
all_events
WHERE
toDateTime(timestamp) > subtractDays(today(), 1)
LIMIT
10
Extracting JSON
If we want to extract JSON, we can use the helper functions JSONExtractString
and JSONExtractFloat
. For instance if we had JSON logged like this
{
price: 12.99,
product: "book",
}
We could get extract the fields with a query like this
SELECT
timestamp,
JSONExtractString(json, 'product') as product,
JSONExtractFloat(json, 'price') as price,
FROM
all_events
LIMIT 10
Substring filters
We can also do substring filter. For instance if we wanted to look up all events that contain the word
error
in the message
field, we could run something like this
SELECT
*
FROM
all_events
WHERE
JSONExtractString(json, 'message') LIKE '%error%'
LIMIT
10
Grouping / Aggregations
You can construct more complex things. Imagine you were Uber rides, and each ride had a day of week field. You could see how many rides on each day with the following query:
SELECT
COUNT(*) as count,
JSONExtractString(json, 'Day_Of_Week') as day
FROM
all_events
GROUP BY
day
LIMIT
10
Common Table Expressions (CTEs)
It's pretty useful to sometimes assign subquery result to some variable. We can do this with
common table expressions. The following query uses a CTE, notice the with
clause,
to calculate DAUs (daily active users) over time:
WITH activity AS (
SELECT
JSONExtractString(json, 'uid') as uid,
toUnixTimestamp(
toDateTime(
toStartOfDay(toDateTime(timestamp, 'America/Los_Angeles'))
)
) AS timestamp
FROM
all_events
WHERE
toDateTime(timestamp, 'America/Los_Angeles') > subtractDays(today(), 365)
GROUP BY
uid,
timestamp
)
SELECT COUNT(DISTINCT uid) as DAU, timestamp FROM activity GROUP BY timestamp
Joins
You can also do joins! For instance you might run the following query to see which riders took a ride on both Saturday and Sunday:
SELECT
*
FROM
(
SELECT
*
FROM
all_events
WHERE
JSONExtractString(json, 'Day_Of_Week') = 'Saturday'
) AS a
JOIN
(
SELECT
*
FROM
all_events
WHERE
JSONExtractString(json, 'Day_Of_Week') = 'Sunday'
) AS b
ON
JSONExtractString(a.json, 'Rider_Id') =
JSONExtractString(b.json, 'Rider_Id')
LIMIT
10
If you want to do some more advanced things. I would recommend reading the following followup guide https://docs.graphjson.com/Guides/a-b-funnels