Skip to main content
Version: 0.9

SQL

GreptimeDB supports full SQL for querying data from a database.

In this document, we will use the monitor table to demonstrate how to query data. For instructions on creating the monitor table and inserting data into it, Please refer to table management and Ingest Data.

Basic query

The query is represented by the SELECT statement. For example, the following query selects all data from the monitor table:

SELECT * FROM monitor;

The query result looks like the following:

+-----------+---------------------+------+--------+
| host | ts | cpu | memory |
+-----------+---------------------+------+--------+
| 127.0.0.1 | 2022-11-03 03:39:57 | 0.1 | 0.4 |
| 127.0.0.1 | 2022-11-03 03:39:58 | 0.5 | 0.2 |
| 127.0.0.2 | 2022-11-03 03:39:58 | 0.2 | 0.3 |
+-----------+---------------------+------+--------+
3 rows in set (0.00 sec)

Functions are also supported in the SELECT field list. For example, you can use the count() function to retrieve the total number of rows in the table:

SELECT count(*) FROM monitor;
+-----------------+
| COUNT(UInt8(1)) |
+-----------------+
| 3 |
+-----------------+

The avg() function returns the average value of a certain field:

SELECT avg(cpu) FROM monitor;
+---------------------+
| AVG(monitor.cpu) |
+---------------------+
| 0.26666666666666666 |
+---------------------+
1 row in set (0.00 sec)

You can also select only the result of a function. For example, you can extract the day of the year from a timestamp. The DOY in the SQL statement is the abbreviation of day of the year:

SELECT date_part('DOY', '2021-07-01 00:00:00');

Output:

+----------------------------------------------------+
| date_part(Utf8("DOY"),Utf8("2021-07-01 00:00:00")) |
+----------------------------------------------------+
| 182 |
+----------------------------------------------------+
1 row in set (0.003 sec)

The parameters and results of date functions align with the SQL client's time zone. For example, when the client's time zone is set to +08:00, the results of the two queries below are the same:

select to_unixtime('2024-01-02 00:00:00');
select to_unixtime('2024-01-02 00:00:00+08:00');

Please refer to SELECT and Functions for more information.

Limit the number of rows returned

Time series data is typically massive. To save bandwidth and improve query performance, you can use the LIMIT clause to restrict the number of rows returned by the SELECT statement.

For example, the following query limits the number of rows returned to 10:

SELECT * FROM monitor LIMIT 10;

Filter data

You can use the WHERE clause to filter the rows returned by the SELECT statement.

Filtering data by tags or time index is efficient and common in time series scenarios. For example, the following query filter data by tag host:

SELECT * FROM monitor WHERE host='127.0.0.1';

The following query filter data by time index ts, and returns the data after 2022-11-03 03:39:57:

SELECT * FROM monitor WHERE ts > '2022-11-03 03:39:57';

You can also use the AND keyword to combine multiple constraints:

SELECT * FROM monitor WHERE host='127.0.0.1' AND ts > '2022-11-03 03:39:57';

Filter by time index

Filtering data by the time index is a crucial feature in time series databases.

When working with Unix time values, the database treats them based on the type of the column value. For instance, if the ts column in the monitor table has a value type of TimestampMillisecond, you can use the following query to filter the data:

The Unix time value 1667446797000 corresponds to the TimestampMillisecond type。

SELECT * FROM monitor WHERE ts > 1667446797000;

When working with a Unix time value that doesn't have the precision of the column value, you need to use the :: syntax to specify the type of the time value. This ensures that the database correctly identifies the type.

For example, 1667446797 represents a timestamp in seconds, which is different from the default millisecond timestamp of the ts column. You need to specify its type as TimestampSecond using the ::TimestampSecond syntax. This informs the database that the value 1667446797 should be treated as a timestamp in seconds.

SELECT * FROM monitor WHERE ts > 1667446797::TimestampSecond;

For the supported time data types, please refer to Data Types.

When using standard RFC3339 or ISO8601 string literals, you can directly use them in the filter condition since the precision is clear.

SELECT * FROM monitor WHERE ts > '2022-07-25 10:32:16.408';

Time and date functions are also supported in the filter condition. For example, use the now() function and the INTERVAL keyword to retrieve data from the last 5 minutes:

SELECT * FROM monitor WHERE ts >= now() - INTERVAL '5 minutes';

For date and time functions, please refer to Functions for more information.

Time zone

A string literal timestamp without time zone information will be interpreted based on the local time zone of the SQL client. For example, the following two queries are equivalent when the client time zone is +08:00:

SELECT * FROM monitor WHERE ts > '2022-07-25 10:32:16.408';
SELECT * FROM monitor WHERE ts > '2022-07-25 10:32:16.408+08:00';

All the timestamp column values in query results are formatted based on the client time zone. For example, the following code shows the same ts value formatted in the client time zones UTC and +08:00 respectively.

+-----------+---------------------+------+--------+
| host | ts | cpu | memory |
+-----------+---------------------+------+--------+
| 127.0.0.1 | 2023-12-31 16:00:00 | 0.5 | 0.1 |
+-----------+---------------------+------+--------+

Order By

The order of the returned data is not guaranteed. You need to use the ORDER BY clause to sort the returned data. For example, in time series scenarios, it is common to use the time index column as the sorting key to arrange the data chronologically:

-- ascending order by ts
SELECT * FROM monitor ORDER BY ts ASC;
-- descending order by ts
SELECT * FROM monitor ORDER BY ts DESC;

Aggregate data by tag

You can use the GROUP BY clause to group rows that have the same values into summary rows. The average memory usage grouped by idc:

SELECT host, avg(cpu) FROM monitor GROUP BY host;
+-----------+------------------+
| host | AVG(monitor.cpu) |
+-----------+------------------+
| 127.0.0.2 | 0.2 |
| 127.0.0.1 | 0.3 |
+-----------+------------------+
2 rows in set (0.00 sec)

Please refer to GROUP BY for more information.

Find the latest data of time series

To find the latest point of each time series, you can use DISTINCT ON together with ORDER BY like in ClickHouse.

SELECT DISTINCT ON (host) * FROM monitor ORDER BY host, ts DESC;
+-----------+---------------------+------+--------+
| host | ts | cpu | memory |
+-----------+---------------------+------+--------+
| 127.0.0.1 | 2022-11-03 03:39:58 | 0.5 | 0.2 |
| 127.0.0.2 | 2022-11-03 03:39:58 | 0.2 | 0.3 |
+-----------+---------------------+------+--------+
2 rows in set (0.00 sec)

Aggregate data by time window

GreptimeDB supports Range Query to aggregate data by time window.

Suppose we have the following data in the monitor table:

+-----------+---------------------+------+--------+
| host | ts | cpu | memory |
+-----------+---------------------+------+--------+
| 127.0.0.1 | 2023-12-13 02:05:41 | 0.5 | 0.2 |
| 127.0.0.1 | 2023-12-13 02:05:46 | NULL | NULL |
| 127.0.0.1 | 2023-12-13 02:05:51 | 0.4 | 0.3 |
| 127.0.0.2 | 2023-12-13 02:05:41 | 0.3 | 0.1 |
| 127.0.0.2 | 2023-12-13 02:05:46 | NULL | NULL |
| 127.0.0.2 | 2023-12-13 02:05:51 | 0.2 | 0.4 |
+-----------+---------------------+------+--------+

The following query returns the average CPU usage in a 10-second time range and calculates it every 5 seconds:

SELECT 
ts,
host,
avg(cpu) RANGE '10s' FILL LINEAR
FROM monitor
ALIGN '5s' TO '2023-12-01T00:00:00' BY (host) ORDER BY ts ASC;
  1. avg(cpu) RANGE '10s' FILL LINEAR is a Range expression. RANGE '10s' specifies that the time range of the aggregation is 10s, and FILL LINEAR specifies that if there is no data within a certain aggregation time, use the LINEAR method to fill it.
  2. ALIGN '5s' specifies the that data statistics should be performed in steps of 5s.
  3. TO '2023-12-01T00:00:00 specifies the origin alignment time. The default value is Unix time 0.
  4. BY (host) specifies the aggregate key. If the BY keyword is omitted, the primary key of the data table is used as the aggregate key by default.
  5. ORDER BY ts ASC specifies the sorting method of the result set. If you do not specify the sorting method, the order of the results is not guaranteed.

The Response is shown below:

+---------------------+-----------+----------------------------------------+
| ts | host | AVG(monitor.cpu) RANGE 10s FILL LINEAR |
+---------------------+-----------+----------------------------------------+
| 2023-12-13 02:05:35 | 127.0.0.1 | 0.5 |
| 2023-12-13 02:05:40 | 127.0.0.1 | 0.5 |
| 2023-12-13 02:05:45 | 127.0.0.1 | 0.4 |
| 2023-12-13 02:05:50 | 127.0.0.1 | 0.4 |
| 2023-12-13 02:05:35 | 127.0.0.2 | 0.3 |
| 2023-12-13 02:05:40 | 127.0.0.2 | 0.3 |
| 2023-12-13 02:05:45 | 127.0.0.2 | 0.2 |
| 2023-12-13 02:05:50 | 127.0.0.2 | 0.2 |
+---------------------+-----------+----------------------------------------+

Time range window

The origin time range window steps forward and backward in the time series to generate all time range windows. In the example above, the origin alignment time is set to 2023-12-01T00:00:00, which is also the end time of the origin time window.

The RANGE option, along with the origin alignment time, defines the origin time range window that starts from origin alignment timestamp and ends at origin alignment timestamp + range.

The ALIGN option defines the query resolution steps. It determines the calculation steps from the origin time window to other time windows. For example, with the origin alignment time 2023-12-01T00:00:00 and ALIGN '5s', the alignment times are 2023-11-30T23:59:55, 2023-12-01T00:00:00, 2023-12-01T00:00:05, 2023-12-01T00:00:10, and so on.

These time windows are left-closed and right-open intervals that satisfy the condition [alignment timestamp, alignment timestamp + range).

The following images can help you understand the time range window more visually:

When the query resolution is greater than the time range window, the metrics data will be calculated for only one time range window.

align > range

When the query resolution is less than the time range window, the metrics data will be calculated for multiple time range windows.

align < range

Align to specific timestamp

The alignment times default based on the time zone of the current SQL client session. You can change the origin alignment time to any timestamp you want. For example, use NOW to align to the current time:

SELECT 
ts,
host,
avg(cpu) RANGE '1w'
FROM monitor
ALIGN '1d' TO NOW BY (host);

Or use a ISO 8601 timestamp to align to a specified time:

SELECT 
ts,
host,
avg(cpu) RANGE '1w'
FROM monitor
ALIGN '1d' TO '2023-12-01T00:00:00+08:00' BY (host);

Fill null values

The FILL option can be used to fill null values in the data. In the above example, the LINEAR method is used to fill null values. Other methods are also supported, such as PREV and a constant value X. For more information, please refer to the FILL OPTION.

Syntax

Please refer to Range Query for more information.

HTTP API

Use POST method to query data:

curl -X POST \
-H 'authorization: Basic {{authorization if exists}}' \
-H 'Content-Type: application/x-www-form-urlencoded' \
-d 'sql=select * from monitor' \
http://localhost:4000/v1/sql?db=public

The result is shown below:

{
"code": 0,
"output": [
{
"records": {
"schema": {
"column_schemas": [
{
"name": "host",
"data_type": "String"
},
{
"name": "ts",
"data_type": "TimestampMillisecond"
},
{
"name": "cpu",
"data_type": "Float64"
},
{
"name": "memory",
"data_type": "Float64"
}
]
},
"rows": [
["127.0.0.1", 1667446797450, 0.1, 0.4],
["127.0.0.1", 1667446798450, 0.5, 0.2],
["127.0.0.2", 1667446798450, 0.2, 0.3]
]
}
}
],
"execution_time_ms": 0
}

For more information about SQL HTTP request, please refer to API document.