Time Series Database
Time series database with SQL
Hyper-fast. Results in milliseconds.
/* Based on device data, this query returns the average
* of the battery level for every hour for each device_id
*/
WITH avg_metrics AS (
SELECT device_id,
DATE_BIN('1 hour'::INTERVAL, time, 0) AS period,
AVG(battery_level) AS avg_battery_level
FROM devices.readings
GROUP BY 1, 2
ORDER BY 1, 2
)
SELECT period,
t.device_id,
manufacturer,
avg_battery_level
FROM avg_metrics t, devices.info i
WHERE t.device_id = i.device_id
AND model = 'mustang'
LIMIT 10;
+---------------+------------+--------------+-------------------+
| period | device_id | manufacturer | avg_battery_level |
+---------------+------------+--------------+-------------------+
| 1480802400000 | demo000001 | iobeam | 49.25757575757576 |
| 1480806000000 | demo000001 | iobeam | 47.375 |
| 1480802400000 | demo000007 | iobeam | 25.53030303030303 |
| 1480806000000 | demo000007 | iobeam | 58.5 |
| 1480802400000 | demo000010 | iobeam | 34.90909090909091 |
| 1480806000000 | demo000010 | iobeam | 32.4 |
| 1480802400000 | demo000016 | iobeam | 36.06060606060606 |
| 1480806000000 | demo000016 | iobeam | 35.45 |
| 1480802400000 | demo000025 | iobeam | 12 |
| 1480806000000 | demo000025 | iobeam | 16.475 |
+---------------+------------+--------------+-------------------+
/* To identify gaps on the readings, the following queries generates a series
* and by joining it with the original data, you can spot any gap */
with avg_battery AS (
SELECT battery_level, time
FROM devices.readings
WHERE device_id = 'demo000007'
AND time > 1480118400000
AND time < 1480301200000
ORDER BY 2
),
all_hours AS (
SELECT generate_series(1480118430000,1480301200000,'30 second'::interval) AS generated_hours
)
SELECT time, generated_hours, battery_level
FROM all_hours
LEFT JOIN avg_battery ON generated_hours = time
ORDER BY 2
LIMIT 20;
+---------------+---------------+---------------+
| time | hours | battery_level |
+---------------+---------------+---------------+
| 1480118430000 | 1480118430000 | 67 |
| 1480118460000 | 1480118460000 | 66 |
| 1480118490000 | 1480118490000 | 66 |
| 1480118520000 | 1480118520000 | 66 |
| 1480118550000 | 1480118550000 | 66 |
| 1480118580000 | 1480118580000 | 66 |
| 1480118610000 | 1480118610000 | 65 |
| 1480118640000 | 1480118640000 | NULL |
| 1480118670000 | 1480118670000 | 65 |
| 1480118700000 | 1480118700000 | 65 |
| 1480118730000 | 1480118730000 | 65 |
| 1480118760000 | 1480118760000 | 65 |
| 1480118790000 | 1480118790000 | 65 |
| 1480118820000 | 1480118820000 | 65 |
| 1480118850000 | 1480118850000 | 65 |
| 1480118880000 | 1480118880000 | 65 |
| 1480118910000 | 1480118910000 | 65 |
| 1480118940000 | 1480118940000 | 65 |
| 1480118970000 | 1480118970000 | NULL |
| 1480119000000 | 1480119000000 | NULL |
+---------------+---------------+---------------+
/* Based on device data, this query returns the number of battery charges
* per day for a given device_id */
WITH aux_charging AS (
SELECT time,
DATE_BIN('P1D'::INTERVAL,time,0) AS day,
battery_status,
LAG(battery_status) OVER (PARTITION BY device_id ORDER BY time) AS prev_battery_status
FROM devices.readings
WHERE device_id = 'demo000001'
ORDER BY time
),
count_start_charging AS (
SELECT day, (case when battery_status <> prev_battery_status then 1 else 0 end) AS start_charging
FROM aux_charging
ORDER BY 1
)
SELECT day, sum(start_charging) as charges_number
FROM count_start_charging
GROUP BY 1
ORDER BY 1;
+---------------+---------------+
| count_charges | day |
+---------------+---------------+
| 2 | 1479168000000 |
| 4 | 1479254400000 |
| 2 | 1479340800000 |
| 10 | 1479427200000 |
| 7 | 1479600000000 |
| 8 | 1479686400000 |
| 6 | 1479772800000 |
| 11 | 1479859200000 |
| 5 | 1480032000000 |
| 7 | 1480118400000 |
| 6 | 1480204800000 |
| 10 | 1480291200000 |
| 3 | 1480464000000 |
| 3 | 1480550400000 |
| 7 | 1480636800000 |
| 2 | 1480723200000 |
+---------------+---------------+
/* Based on device data, this query returns the average of the battery temperature
* for each OS version */
SELECT device_info['os_name'], avg(battery_temperature)
FROM "devices"."readings"
GROUP BY 1
LIMIT 100;
+---------+--------------------------+
| os_name | avg(battery_temperature) |
+---------+--------------------------+
| 4.4.4 | 90.85937893039049 |
| 5.1.0 | 90.86754559738132 |
| 6.0.1 | 90.84230101265824 |
| 5.0.0 | 90.8574802739726 |
+---------+--------------------------+
Webinar
How to Get the Most Out of Your Time Series Data
Columnar storage
High cardinality
CrateDB offers robust support for time partitioning. It enables data to be stored long-term without any need for aggregation or down-sampling. This is crucial because the original, un-aggregated data often contains granular details that may be lost during aggregation. By preserving this level of detail, you get enhanced flexibility to revisit historical data for new insights, or conduct precise forecasting, which is crucial to strategic decision-making processes.
With CrateDB, you can also move your old partitions to slow but cheap spinning disks, while keeping the most recent data on fast SSDs, all while retaining fast query speed for most recent data, and not loosing any details in older data.
Time series functionality
CrateDB includes built-in time-series functionality, such as window functions and time-based indexes. These features make it easier to query and analyze the data, and can improve query performance.
LEAD
andLAG
functions, withIGNORE NULLS
option, to fill and extrapolate missing data.DATE_BIN
function to resample the data and use the same intervals on the time axis.WINDOW
function.JOIN
operator to easily combine time series data in one table and corresponding metadata in another table. This avoids pushing too much data in your application.
Much more than a time series database
CrateDB offers a much broader scope; it is built for any type of data and you can combine them all into the same database: structured, unstructured, time-series, geospatial, BLOB and vector data. This way, you can easily cover most of your needs, without investing in some new costly technology that needs complex maintenance and data synchronization.
SQL interface
Integrations
CrateDB integrates seamlessly with popular modern data visualization tools like Grafana, or libraries like Matplotlib. It also offers compatibility with the Java- and Python-based data ecosystem and corresponding libraries and frameworks such as pandas, Dask, or Spark, to facilitate efficient analysis and visualization of time series data.
Distributed architecture
CrateDB is a distributed database that can scale horizontally across multiple nodes. This makes it an ideal fit for time series workloads, which often involve handling large volumes of data from multiple different sources (sensors, IoT gateways, CRM, ERP...) that need to be ingested, enriched and processed on the fly to serve many simultaneous data consumers in real-time.
Open source
Time Series Video Tutorials
Top 8 Most Asked Questions
Discover some of the common questions around time-series databases, including their advantages over traditional databases, best practices for managing them, and the industries that can benefit the most from their adoption. This white paper contains 8 essential things you need to know about time-series databases.
Guide for Time Series Data Projects
This comprehensive guide covers everything you need to know to get started with a time series data project.
Interested?
In this tutorial, we analyze a dataset that captures weather information from CrateDB offices across the globe.
This collection of tutorials will show you how to generate mock data about the International Space Station (ISS) and write it to CrateDB using the client of your choice.
This tutorial demonstrates the shortcomings of visualizing the non-normalized data and shows you how to address them by normalizing your data using SQL.
Explore the list of all date and time functions you can use when querying CrateDB data.
In this tutorial, we'll explore how you can query time stamped data with CrateDB and what you can do with your Free tier cluster.
Confused about all these terms? We explain the fundamentals, introducing CrateDB Cloud.
The goal of this guide is to support you with building the best sharding and partitioning strategy.
Performance is crucial. Ingest data should be available instantly for querying. As data ages and accumulates, storage cost becomes more important.
Sensor outage and network issues often result in missing data. This article discusses two common interpolation techniques: LOCF and NOCB.
This article explains how to use DATE_BIN to group rows into time buckets and resample the values.
Want to know more?
Efficient time series data analysis is crucial for business success. It can help understand patterns, trends, and causes as data volume grows. CrateDB is a time series database (TSDB), built to collect, ingest and manage massive amounts of data from diverse sources. It leverages SQL, which makes it is easy to learn, easy to integrate, and does not lock you into proprietary data access interfaces, as opposed to most other solutions.
Use cases: successful companies adopting a time series database
Additional resources on time series databases
If you are monitoring or analyzing sensor data, IT system metrics, GPS locations, logs, this webinar will explain how to build a scalable, flexible time series database to manage them.
Choosing the best storage depends on several factors, including data volume, speed, query requirements, and scalability needs.
Given the wide range of options available, making an informed decision can become difficult. So where should you start and what are the most important aspects to consider?
While several proprietary solutions are available on the market, an open-source TSDB offers several advantages that make it an attractive choice for businesses of all sizes.
In this blog post, we explore the considerations when choosing the right database for managing and analyzing time stamped data.
A cloud-based TSDB gives both the capability of analyzing the data and the freedom and flexibility to do so from anywhere, empowering businesses to make data-driven decisions.
101 for Time-Series databases
- What is a time series database?
- Key criteria for selecting a time series database.