SQL
There are three native SQL engines in TractoAI:
- YQL – YTsaurus's own SQL dialect
- CHYT – ClickHouse-compatible SQL engine
- SPYT SQL – Spark's SQL engine
All of them support reading and writing tables.
Comparison
| Property | YQL | CHYT | SPYT SQL |
|---|---|---|---|
| Based on | MapReduce, in-memory computations | ClickHouse columnar in-memory engine | Spark SQL engine with YTsaurus data source |
| Computation workers | short-living MapReduce jobs | long-living ClickHouse clusters called "cliques" | either long-living Spark clusters running in jobs, or short-living Spark executors |
| Efficient up to | petabytes of data | hundreds of gigabytes of data | tens of terabytes of data |
| Typical query times | seconds to days | sub-second to tens of minutes | seconds to hours |
| Fault tolerance | highly resilient, handles worker failures due to YTsaurus MapReduce reliability | does not handle worker failures, query should be retried by the user | handles worker failures, but not the failure of master node |
| YTsaurus documentation | YQL | CHYT | Spark SQL |
Suggested use cases:
- Exploration of smaller datasets (up to hundreds of gigabytes) – CHYT
- Large-scale data processing, production workloads – YQL
- Integrations with 3rd party data lakes – Spark SQL