Skip to main content

SQL

There are three native SQL engines in TractoAI:

  • YQL – YTsaurus's own SQL dialect
  • CHYT – ClickHouse-compatible SQL engine
  • SPYT SQL – Spark's SQL engine

All of them support reading and writing tables.

Comparison

PropertyYQLCHYTSPYT SQL
Based onMapReduce, in-memory computationsClickHouse columnar in-memory engineSpark SQL engine with YTsaurus data source
Computation workersshort-living MapReduce jobslong-living ClickHouse clusters called "cliques"either long-living Spark clusters running in jobs, or short-living Spark executors
Efficient up topetabytes of datahundreds of gigabytes of datatens of terabytes of data
Typical query timesseconds to dayssub-second to tens of minutesseconds to hours
Fault tolerancehighly resilient, handles worker failures due to YTsaurus MapReduce reliabilitydoes not handle worker failures, query should be retried by the userhandles worker failures, but not the failure of master node
YTsaurus documentationYQLCHYTSpark SQL

Suggested use cases:

  • Exploration of smaller datasets (up to hundreds of gigabytes) – CHYT
  • Large-scale data processing, production workloads – YQL
  • Integrations with 3rd party data lakes – Spark SQL