Cypress
Cypress is a hierarchical distributed storage system under the hood of TractoAI.
Cypress stores various kinds of objects, the most important ones are:
- Tables
- Files
- Directories
- Notebooks
- Symbolic Links
- Workflows
Tables
Tables are one of the most frequent kind of objects you will meet in Cypress. They are used to represents datasets.
Tables are horizontally scalable and are designed to entirely hide the distributed nature of the storage - you may as easily save a small gigabyte dataset as a huge petabyte one.

Tables are schemaful, which means that you can define a set of columns and their types. Types may be primitive, such as int64, string, float64, or complex, such as list<int64> or tagged<string, 'image/png'>. You may also work with non-schematized tables, but this would be less convenient for strictly typed data processing engines, such as YQL or Spark.
Tables are stored in YTsaurus internal format, which stores columnar codecs and additional metadata for efficient IO and querying.
Related API methods:
create("table", path)read_table(path)write_table(path)
Related YTsaurus documentation.
Files
Files may be used to store artifacts of your workflows, such as binaries, model checkpoints, etc.
Files in Cypress are tightly integrated with the data processing capabilities of the system, allowing using the same artifact in tens of thousands of parallel jobs without IO bottlenecks due to artifact caching subsystem together with the internal p2p file distribution means.
create("file", path)read_file(path)write_file(path)file_pathsparameter ofrun_map,run_reduce,run_map_reduceetc user job spec
Related YTsaurus documentation.
Files vs Tables
It is not recommended to build data processing on top of files, as TractoAI's distributed processing paradigm is based on tables, while files are treated by the system as opaque blobs.
Comparison table:
| Property | Tables | Files |
|---|---|---|
| API | read_table, write_table, run_query, start_operation | read_file, write_file |
| Scalability | scale up to petabytes | scale up to hundreds of gigabytes |
| Schematization | schemaful | opaque |
| Storage efficiency | columnar codecs + compression + erasure | only compression + erasure |
| Parallel processing | MapReduce, SQL, Spark | manual parallelization via offsets |
| column-based ACLs | supported | not supported |
| download/upload formats | JSON, Parquet, CSV, etc | original blob without any format |
Notebooks
Notebooks are a special kind of objects that store code and results of your experiments in a format compatible with Jupyter notebooks.
In TractoAI, Jupyter notebooks can be accessed via UI, may be downloaded or uploaded to the system.