Introduction
TractoAI is a neocloud platform for BigData and AI workloads. Using TractoAI, you can:
- Process massive volumes of data using SQL, Python, Java, Golang, C++ and more.
- Run distributed training or fine-tuning using PyTorch, TensorFlow or other frameworks.
- Use AI models in batch and real-time data pipelines.
- Collaborate with your team members on the same data and models, using notebooks.
- Build and re-use same containerized environments for development, notebooks, data processing and AI jobs.
- Define and manage multi-step workflows to orchestrate data processing, model training and inference jobs.
Keep your infrastructure costs under control with net workload-based pay-as-you-go pricing, suitable for early stage startups and enterprises.
How to get started
As of May 2025, TractoAI is in a closed beta.
There is a publicly available Playground which allows you to try out the platform. Read more about the Playground in the Playground documentation.
If you are interested in joining the beta, please fill out the contact form, or reach us via Discord.
We are aiming to release a public self-service Console in Q2 2025.
How it works
TractoAI is built on top of YTsaurus, which is an open-source distributed storage and a computation runtime. We use YTsaurus under a hood, enriching it with additional features.
TractoAI is a managed service, which means that we take care of the underlying infrastructure maintenance.
On a high level, there are following components:
- Hierarchial distributed storage for tables and files
- Compute runtime with a flexible resource management and scheduling
- Distributed training runtime Tractorun
- Data processing frameworks:
- YQL engine for large-scale SQL queries
- MapReduce engine
- SPYT engine on top of Apache Spark
- CHYT engine on top of ClickHouse
- Integrated notebooks
- Workflow orchestrator
- Container registry integrated with data processing, training, inference and notebooks
On a low level, we operate TractoAI on Nebius bare metal infrastructure, which excludes a lot of the cloud provider pricing overhead. You may expect to pay up to 10x less than hyperscale cloud providers for the same amount of work done.
Pricing
TractoAI is a net pay-as-you-go platform. You pay for the used compute time and storage quota.
In addition to that, there is a reservation model for medium-size teams that want to reserve a certain amount of compute time and storage with a discount. Higher tiers of the reservation model include additional features, such as a dedicated support of TractoAI developer team.
What TractoAI is not
TractoAI is not a general-purpose cloud provider.
- You don't think with VMs, but with workloads (data processing, training, batch inference, queries), and you get charged for the workload time.
- You don't use shared FS volumes or FUSE mounts, but a structured Cypress storage.
- You don't store unstructured data in form of files in object storage, but a semi-structured or structured data in a form of tables.
- You don't process data elsewhere and then copy it to TractoAI or read it from another cloud, but process data directly in the cloud.
TractoAI is not a cloud application that you can deploy to a cloud provider.
- It does not work on top of S3 or other object storage, but it can import data from there.
- It does not get installed in your cloud account, but it is a managed service.
TractoAI is not a real-time inference platform.
- TractoAI does not host models, or have an OpenAI-like API; but you can use any ML framework (e.g. vllm) to run inference code yourself.
- TractoAI does not charge for inference per token, but instead you pay for the compute time.