
Together AI
AI Acceleration Cloud designed for rapid inference, fine-tuning, and training of generative AI models.
About Together AI
Together AI provides a comprehensive AI Acceleration Cloud platform that supports the full lifecycle of generative AI development. It offers fast inference, flexible fine-tuning, and scalable training with user-friendly APIs and robust infrastructure. Users can run and customize open-source models, deploy large-scale AI solutions on GPU clusters, and optimize performance and costs. The platform supports over 200 models across modalities such as chat, images, code, and more, all compatible with OpenAI APIs for seamless integration.
How to Use
Users can access Together AI via simple APIs for serverless inference or deploy models on dedicated hardware endpoints. Fine-tuning is straightforward through command-line tools or API controls of hyperparameters. GPU clusters can be requested for intensive training tasks. The platform includes a web UI, API, and CLI to manage endpoints and services. Additionally, code execution environments facilitate AI development and testing workflows.
Features
Use Cases
Best For
Pros
Cons
Pricing Plans
Choose the perfect plan for your needs. All plans include 24/7 support and regular updates.
Serverless Inference
Pricing depends on token count, with costs per 1 million tokens for input and output, or images and multimodal inputs. Batch inference benefits from a 50% introductory discount. Model prices range from $0.06 to $7.00 per million tokens.
Dedicated Endpoints
Deploy models on custom GPU endpoints with per-minute billing. Available NVIDIA GPUs include RTX-6000, L40, A100, H100, and H200, with prices starting at $0.025/minute ($1.49/hour) for RTX-6000 and L40, up to $0.083/minute ($4.99/hour) for H200.
Fine-tuning
Pricing varies with model size, dataset, and epochs. Supervised fine-tuning (LoRA) costs between $0.48 and $2.90 per million tokens; full fine-tuning ranges from $0.54 to $3.20. DPO and other methods are priced accordingly.
Together GPU Clusters
High-performance clusters equipped with NVIDIA Blackwell and Hopper GPUs, including H200, H100, and A100, optimized for AI training and inference. H200 clusters start at $2.09/hr, H100 at $1.75/hr, and A100 at $1.30/hr. Contact us for pricing on GB200 and B200.
Code Execution
Code Sandbox is billed per vCPU ($0.0446/hour) and per GiB RAM ($0.0149/hour). Code Interpreter sessions cost $0.03 for 60 minutes of execution.
Frequently Asked Questions
Find answers to common questions about Together AI
