
Together AI
AI Acceleration Cloud designed for rapid inference, fine-tuning, and training of generative AI models.
About Together AI
Together AI provides a comprehensive AI Acceleration Cloud platform that supports the full lifecycle of generative AI development. It offers fast inference, flexible fine-tuning, and scalable training with user-friendly APIs and robust infrastructure. Users can run and customize open-source models, deploy large-scale AI solutions on GPU clusters, and optimize performance and costs. The platform supports over 200 models across modalities such as chat, images, code, and more, all compatible with OpenAI APIs for seamless integration.
How to Use
Users can access Together AI via simple APIs for serverless inference or deploy models on dedicated hardware endpoints. Fine-tuning is straightforward through command-line tools or API controls of hyperparameters. GPU clusters can be requested for intensive training tasks. The platform includes a web UI, API, and CLI to manage endpoints and services. Additionally, code execution environments facilitate AI development and testing workflows.
Features
- NVIDIA GPU clusters available instantly or on reservation, including models like H100, A100, GB200, and B200
- Advanced management tools with Slurm and Kubernetes support
- Together Chat app for open-source AI interaction
- Code Interpreter for executing AI-generated code
- Flexible fine-tuning options, including LoRA and full fine-tuning
- APIs compatible with OpenAI standards
- Code Sandbox environment for AI development and experimentation
- Extensive library of over 200 generative AI models
- High-speed interconnects such as InfiniBand and NVLink
- Optimized software stack featuring FlashAttention-3 and custom CUDA kernels
- Dedicated endpoints for deploying models on custom hardware
- Serverless inference API supporting open-source models
Use Cases
- Reducing latency and costs for AI models like Arcee AI
- Performing multi-document analysis and personalized data processing
- Enabling production-grade AI applications for businesses
- Automating classification and data extraction tasks
- Executing visual recognition, reasoning, and video understanding
- Developing cybersecurity solutions such as Nexusflow
- Training custom generative AI models from scratch
- Creating scalable AI customer support chatbots for platforms like Zomato
- Generating and debugging code with advanced language models
- Managing complex tool integrations and API-driven workflows
- Developing next-generation text-to-video models like Pika
- Accelerating enterprise AI projects for companies like Salesforce and Zoom
Best For
Pros
- Provides scalable infrastructure with NVIDIA GPUs for demanding AI workloads
- Full ownership over models ensures no vendor lock-in
- Meets SOC 2 and HIPAA compliance standards for secure enterprise deployment
- Supports a diverse library of over 200 open-source and specialized models
- Easy-to-use, OpenAI-compatible APIs streamline integration
- Fast inference, fine-tuning, and training capabilities empower AI development
- Incorporates cutting-edge optimizations like FlashAttention-3 and custom kernels
- Offers competitive pricing aimed at reducing overall AI deployment costs
- Batch inference available with an introductory discount
- High reliability with a 99.9% uptime SLA for GPU clusters
Cons
- Advanced features like hyperparameter tuning and custom deployment require technical expertise
- Pricing for high-end GPU hardware like GB200 and B200, and large-scale setups, is available upon request, not immediately transparent
Pricing Plans
Choose the perfect plan. All plans include 24/7 support.
Serverless Inference
Pricing depends on token count, with costs per 1 million tokens for input and output, or images and multimodal inputs. Batch inference benefits from a 50% introductory discount. Model prices range from $0.06 to $7.00 per million tokens.
Get StartedDedicated Endpoints
Deploy models on custom GPU endpoints with per-minute billing. Available NVIDIA GPUs include RTX-6000, L40, A100, H100, and H200, with prices starting at $0.025/minute ($1.49/hour) for RTX-6000 and L40, up to $0.083/minute ($4.99/hour) for H200.
Get StartedFine-tuning
Pricing varies with model size, dataset, and epochs. Supervised fine-tuning (LoRA) costs between $0.48 and $2.90 per million tokens; full fine-tuning ranges from $0.54 to $3.20. DPO and other methods are priced accordingly.
Get StartedTogether GPU Clusters
High-performance clusters equipped with NVIDIA Blackwell and Hopper GPUs, including H200, H100, and A100, optimized for AI training and inference. H200 clusters start at $2.09/hr, H100 at $1.75/hr, and A100 at $1.30/hr. Contact us for pricing on GB200 and B200.
Get StartedCode Execution
Code Sandbox is billed per vCPU ($0.0446/hour) and per GiB RAM ($0.0149/hour). Code Interpreter sessions cost $0.03 for 60 minutes of execution.
Get Started