Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More Book Discussion
Optimizing LLM Performance: Framework-Agnostic Techniques for Speed, Scalability, and Cost-Efficient Inference Across PyTorch, ONNX, vLLM, and More (Paperback)
by