AI summary: The article provides a detailed comparison between Llama-2-70B and gpt-3.5 language models, focusing on cost and latency. It suggests that Llama is best suited for prompt-dominated tasks and batch processing jobs, while gpt-3.5 is cheaper and faster for completion-heavy workloads. The piece also explores the potential of quantization and other techniques to improve the performance of open-source models. It concludes by recommending the use of open-source models for prompt-heavy tasks, such as classification or reranking.