国产大模型 vllm 版本性能测试
先整理一下几个 GPU 上不同的结果
entrypoint: ["python3.9", "-m", "fastchat.serve.vllm_worker", "--model-names=Baichuan-13B-Chat", "--model-path=/files/huggingface/Baichuan-13B-Chat", "--worker-address=http://fastchat-worker-baichuan:21003", "--controller-address=http://fastchat-controller:21001", "--host=0.0.0.0", "--port=21003", "--trust-remote-code"]
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla V100-SXM2-32GB GPU has compute capability 7.0.
ValueError: Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla P40 GPU has compute capability 6.1.