在烧录好的系统上,配置好一切,开始部署ollama。

一、安装ollama

ollama官网,download,选择Linux,会告知你一个安装命令

curl -fsSL https://ollama.com/install.sh | sh

等待完成,查看版本

启动服务,

OLLAMA_HOST=192.168.x.x:11434 ollama serve

        端口11434也是可以改的。

二、下载qwen2.5 0.5B

OLLAMA_HOST=192.168.x.x:11434 ollama pull qwen2.5:0.5b

更多模型,可以去ollama官网查看,https://ollama.com/search

在这里,可以查看更多可用模型。

比如,

 OLLAMA_HOST=192.168.x.x:11434 ollama pull qwen2.5:0.5b-instruct-q8_0

或者,1.5b,3b等等。不过3b用起来有点卡顿(尽管慢,相比直接用modelscope部署,还是快很多很多)

三、测试

# -*- coding: utf-8 -*-
import json
import requests

# NOTE: ollama must be running for this to work, start the ollama app or run `ollama serve`
model = "qwen2.5:0.5b"  # TODO: update this for whatever model you wish to use

def chat(messages):
    r = requests.post(
        "http://192.168.x.x:11434/api/chat",
        json={"model": model, "messages": messages,
              "options": {
                  "seed": 101,
                  "top_p": 0.76,
                  "temperature": 0
              },
              "stream": True},
        stream=True
    )
    r.raise_for_status()
    output = ""

    for line in r.iter_lines():
        body = json.loads(line)
        if "error" in body:
            raise Exception(body["error"])
        if body.get("done") is False:
            message = body.get("message", "")
            content = message.get("content", "")
            output += content
            # the response streams one token at a time, print that as we receive it
            print(content, end="", flush=True)

        if body.get("done", False):
            message["content"] = output
            return message

def main():
    messages1 = [{
        "role": "system",
        "content": "你是人工智能助手纳西妲,请简短的回答我的问题,如果是代码问题,需要详细写出。"
    }]

    while True:
        user_input = input("Input: ")
        if not user_input:
            exit()
        print()
        messages1.append({"role":"user","content":user_input})

        # 提取财产分割信息
        message1 = chat(messages1)
        messages1.append(message1)

        # 打印最终输出
        #print("\nOutput:\n", message1['content'])
        print("\n\n")

if __name__ == "__main__":
    main()

四、openai接口

from openai import OpenAI

# model = "qwen2.5:0.5b-instruct-q8_0"
# model = "qwen2.5:0.5b"
model = "qwen2.5:1.5b-instruct-q8_0"

client = OpenAI(
    # defaults to os.environ.get("OPENAI_API_KEY")
    api_key="qwen",#需要但无用,随意填写
    
    base_url="http://192.168.1.7:11434/v1"
    
)
print("client:",client)

# 非流式响应
def qwen_25_api(messages: list):
    """为提供的对话消息创建新的回答

    Args:
        messages (list): 完整的对话消息
    """
    # completion = client.chat.completions.create(model="gpt-3.5-turbo", messages=messages)
    completion = client.chat.completions.create(
        model=model, 
        messages=messages)
    print(completion.choices[0].message.content)

def qwen_25_api_stream(messages: list):
    """为提供的对话消息创建新的回答 (流式传输)

    Args:
        messages (list): 完整的对话消息
    """
    stream = client.chat.completions.create(
        model=model,
        messages=messages,
        stream=True,
    )
    for chunk in stream:
        if chunk.choices[0].delta.content is not None:
            print(chunk.choices[0].delta.content, end="")

if __name__ == '__main__':
    messages = [{'role': 'system','content': '你是谁'},]
    # 非流式调用
    # qwen_25_api(messages)
    # 流式调用
    qwen_25_api_stream(messages)

怎么精神错乱了?hhhh

五、chatweb(未测试)

有多种web界面,但没有尝试

六、系统状态

当调用时,cpu基本都是满载状态。

Logo

尧米是由西云算力与CSDN联合运营的AI算力和模型开源社区品牌,为基于DaModel智算平台的AI应用企业和泛AI开发者提供技术交流与成果转化平台。

更多推荐