当前位置：首页 > news >正文

软件开发过程模型如何做网站性能优化

news 2026/7/27 14:33:36

软件开发过程模型,如何做网站性能优化,大型网站开发公司,网页设计可以进怎样的公司视频中所出现的代码 Tavily SearchRAG 微调Llama3实现在线搜索引擎和RAG检索增强生成功能#xff01;打造自己的perplexity和GPTs#xff01;用PDF实现本地知识库_哔哩哔哩_bilibili 一.准备工作 1.安装环境 conda create --name unsloth_env python3.10 conda activate …视频中所出现的代码 Tavily SearchRAG 微调Llama3实现在线搜索引擎和RAG检索增强生成功能打造自己的perplexity和GPTs用PDF实现本地知识库_哔哩哔哩_bilibili 一.准备工作 1.安装环境 conda create --name unsloth_env python3.10 conda activate unsloth_envconda install pytorch-cuda12.1 pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformerspip install unsloth[colab-new] githttps://github.com/unslothai/unsloth.gitpip install --no-deps trl peft accelerate bitsandbytes 2.微调代码要先登录一下 huggingface-cli login 点击提示的网页获取token注意要选择可写的 #dataset https://huggingface.co/datasets/shibing624/alpaca-zh/viewerfrom unsloth import FastLanguageModel import torchfrom trl import SFTTrainer from transformers import TrainingArgumentsmax_seq_length 2048 # Choose any! We auto support RoPE Scaling internally! dtype None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere load_in_4bit True # Use 4bit quantization to reduce memory usage. Can be False.# 4bit pre quantized models we support for 4x faster downloading no OOMs. fourbit_models [unsloth/mistral-7b-bnb-4bit,unsloth/mistral-7b-instruct-v0.2-bnb-4bit,unsloth/llama-2-7b-bnb-4bit,unsloth/gemma-7b-bnb-4bit,unsloth/gemma-7b-it-bnb-4bit, # Instruct version of Gemma 7bunsloth/gemma-2b-bnb-4bit,unsloth/gemma-2b-it-bnb-4bit, # Instruct version of Gemma 2bunsloth/llama-3-8b-bnb-4bit, # [NEW] 15 Trillion token Llama-3 ] # More models at https://huggingface.co/unslothmodel, tokenizer FastLanguageModel.from_pretrained(model_name unsloth/llama-3-8b-bnb-4bit,max_seq_length max_seq_length,dtype dtype,load_in_4bit load_in_4bit,# token hf_..., # use one if using gated models like meta-llama/Llama-2-7b-hf )model FastLanguageModel.get_peft_model(model,r 16, # Choose any number 0 ! Suggested 8, 16, 32, 64, 128target_modules [q_proj, k_proj, v_proj, o_proj,gate_proj, up_proj, down_proj,],lora_alpha 16,lora_dropout 0, # Supports any, but 0 is optimizedbias none, # Supports any, but none is optimized# [NEW] unsloth uses 30% less VRAM, fits 2x larger batch sizes!use_gradient_checkpointing unsloth, # True or unsloth for very long contextrandom_state 3407,use_rslora False, # We support rank stabilized LoRAloftq_config None, # And LoftQ )alpaca_prompt Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.### Instruction: {}### Input: {}### Response: {}EOS_TOKEN tokenizer.eos_token # Must add EOS_TOKEN def formatting_prompts_func(examples):instructions examples[instruction]inputs examples[input]outputs examples[output]texts []for instruction, input, output in zip(instructions, inputs, outputs):# Must add EOS_TOKEN, otherwise your generation will go on forever!text alpaca_prompt.format(instruction, input, output) EOS_TOKENtexts.append(text)return { text : texts, } passfrom datasets import load_dataset#file_path /home/Ubuntu/alpaca_gpt4_data_zh.json#dataset load_dataset(json, data_files{train: file_path}, splittrain)dataset load_dataset(yahma/alpaca-cleaned, split train)dataset dataset.map(formatting_prompts_func, batched True,)trainer SFTTrainer(model model,tokenizer tokenizer,train_dataset dataset,dataset_text_field text,max_seq_length max_seq_length,dataset_num_proc 2,packing False, # Can make training 5x faster for short sequences.args TrainingArguments(per_device_train_batch_size 2,gradient_accumulation_steps 4,warmup_steps 5,max_steps 60,learning_rate 2e-4,fp16 not torch.cuda.is_bf16_supported(),bf16 torch.cuda.is_bf16_supported(),logging_steps 1,optim adamw_8bit,weight_decay 0.01,lr_scheduler_type linear,seed 3407,output_dir outputs,), )trainer_stats trainer.train()model.save_pretrained_gguf(llama3, tokenizer, quantization_method q4_k_m) model.save_pretrained_gguf(llama3, tokenizer, quantization_method q8_0) model.save_pretrained_gguf(llama3, tokenizer, quantization_method f16)#to hugging face model.push_to_hub_gguf(leo009/llama3, tokenizer, quantization_method q4_k_m) model.push_to_hub_gguf(leo009/llama3, tokenizer, quantization_method q8_0) model.push_to_hub_gguf(leo009/llama3, tokenizer, quantization_method f16)3.我们选择将hugging face上微调好的模型下载下来https://huggingface.co/leo009/llama3/tree/main 4.模型导入ollama 下载ollama 导入ollama FROM ./downloads/mistrallite.Q4_K_M.gguf ollama create example -f Modelfile 二.实现在线搜索 1.获取Tavily AI API Tavily AI export TAVILY_API_KEYtvly-xxxxxxxxxxx 2.安装对应的python库 install tavily-python pip install phidata pip install ollam 3.运行app.py #app.py import warnings# Suppress only the specific NotOpenSSLWarning warnings.filterwarnings(ignore, messageurllib3 v2 only supports OpenSSL 1.1.1)from phi.assistant import Assistant from phi.llm.ollama import OllamaTools from phi.tools.tavily import TavilyTools# 创建一个Assistant实例配置其使用OllamaTools中的llama3模型并整合Tavily工具 assistant Assistant(llmOllamaTools(modelmymodel3), # 使用OllamaTools的llama3模型tools[TavilyTools()],show_tool_callsTrue, # 设置为True以展示工具调用信息 )# 使用助手实例输出请求的响应并以Markdown格式展示结果 assistant.print_response(Search tavily for GPT-5, markdownTrue)三.实现RAG 1.git clone https://github.com/phidatahq/phidata.git 2.phidata----cookbook----llms---ollama---rag里面有示例和教程修改assigant.py中的14行代码将llama3改为自己微调好的模型另外需要注意的是要将自己的模型名称加入到app.py里面的数组里 streamlit run /home/cxh/phidata/cookbook/llms/ollama/rag/assistant.py

查看全文

http://www.w-s-a.com/news/921465/