当前位置：首页 > news >正文

服装网站建设需求分析报告重庆百度竞价开户

news 2026/4/9 15:05:42

服装网站建设需求分析报告,重庆百度竞价开户,如何防止网站被盗,做蜂蜜上什么网站目录一、引言二、模型简介 2.1 GLM4-9B 模型概述 2.2 GLM4-9B 模型架构三、模型推理 3.1 GLM4-9B-Chat 语言模型 3.1.1 model.generate 3.1.2 model.chat 3.2 GLM-4V-9B 多模态模型 3.2.1 多模态模型概述 3.2.2 多模态模型实践四、总结一、引言… 目录一、引言二、模型简介 2.1 GLM4-9B 模型概述 2.2 GLM4-9B 模型架构三、模型推理 3.1 GLM4-9B-Chat 语言模型 3.1.1 model.generate 3.1.2 model.chat 3.2 GLM-4V-9B 多模态模型 3.2.1 多模态模型概述 3.2.2 多模态模型实践四、总结一、引言周一6.3写完【机器学习】Qwen1.5-14B-Chat大模型训练与推理实战周二6.4首次拿下CSDN热榜第一名周三6.5清华智谱宣布开源GLM-4-9B今天周四6.6马不停蹄开始部署实验码字。自ZHIPU AI于2023年3月14日发布ChatGLM-6B截止目前该系列已经发布了4代ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B以及最新发布的GLM-4-9B。二、模型简介 2.1 GLM4-9B 模型概述 GLM4-9B相较于上一代ChatGLM3-6B主要有以下几点变更预训练数据量提升3倍在预训练方面引入了大语言模型进入数据筛选流程最终获得了 10T 高质量多语言数据。训练效率提高了 3.5 倍采用了 FP8 技术进行高效的预训练相较于第三代模型训练效率提高了 3.5 倍。模型规模提升至 9B在有限显存的情况下探索了性能的极限并发现 6B 模型性能有限。因此在考虑到大多数用户的显存大小后将模型规模提升至 9B并将预训练计算量增加了 5 倍。综合以上技术升级和其他经验GLM-4-9B 模型具备了更强大的推理性能、更长的上下文处理能力、多语言、多模态和 All Tools 等突出能力。GLM-4-9B 系列模型包括基础版本 GLM-4-9B8K基础版本。对话版本 GLM-4-9B-Chat128K人类偏好对齐的版本。除了能进行多轮对话还具备网页浏览、代码执行、自定义工具调用Function Call和长文本推理支持最大 128K 上下文等高级功能。超长上下文版本 GLM-4-9B-Chat-1M1M支持 1M 上下文长度约 200 万中文字符。多模态版本 GLM-4V-9B-Chat8K 具备 1120 * 1120 高分辨率下的中英双语多轮对话能力。官方能力缩影图如下 2.2 GLM4-9B 模型架构 GLM模型从发布之初最主要的特点是将encoder-decoder相结合自编码随机 MASK 输入中连续跨度的 token自回归基于自回归空白填充的方法重新构建跨度中的内容具体模型这里看一下“原地漫游”大佬在ChatGLM2-6B模型推理流程和模型架构详解中做的GLM架构图架构中包含输入层、Embedding层、GLMBlock*28层、RMS层、输出层以及Residual网络和Rope。其中最核心的在于GLMBlock*28 输入层 Tokenizer将输入的文本序列转换为字或词标记的序列Input_ids将Tokenizer生成的词标记ID化。Embedding层将每个ID映射到一个固定维度的向量生成一个向量序列作为模型的初始输入表示GLMBlock*28重复28次类似qwen1.5中将layer堆叠包含2个大部分 Self-Attention先将输入进行Q、K、V矩阵映射引入RoPE位置网络后再进行attention注意力计算最后线性变换为输入同样的维度。输出后引入残差网络、Dropout、RMSNorm等方法方式过拟合。Feed-Forward Network (MLP)经过两层全连接变换最多扩至13696维度GLM4ChatGLM3均为13696ChatGLM2是27392提升表征能力。激活函数使用Swiglu代替Relu。与self-attention的输出后一样同样引入Dropout、RMSNorm方法。RMSNorm层标准化这里使用RMSNorm均方根标准化代替LayerNorm层标准化具有加速训练和改善模型的泛化能力的效果在实际的推荐系统工作中经常用到BatchNorm批量标准化在神经元激活函数前加上一个BN层使得每个批次的神经元输出遵循标准正态分布解决深度传播过程中随数据分布产生的协变量偏移问题。输出层将将embedding转换会字词编码之后decode为我们看到的文字。Residual Connection残差连接网络在深度学习中经常用到的技巧在神经网络的层与层之间添加一个直接的连接允许输入信号无损地传递到较深的层。这样设计的目的是为了缓解梯度消失和梯度爆炸问题同时促进梯度在深层网络中的流畅传播使得训练更高效模型更容易学习复杂的特征Rotary Position EmbeddingRoPE旋转位置编码Qwen、LLaMA也在用可以更好的学习词之间的位置信息。附GLMBlock官方源码 class GLMBlock(torch.nn.Module):A single transformer layer.Transformer layer takes input with size [s, b, h] and returns anoutput of the same size.def __init__(self, config: ChatGLMConfig, layer_number, deviceNone):super(GLMBlock, self).__init__()self.layer_number layer_numberself.apply_residual_connection_post_layernorm config.apply_residual_connection_post_layernormself.fp32_residual_connection config.fp32_residual_connectionLayerNormFunc RMSNorm if config.rmsnorm else LayerNorm# Layernorm on the input data.self.input_layernorm LayerNormFunc(config.hidden_size, epsconfig.layernorm_epsilon, devicedevice,dtypeconfig.torch_dtype)# Self attention.self.self_attention SelfAttention(config, layer_number, devicedevice)self.hidden_dropout config.hidden_dropout# Layernorm on the attention outputself.post_attention_layernorm LayerNormFunc(config.hidden_size, epsconfig.layernorm_epsilon, devicedevice,dtypeconfig.torch_dtype)# MLPself.mlp MLP(config, devicedevice)def forward(self, hidden_states, attention_mask, rotary_pos_emb, kv_cacheNone, use_cacheTrue,):# hidden_states: [s, b, h]# Layer norm at the beginning of the transformer layer.layernorm_output self.input_layernorm(hidden_states)# Self attention.attention_output, kv_cache self.self_attention(layernorm_output,attention_mask,rotary_pos_emb,kv_cachekv_cache,use_cacheuse_cache)# Residual connection.if self.apply_residual_connection_post_layernorm:residual layernorm_outputelse:residual hidden_stateslayernorm_input torch.nn.functional.dropout(attention_output, pself.hidden_dropout, trainingself.training)layernorm_input residual layernorm_input# Layer norm post the self attention.layernorm_output self.post_attention_layernorm(layernorm_input)# MLP.mlp_output self.mlp(layernorm_output)# Second residual connection.if self.apply_residual_connection_post_layernorm:residual layernorm_outputelse:residual layernorm_inputoutput torch.nn.functional.dropout(mlp_output, pself.hidden_dropout, trainingself.training)output residual outputreturn output, kv_cache 附GLMBlock大图by 原地漫游三、模型推理 3.1 GLM4-9B-Chat 语言模型以为官方样例代码直接就能跑结果由于网络、GPU、依赖包版本问题卡了好久有趣的是GLM卡了太长时间于是先去Qwen1.5官网找了源码调通后平移到GLM。这怎么评价呢网络使用modelscope代替huggingface下载模型GPUtransformers支持多种GPU指定方式这里用到了两种均以字符串cuda:2形式指定 tokenizer或model变量后加.to(cuda:2)方法在from_pretrained里加入device_mapcuda:2参数。pip安装依赖包transformers、mdeolscope、torch2.3.0、torchvision0.18.0最好用腾讯源安装节约很多时间 pip install torch2.3.0 -i https://mirrors.cloud.tencent.com/pypi/simple 3.1.1 model.generate 需要apply_chat_template应用对话模版引入对话messages数组以及设置add_generation_promptTrue对含有对话角色的message输入进行解析处理。大致意思就是将多个对话安装顺序展开成一行并在每个角色对话之间加入“特殊符号”分割区分。具体可以参考如何设置transformers的聊天模板chat_template from modelscope import snapshot_download from transformers import AutoTokenizer, AutoModelForCausalLM model_dir snapshot_download(ZhipuAI/glm-4-9b-chat) import torchdevice cuda:2 # the device to load the model ontotokenizer AutoTokenizer.from_pretrained(model_dir,trust_remote_codeTrue)prompt 介绍一下大语言模型 messages [{role: system, content: 你是一个智能助理.},{role: user, content: prompt} ] text tokenizer.apply_chat_template(messages,tokenizeFalse,add_generation_promptTrue ) model_inputs tokenizer([text], return_tensorspt).to(device)model AutoModelForCausalLM.from_pretrained(model_dir,device_mapcuda:2,trust_remote_codeTrue )gen_kwargs {max_length: 512, do_sample: True, top_k: 1} with torch.no_grad():outputs model.generate(**model_inputs, **gen_kwargs)outputs outputs[:, model_inputs[input_ids].shape[1]:]print(tokenizer.decode(outputs[0], skip_special_tokensTrue)) generated_ids model.generate(model_inputs.input_ids,max_new_tokens512 ) generated_ids [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ]response tokenizer.batch_decode(generated_ids, skip_special_tokensTrue)[0] print(response) 运行结果如下共计消耗GPU显存18G 3.1.2 model.chat 代码干净简洁好理解并可以轻松实现多轮对话。只需要实例化tokenizer和model就可以了。ChatGLM和Qwen1.0早期均采用model.chat直接生成对话作为样例后来可能系统提示词system prompt太刚需了所以都采用apply_chat_template了。是这样吗 from modelscope import snapshot_download from transformers import AutoTokenizer, AutoModelForCausalLM model_dir snapshot_download(ZhipuAI/glm-4-9b-chat)#from modelscope import AutoModelForCausalLM, AutoTokenizer #from modelscope import GenerationConfigtokenizer AutoTokenizer.from_pretrained(model_dir, trust_remote_codeTrue) model AutoModelForCausalLM.from_pretrained(model_dir, device_mapcuda:2, trust_remote_codeTrue, torch_dtypetorch.bfloat16).eval() #model.generation_config GenerationConfig.from_pretrained(ZhipuAI/glm-4-9b-chat, trust_remote_codeTrue) # 可指定不同的生成长度、top_p等相关超参response, history model.chat(tokenizer, 你好, historyNone) print(response) response, history model.chat(tokenizer, 浙江的省会在哪里, historyhistory) print(response) response, history model.chat(tokenizer, 它有什么好玩的景点, historyhistory) print(response) 多轮对话结果 3.2 GLM-4V-9B 多模态模型同时GLM还发布了图像识别大模型GLM-4V-9B8K 3.2.1 多模态模型概述该模型采用了与CogVLM2相似的架构设计能够处理高达1120 x 1120分辨率的输入并通过降采样技术有效减少了token的开销。为了减小部署与计算开销GLM-4V-9B没有引入额外的视觉专家模块采用了直接混合文本和图片数据的方式进行训练在保持文本性能的同时提升多模态能力。 3.2.2 多模态模型实践上自己调通的代码 from modelscope import snapshot_download from transformers import AutoTokenizer, AutoModelForCausalLM model_dir snapshot_download(ZhipuAI/glm-4v-9b) import torch from PIL import Imagedevice cuda:2 # the device to load the model ontotokenizer AutoTokenizer.from_pretrained(model_dir,trust_remote_codeTrue)prompt 描述一下这张图片 image Image.open(./test_pic.png).convert(RGB) messages [{role: user, image:image,content: prompt} ] text tokenizer.apply_chat_template(messages,tokenizeFalse,add_generation_promptTrue ) model_inputs tokenizer([text], return_tensorspt).to(device)model AutoModelForCausalLM.from_pretrained(model_dir,device_mapcuda:2,trust_remote_codeTrue )gen_kwargs {max_length: 512, do_sample: True, top_k: 1} with torch.no_grad():outputs model.generate(**model_inputs, **gen_kwargs)outputs outputs[:, model_inputs[input_ids].shape[1]:]print(tokenizer.decode(outputs[0], skip_special_tokensTrue)) generated_ids model.generate(model_inputs.input_ids,max_new_tokens512 ) generated_ids [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) ]response tokenizer.batch_decode(generated_ids, skip_special_tokensTrue)[0] print(response) 不过官方表示GLM-4V-9B参数量达到13B之前baichuan2-13B计算过大概需要13*2.532.5G的显存本人使用32B的单卡直接爆显存了。如果官方能看到真希望再优化一丢丢。四、总结本文首先对GLM4-9B的模型特点及原理进行介绍接着分别对GLM4-9B-Chat语言大模型和GLM-4V-9B多模态大模型进行代码实践。之前更多使用LLaMA_Factory、Xinference等框架对模型的Chat、Client及Api进行测试和部署很多框架真的已经封装的非常易用一件部署前端管理transformers原生版的反倒生疏了。最近正在夯实transformers库的知识基础知识扎实在AI智能体开发过程中遇到问题才能游刃有余上限更高。期待您的关注三连您的鼓励让我创作更加充满动力如果您还有时间可以看看我的其他文章《AI—工程篇》 AI智能体研发之路-工程篇一Docker助力AI智能体开发提效 AI智能体研发之路-工程篇二Dify智能体开发平台一键部署 AI智能体研发之路-工程篇三大模型推理服务框架Ollama一键部署 AI智能体研发之路-工程篇四大模型推理服务框架Xinference一键部署 AI智能体研发之路-工程篇五大模型推理服务框架LocalAI一键部署《AI-模型篇》 AI智能体研发之路-模型篇一大模型训练框架LLaMA-Factory在国内网络环境下的安装、部署及使用 AI智能体研发之路-模型篇二DeepSeek-V2-Chat 训练与推理实战 AI智能体研发之路-模型篇三中文大模型开、闭源之争 AI智能体研发之路-模型篇四一文入门pytorch开发 AI智能体研发之路-模型篇五pytorch vs tensorflow框架DNN网络结构源码级对比 AI智能体研发之路-模型篇六【机器学习】基于tensorflow实现你的第一个DNN网络 AI智能体研发之路-模型篇七【机器学习】基于YOLOv10实现你的第一个视觉AI大模型 AI智能体研发之路-模型篇八【机器学习】Qwen1.5-14B-Chat大模型训练与推理实战

查看全文

http://www.w-s-a.com/news/254292/