怎么做浏览器网站吗,哈尔滨快速网站排名,互联广告精准营销,wordpress被黑GPT-2简介
GPT-2是一个由OpenAI于2019年提出的自回归语言模型。与GPT-1相比#xff0c;仍基于Transformer Decoder架构#xff0c;但是做出了一定改进。
模型规格上#xff1a;
GPT-1有117M参数#xff0c;为下游微调任务提供预训练模型。
GPT-2显著增加了模型规模仍基于Transformer Decoder架构但是做出了一定改进。
模型规格上
GPT-1有117M参数为下游微调任务提供预训练模型。
GPT-2显著增加了模型规模提供了多种模型如124M、355M、774M和1.5B
数据集大小上
GPT-2训练于数据量约有45GB的WebText数据集。数据集的数据收集于Reddit中的网络文章。
模型架构上
GPT-2维持了GPT-1的Decoder-only架构但是讲Decoder Block增加至48层采用了更深层的注意力机制和更大的前馈网络维度并改进了正则化。同时GPT-2加入了可学习的位置编码。将Layer Norm前置于模型获得输入后。一个额外的Layer Norm被添加于最后一个自注意力Block后。
参数初始化上
参数初始化上使用了一个Special Scaled Initialization。Special Scaled Initialization是Xavier Normalization的一种变体使用了额外的缩放。将因子n调整为残差连接的次数也就是block数量的两倍。
任务设定
一般语言模型的训练目标设置为
但是GPT-2通过同样的无监督模型来完成多个既定任务学习目标变为
这种修改被称为任务设定。对于同样的输入模型应该根据不同的任务输出不同的结果。
翻译任务 文本总结 其他下游任务基于zero-shot或few-shot的文本生成、文本总结、文本翻译、QA问答、文本分类。
基于MindSpore的GPT-2实践
复习Masked Multi Self-Attention
#安装mindnlp 0.4.0套件
!pip install mindnlp0.4.0
!pip uninstall soundfile -y
!pip install https://ms-release.obs.cn-north-4.myhuaweicloud.com/2.3.1/MindSpore/unified/aarch64/mindspore-2.3.1-cp39-cp39-linux_aarch64.whl --trusted-host ms-release.obs.cn-north-4.myhuaweicloud.com -i https://pypi.tuna.tsinghua.edu.cn/simple
假设一个批大小为1序列长度为10特征维度为768的输入 # GPT-2 Masked Self-Attention# assume an input of self-attention x, input_dim is 768
batch_size, seq_len, embed_dim 1, 10, 768x Tensor(np.random.randn(batch_size, seq_len, embed_dim), mindspore.float32)
x.shape
将输入复制三份作为Q、K、V。 import mindspore.ops as ops
from mindnlp.transformers.ms_utils import Conv1D# an input will be multipled by three matrixs Wq, Wk, Wv
# concat the three matrixs, you will get a matrix of (768, 768*3)
# x matmul matrix, the output would be (batch_size, seq_len, 768*3)
c_attn Conv1D(3 * embed_dim, embed_dim)
output c_attn(x)
# split the output into q, k, v
query, key, value ops.Split(axis2, output_num3)(output)
query.shape, key.shape, value.shape 将注意力分头
# split self-attention into multi_head attention
def split_heads(tensor, num_heads, attn_head_size):Spilit hidden_size dim into attn_head_size and num_headsArgs:tensor: tensor to splitnum_heads: how many heads to splitattn_head_size: hidden_size of each headReturn:Multi-Head tensornew_shape tensor.shape[:-1] (num_heads, attn_head_size)tensor tensor.view(new_shape)return ops.transpose(tensor, (0, 2, 1, 3))num_heads 12
attn_head_size embed_dim // num_headsquery split_heads(query, num_heads, attn_head_size)
key split_heads(key, num_heads, attn_head_size)
value split_heads(value, num_heads, attn_head_size)query.shape, key.shape, value.shape
将Q、K相乘得到注意力分数
# get self-attention score
attn_weights ops.matmul(query, key.swapaxes(-1, -2))
attn_weights.shape
将注意力分数加上掩码防止模型看见“未来”的数据 # get mask attn_weighs
max_positions seq_len
# create a mask matrix
bias Tensor(np.tril(np.ones((max_positions, max_positions))).reshape((1, 1, max_positions, max_positions)), mindspore.bool_)# apply Mask Matrix to get masked scores
# this normalization helps stabilize gradients
# and is common in scaled dot-product attention mechanisms
attn_weights attn_weights / ops.sqrt(ops.scalar_to_tensor(value.shape[-1]))query_length, key_length query.shape[-2], key.shape[-2]
causal_mask bias[:, :, key_length - query_length: key_length, :key_length].bool()
mask_value Tensor(np.finfo(np.float32).min, dtypeattn_weights.dtype)
attn_weights ops.where(causal_mask, attn_weights, mask_value)经过SoftMax层得到掩码分数 # get attn scores
attn_weights ops.softmax(attn_weights, axis-1)
掩码分数与V相乘得到注意力输出
# get output of Masked Self-Attention
attn_output ops.matmul(attn_weights, value)
attn_output.shape 将头合并
# merge multi heads
def merge_heads(tensor, num_heads, attn_head_size):Merge attn_head_size dim and num_attn_heads dim to hidden_sizetensor ops.transpose(tensor, (0, 2, 1, 3))new_shape tensor.shape[:-2] (num_heads * attn_head_size, )return tensor.view(new_shape)attn_output merge_heads(attn_output, num_heads, attn_head_size)
attn_output.shape
将输出与Wv相乘得到最终输出
# project Attnetion results with Wv
projection Conv1D(embed_dim, embed_dim)
attn_output projection(attn_output)
attn_output.shape
基于MindSpore的GPT2文本摘要
基于GPT-2实现一个简单的文本摘要。
# 数据加载与预处理
from mindnlp.utils import http_get# download dataset
url https://download.mindspore.cn/toolkits/mindnlp/dataset/text_generation/nlpcc2017/train_with_summ.txt
path http_get(url, ./)from mindspore.dataset import TextFileDataset# load dataset
dataset TextFileDataset(str(path), shuffle False)
dataset.get_dataset_size()mini_dataset, _ dataset.split([0.001, 0.999], randomizeFalse)
train_dataset, test_dataset mini_dataset.split([0.9, 0.1], randomizeFalse)import json
import numpy as np
def process_dataset(dataset, tokenizer, batch_size4, max_seq_len1024, shuffleFalse):数据预处理原始数据格式article[CLS] article_context [SEP]summary[CLS] summary_context [SEP]预处理后的数据格式[CLS] article_context [SEP] summary_context [SEP]def read_map(text):sub function to change the form of datadata json.loads(text.tobytes())print(data)return np.array(data[article]), np.array(data[summarization])def merge_and_pad(article, summary):# tokenization, pad to max_seq_length, only article will be truncatedtokenized tokenizer(textarticle, text_parisummary, paddingmax_length, truncationonly_first, max_lengthmax_seq_len)# Returns tokenized input IDs for both the input (input_ids) and the labels.return tokenized[input_ids], tokenized[input_ids]# text: Input column to process.# [article, summary]: Names of the output columns after processing.dataset dataset.map(read_map, output_columns[article, summary])dataset dataset.map(merge_and_pad, [article, summary], [input_ids, labels])dataset dataset.batch(batch_size)if shuffle:dataset dataset.shuffle(batch_size)return datasetfrom mindnlp.transformers import BertTokenizer# Load BERT-base-Chinese tokenizer
tokenizer BertTokenizer.from_pretrained(bert-base-chinese)# load train dataset
train_dataset process_dataset(train_dataset, tokenizer, batch_size1)# model architecture of GPT2ForSummarization
from mindnlp.transformers import GPT2LMHeadModelclass GPT2ForSummarization(GPT2LMHeadModel):def forward(self, input_idsNone, attention_maskNone, labelsNone):outputs super().forward(input_idsinput_ids, attention_maskattention_mask)shift_logits outputs.logits[..., :-1, :]shift_labels labels[..., 1:]loss ops.cross_entropy(shift_logits.view(-1, shift_logits.shape[-1]), shift_labels.view(-1), ignore_indextokenizer.pad_token_id)return (loss, )num_epochs 1
warmup_steps 100
lr 1.5e-4
max_grad_norm 1.0
num_training_steps num_epochs * train_dataset.get_dataset_size()from mindspore import nn
from mindnlp.transformers import GPT2Config, GPT2LMHeadModelconfig GPT2Config(vocab_sizelen(tokenizer))
model GPT2ForSummarization(config)from mindnlp.engine import TrainingArgumentstraining_args TrainingArguments(output_dirgpt2_summarization,save_stepstrain_dataset.get_dataset_size(),save_total_limit3,logging_steps1000,max_stepsnum_training_steps,learning_ratelr,max_grad_normmax_grad_norm,warmup_stepswarmup_steps)from mindnlp.engine import Trainertrainer Trainer(modelmodel,argstraining_args,train_datasettrain_dataset,
)trainer.train()def process_test_dataset(test_dataset, tokenizer, batch_size1, max_seq_len1024, max_summary_len100):def read_map(text):data json.loads(text.tobytes())return np.array(data[article]), np.array(data[summarization])def pad(article):tokenized tokenizer(textarticle, truncationTrue, max_lengthmax_seq_len-max_summary_len)return tokenized[input_ids]test_dataset test_dataset.map(read_map, output_columns[article, summary])test_dataset test_dataset.map(pad, article, [input_ids])test_dataset test_dataset.batch(batch_size)return test_datasettokenizer_test BertTokenizer.from_pretrained(bert-base-chinese)batched_test_dataset process_test_dataset(test_dataset, tokenizer_test, batch_size1)model GPT2LMHeadModel.from_pretrained(./gpt2_summarization/checkpoint-45, configconfig)model.set_train(False)
model.config.eos_token_id model.config.sep_token_id
i 0
for (input_ids, raw_summary) in batched_test_dataset.create_tuple_iterator():output_ids model.generate(input_ids, max_new_tokens50, num_beams5, no_repeat_ngram_size2)output_text tokenizer.decode(output_ids[0].tolist())print(input, tokenizer.decode(input_ids[0].tolist()))print()print(output_text)i 1if i 1:break