当前位置：首页 > news >正文

深圳网站建设方案优化标准件做啥网站

news 2025/12/18 17:18:05

深圳网站建设方案优化,标准件做啥网站,深圳华汇设计,教学类网站开发参考链接#xff1a;fastText/python/README.md at main facebookresearch/fastText (github.com) fastText模块介绍 fastText 是一个用于高效学习单词表述和句子分类的库。在本文档中#xff0c;我们将介绍如何在 python 中使用 fastText。环境要求 fastText 可在现代 …参考链接fastText/python/README.md at main · facebookresearch/fastText (github.com) fastText模块介绍 fastText 是一个用于高效学习单词表述和句子分类的库。在本文档中我们将介绍如何在 python 中使用 fastText。环境要求 fastText 可在现代 Mac OS 和 Linux 发行版上运行。由于它使用了 C11 功能因此需要一个支持 C11 的编译器。您需要 Python版本 2.7 或 ≥ 3.4、NumPy SciPy 和 pybind11。安装要安装最新版本可以执行 $ pip install fasttext 或者要获取 fasttext 的最新开发版本您可以从我们的 github 代码库中安装 $ git clone https://github.com/facebookresearch/fastText.git $ cd fastText $ sudo pip install . $ # or : $ sudo python setup.py install 使用概览词语表征模型为了像这里描述的那样学习单词向量我们可以像这样使用 fasttext.train_unsupervised 函数 import fasttext# Skipgram model : model fasttext.train_unsupervised(data.txt, modelskipgram)# or, cbow model : model fasttext.train_unsupervised(data.txt, modelcbow)其中data.txt 是包含 utf-8 编码文本的训练文件。返回的模型对象代表您学习的模型您可以用它来检索信息。 print(model.words) # list of words in dictionary print(model[king]) # get the vector of the word king 保存和加载模型对象调用函数 save_model 可以保存训练好的模型对象。 model.save_model(model_filename.bin) 并通过函数 load_model 加载模型参数 model fasttext.load_model(model_filename.bin) 文本分类模型为了使用这里介绍的方法训练文本分类器我们可以这样使用 fasttext.train_supervised 函数 import fasttextmodel fasttext.train_supervised(data.train.txt) 其中 data.train.txt 是一个文本文件每行包含一个训练句子和标签。默认情况下我们假定标签是以字符串 __label__ 为前缀的单词。模型训练完成后我们就可以检索单词和标签列表 print(model.words) print(model.labels) 为了通过在测试集上计算精度为 1 (P1) 和召回率来评估我们的模型我们使用了测试函数 def print_results(N, p, r):print(N\t str(N))print(P{}\t{:.3f}.format(1, p))print(R{}\t{:.3f}.format(1, r))print_results(*model.test(test.txt)) 我们还可以预测特定文本的标签 model.predict(Which baking dish is best to bake a banana bread ?) 默认情况下predict 只返回一个标签概率最高的标签。您也可以通过指定参数 k 来预测多个标签 model.predict(Which baking dish is best to bake a banana bread ?, k3) 如果您想预测多个句子可以传递一个字符串数组 model.predict([Which baking dish is best to bake a banana bread ?, Why not put knives in the dishwasher?], k3) 当然您也可以像文字表示法那样将模型保存到文件或从文件加载模型。用量化技术压缩模型文件当您想保存一个经过监督的模型文件时fastText 可以对其进行压缩从而只牺牲一点点性能获得更小的模型文件。 # with the previously trained model object, call : model.quantize(inputdata.train.txt, retrainTrue)# then display results and save the new model : print_results(*model.test(valid_data)) model.save_model(model_filename.ftz) model_filename.ftz 的大小将远远小于 model_filename.bin。重要预处理数据/编码约定一般来说对数据进行适当的预处理非常重要。特别是根文件夹中的示例脚本可以做到这一点。 fastText 假定使用 UTF-8 编码的文本。对于 Python2所有文本都必须是 unicode对于 Python3所有文本都必须是 str。传入的文本将由 pybind11 编码为 UTF-8然后再传给 fastText C 库。这意味着在构建模型时使用 UTF-8 编码的文本非常重要。在类 Unix 系统中可以使用 iconv 转换文本。 fastText 将根据以下 ASCII 字符字节进行标记化将文本分割成片段。特别是它无法识别 UTF-8 的空白。我们建议用户将UTF-8 空格/单词边界转换为以下适当的符号之一。空间选项卡垂直制表符回车换页空字符换行符用于分隔文本行。特别是如果遇到换行符EOS 标记就会被附加到文本行中。唯一的例外情况是标记的数量超过了字典标题中定义的 MAX_LINE_SIZE 常量。这意味着如果文本没有换行符分隔例如 fil9 数据集它将被分割成具有 MAX_LINE_SIZE 的标记块而 EOS 标记不会被附加。标记符的长度是UTF-8 字符的数量通过考虑字节的前两位来识别多字节序列的后续字节。在选择子字的最小和最大长度时了解这一点尤为重要。此外EOS 标记在字典标头中指定被视为一个字符不会被分解为子字。更多实例为了更好地了解 fastText 模型请参阅主 README特别是我们网站上的教程。您还可以在 doc 文件夹中找到更多 Python 示例。与其他软件包一样您可以使用 help 函数获得有关任何 Python 函数的帮助。例如 import fasttexthelp(fasttext.FastText)Help on module fasttext.FastText in fasttext:NAMEfasttext.FastTextDESCRIPTION# Copyright (c) 2017-present, Facebook, Inc.# All rights reserved.## This source code is licensed under the MIT license found in the# LICENSE file in the root directory of this source tree.FUNCTIONSload_model(path)Load a model given a filepath and return a model object.tokenize(text)Given a string of text, tokenize it and return a list of tokens [...] API——应用程序接口 train_unsupervised 无监督训练参数 input # training file path (required)model # unsupervised fasttext model {cbow, skipgram} [skipgram]lr # learning rate [0.05]dim # size of word vectors [100]ws # size of the context window [5]epoch # number of epochs [5]minCount # minimal number of word occurences [5]minn # min length of char ngram [3]maxn # max length of char ngram [6]neg # number of negatives sampled [5]wordNgrams # max length of word ngram [1]loss # loss function {ns, hs, softmax, ova} [ns]bucket # number of buckets [2000000]thread # number of threads [number of cpus]lrUpdateRate # change the rate of updates for the learning rate [100]t # sampling threshold [0.0001]verbose # verbose [2] train_supervised parameters监督训练参数 input # training file path (required)lr # learning rate [0.1]dim # size of word vectors [100]ws # size of the context window [5]epoch # number of epochs [5]minCount # minimal number of word occurences [1]minCountLabel # minimal number of label occurences [1]minn # min length of char ngram [0]maxn # max length of char ngram [0]neg # number of negatives sampled [5]wordNgrams # max length of word ngram [1]loss # loss function {ns, hs, softmax, ova} [softmax]bucket # number of buckets [2000000]thread # number of threads [number of cpus]lrUpdateRate # change the rate of updates for the learning rate [100]t # sampling threshold [0.0001]label # label prefix [__label__]verbose # verbose [2]pretrainedVectors # pretrained word vectors (.vec file) for supervised learning [] 模型对象、 train_supervised、train_unsupervised 和 load_model 函数返回 _FastText 类的一个实例我们一般将其命名为模型对象。该对象将这些训练参数作为属性公开lr、dim、ws、epoch、minCount、minCountLabel、minn、maxn、neg、wordNgrams、loss、bucket、thread、lrUpdateRate、t、label、verbose、pretrainedVectors。因此model.wordNgrams 将给出用于训练该模型的单词 ngram 的最大长度。此外该对象还公开了多个函数 get_dimension # Get the dimension (size) of a lookup vector (hidden layer).# This is equivalent to dim property.get_input_vector # Given an index, get the corresponding vector of the Input Matrix.get_input_matrix # Get a copy of the full input matrix of a Model.get_labels # Get the entire list of labels of the dictionary# This is equivalent to labels property.get_line # Split a line of text into words and labels.get_output_matrix # Get a copy of the full output matrix of a Model.get_sentence_vector # Given a string, get a single vector represenation. This function# assumes to be given a single line of text. We split words on# whitespace (space, newline, tab, vertical tab) and the control# characters carriage return, formfeed and the null character.get_subword_id # Given a subword, return the index (within input matrix) it hashes to.get_subwords # Given a word, get the subwords and their indicies.get_word_id # Given a word, get the word id within the dictionary.get_word_vector # Get the vector representation of word.get_words # Get the entire list of words of the dictionary# This is equivalent to words property.is_quantized # whether the model has been quantizedpredict # Given a string, get a list of labels and a list of corresponding probabilities.quantize # Quantize the model reducing the size of the model and its memory footprint.save_model # Save the model to the given pathtest # Evaluate supervised model using file given by pathtest_label # Return the precision and recall score for each label. 属性 words, labels 返回字典中的单词和标签 model.words # equivalent to model.get_words() model.labels # equivalent to model.get_labels() 该对象重载了 __getitem__ 和 __contains__ 函数以便返回单词的表示形式和检查单词是否在词汇表中 model[king] # equivalent to model.get_word_vector(king) king in model # equivalent to king in model.get_words()

查看全文

http://www.w-s-a.com/news/260796/