python做个人网站,全媒体运营师证书怎么考,网站制作公司 知道万维科技,网址搜索栏在哪文章目录 前言代码 前言
当我们需要对大规模的数据向量化以存到向量数据库中时#xff0c;且服务器上有多个GPU可以支配#xff0c;我们希望同时利用所有的GPU来并行这一过程#xff0c;加速向量化。
代码
就几行代码#xff0c;不废话了
from sentence_transformers i… 文章目录 前言代码 前言
当我们需要对大规模的数据向量化以存到向量数据库中时且服务器上有多个GPU可以支配我们希望同时利用所有的GPU来并行这一过程加速向量化。
代码
就几行代码不废话了
from sentence_transformers import SentenceTransformer#Important, you need to shield your code with if __name__. Otherwise, CUDA runs into issues when spawning new processes.
if __name__ __main__:#Create a large list of 100k sentencessentences [This is sentence {}.format(i) for i in range(100000)]#Define the modelmodel SentenceTransformer(all-MiniLM-L6-v2)#Start the multi-process pool on all available CUDA devicespool model.start_multi_process_pool()#Compute the embeddings using the multi-process poolemb model.encode_multi_process(sentences, pool)print(Embeddings computed. Shape:, emb.shape)#Optional: Stop the proccesses in the poolmodel.stop_multi_process_pool(pool)注意一定要加if __name__ __main__:这一句不然报如下错
RuntimeError: An attempt has been made to start a new process before thecurrent process has finished its bootstrapping phase.This probably means that you are not using fork to start yourchild processes and you have forgotten to use the proper idiomin the main module:if __name__ __main__:freeze_support()...The freeze_support() line can be omitted if the programis not going to be frozen to produce an executable.其实官方已经给出代码啦我只不过复制粘贴了一下代码位置computing_embeddings_multi_gpu.py
官方还给出了流式encode的例子也是多GPU并行的如下
from sentence_transformers import SentenceTransformer, LoggingHandler
import logging
from datasets import load_dataset
from torch.utils.data import DataLoader
from tqdm import tqdmlogging.basicConfig(format%(asctime)s - %(message)s,datefmt%Y-%m-%d %H:%M:%S,levellogging.INFO,handlers[LoggingHandler()])#Important, you need to shield your code with if __name__. Otherwise, CUDA runs into issues when spawning new processes.
if __name__ __main__:#Set paramsdata_stream_size 16384 #Size of the data that is loaded into memory at oncechunk_size 1024 #Size of the chunks that are sent to each processencode_batch_size 128 #Batch size of the model#Load a large dataset in streaming mode. more info: https://huggingface.co/docs/datasets/streamdataset load_dataset(yahoo_answers_topics, splittrain, streamingTrue)dataloader DataLoader(dataset.with_format(torch), batch_sizedata_stream_size)#Define the modelmodel SentenceTransformer(all-MiniLM-L6-v2)#Start the multi-process pool on all available CUDA devicespool model.start_multi_process_pool()for i, batch in enumerate(tqdm(dataloader)):#Compute the embeddings using the multi-process poolsentences batch[best_answer]batch_emb model.encode_multi_process(sentences, pool, chunk_sizechunk_size, batch_sizeencode_batch_size)print(Embeddings computed for 1 batch. Shape:, batch_emb.shape)#Optional: Stop the proccesses in the poolmodel.stop_multi_process_pool(pool)官方案例computing_embeddings_streaming.py
-----------------------------------------------------------------------------
| NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 |
|---------------------------------------------------------------------------
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
||
| 0 NVIDIA A800-SXM... On | 00000000:23:00.0 Off | 0 |
| N/A 58C P0 297W / 400W | 75340MiB / 81920MiB | 100% Default |
| | | Disabled |
---------------------------------------------------------------------------
| 1 NVIDIA A800-SXM... On | 00000000:29:00.0 Off | 0 |
| N/A 71C P0 352W / 400W | 80672MiB / 81920MiB | 100% Default |
| | | Disabled |
---------------------------------------------------------------------------
| 2 NVIDIA A800-SXM... On | 00000000:52:00.0 Off | 0 |
| N/A 68C P0 398W / 400W | 75756MiB / 81920MiB | 100% Default |
| | | Disabled |
---------------------------------------------------------------------------
| 3 NVIDIA A800-SXM... On | 00000000:57:00.0 Off | 0 |
| N/A 58C P0 341W / 400W | 75994MiB / 81920MiB | 100% Default |
| | | Disabled |
---------------------------------------------------------------------------
| 4 NVIDIA A800-SXM... On | 00000000:8D:00.0 Off | 0 |
| N/A 56C P0 319W / 400W | 70084MiB / 81920MiB | 100% Default |
| | | Disabled |
---------------------------------------------------------------------------
| 5 NVIDIA A800-SXM... On | 00000000:92:00.0 Off | 0 |
| N/A 70C P0 354W / 400W | 76314MiB / 81920MiB | 100% Default |
| | | Disabled |
---------------------------------------------------------------------------
| 6 NVIDIA A800-SXM... On | 00000000:BF:00.0 Off | 0 |
| N/A 73C P0 360W / 400W | 75876MiB / 81920MiB | 100% Default |
| | | Disabled |
---------------------------------------------------------------------------
| 7 NVIDIA A800-SXM... On | 00000000:C5:00.0 Off | 0 |
| N/A 57C P0 364W / 400W | 80404MiB / 81920MiB | 100% Default |
| | | Disabled |
---------------------------------------------------------------------------嘎嘎快啊