怎样换网站logo,公司名字大全集免费,手机官网制作,网站开发面试题文章目录 学习目标#xff1a;如何使用whisper学习内容一#xff1a;whisper 转文字1.1 使用whisper.load_model()方法下载#xff0c;加载1.2 使用实例对文件进行转录1.3 实战 学习内容二#xff1a;语者分离#xff08;pyannote.audio#xff09;pyannote.audio是huggi… 文章目录 学习目标如何使用whisper学习内容一whisper 转文字1.1 使用whisper.load_model()方法下载加载1.2 使用实例对文件进行转录1.3 实战 学习内容二语者分离pyannote.audiopyannote.audio是huggingface开源音色包第一步安装依赖第二步创建key第三步测试pyannote.audio 学习内容三整合 学习目标如何使用whisper 学习内容一whisper 转文字 1.1 使用whisper.load_model()方法下载加载
modelwhisper.load_model(参数)name 需要加载的模型如上图device默认有个方法有显存使用显存没有使用cpudownload_root:下载的根目录默认使用~/.cache/whisperin_memory 是否将模型权重预加载到主机内存中 返回值 model : Whisper Whisper语音识别模型实例
def load_model(name: str,device: Optional[Union[str, torch.device]] None,download_root: str None,in_memory: bool False,
) - Whisper:Load a Whisper ASR modelParameters----------name : strone of the official model names listed by whisper.available_models(), orpath to a model checkpoint containing the model dimensions and the model state_dict.device : Union[str, torch.device]the PyTorch device to put the model intodownload_root: strpath to download the model files; by default, it uses ~/.cache/whisperin_memory: boolwhether to preload the model weights into host memoryReturns-------model : WhisperThe Whisper ASR model instanceif device is None:device cuda if torch.cuda.is_available() else cpuif download_root is None:default os.path.join(os.path.expanduser(~), .cache)download_root os.path.join(os.getenv(XDG_CACHE_HOME, default), whisper)if name in _MODELS:checkpoint_file _download(_MODELS[name], download_root, in_memory)alignment_heads _ALIGNMENT_HEADS[name]elif os.path.isfile(name):checkpoint_file open(name, rb).read() if in_memory else namealignment_heads Noneelse:raise RuntimeError(fModel {name} not found; available models {available_models()})with (io.BytesIO(checkpoint_file) if in_memory else open(checkpoint_file, rb)) as fp:checkpoint torch.load(fp, map_locationdevice)del checkpoint_filedims ModelDimensions(**checkpoint[dims])model Whisper(dims)model.load_state_dict(checkpoint[model_state_dict])if alignment_heads is not None:model.set_alignment_heads(alignment_heads)return model.to(device)1.2 使用实例对文件进行转录 result model.transcribe(file_path) def transcribe(model: Whisper,audio: Union[str, np.ndarray, torch.Tensor],*,verbose: Optional[bool] None,temperature: Union[float, Tuple[float, ...]] (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),compression_ratio_threshold: Optional[float] 2.4,logprob_threshold: Optional[float] -1.0,no_speech_threshold: Optional[float] 0.6,condition_on_previous_text: bool True,initial_prompt: Optional[str] None,word_timestamps: bool False,prepend_punctuations: str \“¿([{-,append_punctuations: str \.。,!?:”)]}、,**decode_options,
):将音频转换为文本。参数- model: Whisper模型- audio: 音频文件路径、NumPy数组或PyTorch张量- verbose: 是否打印详细信息默认为None- temperature: 温度参数默认为(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)- compression_ratio_threshold: 压缩比阈值默认为2.4- logprob_threshold: 对数概率阈值默认为-1.0- no_speech_threshold: 无语音信号阈值默认为0.6- condition_on_previous_text: 是否根据先前的文本进行解码默认为True- initial_prompt: 初始提示默认为None- word_timestamps: 是否返回单词时间戳默认为False- prepend_punctuations: 前缀标点符号默认为\“¿([{-- append_punctuations: 后缀标点符号默认为\.。,!?:”)]}、- **decode_options: 其他解码选项返回- 转录得到的文本1.3 实战
建议load_model添加参数
download_root:下载的根目录默认使用~/.cache/whisper transcribe方法添加参数word_timestampsTrue
import whisper
import arrow# 定义模型、音频地址、录音开始时间
def excute(model_name,file_path,start_time):model whisper.load_model(model_name)result model.transcribe(file_path,word_timestampsTrue)for segment in result[segments]:now arrow.get(start_time)start now.shift(secondssegment[start]).format(YYYY-MM-DD HH:mm:ss)end now.shift(secondssegment[end]).format(YYYY-MM-DD HH:mm:ss)print(【start- end】segment[text])if __name__ __main__:excute(large,/root/autodl-tmp/no/test.mp3,2022-10-24 16:23:00) 学习内容二语者分离pyannote.audiopyannote.audio是huggingface开源音色包
第一步安装依赖
pip install pyannote.audio
第二步创建key
https://huggingface.co/settings/tokens
第三步测试pyannote.audio
创建实例Pipeline.from_pretrained(参数)使用GPU加速import torch # 导入torch库 pipeline.to(torch.device(“cuda”))实例转化音频pipeline(“test.wav”) from_pretrained(参数)
cache_dir:路径或str可选模型缓存目录的路径。默认/pyannote当未设置时。 pipeline(参数)
file_path录音文件num_speakers几个说话者可以不带 from pyannote.audio import Pipeline
pipeline Pipeline.from_pretrained(pyannote/speaker-diarization2.1, use_auth_token申请的key)# send pipeline to GPU (when available)
import torch
devicecuda if torch.cuda.is_available() else cpu
pipeline.to(torch.device(device))# apply pretrained pipeline
diarization pipeline(test.wav)
print(diarization)
# print the result
for turn, _, speaker in diarization.itertracks(yield_labelTrue):print(fstart{turn.start:.1f}s stop{turn.end:.1f}s speaker_{speaker})
# start0.2s stop1.5s speaker_0
# start1.8s stop3.9s speaker_1
# start4.2s stop5.7s speaker_0
# ...
学习内容三整合
这里要借助一个开源代码用于整合以上两种产生的结果
报错No module named pyannote_whisper 如果你使用使用AutoDL平台你可以使用学术代理加速
source /etc/network_turbogit clone https://github.com/yinruiqing/pyannote-whisper.git
cd pyannote-whisper
pip install -r requirements.txt这个错误可能是由于缺少或不正确安装了所需的 sndfile 库。sndfile 是一个用于处理音频文件的库它提供了多种格式的读写支持。
你可以尝试安装 sndfile 库方法如下
在 Ubuntu 上使用以下命令安装sudo apt-get install libsndfile1-dev 在 CentOS 上使用以下命令安装sudo yum install libsndfile-devel 在 macOS 上使用 Homebrew 安装brew install libsndfile 然后重新执行如上指令
在项目里面写代码就可以了,或者复制代码里面的pyannote_whisper.utils模块代码 import os
import whisper
from pyannote.audio import Pipeline
from pyannote_whisper.utils import diarize_text
import concurrent.futures
import subprocess
import torch
print(正在加载声纹模型)
pipeline Pipeline.from_pretrained(pyannote/speaker-diarization2.1,use_auth_tokenhf_GLcmZqbduJZbfEhJpNVZzKnkqkdcXRhVRw)
output_dir /root/autodl-tmp/no/out
print(正在whisper模型)
model whisper.load_model(large, devicecuda)# MP3转化为wav
def convert_to_wav(path):new_path if path[-3:] ! wav:new_path ..join(path.split(.)[:-1]) .wavtry:subprocess.call([ffmpeg, -i, path, new_path, -y, -an])except:return path, Error: Could not convert file to .wavelse:new_path return new_path, Nonedef process_audio(file_path):file_path, retmsg convert_to_wav(file_path)print(f{file_path})asr_result model.transcribe(file_path, initial_prompt语音转换)pipeline.to(torch.device(cuda))diarization_result pipeline(file_path, num_speakers2)final_result diarize_text(asr_result, diarization_result)output_file os.path.join(output_dir, os.path.basename(file_path)[:-4] .txt)with open(output_file, w) as f:for seg, spk, sent in final_result:line f{seg.start:.2f} {seg.end:.2f} {spk} {sent}\nf.write(line)if not os.path.exists(output_dir):os.makedirs(output_dir)wave_dir /root/autodl-tmp/no# 获取当前目录下所有wav文件名
wav_files [os.path.join(wave_dir, file) for file in os.listdir(wave_dir) if file.endswith(.mp3)]# 处理每个wav文件
# with concurrent.futures.ThreadPoolExecutor(max_workers1) as executor:
# executor.map(process_audio, wav_files)
for wav_file in wav_files:process_audio(wav_file)
print(处理完成)