功能: - 基于 ONNX 的语音识别引擎 - 多语言支持(中文、英文、日语、韩语) - 模型加载器(支持 SenseVoice/Whisper/Paraformer) - 音频采集和处理模块(VAD、重采样、归一化) - 文本输出模块(剪贴板) - CLI 命令行工具 - Electron GUI 界面 - Windows x64 打包配置 文档: - PRD 产品需求文档 - README 项目说明 - 开发指南 - Windows 构建指南 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
226 lines
4.6 KiB
Markdown
226 lines
4.6 KiB
Markdown
# 开发指南
|
||
|
||
## 项目结构
|
||
|
||
```
|
||
impress-asr-input/
|
||
├── src/
|
||
│ ├── core/ # 核心模块
|
||
│ │ ├── audio-recorder.ts # 音频采集
|
||
│ │ ├── audio-processor.ts # 音频处理(VAD、重采样等)
|
||
│ │ ├── speech-recognizer.ts # ONNX 语音识别引擎
|
||
│ │ ├── text-output.ts # 文本输出
|
||
│ │ └── index.ts # 模块导出
|
||
│ ├── ui/ # Electron UI
|
||
│ │ └── index.html # 主界面
|
||
│ ├── electron-main.ts # Electron 主进程
|
||
│ ├── preload.ts # Electron 预加载脚本
|
||
│ ├── main.ts # CLI 入口
|
||
│ └── utils/
|
||
│ └── config.ts # 配置管理
|
||
├── models/ # ONNX 模型文件(需自行下载)
|
||
├── scripts/
|
||
│ └── postinstall.js # 安装后脚本
|
||
├── test/
|
||
│ └── audio-processor.test.ts # 单元测试
|
||
├── package.json
|
||
├── tsconfig.json
|
||
└── PRD.md
|
||
```
|
||
|
||
## 开发环境设置
|
||
|
||
### 前置要求
|
||
|
||
- Node.js >= 20.0.0
|
||
- npm >= 9.0.0
|
||
|
||
### 安装步骤
|
||
|
||
```bash
|
||
# 安装依赖
|
||
npm install
|
||
|
||
# 下载模型文件(见下文)
|
||
|
||
# 开发模式运行
|
||
npm run dev
|
||
|
||
# 开发模式运行 Electron
|
||
npm run dev:electron
|
||
```
|
||
|
||
## 模型下载
|
||
|
||
### 推荐模型
|
||
|
||
#### 1. SenseVoice(推荐)
|
||
|
||
```bash
|
||
# HuggingFace 下载
|
||
# https://huggingface.co/FunAudioLLM/SenseVoice/tree/main
|
||
|
||
# 或使用 ModelScope
|
||
# https://www.modelscope.cn/models/iic/SenseVoiceSmall
|
||
```
|
||
|
||
将 `model.onnx` 放入 `models/` 目录。
|
||
|
||
#### 2. Whisper ONNX
|
||
|
||
```bash
|
||
# HuggingFace
|
||
# https://huggingface.co/onnx-community/whisper-base
|
||
|
||
# 直接下载
|
||
huggingface-cli download onnx-community/whisper-base --local-dir models/
|
||
```
|
||
|
||
#### 3. Paraformer(中文优化)
|
||
|
||
```bash
|
||
# ModelScope
|
||
# https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct
|
||
```
|
||
|
||
### 模型配置
|
||
|
||
在 `src/main.ts` 或设置界面中指定模型路径:
|
||
|
||
```typescript
|
||
const recognizer = new SpeechRecognizer({
|
||
modelPath: './models/model.onnx',
|
||
language: 'zh',
|
||
useVad: true,
|
||
beamSize: 5,
|
||
});
|
||
```
|
||
|
||
## 开发命令
|
||
|
||
```bash
|
||
# 编译 TypeScript
|
||
npm run build
|
||
|
||
# 运行 CLI
|
||
npm start -- start
|
||
|
||
# 运行测试
|
||
npm test
|
||
|
||
# 代码检查
|
||
npm run lint
|
||
|
||
# 构建 Electron 应用
|
||
npm run build:electron
|
||
```
|
||
|
||
## 核心模块说明
|
||
|
||
### AudioRecorder(音频采集)
|
||
|
||
负责从麦克风采集音频数据。
|
||
|
||
```typescript
|
||
const recorder = new AudioRecorder({
|
||
sampleRate: 16000,
|
||
channels: 1,
|
||
chunkDuration: 100,
|
||
});
|
||
|
||
recorder.on('data', (chunk: AudioChunk) => {
|
||
// 处理音频数据
|
||
});
|
||
|
||
await recorder.start();
|
||
```
|
||
|
||
**注意**: 当前实现基于 Web Audio API,在纯 Node.js 环境中需要使用其他方案(如 `node-audio` 或 Electron 的音频 API)。
|
||
|
||
### SpeechRecognizer(语音识别)
|
||
|
||
基于 ONNX Runtime 的语音识别引擎。
|
||
|
||
```typescript
|
||
const recognizer = new SpeechRecognizer({
|
||
modelPath: './models/model.onnx',
|
||
language: 'zh',
|
||
useVad: true,
|
||
});
|
||
|
||
recognizer.on('result', (result: RecognitionResult) => {
|
||
console.log(result.text);
|
||
});
|
||
|
||
await recognizer.initialize();
|
||
recognizer.start();
|
||
```
|
||
|
||
### TextOutput(文本输出)
|
||
|
||
将识别结果输出到剪贴板。
|
||
|
||
```typescript
|
||
const output = new TextOutput({
|
||
outputMode: 'clipboard',
|
||
});
|
||
|
||
output.output({ text: '你好', isFinal: true, confidence: 0.95, timestamp: Date.now() });
|
||
```
|
||
|
||
### SimpleVAD(语音端点检测)
|
||
|
||
简单的能量检测 VAD 实现。
|
||
|
||
```typescript
|
||
const vad = new SimpleVAD({
|
||
energyThreshold: 0.01,
|
||
silenceDuration: 500,
|
||
});
|
||
|
||
const { isSpeaking, isFinal } = vad.process(audioFrame, 16000);
|
||
```
|
||
|
||
## 添加新模型支持
|
||
|
||
1. 在 `models/` 目录创建模型配置文件:
|
||
|
||
```typescript
|
||
// src/core/models/sensevoice.ts
|
||
export const senseVoiceConfig = {
|
||
inputShape: [1, 16000],
|
||
outputKeys: ['output', 'logits'],
|
||
// ...
|
||
};
|
||
```
|
||
|
||
2. 在 `SpeechRecognizer` 中添加模型适配逻辑。
|
||
|
||
## 常见问题
|
||
|
||
### Q: 如何调试音频采集?
|
||
|
||
```typescript
|
||
recorder.on('data', (chunk) => {
|
||
console.log('音频数据:', chunk.data.length, '采样率:', chunk.sampleRate);
|
||
});
|
||
```
|
||
|
||
### Q: 识别延迟高?
|
||
|
||
1. 使用量化模型(int8)
|
||
2. 减少 `chunkDuration`
|
||
3. 启用 `useVad` 减少无效识别
|
||
|
||
### Q: Electron 打包失败?
|
||
|
||
检查 `package.json` 中的 `build` 配置,确保模型文件被包含。
|
||
|
||
## 贡献指南
|
||
|
||
1. Fork 项目
|
||
2. 创建特性分支 (`git checkout -b feature/AmazingFeature`)
|
||
3. 提交更改 (`git commit -m 'Add some AmazingFeature'`)
|
||
4. 推送到分支 (`git push origin feature/AmazingFeature`)
|
||
5. 开启 Pull Request
|