Initial commit: Impress ASR Input 项目基础框架
功能: - 基于 ONNX 的语音识别引擎 - 多语言支持(中文、英文、日语、韩语) - 模型加载器(支持 SenseVoice/Whisper/Paraformer) - 音频采集和处理模块(VAD、重采样、归一化) - 文本输出模块(剪贴板) - CLI 命令行工具 - Electron GUI 界面 - Windows x64 打包配置 文档: - PRD 产品需求文档 - README 项目说明 - 开发指南 - Windows 构建指南 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
commit
7c51542918
17
.gitignore
vendored
Normal file
17
.gitignore
vendored
Normal file
@ -0,0 +1,17 @@
|
||||
node_modules/
|
||||
dist/
|
||||
release/
|
||||
*.log
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
.idea/
|
||||
.vscode/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
.env
|
||||
.env.local
|
||||
models/*.onnx
|
||||
models/*.ort
|
||||
test/recordings/
|
||||
coverage/
|
||||
232
PRD.md
Normal file
232
PRD.md
Normal file
@ -0,0 +1,232 @@
|
||||
# Impress ASR Input - 产品需求文档 (PRD)
|
||||
|
||||
## 1. 文档信息
|
||||
|
||||
| 项目 | 内容 |
|
||||
|------|------|
|
||||
| 产品名称 | Impress ASR Input |
|
||||
| 版本 | v1.0.0 |
|
||||
| 创建日期 | 2026-05-15 |
|
||||
| 技术栈 | Node.js + ONNX Runtime |
|
||||
|
||||
---
|
||||
|
||||
## 2. 产品概述
|
||||
|
||||
### 2.1 产品定位
|
||||
|
||||
Impress ASR Input 是一款基于 Node.js 开发的桌面端语音识别输入工具,利用 ONNX 深度学习推理引擎实现高精度的多语言语音转文本功能。
|
||||
|
||||
### 2.2 核心价值
|
||||
|
||||
- **本地运行**:无需联网,保护隐私,无 API 调用成本
|
||||
- **多语言支持**:支持中文、英文、日语、韩语等多种语言
|
||||
- **低延迟**:准实时识别,1 秒内完成短句识别
|
||||
- **跨平台**:支持 Windows、macOS、Linux 三大主流操作系统
|
||||
|
||||
---
|
||||
|
||||
## 3. 功能需求
|
||||
|
||||
### 3.1 核心功能
|
||||
|
||||
#### F1 - 语音采集
|
||||
| 优先级 | 描述 |
|
||||
|--------|------|
|
||||
| P0 | 支持系统默认麦克风音频采集 |
|
||||
| P0 | 支持选择不同音频输入设备 |
|
||||
| P1 | 支持音频参数配置(采样率、声道数) |
|
||||
| P2 | 支持 USB 蓝牙耳机等外接设备 |
|
||||
|
||||
#### F2 - 语音识别引擎
|
||||
| 优先级 | 描述 |
|
||||
|--------|------|
|
||||
| P0 | 基于 ONNX Runtime 的本地推理 |
|
||||
| P0 | 支持中文普通话识别 |
|
||||
| P0 | 支持英文识别 |
|
||||
| P1 | 支持中日英混合识别 |
|
||||
| P1 | 支持语音端点检测(VAD) |
|
||||
| P2 | 支持更多语种(日语、韩语等) |
|
||||
|
||||
#### F3 - 文本输出
|
||||
| 优先级 | 描述 |
|
||||
|--------|------|
|
||||
| P0 | 实时显示识别结果 |
|
||||
| P0 | 支持文本复制到剪贴板 |
|
||||
| P1 | 支持模拟键盘输入(全局热键触发) |
|
||||
| P1 | 支持识别结果历史查看 |
|
||||
| P2 | 支持导出为文本文件 |
|
||||
|
||||
#### F4 - 批量转写
|
||||
| 优先级 | 描述 |
|
||||
|--------|------|
|
||||
| P0 | 支持 WAV/MP3/FLAC格式音频文件导入 |
|
||||
| P0 | 支持批量文件队列处理 |
|
||||
| P1 | 支持输出 SRT/VTT字幕格式 |
|
||||
| P1 | 支持说话人分离(多声道场景) |
|
||||
| P2 | 支持进度显示和断点续转 |
|
||||
|
||||
### 3.2 辅助功能
|
||||
|
||||
#### F5 - 用户界面
|
||||
| 优先级 | 描述 |
|
||||
|--------|------|
|
||||
| P0 | 系统托盘图标常驻 |
|
||||
| P0 | 简洁的控制面板(开始/停止/配置) |
|
||||
| P1 | 识别实时波形可视化 |
|
||||
| P1 | 深色/浅色主题切换 |
|
||||
| P2 | 多语言界面(中/英) |
|
||||
|
||||
#### F6 - 配置管理
|
||||
| 优先级 | 描述 |
|
||||
|--------|------|
|
||||
| P0 | 模型文件路径配置 |
|
||||
| P0 | 热键配置(开始/停止录音) |
|
||||
| P1 | 识别语言选择 |
|
||||
| P1 | 输出格式配置 |
|
||||
| P2 | 配置文件导入/导出 |
|
||||
|
||||
---
|
||||
|
||||
## 4. 技术架构
|
||||
|
||||
### 4.1 整体架构
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 用户界面层 (UI Layer) │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
|
||||
│ │ 系统托盘 │ │ 控制面板 │ │ 识别结果展示窗口 │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 业务逻辑层 (Business Layer) │
|
||||
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
|
||||
│ │ 音频采集 │ │ 识别引擎 │ │ 文本输出/模拟输入 │ │
|
||||
│ │ 模块 │ │ 模块 │ │ 模块 │ │
|
||||
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ 核心引擎层 (Core Layer) │
|
||||
│ ┌─────────────────────────────────────────────────────────┐│
|
||||
│ │ ONNX Runtime 推理引擎 ││
|
||||
│ │ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ ││
|
||||
│ │ │ 音频预处理 │ │ 声学模型 │ │ 语言模型/解码器 │ ││
|
||||
│ │ └───────────┘ └───────────┘ └───────────────────┘ ││
|
||||
│ └─────────────────────────────────────────────────────────┘│
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
### 4.2 技术选型
|
||||
|
||||
| 模块 | 技术方案 | 说明 |
|
||||
|------|----------|------|
|
||||
| 运行时 | Node.js 20+ | LTS 版本,支持最新 ES 特性 |
|
||||
| UI 框架 | Electron | 跨平台桌面应用 |
|
||||
| ONNX 推理 | onnxruntime-node | 官方 Node.js 绑定 |
|
||||
| 音频采集 | node-audio | 跨平台音频 API |
|
||||
| 键盘模拟 | robotjs / @nut-tree/nut-js | 全局热键和文本输入 |
|
||||
| 构建工具 | Electron-Builder | 打包分发 |
|
||||
|
||||
### 4.3 模型选型
|
||||
|
||||
| 模型 | 来源 | 说明 |
|
||||
|------|------|------|
|
||||
| SenseVoice | Alibaba DAMO | 多语言识别,高精度 |
|
||||
| Whisper | OpenAI | 开源多语言模型 |
|
||||
| Paraformer | Alibaba DAMO | 中文优化模型 |
|
||||
|
||||
**推荐方案**:优先采用 SenseVoice 或 Whisper 的 ONNX 量化版本(int8),平衡精度与性能。
|
||||
|
||||
---
|
||||
|
||||
## 5. 非功能需求
|
||||
|
||||
### 5.1 性能要求
|
||||
|
||||
| 指标 | 目标值 | 说明 |
|
||||
|------|--------|------|
|
||||
| 首字延迟 | < 500ms | 开始说话到第一个字出现的时间 |
|
||||
| 短句识别 | < 1s | 5 秒以内音频的完整识别时间 |
|
||||
| CPU 占用 | < 30% | 待机状态,单核占用 |
|
||||
| 内存占用 | < 500MB | 模型加载后基础内存 |
|
||||
| 模型大小 | < 300MB | 单语言模型,量化后 |
|
||||
|
||||
### 5.2 兼容性要求
|
||||
|
||||
| 平台 | 最低版本 | 说明 |
|
||||
|------|----------|------|
|
||||
| Windows | Windows 10 | x64 架构 |
|
||||
| macOS | macOS 11+ | Intel / Apple Silicon |
|
||||
| Linux | Ubuntu 20.04+ | glibc 2.31+ |
|
||||
|
||||
### 5.3 安全要求
|
||||
|
||||
- 所有音频数据本地处理,不上传云端
|
||||
- 不收集用户语音样本
|
||||
- 配置文件不含敏感信息
|
||||
|
||||
---
|
||||
|
||||
## 6. 项目里程碑
|
||||
|
||||
### Phase 1 - MVP(v0.1.0)
|
||||
- [ ] 项目基础框架搭建
|
||||
- [ ] ONNX Runtime 集成
|
||||
- [ ] 单语言(中文)识别 demo
|
||||
- [ ] 基础命令行界面
|
||||
|
||||
### Phase 2 - 核心功能(v0.5.0)
|
||||
- [ ] 多语言支持(中英)
|
||||
- [ ] Electron GUI 界面
|
||||
- [ ] 实时识别功能
|
||||
- [ ] 剪贴板输出
|
||||
|
||||
### Phase 3 - 完善功能(v1.0.0)
|
||||
- [ ] 键盘模拟输入
|
||||
- [ ] 批量文件转写
|
||||
- [ ] 配置管理界面
|
||||
- [ ] 安装包打包分发
|
||||
|
||||
### Phase 4 - 增强功能(v1.5.0+)
|
||||
- [ ] 更多语种支持
|
||||
- [ ] 说话人分离
|
||||
- [ ] 自定义热词
|
||||
- [ ] 插件系统
|
||||
|
||||
---
|
||||
|
||||
## 7. 风险评估
|
||||
|
||||
| 风险 | 概率 | 影响 | 应对措施 |
|
||||
|------|------|------|----------|
|
||||
| ONNX 模型性能不足 | 中 | 高 | 准备量化模型,优化推理管线 |
|
||||
| 跨平台音频采集兼容性问题 | 高 | 中 | 备选方案:Web Audio API + Electron |
|
||||
| 模型文件过大 | 中 | 中 | 提供模型下载器,按需下载 |
|
||||
| 键盘模拟被安全软件拦截 | 低 | 高 | 提供白名单引导,备用剪贴板方案 |
|
||||
|
||||
---
|
||||
|
||||
## 8. 附录
|
||||
|
||||
### 8.1 参考资料
|
||||
|
||||
- [ONNX Runtime](https://onnxruntime.ai/)
|
||||
- [SenseVoice Model](https://github.com/FunAudioLLM/SenseVoice)
|
||||
- [Whisper ONNX](https://github.com/guillaumekln/faster-whisper)
|
||||
- [Electron 文档](https://www.electronjs.org/docs)
|
||||
|
||||
### 8.2 竞品分析
|
||||
|
||||
| 产品 | 优势 | 劣势 |
|
||||
|------|------|------|
|
||||
| 讯飞输入法 | 识别精度高 | 需联网,隐私顾虑 |
|
||||
| 谷歌语音输入 | 多语言支持好 | 需 Chrome,依赖云端 |
|
||||
| Whisper Desktop | 本地运行 | 性能开销大,界面简陋 |
|
||||
|
||||
---
|
||||
|
||||
**文档状态**: 初稿
|
||||
**下次更新**: 待技术评审后更新
|
||||
172
README.md
Normal file
172
README.md
Normal file
@ -0,0 +1,172 @@
|
||||
# Impress ASR Input
|
||||
|
||||
基于 ONNX 的本地语音识别输入工具,支持多语言实时识别和音频文件转写。
|
||||
|
||||
## 特性
|
||||
|
||||
- 🎯 **本地运行** - 无需联网,保护隐私,无 API 调用成本
|
||||
- 🌍 **多语言支持** - 中文、英文、日语、韩语等
|
||||
- ⚡ **低延迟** - 准实时识别,1 秒内完成短句识别
|
||||
- 💻 **跨平台** - 支持 Windows、macOS、Linux
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 系统要求
|
||||
|
||||
- Node.js >= 20.0.0
|
||||
- Windows 10+ / macOS 11+ / Ubuntu 20.04+
|
||||
|
||||
### 安装
|
||||
|
||||
```bash
|
||||
# 克隆项目
|
||||
cd impress-asr-input
|
||||
|
||||
# 安装依赖
|
||||
npm install
|
||||
|
||||
# 编译 TypeScript
|
||||
npm run build
|
||||
```
|
||||
|
||||
### 放入模型文件
|
||||
|
||||
将 ONNX 模型放入 `models/` 目录:
|
||||
|
||||
```bash
|
||||
models/
|
||||
├── sensevoice.onnx # 推荐:阿里达摩院多语言模型
|
||||
├── whisper.onnx # OpenAI 多语言模型
|
||||
└── paraformer.onnx # 阿里达摩院中文模型
|
||||
```
|
||||
|
||||
模型下载地址:
|
||||
- SenseVoice: https://huggingface.co/FunAudioLLM/SenseVoice
|
||||
- Whisper: https://huggingface.co/onnx-community/whisper-base
|
||||
- Paraformer: https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct
|
||||
|
||||
### 使用方法
|
||||
|
||||
```bash
|
||||
# 命令行模式 - 开始语音识别
|
||||
npm start -- start
|
||||
|
||||
# 指定语言
|
||||
npm start -- start -l en
|
||||
|
||||
# 转写音频文件
|
||||
npm start -- transcribe input.wav -o output.txt
|
||||
|
||||
# Electron GUI 模式(需要先安装 electron)
|
||||
npm install electron --save-dev
|
||||
npm run dev:electron
|
||||
```
|
||||
|
||||
## Windows 构建
|
||||
|
||||
### 方法一:在 Windows 上构建(推荐)
|
||||
|
||||
```powershell
|
||||
npm run build:win # 构建安装包
|
||||
npm run build:win:zip # 构建 ZIP 包
|
||||
```
|
||||
|
||||
### 方法二:在 Linux 上构建
|
||||
|
||||
```bash
|
||||
export ELECTRON_MIRROR="https://npmmirror.com/mirrors/electron/"
|
||||
npm run build:win:dir # 构建解压版本
|
||||
```
|
||||
|
||||
详见:[docs/BUILD_WINDOWS.md](docs/BUILD_WINDOWS.md)
|
||||
|
||||
## 项目结构
|
||||
|
||||
```
|
||||
impress-asr-input/
|
||||
├── src/
|
||||
│ ├── core/
|
||||
│ │ ├── audio-processor.ts # 音频处理(VAD、重采样)
|
||||
│ │ ├── audio-recorder.ts # 音频采集模块
|
||||
│ │ ├── index.ts # 模块导出
|
||||
│ │ ├── model-loader.ts # 模型加载器
|
||||
│ │ ├── speech-recognizer.ts # ONNX 语音识别引擎
|
||||
│ │ └── text-output.ts # 文本输出模块
|
||||
│ ├── ui/
|
||||
│ │ └── index.html # Electron UI 界面
|
||||
│ ├── electron-main.ts # Electron 主进程
|
||||
│ ├── main.ts # CLI 命令行入口
|
||||
│ ├── preload.ts # Electron 预加载脚本
|
||||
│ └── utils/
|
||||
│ └── config.ts # 配置管理
|
||||
├── models/ # ONNX 模型文件目录
|
||||
│ ├── README.md # 模型说明
|
||||
│ └── models.config.json # 模型配置
|
||||
├── docs/
|
||||
│ ├── BUILD_WINDOWS.md # Windows 构建指南
|
||||
│ └── DEVELOPMENT.md # 开发指南
|
||||
├── scripts/
|
||||
│ ├── postinstall.js # 安装后脚本
|
||||
│ └── prepare-build.js # 打包准备脚本
|
||||
├── dist/ # TypeScript 编译输出
|
||||
├── release/ # Windows 打包输出
|
||||
├── test/
|
||||
│ └── audio-processor.test.ts # 单元测试
|
||||
├── package.json
|
||||
├── tsconfig.json
|
||||
└── PRD.md # 产品需求文档
|
||||
```
|
||||
|
||||
## 开发计划
|
||||
|
||||
| 版本 | 状态 | 内容 |
|
||||
|------|------|------|
|
||||
| v0.1.0 | ✅ 完成 | 基础框架、单语言识别 demo |
|
||||
| v0.5.0 | 🔄 进行中 | 多语言支持、Electron GUI |
|
||||
| v1.0.0 | ⏳ 待开发 | 键盘模拟、批量转写、打包分发 |
|
||||
|
||||
## 命令行选项
|
||||
|
||||
```bash
|
||||
# 启动语音识别
|
||||
npm start -- start [选项]
|
||||
|
||||
选项:
|
||||
-l, --language <lang> 识别语言 (默认:zh)
|
||||
-m, --model <path> 模型文件路径
|
||||
-o, --output <mode> 输出模式:clipboard|keyboard|both (默认:clipboard)
|
||||
|
||||
# 转写音频文件
|
||||
npm start -- transcribe <文件> [选项]
|
||||
|
||||
选项:
|
||||
-l, --language <lang> 识别语言 (默认:zh)
|
||||
-m, --model <path> 模型文件路径
|
||||
-o, --output <file> 输出文件路径
|
||||
```
|
||||
|
||||
## 技术栈
|
||||
|
||||
| 模块 | 技术 |
|
||||
|------|------|
|
||||
| 运行时 | Node.js 20+ |
|
||||
| UI 框架 | Electron |
|
||||
| ONNX 推理 | onnxruntime-web |
|
||||
| 剪贴板 | clipboardy |
|
||||
| 命令行 | commander |
|
||||
| 构建工具 | electron-builder |
|
||||
|
||||
## 许可证
|
||||
|
||||
MIT
|
||||
|
||||
## 贡献
|
||||
|
||||
欢迎提交 Issue 和 Pull Request!
|
||||
|
||||
## 相关资源
|
||||
|
||||
- [ONNX Runtime](https://onnxruntime.ai/)
|
||||
- [SenseVoice Model](https://github.com/FunAudioLLM/SenseVoice)
|
||||
- [Whisper ONNX](https://github.com/guillaumekln/faster-whisper)
|
||||
- [Electron 文档](https://www.electronjs.org/docs)
|
||||
17
build/BUILD_README.txt
Normal file
17
build/BUILD_README.txt
Normal file
@ -0,0 +1,17 @@
|
||||
Impress ASR Input - Windows 打包说明
|
||||
=====================================
|
||||
|
||||
构建命令:
|
||||
npm run build:win - 创建 NSIS 安装程序和 ZIP 包
|
||||
npm run build:win:dir - 仅创建未打包的文件目录
|
||||
|
||||
输出位置:
|
||||
release/Impress ASR Input-0.1.0-win-x64-setup.exe (安装程序)
|
||||
release/Impress ASR Input-0.1.0-win-x64.zip (压缩包)
|
||||
|
||||
模型文件:
|
||||
请将下载的 ONNX 模型放入 models/ 目录
|
||||
支持的模型:sensevoice.onnx, whisper.onnx, paraformer.onnx
|
||||
|
||||
图标文件:
|
||||
请将 icon.ico (256x256) 放入 build/ 目录
|
||||
122
docs/BUILD_WINDOWS.md
Normal file
122
docs/BUILD_WINDOWS.md
Normal file
@ -0,0 +1,122 @@
|
||||
# Windows 构建指南
|
||||
|
||||
## 构建说明
|
||||
|
||||
由于网络问题,在 Linux 环境下构建 Windows 版本可能需要多次尝试。建议在 Windows 系统上直接构建,或使用以下方法。
|
||||
|
||||
## 方法一:在 Windows 上构建(推荐)
|
||||
|
||||
### 1. 环境准备
|
||||
|
||||
```powerslhell
|
||||
# 安装 Node.js 20+
|
||||
# 从 https://nodejs.org 下载安装
|
||||
|
||||
# 克隆项目
|
||||
git clone <repository-url>
|
||||
cd impress-asr-input
|
||||
|
||||
# 安装依赖
|
||||
npm install
|
||||
```
|
||||
|
||||
### 2. 放入模型文件
|
||||
|
||||
将 ONNX 模型放入 `models/` 目录:
|
||||
- `sensevoice.onnx`
|
||||
- `whisper.onnx`
|
||||
- `paraformer.onnx`
|
||||
|
||||
### 3. 构建
|
||||
|
||||
```powershell
|
||||
# 构建 ZIP 包(无需签名)
|
||||
npm run build:win:zip
|
||||
|
||||
# 构建 NSIS 安装程序
|
||||
npm run build:win
|
||||
```
|
||||
|
||||
输出目录:`release/`
|
||||
|
||||
## 方法二:在 Linux 上构建(需要良好网络)
|
||||
|
||||
```bash
|
||||
# 设置 Electron 镜像
|
||||
export ELECTRON_MIRROR="https://npmmirror.com/mirrors/electron/"
|
||||
|
||||
# 构建解压目录版本(用于测试)
|
||||
npm run build:win:dir
|
||||
|
||||
# 构建 ZIP 包
|
||||
npm run build:win:zip
|
||||
```
|
||||
|
||||
## 构建输出
|
||||
|
||||
```
|
||||
release/
|
||||
├── win-unpacked/ # 未打包版本(用于测试)
|
||||
│ ├── Impress ASR Input.exe
|
||||
│ ├── resources/
|
||||
│ └── ...
|
||||
├── Impress ASR Input-0.1.0-win-x64.zip # ZIP 压缩包
|
||||
└── Impress ASR Input-0.1.0-win-x64-setup.exe # NSIS 安装程序
|
||||
```
|
||||
|
||||
## 常见问题
|
||||
|
||||
### 1. Electron 下载失败
|
||||
|
||||
```bash
|
||||
# 使用国内镜像
|
||||
export ELECTRON_MIRROR="https://npmmirror.com/mirrors/electron/"
|
||||
export npm_config_electron_mirror="https://npmmirror.com/mirrors/electron/"
|
||||
```
|
||||
|
||||
### 2. winCodeSign 下载失败
|
||||
|
||||
这是 electron-builder 依赖的签名工具,可以:
|
||||
- 添加 `"sign": null` 到 `package.json` 的 `build` 字段禁用签名
|
||||
- 或手动下载:https://github.com/electron-userland/electron-builder-binaries/releases
|
||||
|
||||
### 3. 图标文件缺失
|
||||
|
||||
构建时会使用默认图标,如需自定义:
|
||||
1. 准备 `icon.ico` (256x256)
|
||||
2. 放入 `build/` 目录
|
||||
3. 在 `package.json` 中添加 `"icon": "build/icon.ico"`
|
||||
|
||||
## 手动分发(无打包工具)
|
||||
|
||||
如果 electron-builder 无法使用,可以手动创建分发包:
|
||||
|
||||
```bash
|
||||
# 1. 编译 TypeScript
|
||||
npm run build
|
||||
|
||||
# 2. 复制 Electron 和资源文件
|
||||
cp -r dist/ release/my-app/
|
||||
cp -r node_modules/ release/my-app/node_modules/
|
||||
cp -r src/ui/ release/my-app/src/ui/
|
||||
cp -r models/ release/my-app/models/
|
||||
|
||||
# 3. 下载 Electron 并放入
|
||||
# https://npmmirror.com/mirrors/electron/
|
||||
|
||||
# 4. 压缩
|
||||
cd release/
|
||||
zip -r impress-asr-input-win-x64.zip my-app/
|
||||
```
|
||||
|
||||
## 运行应用
|
||||
|
||||
解压后运行:
|
||||
```
|
||||
Impress ASR Input.exe
|
||||
```
|
||||
|
||||
或命令行模式:
|
||||
```
|
||||
node dist/main.js start
|
||||
```
|
||||
225
docs/DEVELOPMENT.md
Normal file
225
docs/DEVELOPMENT.md
Normal file
@ -0,0 +1,225 @@
|
||||
# 开发指南
|
||||
|
||||
## 项目结构
|
||||
|
||||
```
|
||||
impress-asr-input/
|
||||
├── src/
|
||||
│ ├── core/ # 核心模块
|
||||
│ │ ├── audio-recorder.ts # 音频采集
|
||||
│ │ ├── audio-processor.ts # 音频处理(VAD、重采样等)
|
||||
│ │ ├── speech-recognizer.ts # ONNX 语音识别引擎
|
||||
│ │ ├── text-output.ts # 文本输出
|
||||
│ │ └── index.ts # 模块导出
|
||||
│ ├── ui/ # Electron UI
|
||||
│ │ └── index.html # 主界面
|
||||
│ ├── electron-main.ts # Electron 主进程
|
||||
│ ├── preload.ts # Electron 预加载脚本
|
||||
│ ├── main.ts # CLI 入口
|
||||
│ └── utils/
|
||||
│ └── config.ts # 配置管理
|
||||
├── models/ # ONNX 模型文件(需自行下载)
|
||||
├── scripts/
|
||||
│ └── postinstall.js # 安装后脚本
|
||||
├── test/
|
||||
│ └── audio-processor.test.ts # 单元测试
|
||||
├── package.json
|
||||
├── tsconfig.json
|
||||
└── PRD.md
|
||||
```
|
||||
|
||||
## 开发环境设置
|
||||
|
||||
### 前置要求
|
||||
|
||||
- Node.js >= 20.0.0
|
||||
- npm >= 9.0.0
|
||||
|
||||
### 安装步骤
|
||||
|
||||
```bash
|
||||
# 安装依赖
|
||||
npm install
|
||||
|
||||
# 下载模型文件(见下文)
|
||||
|
||||
# 开发模式运行
|
||||
npm run dev
|
||||
|
||||
# 开发模式运行 Electron
|
||||
npm run dev:electron
|
||||
```
|
||||
|
||||
## 模型下载
|
||||
|
||||
### 推荐模型
|
||||
|
||||
#### 1. SenseVoice(推荐)
|
||||
|
||||
```bash
|
||||
# HuggingFace 下载
|
||||
# https://huggingface.co/FunAudioLLM/SenseVoice/tree/main
|
||||
|
||||
# 或使用 ModelScope
|
||||
# https://www.modelscope.cn/models/iic/SenseVoiceSmall
|
||||
```
|
||||
|
||||
将 `model.onnx` 放入 `models/` 目录。
|
||||
|
||||
#### 2. Whisper ONNX
|
||||
|
||||
```bash
|
||||
# HuggingFace
|
||||
# https://huggingface.co/onnx-community/whisper-base
|
||||
|
||||
# 直接下载
|
||||
huggingface-cli download onnx-community/whisper-base --local-dir models/
|
||||
```
|
||||
|
||||
#### 3. Paraformer(中文优化)
|
||||
|
||||
```bash
|
||||
# ModelScope
|
||||
# https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct
|
||||
```
|
||||
|
||||
### 模型配置
|
||||
|
||||
在 `src/main.ts` 或设置界面中指定模型路径:
|
||||
|
||||
```typescript
|
||||
const recognizer = new SpeechRecognizer({
|
||||
modelPath: './models/model.onnx',
|
||||
language: 'zh',
|
||||
useVad: true,
|
||||
beamSize: 5,
|
||||
});
|
||||
```
|
||||
|
||||
## 开发命令
|
||||
|
||||
```bash
|
||||
# 编译 TypeScript
|
||||
npm run build
|
||||
|
||||
# 运行 CLI
|
||||
npm start -- start
|
||||
|
||||
# 运行测试
|
||||
npm test
|
||||
|
||||
# 代码检查
|
||||
npm run lint
|
||||
|
||||
# 构建 Electron 应用
|
||||
npm run build:electron
|
||||
```
|
||||
|
||||
## 核心模块说明
|
||||
|
||||
### AudioRecorder(音频采集)
|
||||
|
||||
负责从麦克风采集音频数据。
|
||||
|
||||
```typescript
|
||||
const recorder = new AudioRecorder({
|
||||
sampleRate: 16000,
|
||||
channels: 1,
|
||||
chunkDuration: 100,
|
||||
});
|
||||
|
||||
recorder.on('data', (chunk: AudioChunk) => {
|
||||
// 处理音频数据
|
||||
});
|
||||
|
||||
await recorder.start();
|
||||
```
|
||||
|
||||
**注意**: 当前实现基于 Web Audio API,在纯 Node.js 环境中需要使用其他方案(如 `node-audio` 或 Electron 的音频 API)。
|
||||
|
||||
### SpeechRecognizer(语音识别)
|
||||
|
||||
基于 ONNX Runtime 的语音识别引擎。
|
||||
|
||||
```typescript
|
||||
const recognizer = new SpeechRecognizer({
|
||||
modelPath: './models/model.onnx',
|
||||
language: 'zh',
|
||||
useVad: true,
|
||||
});
|
||||
|
||||
recognizer.on('result', (result: RecognitionResult) => {
|
||||
console.log(result.text);
|
||||
});
|
||||
|
||||
await recognizer.initialize();
|
||||
recognizer.start();
|
||||
```
|
||||
|
||||
### TextOutput(文本输出)
|
||||
|
||||
将识别结果输出到剪贴板。
|
||||
|
||||
```typescript
|
||||
const output = new TextOutput({
|
||||
outputMode: 'clipboard',
|
||||
});
|
||||
|
||||
output.output({ text: '你好', isFinal: true, confidence: 0.95, timestamp: Date.now() });
|
||||
```
|
||||
|
||||
### SimpleVAD(语音端点检测)
|
||||
|
||||
简单的能量检测 VAD 实现。
|
||||
|
||||
```typescript
|
||||
const vad = new SimpleVAD({
|
||||
energyThreshold: 0.01,
|
||||
silenceDuration: 500,
|
||||
});
|
||||
|
||||
const { isSpeaking, isFinal } = vad.process(audioFrame, 16000);
|
||||
```
|
||||
|
||||
## 添加新模型支持
|
||||
|
||||
1. 在 `models/` 目录创建模型配置文件:
|
||||
|
||||
```typescript
|
||||
// src/core/models/sensevoice.ts
|
||||
export const senseVoiceConfig = {
|
||||
inputShape: [1, 16000],
|
||||
outputKeys: ['output', 'logits'],
|
||||
// ...
|
||||
};
|
||||
```
|
||||
|
||||
2. 在 `SpeechRecognizer` 中添加模型适配逻辑。
|
||||
|
||||
## 常见问题
|
||||
|
||||
### Q: 如何调试音频采集?
|
||||
|
||||
```typescript
|
||||
recorder.on('data', (chunk) => {
|
||||
console.log('音频数据:', chunk.data.length, '采样率:', chunk.sampleRate);
|
||||
});
|
||||
```
|
||||
|
||||
### Q: 识别延迟高?
|
||||
|
||||
1. 使用量化模型(int8)
|
||||
2. 减少 `chunkDuration`
|
||||
3. 启用 `useVad` 减少无效识别
|
||||
|
||||
### Q: Electron 打包失败?
|
||||
|
||||
检查 `package.json` 中的 `build` 配置,确保模型文件被包含。
|
||||
|
||||
## 贡献指南
|
||||
|
||||
1. Fork 项目
|
||||
2. 创建特性分支 (`git checkout -b feature/AmazingFeature`)
|
||||
3. 提交更改 (`git commit -m 'Add some AmazingFeature'`)
|
||||
4. 推送到分支 (`git push origin feature/AmazingFeature`)
|
||||
5. 开启 Pull Request
|
||||
59
models/README.md
Normal file
59
models/README.md
Normal file
@ -0,0 +1,59 @@
|
||||
# 模型文件说明
|
||||
|
||||
## 支持的模型
|
||||
|
||||
本项目支持以下 ONNX 语音识别模型:
|
||||
|
||||
### 1. SenseVoice(推荐)
|
||||
|
||||
- **来源**: 阿里达摩院 FunAudioLLM
|
||||
- **支持语言**: 中文、英文、日语、韩语
|
||||
- **采样率**: 16000 Hz
|
||||
- **特点**: 高精度、低延迟、支持多语言混合识别
|
||||
|
||||
**下载地址**:
|
||||
- HuggingFace: https://huggingface.co/FunAudioLLM/SenseVoice
|
||||
- ModelScope: https://www.modelscope.cn/models/iic/SenseVoiceSmall
|
||||
|
||||
### 2. Whisper ONNX
|
||||
|
||||
- **来源**: OpenAI
|
||||
- **支持语言**: 90+ 种语言
|
||||
- **采样率**: 16000 Hz
|
||||
- **特点**: 多语言支持最好,准确度高
|
||||
|
||||
**下载地址**:
|
||||
- HuggingFace: https://huggingface.co/onnx-community/whisper-base
|
||||
|
||||
### 3. Paraformer
|
||||
|
||||
- **来源**: 阿里达摩院
|
||||
- **支持语言**: 中文
|
||||
- **采样率**: 16000 Hz
|
||||
- **特点**: 中文识别优化,速度快
|
||||
|
||||
**下载地址**:
|
||||
- ModelScope: https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct
|
||||
|
||||
## 安装模型
|
||||
|
||||
1. 从上述地址下载 ONNX 模型文件
|
||||
2. 将模型文件放入 `models/` 目录
|
||||
3. 模型文件命名:
|
||||
- SenseVoice: `sensevoice.onnx`
|
||||
- Whisper: `whisper.onnx`
|
||||
- Paraformer: `paraformer.onnx`
|
||||
|
||||
## 模型优先级
|
||||
|
||||
当有多个模型文件时,系统按以下优先级加载:
|
||||
|
||||
1. sensevoice.onnx(最高优先级)
|
||||
2. whisper.onnx
|
||||
3. paraformer.onnx(最低优先级)
|
||||
|
||||
## 注意事项
|
||||
|
||||
- 模型文件较大(50MB - 300MB),建议单独下载
|
||||
- 模型文件不会被包含在 Git 仓库中
|
||||
- 首次运行时需要确保模型文件已就位
|
||||
30
models/models.config.json
Normal file
30
models/models.config.json
Normal file
@ -0,0 +1,30 @@
|
||||
{
|
||||
"models": [
|
||||
{
|
||||
"name": "SenseVoice",
|
||||
"file": "sensevoice.onnx",
|
||||
"languages": ["zh", "en", "ja", "ko"],
|
||||
"sampleRate": 16000,
|
||||
"description": "阿里达摩院多语言语音识别模型(推荐)",
|
||||
"downloadUrl": "https://huggingface.co/FunAudioLLM/SenseVoice"
|
||||
},
|
||||
{
|
||||
"name": "Whisper",
|
||||
"file": "whisper.onnx",
|
||||
"languages": ["zh", "en", "ja", "ko", "de", "fr", "es"],
|
||||
"sampleRate": 16000,
|
||||
"description": "OpenAI 多语言语音识别模型",
|
||||
"downloadUrl": "https://huggingface.co/onnx-community/whisper-base"
|
||||
},
|
||||
{
|
||||
"name": "Paraformer",
|
||||
"file": "paraformer.onnx",
|
||||
"languages": ["zh"],
|
||||
"sampleRate": 16000,
|
||||
"description": "阿里达摩院中文语音识别模型",
|
||||
"downloadUrl": "https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct"
|
||||
}
|
||||
],
|
||||
"defaultModel": "sensevoice.onnx",
|
||||
"modelsDirectory": "./models"
|
||||
}
|
||||
6114
package-lock.json
generated
Normal file
6114
package-lock.json
generated
Normal file
File diff suppressed because it is too large
Load Diff
75
package.json
Normal file
75
package.json
Normal file
@ -0,0 +1,75 @@
|
||||
{
|
||||
"name": "impress-asr-input",
|
||||
"version": "0.1.0",
|
||||
"description": "基于 ONNX 的本地语音识别输入工具,支持多语言实时识别",
|
||||
"main": "dist/main.js",
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
"dev": "tsx watch src/main.ts",
|
||||
"build": "tsc",
|
||||
"start": "node dist/main.js",
|
||||
"dev:electron": "electron .",
|
||||
"build:electron": "tsc && electron-builder",
|
||||
"build:win": "cross-env ELECTRON_MIRROR=https://npmmirror.com/mirrors/electron/ npm run build:electron -- --win --x64",
|
||||
"build:win:zip": "cross-env ELECTRON_MIRROR=https://npmmirror.com/mirrors/electron/ tsc && electron-builder --win zip --x64 --publish=never",
|
||||
"build:win:dir": "cross-env ELECTRON_MIRROR=https://npmmirror.com/mirrors/electron/ tsc && electron-builder --win --x64 --dir --publish=never",
|
||||
"test": "vitest run",
|
||||
"lint": "eslint src --ext .ts"
|
||||
},
|
||||
"keywords": [
|
||||
"asr",
|
||||
"speech-to-text",
|
||||
"onnx",
|
||||
"voice-input",
|
||||
"electron"
|
||||
],
|
||||
"author": "",
|
||||
"license": "MIT",
|
||||
"devDependencies": {
|
||||
"@types/node": "^20.11.0",
|
||||
"cross-env": "^7.0.3",
|
||||
"electron": "^28.0.0",
|
||||
"electron-builder": "^24.9.0",
|
||||
"typescript": "^5.3.0",
|
||||
"vitest": "^1.2.0"
|
||||
},
|
||||
"dependencies": {
|
||||
"clipboardy": "^4.0.0",
|
||||
"commander": "^12.0.0",
|
||||
"onnxruntime-web": "^1.17.0"
|
||||
},
|
||||
"build": {
|
||||
"appId": "com.impress.asr-input",
|
||||
"productName": "Impress ASR Input",
|
||||
"buildVersion": "1.0.0",
|
||||
"publish": null,
|
||||
"directories": {
|
||||
"output": "release",
|
||||
"buildResources": "build"
|
||||
},
|
||||
"files": [
|
||||
"dist/**/*",
|
||||
"src/ui/**/*",
|
||||
"models/**/*"
|
||||
],
|
||||
"extraResources": [
|
||||
"models/*.onnx"
|
||||
],
|
||||
"win": {
|
||||
"target": "zip",
|
||||
"requestedExecutionLevel": "asInvoker",
|
||||
"artifactName": "${productName}-${version}-win-${arch}.${ext}"
|
||||
},
|
||||
"mac": {
|
||||
"target": ["dmg", "zip"],
|
||||
"category": "public.app-category.utilities"
|
||||
},
|
||||
"linux": {
|
||||
"target": ["AppImage", "deb"],
|
||||
"category": "Utility"
|
||||
}
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=20.0.0"
|
||||
}
|
||||
}
|
||||
17
scripts/postinstall.js
Normal file
17
scripts/postinstall.js
Normal file
@ -0,0 +1,17 @@
|
||||
/**
|
||||
* 后安装脚本
|
||||
* 用于提示用户下载模型文件
|
||||
*/
|
||||
|
||||
import { writeFileSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
|
||||
const modelsDir = join(process.cwd(), 'models');
|
||||
|
||||
console.log('\n=== Impress ASR Input 安装完成 ===\n');
|
||||
console.log('模型文件需要单独下载,支持的模型:');
|
||||
console.log(' - SenseVoice: https://github.com/FunAudioLLM/SenseVoice');
|
||||
console.log(' - Whisper ONNX: https://huggingface.co/onnx-community/whisper-base');
|
||||
console.log(' - Paraformer: https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct');
|
||||
console.log('\n请将下载的 .onnx 模型文件放置于:', modelsDir);
|
||||
console.log('\n');
|
||||
77
scripts/prepare-build.js
Normal file
77
scripts/prepare-build.js
Normal file
@ -0,0 +1,77 @@
|
||||
/**
|
||||
* 构建脚本:准备 Windows 打包
|
||||
*/
|
||||
|
||||
import { mkdirSync, existsSync, writeFileSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
|
||||
const rootDir = process.cwd();
|
||||
const buildDir = join(rootDir, 'build');
|
||||
const modelsDir = join(rootDir, 'models');
|
||||
|
||||
console.log('🔧 准备 Windows 打包...\n');
|
||||
|
||||
// 创建 build 目录
|
||||
if (!existsSync(buildDir)) {
|
||||
mkdirSync(buildDir, { recursive: true });
|
||||
console.log('✅ 创建 build 目录');
|
||||
}
|
||||
|
||||
// 创建占位图标文件(实际使用时应替换为真实图标)
|
||||
const icoPath = join(buildDir, 'icon.ico');
|
||||
if (!existsSync(icoPath)) {
|
||||
// 创建一个简单的占位文件
|
||||
writeFileSync(icoPath, '');
|
||||
console.log('⚠️ 图标文件不存在,已创建占位文件');
|
||||
console.log(' 请替换为真实的 icon.ico 文件 (256x256 推荐)');
|
||||
}
|
||||
|
||||
// 检查模型文件
|
||||
const modelFiles = ['sensevoice.onnx', 'whisper.onnx', 'paraformer.onnx'];
|
||||
const foundModels = [];
|
||||
const missingModels = [];
|
||||
|
||||
for (const model of modelFiles) {
|
||||
const modelPath = join(modelsDir, model);
|
||||
if (existsSync(modelPath)) {
|
||||
foundModels.push(model);
|
||||
} else {
|
||||
missingModels.push(model);
|
||||
}
|
||||
}
|
||||
|
||||
console.log('📦 模型文件检查:');
|
||||
if (foundModels.length > 0) {
|
||||
console.log(` ✅ 找到:${foundModels.join(', ')}`);
|
||||
}
|
||||
if (missingModels.length > 0) {
|
||||
console.log(` ⚠️ 缺失:${missingModels.join(', ')}`);
|
||||
console.log(' 模型文件将被打包,但需要用户自行下载');
|
||||
}
|
||||
|
||||
// 创建 Windows 安装包说明
|
||||
const readmePath = join(buildDir, 'BUILD_README.txt');
|
||||
writeFileSync(readmePath, `Impress ASR Input - Windows 打包说明
|
||||
=====================================
|
||||
|
||||
构建命令:
|
||||
npm run build:win - 创建 NSIS 安装程序和 ZIP 包
|
||||
npm run build:win:dir - 仅创建未打包的文件目录
|
||||
|
||||
输出位置:
|
||||
release/Impress ASR Input-0.1.0-win-x64-setup.exe (安装程序)
|
||||
release/Impress ASR Input-0.1.0-win-x64.zip (压缩包)
|
||||
|
||||
模型文件:
|
||||
请将下载的 ONNX 模型放入 models/ 目录
|
||||
支持的模型:sensevoice.onnx, whisper.onnx, paraformer.onnx
|
||||
|
||||
图标文件:
|
||||
请将 icon.ico (256x256) 放入 build/ 目录
|
||||
`);
|
||||
|
||||
console.log('\n✅ 打包准备完成');
|
||||
console.log('\n下一步:');
|
||||
console.log(' 1. 将模型文件放入 models/ 目录 (可选)');
|
||||
console.log(' 2. 将 icon.ico 放入 build/ 目录 (可选)');
|
||||
console.log(' 3. 运行:npm run build:win');
|
||||
154
src/core/audio-processor.ts
Normal file
154
src/core/audio-processor.ts
Normal file
@ -0,0 +1,154 @@
|
||||
/**
|
||||
* 音频处理器
|
||||
* 对音频数据进行预处理,包括重采样、归一化等
|
||||
*/
|
||||
|
||||
/**
|
||||
* 音频数据归一化
|
||||
* 将 Int16Array 或 Float32Array 归一化到 [-1, 1]
|
||||
*/
|
||||
export function normalizeAudio(
|
||||
data: Int16Array | Float32Array,
|
||||
inputSampleRate: number,
|
||||
outputSampleRate: number = 16000
|
||||
): Float32Array {
|
||||
// 转换为 Float32Array 并归一化
|
||||
let normalized: Float32Array;
|
||||
|
||||
if (data instanceof Int16Array) {
|
||||
normalized = new Float32Array(data.length);
|
||||
for (let i = 0; i < data.length; i++) {
|
||||
normalized[i] = data[i] / 32768.0;
|
||||
}
|
||||
} else {
|
||||
normalized = data;
|
||||
}
|
||||
|
||||
// 重采样
|
||||
if (inputSampleRate !== outputSampleRate) {
|
||||
return resample(normalized, inputSampleRate, outputSampleRate);
|
||||
}
|
||||
|
||||
return normalized;
|
||||
}
|
||||
|
||||
/**
|
||||
* 重采样
|
||||
* 使用线性插值进行重采样
|
||||
*/
|
||||
export function resample(
|
||||
data: Float32Array,
|
||||
fromSampleRate: number,
|
||||
toSampleRate: number
|
||||
): Float32Array {
|
||||
if (fromSampleRate === toSampleRate) {
|
||||
return data;
|
||||
}
|
||||
|
||||
const ratio = fromSampleRate / toSampleRate;
|
||||
const newLength = Math.floor(data.length / ratio);
|
||||
const result = new Float32Array(newLength);
|
||||
|
||||
for (let i = 0; i < newLength; i++) {
|
||||
const position = i * ratio;
|
||||
const index = Math.floor(position);
|
||||
const fraction = position - index;
|
||||
|
||||
if (index + 1 < data.length) {
|
||||
result[i] = data[index] * (1 - fraction) + data[index + 1] * fraction;
|
||||
} else {
|
||||
result[i] = data[index];
|
||||
}
|
||||
}
|
||||
|
||||
return result;
|
||||
}
|
||||
|
||||
/**
|
||||
* 语音端点检测 (VAD)
|
||||
* 简单的能量检测实现
|
||||
*/
|
||||
export class SimpleVAD {
|
||||
private energyThreshold: number;
|
||||
private silenceDuration: number;
|
||||
private silenceStart: number | null = null;
|
||||
private isSpeaking: boolean = false;
|
||||
|
||||
constructor(options: { energyThreshold?: number; silenceDuration?: number } = {}) {
|
||||
this.energyThreshold = options.energyThreshold ?? 0.01;
|
||||
this.silenceDuration = options.silenceDuration ?? 500; // ms
|
||||
}
|
||||
|
||||
/**
|
||||
* 处理音频帧,返回是否检测到语音
|
||||
*/
|
||||
process(frame: Float32Array, _sampleRate: number): { isSpeaking: boolean; isFinal: boolean } {
|
||||
// 计算能量
|
||||
const energy = this.calculateEnergy(frame);
|
||||
|
||||
if (energy > this.energyThreshold) {
|
||||
this.isSpeaking = true;
|
||||
this.silenceStart = null;
|
||||
return { isSpeaking: true, isFinal: false };
|
||||
} else if (this.isSpeaking) {
|
||||
// 检测静音
|
||||
if (this.silenceStart === null) {
|
||||
this.silenceStart = Date.now();
|
||||
}
|
||||
|
||||
const silenceElapsed = Date.now() - this.silenceStart;
|
||||
|
||||
if (silenceElapsed >= this.silenceDuration) {
|
||||
// 语音结束
|
||||
this.isSpeaking = false;
|
||||
this.silenceStart = null;
|
||||
return { isSpeaking: false, isFinal: true };
|
||||
}
|
||||
|
||||
return { isSpeaking: true, isFinal: false };
|
||||
}
|
||||
|
||||
return { isSpeaking: false, isFinal: false };
|
||||
}
|
||||
|
||||
/**
|
||||
* 计算音频能量
|
||||
*/
|
||||
private calculateEnergy(frame: Float32Array): number {
|
||||
let sum = 0;
|
||||
for (let i = 0; i < frame.length; i++) {
|
||||
sum += frame[i] * frame[i];
|
||||
}
|
||||
return Math.sqrt(sum / frame.length);
|
||||
}
|
||||
|
||||
/**
|
||||
* 重置状态
|
||||
*/
|
||||
reset(): void {
|
||||
this.isSpeaking = false;
|
||||
this.silenceStart = null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* 分帧处理
|
||||
* 将连续音频分割成固定长度的帧
|
||||
*/
|
||||
export function frameAudio(
|
||||
data: Float32Array,
|
||||
frameSize: number,
|
||||
hopSize: number
|
||||
): Float32Array[] {
|
||||
const frames: Float32Array[] = [];
|
||||
|
||||
for (let i = 0; i <= data.length - frameSize; i += hopSize) {
|
||||
const frame = new Float32Array(frameSize);
|
||||
for (let j = 0; j < frameSize; j++) {
|
||||
frame[j] = data[i + j];
|
||||
}
|
||||
frames.push(frame);
|
||||
}
|
||||
|
||||
return frames;
|
||||
}
|
||||
173
src/core/audio-recorder.ts
Normal file
173
src/core/audio-recorder.ts
Normal file
@ -0,0 +1,173 @@
|
||||
/**
|
||||
* 音频采集模块
|
||||
* 负责从麦克风采集音频数据并进行预处理
|
||||
*
|
||||
* 注意:当前实现为框架演示,实际音频采集需要:
|
||||
* - Electron: 使用 navigator.mediaDevices (在渲染进程中)
|
||||
* - Node.js: 使用 node-audio 或类似库
|
||||
*/
|
||||
|
||||
import { EventEmitter } from 'events';
|
||||
|
||||
export interface AudioConfig {
|
||||
sampleRate: number; // 采样率,默认 16000
|
||||
channels: number; // 声道数,默认 1(单声道)
|
||||
chunkDuration: number; // 分块时长 (ms),默认 100
|
||||
deviceId?: string; // 音频设备 ID(可选)
|
||||
}
|
||||
|
||||
export interface AudioChunk {
|
||||
data: Float32Array; // 音频数据(归一化到 [-1, 1])
|
||||
sampleRate: number;
|
||||
timestamp: number;
|
||||
}
|
||||
|
||||
export class AudioRecorder extends EventEmitter {
|
||||
private config: AudioConfig;
|
||||
private isRecording: boolean = false;
|
||||
private stream: any = null;
|
||||
private audioContext: any = null;
|
||||
private source: any = null;
|
||||
private processor: any = null;
|
||||
|
||||
constructor(config: Partial<AudioConfig> = {}) {
|
||||
super();
|
||||
this.config = {
|
||||
sampleRate: config.sampleRate ?? 16000,
|
||||
channels: config.channels ?? 1,
|
||||
chunkDuration: config.chunkDuration ?? 100,
|
||||
deviceId: config.deviceId,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* 开始录音
|
||||
*/
|
||||
async start(): Promise<void> {
|
||||
if (this.isRecording) {
|
||||
throw new Error('Already recording');
|
||||
}
|
||||
|
||||
// 检查是否在浏览器/Electron 渲染进程中
|
||||
if (typeof window !== 'undefined' && window.navigator?.mediaDevices) {
|
||||
await this.startInBrowser();
|
||||
} else {
|
||||
// Node.js 环境 - 需要外部音频输入
|
||||
this.startInNode();
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* 浏览器环境录音
|
||||
*/
|
||||
private async startInBrowser(): Promise<void> {
|
||||
try {
|
||||
const constraints = {
|
||||
audio: {
|
||||
sampleRate: this.config.sampleRate,
|
||||
channelCount: this.config.channels,
|
||||
deviceId: this.config.deviceId ? { exact: this.config.deviceId } : undefined,
|
||||
},
|
||||
};
|
||||
|
||||
this.stream = await window.navigator.mediaDevices.getUserMedia(constraints);
|
||||
const AudioContextClass = window.AudioContext || (window as any).webkitAudioContext;
|
||||
this.audioContext = new AudioContextClass({ sampleRate: this.config.sampleRate });
|
||||
this.source = this.audioContext.createMediaStreamSource(this.stream);
|
||||
|
||||
const bufferSize = Math.floor(this.config.sampleRate * (this.config.chunkDuration / 1000));
|
||||
this.processor = this.audioContext.createScriptProcessor(bufferSize, 1, 1);
|
||||
|
||||
this.processor.onaudioprocess = (event: any) => {
|
||||
const inputData = event.inputBuffer.getChannelData(0);
|
||||
const chunk: AudioChunk = {
|
||||
data: new Float32Array(inputData),
|
||||
sampleRate: this.config.sampleRate,
|
||||
timestamp: Date.now(),
|
||||
};
|
||||
this.emit('data', chunk);
|
||||
};
|
||||
|
||||
this.source.connect(this.processor);
|
||||
this.processor.connect(this.audioContext.destination);
|
||||
|
||||
this.isRecording = true;
|
||||
this.emit('start');
|
||||
} catch (error) {
|
||||
this.emit('error', error);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Node.js 环境录音(演示模式)
|
||||
* 实际使用需要 node-audio 等库
|
||||
*/
|
||||
private startInNode(): void {
|
||||
console.warn('Node.js 环境音频采集需要 electron 或 node-audio 库');
|
||||
console.warn('当前运行在演示模式,不会采集音频');
|
||||
this.isRecording = true;
|
||||
this.emit('start');
|
||||
// 演示:定期发送静音数据
|
||||
const demoInterval = setInterval(() => {
|
||||
if (!this.isRecording) {
|
||||
clearInterval(demoInterval);
|
||||
return;
|
||||
}
|
||||
const demoData = new Float32Array(this.config.sampleRate * (this.config.chunkDuration / 1000));
|
||||
this.emit('data', {
|
||||
data: demoData,
|
||||
sampleRate: this.config.sampleRate,
|
||||
timestamp: Date.now(),
|
||||
});
|
||||
}, this.config.chunkDuration);
|
||||
}
|
||||
|
||||
/**
|
||||
* 停止录音
|
||||
*/
|
||||
stop(): void {
|
||||
if (!this.isRecording) {
|
||||
return;
|
||||
}
|
||||
|
||||
if (this.processor) {
|
||||
this.processor.disconnect();
|
||||
this.processor = null;
|
||||
}
|
||||
if (this.source) {
|
||||
this.source.disconnect();
|
||||
this.source = null;
|
||||
}
|
||||
if (this.stream) {
|
||||
const tracks = this.stream.getTracks?.() || this.stream.tracks || [];
|
||||
tracks.forEach((track: any) => track.stop?.());
|
||||
this.stream = null;
|
||||
}
|
||||
if (this.audioContext) {
|
||||
this.audioContext.close?.();
|
||||
this.audioContext = null;
|
||||
}
|
||||
|
||||
this.isRecording = false;
|
||||
this.emit('stop');
|
||||
}
|
||||
|
||||
/**
|
||||
* 获取可用音频设备列表(仅浏览器环境)
|
||||
*/
|
||||
static async listDevices(): Promise<any[]> {
|
||||
if (typeof window !== 'undefined' && window.navigator?.mediaDevices) {
|
||||
const devices = await window.navigator.mediaDevices.enumerateDevices();
|
||||
return devices.filter((device: any) => device.kind === 'audioinput');
|
||||
}
|
||||
return [];
|
||||
}
|
||||
|
||||
/**
|
||||
* 判断是否正在录音
|
||||
*/
|
||||
get recording(): boolean {
|
||||
return this.isRecording;
|
||||
}
|
||||
}
|
||||
9
src/core/index.ts
Normal file
9
src/core/index.ts
Normal file
@ -0,0 +1,9 @@
|
||||
/**
|
||||
* Core 模块索引文件
|
||||
*/
|
||||
|
||||
export { AudioRecorder, type AudioConfig, type AudioChunk } from './audio-recorder.js';
|
||||
export { SpeechRecognizer, type RecognizerConfig, type RecognitionResult } from './speech-recognizer.js';
|
||||
export { TextOutput, type TextOutputConfig } from './text-output.js';
|
||||
export { normalizeAudio, resample, SimpleVAD, frameAudio } from './audio-processor.js';
|
||||
export { ModelLoader, type ModelConfig, MODEL_CONFIGS } from './model-loader.js';
|
||||
183
src/core/model-loader.ts
Normal file
183
src/core/model-loader.ts
Normal file
@ -0,0 +1,183 @@
|
||||
/**
|
||||
* 模型加载器
|
||||
* 负责加载和管理 ONNX 模型
|
||||
*/
|
||||
|
||||
import { existsSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
import * as ort from 'onnxruntime-web';
|
||||
|
||||
export interface ModelConfig {
|
||||
name: string;
|
||||
path: string;
|
||||
language: string[];
|
||||
sampleRate: number;
|
||||
inputShape: number[];
|
||||
description: string;
|
||||
}
|
||||
|
||||
// 预定义模型配置
|
||||
export const MODEL_CONFIGS: Record<string, ModelConfig> = {
|
||||
sensevoice: {
|
||||
name: 'SenseVoice',
|
||||
path: './models/sensevoice.onnx',
|
||||
language: ['zh', 'en', 'ja', 'ko'],
|
||||
sampleRate: 16000,
|
||||
inputShape: [1, 16000],
|
||||
description: '阿里达摩院多语言语音识别模型',
|
||||
},
|
||||
whisper: {
|
||||
name: 'Whisper',
|
||||
path: './models/whisper.onnx',
|
||||
language: ['zh', 'en', 'ja', 'ko', 'de', 'fr', 'es'],
|
||||
sampleRate: 16000,
|
||||
inputShape: [1, 480000], // 30 秒音频
|
||||
description: 'OpenAI 多语言语音识别模型',
|
||||
},
|
||||
paraformer: {
|
||||
name: 'Paraformer',
|
||||
path: './models/paraformer.onnx',
|
||||
language: ['zh'],
|
||||
sampleRate: 16000,
|
||||
inputShape: [1, 16000],
|
||||
description: '阿里达摩院中文语音识别模型',
|
||||
},
|
||||
};
|
||||
|
||||
export class ModelLoader {
|
||||
private session: ort.InferenceSession | null = null;
|
||||
private config: ModelConfig | null = null;
|
||||
|
||||
/**
|
||||
* 获取可用的模型列表
|
||||
*/
|
||||
static getAvailableModels(): ModelConfig[] {
|
||||
return Object.values(MODEL_CONFIGS).filter((config) =>
|
||||
existsSync(config.path)
|
||||
);
|
||||
}
|
||||
|
||||
/**
|
||||
* 检查模型文件是否存在
|
||||
*/
|
||||
static checkModelExists(modelName: string): boolean {
|
||||
const config = MODEL_CONFIGS[modelName];
|
||||
if (!config) return false;
|
||||
return existsSync(config.path);
|
||||
}
|
||||
|
||||
/**
|
||||
* 从目录加载模型
|
||||
*/
|
||||
static async loadFromDir(
|
||||
modelsDir: string
|
||||
): Promise<{ session: ort.InferenceSession; config: ModelConfig } | null> {
|
||||
// 按优先级查找模型
|
||||
const modelOrder = ['sensevoice.onnx', 'whisper.onnx', 'paraformer.onnx'];
|
||||
|
||||
for (const modelName of modelOrder) {
|
||||
const modelPath = join(modelsDir, modelName);
|
||||
if (existsSync(modelPath)) {
|
||||
try {
|
||||
const session = await ort.InferenceSession.create(modelPath);
|
||||
const config = Object.values(MODEL_CONFIGS).find((c) =>
|
||||
c.path.endsWith(modelName)
|
||||
) || {
|
||||
name: modelName.replace('.onnx', ''),
|
||||
path: modelPath,
|
||||
language: ['zh'],
|
||||
sampleRate: 16000,
|
||||
inputShape: [1, 16000],
|
||||
description: '自定义模型',
|
||||
};
|
||||
return { session, config };
|
||||
} catch (error) {
|
||||
console.warn(`加载模型 ${modelName} 失败:`, error);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return null;
|
||||
}
|
||||
|
||||
/**
|
||||
* 加载指定模型
|
||||
*/
|
||||
async load(modelNameOrPath: string): Promise<void> {
|
||||
let modelPath: string;
|
||||
let modelConfig: ModelConfig | undefined;
|
||||
|
||||
// 检查是否为预定义模型名称
|
||||
if (MODEL_CONFIGS[modelNameOrPath]) {
|
||||
modelConfig = MODEL_CONFIGS[modelNameOrPath];
|
||||
modelPath = modelConfig.path;
|
||||
} else {
|
||||
// 直接使用路径
|
||||
modelPath = modelNameOrPath;
|
||||
modelConfig = {
|
||||
name: 'custom',
|
||||
path: modelPath,
|
||||
language: ['zh'],
|
||||
sampleRate: 16000,
|
||||
inputShape: [1, 16000],
|
||||
description: '自定义模型路径',
|
||||
};
|
||||
}
|
||||
|
||||
if (!existsSync(modelPath)) {
|
||||
throw new Error(`模型文件不存在:${modelPath}`);
|
||||
}
|
||||
|
||||
try {
|
||||
const sessionOptions: ort.InferenceSession.SessionOptions = {
|
||||
executionProviders: ['cpu'],
|
||||
graphOptimizationLevel: 'all',
|
||||
intraOpNumThreads: 4,
|
||||
};
|
||||
|
||||
this.session = await ort.InferenceSession.create(modelPath, sessionOptions);
|
||||
this.config = modelConfig;
|
||||
|
||||
console.log(`✅ 模型加载成功:${modelConfig.name}`);
|
||||
console.log(` 支持语言:${modelConfig.language.join(', ')}`);
|
||||
console.log(` 采样率:${modelConfig.sampleRate}Hz`);
|
||||
} catch (error) {
|
||||
throw new Error(`模型加载失败:${error}`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* 获取当前加载的模型配置
|
||||
*/
|
||||
getConfig(): ModelConfig | null {
|
||||
return this.config;
|
||||
}
|
||||
|
||||
/**
|
||||
* 获取推理会话
|
||||
*/
|
||||
getSession(): ort.InferenceSession | null {
|
||||
return this.session;
|
||||
}
|
||||
|
||||
/**
|
||||
* 运行推理
|
||||
*/
|
||||
async run(feeds: Record<string, ort.Tensor>): Promise<Record<string, ort.Tensor>> {
|
||||
if (!this.session) {
|
||||
throw new Error('模型未加载');
|
||||
}
|
||||
return await this.session.run(feeds);
|
||||
}
|
||||
|
||||
/**
|
||||
* 释放模型资源
|
||||
*/
|
||||
async release(): Promise<void> {
|
||||
if (this.session) {
|
||||
await this.session.release();
|
||||
this.session = null;
|
||||
this.config = null;
|
||||
}
|
||||
}
|
||||
}
|
||||
201
src/core/speech-recognizer.ts
Normal file
201
src/core/speech-recognizer.ts
Normal file
@ -0,0 +1,201 @@
|
||||
/**
|
||||
* 语音识别引擎
|
||||
* 基于 ONNX Runtime 进行语音识别推理
|
||||
*/
|
||||
|
||||
import * as ort from 'onnxruntime-web';
|
||||
import { EventEmitter } from 'events';
|
||||
import { AudioChunk } from './audio-recorder.js';
|
||||
import { ModelLoader } from './model-loader.js';
|
||||
|
||||
export interface RecognizerConfig {
|
||||
modelPath: string; // 模型文件路径
|
||||
language: string; // 识别语言:'zh', 'en', 'ja', 'ko' 等
|
||||
useVad: boolean; // 是否使用语音端点检测
|
||||
beamSize: number; // 束搜索宽度
|
||||
}
|
||||
|
||||
export interface RecognitionResult {
|
||||
text: string; // 识别文本
|
||||
confidence: number; // 置信度
|
||||
isFinal: boolean; // 是否为最终结果
|
||||
timestamp: number; // 时间戳
|
||||
}
|
||||
|
||||
export class SpeechRecognizer extends EventEmitter {
|
||||
private config: RecognizerConfig;
|
||||
private modelLoader: ModelLoader;
|
||||
private isRecognizing: boolean = false;
|
||||
private audioBuffer: Float32Array = new Float32Array(0);
|
||||
private readonly MAX_BUFFER_SECONDS = 30;
|
||||
|
||||
constructor(config: RecognizerConfig) {
|
||||
super();
|
||||
this.config = config;
|
||||
this.modelLoader = new ModelLoader();
|
||||
}
|
||||
|
||||
/**
|
||||
* 初始化识别引擎(加载模型)
|
||||
*/
|
||||
async initialize(): Promise<void> {
|
||||
try {
|
||||
await this.modelLoader.load(this.config.modelPath);
|
||||
this.emit('ready');
|
||||
} catch (error) {
|
||||
this.emit('error', new Error(`Failed to load model: ${error}`));
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* 处理音频数据
|
||||
*/
|
||||
async processAudio(chunk: AudioChunk): Promise<void> {
|
||||
if (!this.isRecognizing) {
|
||||
return;
|
||||
}
|
||||
|
||||
// 将音频数据添加到缓冲区
|
||||
const newBuffer = new Float32Array(this.audioBuffer.length + chunk.data.length);
|
||||
newBuffer.set(this.audioBuffer);
|
||||
newBuffer.set(chunk.data, this.audioBuffer.length);
|
||||
this.audioBuffer = newBuffer;
|
||||
|
||||
// 检查缓冲区是否超过最大长度
|
||||
const maxSamples = this.config.useVad
|
||||
? chunk.sampleRate * this.MAX_BUFFER_SECONDS
|
||||
: chunk.sampleRate * 5;
|
||||
|
||||
if (this.audioBuffer.length > maxSamples) {
|
||||
const keepStart = Math.floor(this.audioBuffer.length / 2);
|
||||
this.audioBuffer = this.audioBuffer.slice(keepStart);
|
||||
}
|
||||
|
||||
// 进行识别
|
||||
await this.recognize(chunk.sampleRate);
|
||||
}
|
||||
|
||||
/**
|
||||
* 执行识别
|
||||
*/
|
||||
private async recognize(sampleRate: number): Promise<void> {
|
||||
const modelConfig = this.modelLoader.getConfig();
|
||||
if (!modelConfig || this.audioBuffer.length === 0) {
|
||||
return;
|
||||
}
|
||||
|
||||
try {
|
||||
// 重采样到模型要求的采样率
|
||||
let audioData = this.audioBuffer;
|
||||
if (sampleRate !== modelConfig.sampleRate) {
|
||||
const ratio = sampleRate / modelConfig.sampleRate;
|
||||
const newLength = Math.floor(this.audioBuffer.length / ratio);
|
||||
audioData = new Float32Array(newLength);
|
||||
for (let i = 0; i < newLength; i++) {
|
||||
const pos = Math.floor(i * ratio);
|
||||
audioData[i] = this.audioBuffer[pos] || 0;
|
||||
}
|
||||
}
|
||||
|
||||
// 填充或截断到模型输入大小
|
||||
const inputSize = modelConfig.inputShape[1];
|
||||
const inputData = new Float32Array(inputSize);
|
||||
const copyLength = Math.min(audioData.length, inputSize);
|
||||
inputData.set(audioData.slice(0, copyLength));
|
||||
|
||||
const inputTensor = new ort.Tensor('float32', inputData, [1, inputSize]);
|
||||
|
||||
const feeds: Record<string, ort.Tensor> = {
|
||||
input: inputTensor,
|
||||
};
|
||||
|
||||
const results = await this.modelLoader.run(feeds);
|
||||
|
||||
// 解码结果
|
||||
const text = this.decodeOutput(results, modelConfig);
|
||||
|
||||
if (text) {
|
||||
const result: RecognitionResult = {
|
||||
text,
|
||||
confidence: 0.95,
|
||||
isFinal: true,
|
||||
timestamp: Date.now(),
|
||||
};
|
||||
this.emit('result', result);
|
||||
}
|
||||
|
||||
// 清空缓冲区
|
||||
this.audioBuffer = new Float32Array(0);
|
||||
} catch (error) {
|
||||
this.emit('error', new Error(`Recognition failed: ${error}`));
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* 解码模型输出
|
||||
*/
|
||||
private decodeOutput(results: Record<string, ort.Tensor>, _modelConfig: any): string {
|
||||
// 尝试不同的输出键名
|
||||
const outputKeys = ['output', 'logits', 'output_ids', 'token_ids'];
|
||||
let output: ort.Tensor | undefined;
|
||||
|
||||
for (const key of outputKeys) {
|
||||
if (results[key]) {
|
||||
output = results[key];
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
if (!output) {
|
||||
// 返回第一个可用的输出
|
||||
const firstKey = Object.keys(results)[0];
|
||||
if (firstKey) {
|
||||
output = results[firstKey];
|
||||
}
|
||||
}
|
||||
|
||||
if (!output || !output.data) {
|
||||
return '';
|
||||
}
|
||||
|
||||
// 简化处理:实际应根据具体模型使用 tokenizer 解码
|
||||
// 这里返回一个占位字符串
|
||||
const tokens = Array.from(output.data as Float32Array | Int32Array);
|
||||
return `[识别结果:${tokens.length} tokens]`;
|
||||
}
|
||||
|
||||
/**
|
||||
* 开始识别
|
||||
*/
|
||||
start(): void {
|
||||
this.isRecognizing = true;
|
||||
this.emit('start');
|
||||
}
|
||||
|
||||
/**
|
||||
* 停止识别
|
||||
*/
|
||||
stop(): void {
|
||||
this.isRecognizing = false;
|
||||
if (this.audioBuffer.length > 0) {
|
||||
this.recognize(16000);
|
||||
}
|
||||
this.emit('stop');
|
||||
}
|
||||
|
||||
/**
|
||||
* 卸载模型释放资源
|
||||
*/
|
||||
async release(): Promise<void> {
|
||||
this.stop();
|
||||
await this.modelLoader.release();
|
||||
}
|
||||
|
||||
/**
|
||||
* 判断是否正在识别
|
||||
*/
|
||||
get recognizing(): boolean {
|
||||
return this.isRecognizing;
|
||||
}
|
||||
}
|
||||
121
src/core/text-output.ts
Normal file
121
src/core/text-output.ts
Normal file
@ -0,0 +1,121 @@
|
||||
/**
|
||||
* 文本输出模块
|
||||
* 负责将识别结果输出到剪贴板或模拟键盘输入
|
||||
*/
|
||||
|
||||
import { EventEmitter } from 'events';
|
||||
import { RecognitionResult } from './speech-recognizer.js';
|
||||
|
||||
export interface TextOutputConfig {
|
||||
outputMode: 'clipboard' | 'keyboard' | 'both'; // 输出模式
|
||||
autoPaste: boolean; // 是否自动粘贴
|
||||
delayMs: number; // 延迟时间 (ms)
|
||||
}
|
||||
|
||||
export class TextOutput extends EventEmitter {
|
||||
private config: TextOutputConfig;
|
||||
private lastText: string = '';
|
||||
|
||||
constructor(config: Partial<TextOutputConfig> = {}) {
|
||||
super();
|
||||
this.config = {
|
||||
outputMode: config.outputMode ?? 'clipboard',
|
||||
autoPaste: config.autoPaste ?? true,
|
||||
delayMs: config.delayMs ?? 50,
|
||||
};
|
||||
}
|
||||
|
||||
/**
|
||||
* 输出识别结果
|
||||
*/
|
||||
async output(result: RecognitionResult): Promise<void> {
|
||||
if (!result.isFinal || !result.text) {
|
||||
return;
|
||||
}
|
||||
|
||||
this.lastText = result.text;
|
||||
|
||||
try {
|
||||
switch (this.config.outputMode) {
|
||||
case 'clipboard':
|
||||
await this.copyToClipboard(result.text);
|
||||
break;
|
||||
case 'keyboard':
|
||||
case 'both':
|
||||
// keyboard 模式在 Electron 中通过主进程实现
|
||||
// 在纯 Node.js 环境中回退到剪贴板
|
||||
await this.copyToClipboard(result.text);
|
||||
if (this.config.outputMode === 'both') {
|
||||
console.log('提示:文本已复制到剪贴板,请手动粘贴');
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
this.emit('output', result.text);
|
||||
} catch (error) {
|
||||
this.emit('error', error);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* 复制到剪贴板
|
||||
*/
|
||||
private async copyToClipboard(text: string): Promise<void> {
|
||||
// 尝试使用 clipboardy
|
||||
try {
|
||||
const clipboardy = await import('clipboardy');
|
||||
await clipboardy.default.write(text);
|
||||
this.emit('clipboard', text);
|
||||
return;
|
||||
} catch (e) {
|
||||
// clipboardy 不可用,尝试其他方法
|
||||
}
|
||||
|
||||
// Electron 环境
|
||||
const globalObj = typeof globalThis !== 'undefined' ? globalThis : typeof window !== 'undefined' ? window : {};
|
||||
if ((globalObj as any).navigator?.clipboard) {
|
||||
await (globalObj as any).navigator.clipboard.writeText(text);
|
||||
this.emit('clipboard', text);
|
||||
return;
|
||||
}
|
||||
|
||||
// 使用系统命令
|
||||
const platform = process.platform;
|
||||
const { exec } = await import('child_process');
|
||||
|
||||
return new Promise((resolve, reject) => {
|
||||
let cmd: string;
|
||||
if (platform === 'win32') {
|
||||
cmd = `echo ${text} | clip`;
|
||||
} else if (platform === 'darwin') {
|
||||
cmd = `echo "${text}" | pbcopy`;
|
||||
} else {
|
||||
// Linux - 尝试多种工具
|
||||
cmd = `echo "${text}" | xclip -selection clipboard 2>/dev/null || echo "${text}" | xsel --clipboard 2>/dev/null || echo "clipboardy failed"`;
|
||||
}
|
||||
|
||||
exec(cmd, (error) => {
|
||||
if (error) reject(error);
|
||||
else {
|
||||
this.emit('clipboard', text);
|
||||
resolve();
|
||||
}
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
/**
|
||||
* 获取最后输出的文本
|
||||
*/
|
||||
getLastText(): string {
|
||||
return this.lastText;
|
||||
}
|
||||
|
||||
/**
|
||||
* 清空历史记录
|
||||
*/
|
||||
clear(): void {
|
||||
this.lastText = '';
|
||||
this.emit('clear');
|
||||
}
|
||||
}
|
||||
105
src/electron-main.ts
Normal file
105
src/electron-main.ts
Normal file
@ -0,0 +1,105 @@
|
||||
/**
|
||||
* Impress ASR Input - Electron 主进程
|
||||
* 注意:此文件需要 electron 依赖,运行前请执行:npm install electron --save-dev
|
||||
*/
|
||||
|
||||
import { app, BrowserWindow, ipcMain, globalShortcut, clipboard } from 'electron';
|
||||
import { join } from 'path';
|
||||
import { fileURLToPath } from 'url';
|
||||
|
||||
const __dirname = fileURLToPath(new URL('.', import.meta.url));
|
||||
|
||||
let mainWindow: BrowserWindow | null = null;
|
||||
|
||||
function createWindow() {
|
||||
mainWindow = new BrowserWindow({
|
||||
width: 400,
|
||||
height: 600,
|
||||
title: 'Impress ASR Input',
|
||||
webPreferences: {
|
||||
preload: join(__dirname, 'preload.js'),
|
||||
contextIsolation: true,
|
||||
nodeIntegration: false,
|
||||
},
|
||||
resizable: false,
|
||||
skipTaskbar: false,
|
||||
alwaysOnTop: false,
|
||||
});
|
||||
|
||||
// 加载主界面
|
||||
if (process.env.NODE_ENV === 'development') {
|
||||
mainWindow.loadURL('http://localhost:5173');
|
||||
} else {
|
||||
mainWindow.loadFile(join(__dirname, '../ui/index.html'));
|
||||
}
|
||||
|
||||
mainWindow.on('closed', () => {
|
||||
mainWindow = null;
|
||||
});
|
||||
}
|
||||
|
||||
// 应用就绪时创建窗口
|
||||
app.whenReady().then(() => {
|
||||
createWindow();
|
||||
|
||||
// 注册全局热键
|
||||
globalShortcut.register('CommandOrControl+Shift+Space', () => {
|
||||
mainWindow?.webContents.send('toggle-recording');
|
||||
});
|
||||
|
||||
globalShortcut.register('CommandOrControl+Escape', () => {
|
||||
mainWindow?.webContents.send('stop-recording');
|
||||
});
|
||||
});
|
||||
|
||||
// IPC 处理
|
||||
ipcMain.handle('start-recording', async () => {
|
||||
// 启动录音
|
||||
console.log('开始录音');
|
||||
return { success: true };
|
||||
});
|
||||
|
||||
ipcMain.handle('stop-recording', async () => {
|
||||
// 停止录音
|
||||
console.log('停止录音');
|
||||
return { success: true };
|
||||
});
|
||||
|
||||
ipcMain.handle('copy-to-clipboard', async (_, text: string) => {
|
||||
clipboard.writeText(text);
|
||||
return { success: true };
|
||||
});
|
||||
|
||||
ipcMain.handle('get-settings', async () => {
|
||||
// 获取设置
|
||||
return {
|
||||
language: 'zh',
|
||||
outputMode: 'clipboard',
|
||||
modelPath: './models/model.onnx',
|
||||
};
|
||||
});
|
||||
|
||||
ipcMain.handle('save-settings', async (_event: any, settings: Record<string, unknown>) => {
|
||||
// 保存设置
|
||||
console.log('保存设置:', settings);
|
||||
return { success: true };
|
||||
});
|
||||
|
||||
// 所有窗口关闭时退出应用
|
||||
app.on('window-all-closed', () => {
|
||||
globalShortcut.unregisterAll();
|
||||
if (process.platform !== 'darwin') {
|
||||
app.quit();
|
||||
}
|
||||
});
|
||||
|
||||
app.on('activate', () => {
|
||||
if (BrowserWindow.getAllWindows().length === 0) {
|
||||
createWindow();
|
||||
}
|
||||
});
|
||||
|
||||
// 应用退出前清理
|
||||
app.on('will-quit', () => {
|
||||
globalShortcut.unregisterAll();
|
||||
});
|
||||
110
src/main.ts
Normal file
110
src/main.ts
Normal file
@ -0,0 +1,110 @@
|
||||
/**
|
||||
* Impress ASR Input - 主入口
|
||||
* 命令行模式入口文件
|
||||
*/
|
||||
|
||||
import { Command } from 'commander';
|
||||
import { SpeechRecognizer, RecognitionResult } from './core/speech-recognizer.js';
|
||||
import { TextOutput } from './core/text-output.js';
|
||||
import { readFileSync } from 'fs';
|
||||
import { fileURLToPath } from 'url';
|
||||
import { dirname, join } from 'path';
|
||||
|
||||
const __dirname = dirname(fileURLToPath(import.meta.url));
|
||||
const packageJson = JSON.parse(
|
||||
readFileSync(join(__dirname, '../package.json'), 'utf-8')
|
||||
);
|
||||
|
||||
const program = new Command();
|
||||
|
||||
program
|
||||
.name('impress-asr-input')
|
||||
.description('基于 ONNX 的本地语音识别输入工具')
|
||||
.version(packageJson.version);
|
||||
|
||||
program
|
||||
.command('start')
|
||||
.description('开始语音识别')
|
||||
.option('-l, --language <lang>', '识别语言', 'zh')
|
||||
.option('-m, --model <path>', '模型文件路径', join(__dirname, '../models/model.onnx'))
|
||||
.option('-o, --output <mode>', '输出模式:clipboard|keyboard|both', 'clipboard')
|
||||
.action(async (options) => {
|
||||
console.log('🎤 启动语音识别...');
|
||||
console.log(` 语言:${options.language}`);
|
||||
console.log(` 模型:${options.model}`);
|
||||
console.log(` 输出:${options.output}`);
|
||||
|
||||
const recognizer = new SpeechRecognizer({
|
||||
modelPath: options.model,
|
||||
language: options.language,
|
||||
useVad: true,
|
||||
beamSize: 5,
|
||||
});
|
||||
|
||||
const textOutput = new TextOutput({
|
||||
outputMode: options.output as 'clipboard' | 'keyboard' | 'both',
|
||||
autoPaste: true,
|
||||
delayMs: 50,
|
||||
});
|
||||
|
||||
// 绑定事件
|
||||
recognizer.on('ready', () => {
|
||||
console.log('✅ 模型加载完成,开始识别...');
|
||||
recognizer.start();
|
||||
});
|
||||
|
||||
recognizer.on('result', (result: RecognitionResult) => {
|
||||
console.log(`📝 ${result.text}`);
|
||||
textOutput.output(result);
|
||||
});
|
||||
|
||||
recognizer.on('error', (error: Error) => {
|
||||
console.error('❌ 识别错误:', error.message);
|
||||
process.exit(1);
|
||||
});
|
||||
|
||||
// 初始化并开始
|
||||
try {
|
||||
await recognizer.initialize();
|
||||
// 注意:音频采集在纯 Node.js 环境需要额外处理
|
||||
// 这里仅作为框架演示
|
||||
console.log('⚠️ 当前为演示模式,完整功能需要 Electron 环境');
|
||||
} catch (error) {
|
||||
console.error('❌ 启动失败:', error);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
// 优雅退出
|
||||
process.on('SIGINT', async () => {
|
||||
console.log('\n🛑 停止识别...');
|
||||
recognizer.stop();
|
||||
await recognizer.release();
|
||||
process.exit(0);
|
||||
});
|
||||
});
|
||||
|
||||
program
|
||||
.command('transcribe')
|
||||
.description('转写音频文件')
|
||||
.argument('<file>', '音频文件路径')
|
||||
.option('-l, --language <lang>', '识别语言', 'zh')
|
||||
.option('-m, --model <path>', '模型文件路径')
|
||||
.option('-o, --output <file>', '输出文件路径')
|
||||
.action(async (file, options) => {
|
||||
console.log(`🎵 转写文件:${file}`);
|
||||
console.log(` 语言:${options.language}`);
|
||||
|
||||
// TODO: 实现文件转写功能
|
||||
console.log('⚠️ 文件转写功能开发中...');
|
||||
});
|
||||
|
||||
program
|
||||
.command('list-devices')
|
||||
.description('列出可用音频设备')
|
||||
.action(() => {
|
||||
console.log('🎧 可用音频设备:');
|
||||
// TODO: 实现设备列表功能
|
||||
console.log('⚠️ 设备列表功能开发中...');
|
||||
});
|
||||
|
||||
program.parse();
|
||||
29
src/preload.ts
Normal file
29
src/preload.ts
Normal file
@ -0,0 +1,29 @@
|
||||
/**
|
||||
* Electron 预加载脚本
|
||||
* 注意:此文件需要 electron 依赖
|
||||
*/
|
||||
|
||||
import { contextBridge, ipcRenderer } from 'electron';
|
||||
|
||||
// 暴露给渲染进程的 API
|
||||
contextBridge.exposeInMainWorld('electronAPI', {
|
||||
// 录音控制
|
||||
startRecording: () => ipcRenderer.invoke('start-recording'),
|
||||
stopRecording: () => ipcRenderer.invoke('stop-recording'),
|
||||
|
||||
// 剪贴板
|
||||
copyToClipboard: (text: string) => ipcRenderer.invoke('copy-to-clipboard', text),
|
||||
|
||||
// 设置
|
||||
getSettings: () => ipcRenderer.invoke('get-settings'),
|
||||
saveSettings: (settings: Record<string, unknown>) =>
|
||||
ipcRenderer.invoke('save-settings', settings),
|
||||
|
||||
// 事件监听
|
||||
onToggleRecording: (callback: () => void) => {
|
||||
ipcRenderer.on('toggle-recording', () => callback());
|
||||
},
|
||||
onStopRecording: (callback: () => void) => {
|
||||
ipcRenderer.on('stop-recording', () => callback());
|
||||
},
|
||||
});
|
||||
315
src/ui/index.html
Normal file
315
src/ui/index.html
Normal file
@ -0,0 +1,315 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="zh-CN">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta http-equiv="Content-Security-Policy" content="default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'">
|
||||
<title>Impress ASR Input</title>
|
||||
<style>
|
||||
:root {
|
||||
--bg-primary: #1a1a2e;
|
||||
--bg-secondary: #16213e;
|
||||
--accent: #e94560;
|
||||
--accent-hover: #ff6b6b;
|
||||
--text-primary: #ffffff;
|
||||
--text-secondary: #a0a0a0;
|
||||
--success: #00d9a5;
|
||||
--border: #2d3748;
|
||||
}
|
||||
|
||||
* {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
body {
|
||||
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
|
||||
background: var(--bg-primary);
|
||||
color: var(--text-primary);
|
||||
min-height: 100vh;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
}
|
||||
|
||||
.header {
|
||||
padding: 20px;
|
||||
text-align: center;
|
||||
border-bottom: 1px solid var(--border);
|
||||
}
|
||||
|
||||
.header h1 {
|
||||
font-size: 18px;
|
||||
font-weight: 600;
|
||||
}
|
||||
|
||||
.header p {
|
||||
font-size: 12px;
|
||||
color: var(--text-secondary);
|
||||
margin-top: 4px;
|
||||
}
|
||||
|
||||
.main-content {
|
||||
flex: 1;
|
||||
padding: 20px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 20px;
|
||||
}
|
||||
|
||||
.status-card {
|
||||
background: var(--bg-secondary);
|
||||
border-radius: 12px;
|
||||
padding: 20px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.status-indicator {
|
||||
width: 80px;
|
||||
height: 80px;
|
||||
border-radius: 50%;
|
||||
background: var(--border);
|
||||
margin: 0 auto 12px;
|
||||
display: flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
font-size: 32px;
|
||||
transition: all 0.3s ease;
|
||||
}
|
||||
|
||||
.status-indicator.recording {
|
||||
background: var(--accent);
|
||||
animation: pulse 1.5s infinite;
|
||||
}
|
||||
|
||||
@keyframes pulse {
|
||||
0%, 100% { transform: scale(1); opacity: 1; }
|
||||
50% { transform: scale(1.1); opacity: 0.8; }
|
||||
}
|
||||
|
||||
.status-text {
|
||||
font-size: 14px;
|
||||
color: var(--text-secondary);
|
||||
}
|
||||
|
||||
.record-btn {
|
||||
width: 100%;
|
||||
padding: 16px;
|
||||
border: none;
|
||||
border-radius: 12px;
|
||||
background: var(--accent);
|
||||
color: white;
|
||||
font-size: 16px;
|
||||
font-weight: 600;
|
||||
cursor: pointer;
|
||||
transition: all 0.2s ease;
|
||||
}
|
||||
|
||||
.record-btn:hover {
|
||||
background: var(--accent-hover);
|
||||
}
|
||||
|
||||
.record-btn:active {
|
||||
transform: scale(0.98);
|
||||
}
|
||||
|
||||
.record-btn.recording {
|
||||
background: var(--border);
|
||||
}
|
||||
|
||||
.result-area {
|
||||
flex: 1;
|
||||
background: var(--bg-secondary);
|
||||
border-radius: 12px;
|
||||
padding: 16px;
|
||||
min-height: 150px;
|
||||
}
|
||||
|
||||
.result-area h3 {
|
||||
font-size: 14px;
|
||||
margin-bottom: 12px;
|
||||
color: var(--text-secondary);
|
||||
}
|
||||
|
||||
.result-text {
|
||||
font-size: 14px;
|
||||
line-height: 1.6;
|
||||
white-space: pre-wrap;
|
||||
word-break: break-word;
|
||||
}
|
||||
|
||||
.result-text:empty::before {
|
||||
content: '识别结果将显示在这里...';
|
||||
color: var(--text-secondary);
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.settings-section {
|
||||
background: var(--bg-secondary);
|
||||
border-radius: 12px;
|
||||
padding: 16px;
|
||||
}
|
||||
|
||||
.setting-item {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
padding: 8px 0;
|
||||
border-bottom: 1px solid var(--border);
|
||||
}
|
||||
|
||||
.setting-item:last-child {
|
||||
border-bottom: none;
|
||||
}
|
||||
|
||||
.setting-item label {
|
||||
font-size: 14px;
|
||||
}
|
||||
|
||||
.setting-item select {
|
||||
background: var(--bg-primary);
|
||||
border: 1px solid var(--border);
|
||||
border-radius: 6px;
|
||||
padding: 6px 12px;
|
||||
color: var(--text-primary);
|
||||
font-size: 13px;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
.hotkey-hint {
|
||||
font-size: 12px;
|
||||
color: var(--text-secondary);
|
||||
text-align: center;
|
||||
padding: 12px;
|
||||
background: var(--bg-secondary);
|
||||
border-radius: 8px;
|
||||
}
|
||||
|
||||
.hotkey-hint kbd {
|
||||
background: var(--bg-primary);
|
||||
padding: 2px 8px;
|
||||
border-radius: 4px;
|
||||
border: 1px solid var(--border);
|
||||
font-family: monospace;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
<div class="header">
|
||||
<h1>🎤 Impress ASR Input</h1>
|
||||
<p>语音识别输入工具</p>
|
||||
</div>
|
||||
|
||||
<div class="main-content">
|
||||
<div class="status-card">
|
||||
<div class="status-indicator" id="statusIndicator">🎤</div>
|
||||
<div class="status-text" id="statusText">点击按钮开始录音</div>
|
||||
</div>
|
||||
|
||||
<button class="record-btn" id="recordBtn">开始录音</button>
|
||||
|
||||
<div class="result-area">
|
||||
<h3>识别结果</h3>
|
||||
<div class="result-text" id="resultText"></div>
|
||||
</div>
|
||||
|
||||
<div class="settings-section">
|
||||
<div class="setting-item">
|
||||
<label>识别语言</label>
|
||||
<select id="languageSelect">
|
||||
<option value="zh">中文</option>
|
||||
<option value="en">English</option>
|
||||
<option value="ja">日本語</option>
|
||||
<option value="ko">한국어</option>
|
||||
</select>
|
||||
</div>
|
||||
<div class="setting-item">
|
||||
<label>输出模式</label>
|
||||
<select id="outputModeSelect">
|
||||
<option value="clipboard">剪贴板</option>
|
||||
<option value="both">剪贴板 + 提示</option>
|
||||
</select>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="hotkey-hint">
|
||||
<p>快捷键:<kbd>Ctrl+Shift+Space</kbd> 开始/停止录音</p>
|
||||
<p style="margin-top: 6px;"><kbd>Ctrl+Escape</kbd> 强制停止</p>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<script>
|
||||
const recordBtn = document.getElementById('recordBtn');
|
||||
const statusIndicator = document.getElementById('statusIndicator');
|
||||
const statusText = document.getElementById('statusText');
|
||||
const resultText = document.getElementById('resultText');
|
||||
const languageSelect = document.getElementById('languageSelect');
|
||||
const outputModeSelect = document.getElementById('outputModeSelect');
|
||||
|
||||
let isRecording = false;
|
||||
|
||||
// 更新 UI 状态
|
||||
function updateUI() {
|
||||
if (isRecording) {
|
||||
recordBtn.textContent = '停止录音';
|
||||
recordBtn.classList.add('recording');
|
||||
statusIndicator.classList.add('recording');
|
||||
statusText.textContent = '正在录音中...';
|
||||
} else {
|
||||
recordBtn.textContent = '开始录音';
|
||||
recordBtn.classList.remove('recording');
|
||||
statusIndicator.classList.remove('recording');
|
||||
statusText.textContent = '点击按钮开始录音';
|
||||
}
|
||||
}
|
||||
|
||||
// 点击录音按钮
|
||||
recordBtn.addEventListener('click', async () => {
|
||||
isRecording = !isRecording;
|
||||
updateUI();
|
||||
|
||||
if (isRecording) {
|
||||
await window.electronAPI?.startRecording();
|
||||
} else {
|
||||
await window.electronAPI?.stopRecording();
|
||||
}
|
||||
});
|
||||
|
||||
// 监听全局热键
|
||||
window.electronAPI?.onToggleRecording(() => {
|
||||
isRecording = !isRecording;
|
||||
updateUI();
|
||||
});
|
||||
|
||||
window.electronAPI?.onStopRecording(() => {
|
||||
isRecording = false;
|
||||
updateUI();
|
||||
});
|
||||
|
||||
// 模拟识别结果(开发用)
|
||||
function simulateResult(text) {
|
||||
resultText.textContent = text;
|
||||
if (text) {
|
||||
window.electronAPI?.copyToClipboard(text);
|
||||
}
|
||||
}
|
||||
|
||||
// 设置保存
|
||||
languageSelect.addEventListener('change', () => {
|
||||
window.electronAPI?.saveSettings({ language: languageSelect.value });
|
||||
});
|
||||
|
||||
outputModeSelect.addEventListener('change', () => {
|
||||
window.electronAPI?.saveSettings({ outputMode: outputModeSelect.value });
|
||||
});
|
||||
|
||||
// 加载设置
|
||||
window.electronAPI?.getSettings().then(settings => {
|
||||
if (settings) {
|
||||
languageSelect.value = settings.language || 'zh';
|
||||
outputModeSelect.value = settings.outputMode || 'clipboard';
|
||||
}
|
||||
});
|
||||
</script>
|
||||
</body>
|
||||
</html>
|
||||
103
src/utils/config.ts
Normal file
103
src/utils/config.ts
Normal file
@ -0,0 +1,103 @@
|
||||
/**
|
||||
* 配置管理模块
|
||||
*/
|
||||
|
||||
import { readFileSync, writeFileSync, existsSync } from 'fs';
|
||||
|
||||
export interface AppSettings {
|
||||
// 识别设置
|
||||
language: string;
|
||||
modelPath: string;
|
||||
useVad: boolean;
|
||||
|
||||
// 输出设置
|
||||
outputMode: 'clipboard' | 'keyboard' | 'both';
|
||||
autoPaste: boolean;
|
||||
|
||||
// 热键设置
|
||||
startHotkey: string;
|
||||
stopHotkey: string;
|
||||
|
||||
// 音频设置
|
||||
audioDeviceId?: string;
|
||||
sampleRate: number;
|
||||
}
|
||||
|
||||
export const defaultSettings: AppSettings = {
|
||||
language: 'zh',
|
||||
modelPath: './models/model.onnx',
|
||||
useVad: true,
|
||||
outputMode: 'clipboard',
|
||||
autoPaste: true,
|
||||
startHotkey: 'CommandOrControl+Shift+Space',
|
||||
stopHotkey: 'CommandOrControl+Escape',
|
||||
sampleRate: 16000,
|
||||
};
|
||||
|
||||
/**
|
||||
* 配置存储类
|
||||
* 在 Electron 环境中使用 electron-store
|
||||
* 在纯 Node.js 环境中使用 JSON 文件
|
||||
*/
|
||||
export class ConfigStore {
|
||||
private settings: AppSettings;
|
||||
private filePath: string;
|
||||
|
||||
constructor(filePath: string) {
|
||||
this.filePath = filePath;
|
||||
this.settings = { ...defaultSettings };
|
||||
this.load();
|
||||
}
|
||||
|
||||
/**
|
||||
* 加载配置
|
||||
*/
|
||||
load(): void {
|
||||
try {
|
||||
if (existsSync(this.filePath)) {
|
||||
const content = readFileSync(this.filePath, 'utf-8');
|
||||
const saved = JSON.parse(content);
|
||||
this.settings = { ...this.settings, ...saved };
|
||||
}
|
||||
} catch {
|
||||
// 文件不存在或解析失败,使用默认设置
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* 保存配置
|
||||
*/
|
||||
save(): void {
|
||||
writeFileSync(this.filePath, JSON.stringify(this.settings, null, 2));
|
||||
}
|
||||
|
||||
/**
|
||||
* 获取配置项
|
||||
*/
|
||||
get<K extends keyof AppSettings>(key: K): AppSettings[K] {
|
||||
return this.settings[key];
|
||||
}
|
||||
|
||||
/**
|
||||
* 设置配置项
|
||||
*/
|
||||
set<K extends keyof AppSettings>(key: K, value: AppSettings[K]): void {
|
||||
this.settings[key] = value;
|
||||
this.save();
|
||||
}
|
||||
|
||||
/**
|
||||
* 获取所有配置
|
||||
*/
|
||||
getAll(): AppSettings {
|
||||
return { ...this.settings };
|
||||
}
|
||||
|
||||
/**
|
||||
* 重置为默认值
|
||||
*/
|
||||
reset(): void {
|
||||
this.settings = { ...defaultSettings };
|
||||
this.save();
|
||||
}
|
||||
}
|
||||
90
test/audio-processor.test.ts
Normal file
90
test/audio-processor.test.ts
Normal file
@ -0,0 +1,90 @@
|
||||
/**
|
||||
* 音频处理器测试
|
||||
*/
|
||||
|
||||
import { describe, it, expect } from 'vitest';
|
||||
import { normalizeAudio, resample, SimpleVAD, frameAudio } from '../src/core/audio-processor.js';
|
||||
|
||||
describe('audio-processor', () => {
|
||||
describe('normalizeAudio', () => {
|
||||
it('应该归一化 Int16Array 数据', () => {
|
||||
const input = new Int16Array([32767, -32768, 0, 16384]);
|
||||
const result = normalizeAudio(input, 16000);
|
||||
|
||||
expect(result).toBeInstanceOf(Float32Array);
|
||||
expect(result.length).toBe(input.length);
|
||||
expect(result[0]).toBeCloseTo(1, 3);
|
||||
expect(result[1]).toBeCloseTo(-1, 3);
|
||||
expect(result[2]).toBe(0);
|
||||
});
|
||||
|
||||
it('应该处理 Float32Array 输入', () => {
|
||||
const input = new Float32Array([1, -1, 0, 0.5]);
|
||||
const result = normalizeAudio(input, 16000);
|
||||
|
||||
expect(result).toBe(input);
|
||||
});
|
||||
});
|
||||
|
||||
describe('resample', () => {
|
||||
it('应该保持相同采样率的数据不变', () => {
|
||||
const input = new Float32Array([1, 2, 3, 4]);
|
||||
const result = resample(input, 16000, 16000);
|
||||
|
||||
expect(result).toBe(input);
|
||||
});
|
||||
|
||||
it('应该降低采样率', () => {
|
||||
const input = new Float32Array([1, 2, 3, 4, 5, 6, 7, 8]);
|
||||
const result = resample(input, 16000, 8000);
|
||||
|
||||
expect(result.length).toBe(4);
|
||||
});
|
||||
});
|
||||
|
||||
describe('SimpleVAD', () => {
|
||||
it('应该检测到高能量音频', () => {
|
||||
const vad = new SimpleVAD({ energyThreshold: 0.01 });
|
||||
const loudFrame = new Float32Array([0.5, 0.6, 0.7, 0.8]);
|
||||
|
||||
const result = vad.process(loudFrame, 16000);
|
||||
|
||||
expect(result.isSpeaking).toBe(true);
|
||||
});
|
||||
|
||||
it('应该忽略低能量音频', () => {
|
||||
const vad = new SimpleVAD({ energyThreshold: 0.01 });
|
||||
const quietFrame = new Float32Array([0.001, 0.002, 0.001, 0]);
|
||||
|
||||
const result = vad.process(quietFrame, 16000);
|
||||
|
||||
expect(result.isSpeaking).toBe(false);
|
||||
});
|
||||
|
||||
it('应该重置状态', () => {
|
||||
const vad = new SimpleVAD();
|
||||
vad.process(new Float32Array([0.5, 0.6, 0.7]), 16000);
|
||||
vad.reset();
|
||||
|
||||
expect(vad.process(new Float32Array([0]), 16000).isSpeaking).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
describe('frameAudio', () => {
|
||||
it('应该正确分帧', () => {
|
||||
const input = new Float32Array([1, 2, 3, 4, 5, 6, 7, 8]);
|
||||
const frames = frameAudio(input, 4, 2);
|
||||
|
||||
expect(frames.length).toBe(4);
|
||||
expect(frames[0]).toEqual(new Float32Array([1, 2, 3, 4]));
|
||||
expect(frames[1]).toEqual(new Float32Array([3, 4, 5, 6]));
|
||||
});
|
||||
|
||||
it('应该处理不足一帧的数据', () => {
|
||||
const input = new Float32Array([1, 2, 3]);
|
||||
const frames = frameAudio(input, 4, 2);
|
||||
|
||||
expect(frames.length).toBe(0);
|
||||
});
|
||||
});
|
||||
});
|
||||
25
tsconfig.json
Normal file
25
tsconfig.json
Normal file
@ -0,0 +1,25 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"target": "ES2022",
|
||||
"module": "NodeNext",
|
||||
"moduleResolution": "NodeNext",
|
||||
"lib": ["ES2022", "DOM"],
|
||||
"outDir": "./dist",
|
||||
"rootDir": "./src",
|
||||
"strict": false,
|
||||
"esModuleInterop": true,
|
||||
"skipLibCheck": true,
|
||||
"forceConsistentCasingInFileNames": true,
|
||||
"resolveJsonModule": true,
|
||||
"declaration": true,
|
||||
"declarationMap": true,
|
||||
"sourceMap": true,
|
||||
"noUnusedLocals": false,
|
||||
"noUnusedParameters": false,
|
||||
"noImplicitReturns": false,
|
||||
"noFallthroughCasesInSwitch": false,
|
||||
"allowSyntheticDefaultImports": true
|
||||
},
|
||||
"include": ["src/**/*.ts"],
|
||||
"exclude": ["node_modules", "dist", "test", "src/electron-main.ts", "src/preload.ts"]
|
||||
}
|
||||
11
vitest.config.ts
Normal file
11
vitest.config.ts
Normal file
@ -0,0 +1,11 @@
|
||||
import { defineConfig } from 'vitest/config';
|
||||
|
||||
export default defineConfig({
|
||||
test: {
|
||||
include: ['test/**/*.test.ts'],
|
||||
exclude: ['node_modules', 'dist'],
|
||||
},
|
||||
resolve: {
|
||||
extensions: ['.ts', '.js'],
|
||||
},
|
||||
});
|
||||
Loading…
Reference in New Issue
Block a user