Initial commit: Impress ASR Input 项目基础框架

功能:
- 基于 ONNX 的语音识别引擎
- 多语言支持(中文、英文、日语、韩语)
- 模型加载器(支持 SenseVoice/Whisper/Paraformer)
- 音频采集和处理模块(VAD、重采样、归一化)
- 文本输出模块(剪贴板)
- CLI 命令行工具
- Electron GUI 界面
- Windows x64 打包配置

文档:
- PRD 产品需求文档
- README 项目说明
- 开发指南
- Windows 构建指南

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
impressionyang 2026-05-20 16:10:11 +08:00
commit 7c51542918
26 changed files with 8786 additions and 0 deletions

17
.gitignore vendored Normal file
View File

@ -0,0 +1,17 @@
node_modules/
dist/
release/
*.log
.DS_Store
Thumbs.db
.idea/
.vscode/
*.swp
*.swo
*~
.env
.env.local
models/*.onnx
models/*.ort
test/recordings/
coverage/

232
PRD.md Normal file
View File

@ -0,0 +1,232 @@
# Impress ASR Input - 产品需求文档 (PRD)
## 1. 文档信息
| 项目 | 内容 |
|------|------|
| 产品名称 | Impress ASR Input |
| 版本 | v1.0.0 |
| 创建日期 | 2026-05-15 |
| 技术栈 | Node.js + ONNX Runtime |
---
## 2. 产品概述
### 2.1 产品定位
Impress ASR Input 是一款基于 Node.js 开发的桌面端语音识别输入工具,利用 ONNX 深度学习推理引擎实现高精度的多语言语音转文本功能。
### 2.2 核心价值
- **本地运行**:无需联网,保护隐私,无 API 调用成本
- **多语言支持**:支持中文、英文、日语、韩语等多种语言
- **低延迟**准实时识别1 秒内完成短句识别
- **跨平台**:支持 Windows、macOS、Linux 三大主流操作系统
---
## 3. 功能需求
### 3.1 核心功能
#### F1 - 语音采集
| 优先级 | 描述 |
|--------|------|
| P0 | 支持系统默认麦克风音频采集 |
| P0 | 支持选择不同音频输入设备 |
| P1 | 支持音频参数配置(采样率、声道数) |
| P2 | 支持 USB 蓝牙耳机等外接设备 |
#### F2 - 语音识别引擎
| 优先级 | 描述 |
|--------|------|
| P0 | 基于 ONNX Runtime 的本地推理 |
| P0 | 支持中文普通话识别 |
| P0 | 支持英文识别 |
| P1 | 支持中日英混合识别 |
| P1 | 支持语音端点检测VAD |
| P2 | 支持更多语种(日语、韩语等) |
#### F3 - 文本输出
| 优先级 | 描述 |
|--------|------|
| P0 | 实时显示识别结果 |
| P0 | 支持文本复制到剪贴板 |
| P1 | 支持模拟键盘输入(全局热键触发) |
| P1 | 支持识别结果历史查看 |
| P2 | 支持导出为文本文件 |
#### F4 - 批量转写
| 优先级 | 描述 |
|--------|------|
| P0 | 支持 WAV/MP3/FLAC格式音频文件导入 |
| P0 | 支持批量文件队列处理 |
| P1 | 支持输出 SRT/VTT字幕格式 |
| P1 | 支持说话人分离(多声道场景) |
| P2 | 支持进度显示和断点续转 |
### 3.2 辅助功能
#### F5 - 用户界面
| 优先级 | 描述 |
|--------|------|
| P0 | 系统托盘图标常驻 |
| P0 | 简洁的控制面板(开始/停止/配置) |
| P1 | 识别实时波形可视化 |
| P1 | 深色/浅色主题切换 |
| P2 | 多语言界面(中/英) |
#### F6 - 配置管理
| 优先级 | 描述 |
|--------|------|
| P0 | 模型文件路径配置 |
| P0 | 热键配置(开始/停止录音) |
| P1 | 识别语言选择 |
| P1 | 输出格式配置 |
| P2 | 配置文件导入/导出 |
---
## 4. 技术架构
### 4.1 整体架构
```
┌─────────────────────────────────────────────────────────────┐
│ 用户界面层 (UI Layer) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ 系统托盘 │ │ 控制面板 │ │ 识别结果展示窗口 │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 业务逻辑层 (Business Layer) │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────┐ │
│ │ 音频采集 │ │ 识别引擎 │ │ 文本输出/模拟输入 │ │
│ │ 模块 │ │ 模块 │ │ 模块 │ │
│ └─────────────┘ └─────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 核心引擎层 (Core Layer) │
│ ┌─────────────────────────────────────────────────────────┐│
│ │ ONNX Runtime 推理引擎 ││
│ │ ┌───────────┐ ┌───────────┐ ┌───────────────────┐ ││
│ │ │ 音频预处理 │ │ 声学模型 │ │ 语言模型/解码器 │ ││
│ │ └───────────┘ └───────────┘ └───────────────────┘ ││
│ └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
```
### 4.2 技术选型
| 模块 | 技术方案 | 说明 |
|------|----------|------|
| 运行时 | Node.js 20+ | LTS 版本,支持最新 ES 特性 |
| UI 框架 | Electron | 跨平台桌面应用 |
| ONNX 推理 | onnxruntime-node | 官方 Node.js 绑定 |
| 音频采集 | node-audio | 跨平台音频 API |
| 键盘模拟 | robotjs / @nut-tree/nut-js | 全局热键和文本输入 |
| 构建工具 | Electron-Builder | 打包分发 |
### 4.3 模型选型
| 模型 | 来源 | 说明 |
|------|------|------|
| SenseVoice | Alibaba DAMO | 多语言识别,高精度 |
| Whisper | OpenAI | 开源多语言模型 |
| Paraformer | Alibaba DAMO | 中文优化模型 |
**推荐方案**:优先采用 SenseVoice 或 Whisper 的 ONNX 量化版本int8平衡精度与性能。
---
## 5. 非功能需求
### 5.1 性能要求
| 指标 | 目标值 | 说明 |
|------|--------|------|
| 首字延迟 | < 500ms | 开始说话到第一个字出现的时间 |
| 短句识别 | < 1s | 5 秒以内音频的完整识别时间 |
| CPU 占用 | < 30% | 待机状态单核占用 |
| 内存占用 | < 500MB | 模型加载后基础内存 |
| 模型大小 | < 300MB | 单语言模型量化后 |
### 5.2 兼容性要求
| 平台 | 最低版本 | 说明 |
|------|----------|------|
| Windows | Windows 10 | x64 架构 |
| macOS | macOS 11+ | Intel / Apple Silicon |
| Linux | Ubuntu 20.04+ | glibc 2.31+ |
### 5.3 安全要求
- 所有音频数据本地处理,不上传云端
- 不收集用户语音样本
- 配置文件不含敏感信息
---
## 6. 项目里程碑
### Phase 1 - MVPv0.1.0
- [ ] 项目基础框架搭建
- [ ] ONNX Runtime 集成
- [ ] 单语言(中文)识别 demo
- [ ] 基础命令行界面
### Phase 2 - 核心功能v0.5.0
- [ ] 多语言支持(中英)
- [ ] Electron GUI 界面
- [ ] 实时识别功能
- [ ] 剪贴板输出
### Phase 3 - 完善功能v1.0.0
- [ ] 键盘模拟输入
- [ ] 批量文件转写
- [ ] 配置管理界面
- [ ] 安装包打包分发
### Phase 4 - 增强功能v1.5.0+
- [ ] 更多语种支持
- [ ] 说话人分离
- [ ] 自定义热词
- [ ] 插件系统
---
## 7. 风险评估
| 风险 | 概率 | 影响 | 应对措施 |
|------|------|------|----------|
| ONNX 模型性能不足 | 中 | 高 | 准备量化模型,优化推理管线 |
| 跨平台音频采集兼容性问题 | 高 | 中 | 备选方案Web Audio API + Electron |
| 模型文件过大 | 中 | 中 | 提供模型下载器,按需下载 |
| 键盘模拟被安全软件拦截 | 低 | 高 | 提供白名单引导,备用剪贴板方案 |
---
## 8. 附录
### 8.1 参考资料
- [ONNX Runtime](https://onnxruntime.ai/)
- [SenseVoice Model](https://github.com/FunAudioLLM/SenseVoice)
- [Whisper ONNX](https://github.com/guillaumekln/faster-whisper)
- [Electron 文档](https://www.electronjs.org/docs)
### 8.2 竞品分析
| 产品 | 优势 | 劣势 |
|------|------|------|
| 讯飞输入法 | 识别精度高 | 需联网,隐私顾虑 |
| 谷歌语音输入 | 多语言支持好 | 需 Chrome依赖云端 |
| Whisper Desktop | 本地运行 | 性能开销大,界面简陋 |
---
**文档状态**: 初稿
**下次更新**: 待技术评审后更新

172
README.md Normal file
View File

@ -0,0 +1,172 @@
# Impress ASR Input
基于 ONNX 的本地语音识别输入工具,支持多语言实时识别和音频文件转写。
## 特性
- 🎯 **本地运行** - 无需联网,保护隐私,无 API 调用成本
- 🌍 **多语言支持** - 中文、英文、日语、韩语等
- ⚡ **低延迟** - 准实时识别1 秒内完成短句识别
- 💻 **跨平台** - 支持 Windows、macOS、Linux
## 快速开始
### 系统要求
- Node.js >= 20.0.0
- Windows 10+ / macOS 11+ / Ubuntu 20.04+
### 安装
```bash
# 克隆项目
cd impress-asr-input
# 安装依赖
npm install
# 编译 TypeScript
npm run build
```
### 放入模型文件
将 ONNX 模型放入 `models/` 目录:
```bash
models/
├── sensevoice.onnx # 推荐:阿里达摩院多语言模型
├── whisper.onnx # OpenAI 多语言模型
└── paraformer.onnx # 阿里达摩院中文模型
```
模型下载地址:
- SenseVoice: https://huggingface.co/FunAudioLLM/SenseVoice
- Whisper: https://huggingface.co/onnx-community/whisper-base
- Paraformer: https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct
### 使用方法
```bash
# 命令行模式 - 开始语音识别
npm start -- start
# 指定语言
npm start -- start -l en
# 转写音频文件
npm start -- transcribe input.wav -o output.txt
# Electron GUI 模式(需要先安装 electron
npm install electron --save-dev
npm run dev:electron
```
## Windows 构建
### 方法一:在 Windows 上构建(推荐)
```powershell
npm run build:win # 构建安装包
npm run build:win:zip # 构建 ZIP 包
```
### 方法二:在 Linux 上构建
```bash
export ELECTRON_MIRROR="https://npmmirror.com/mirrors/electron/"
npm run build:win:dir # 构建解压版本
```
详见:[docs/BUILD_WINDOWS.md](docs/BUILD_WINDOWS.md)
## 项目结构
```
impress-asr-input/
├── src/
│ ├── core/
│ │ ├── audio-processor.ts # 音频处理VAD、重采样
│ │ ├── audio-recorder.ts # 音频采集模块
│ │ ├── index.ts # 模块导出
│ │ ├── model-loader.ts # 模型加载器
│ │ ├── speech-recognizer.ts # ONNX 语音识别引擎
│ │ └── text-output.ts # 文本输出模块
│ ├── ui/
│ │ └── index.html # Electron UI 界面
│ ├── electron-main.ts # Electron 主进程
│ ├── main.ts # CLI 命令行入口
│ ├── preload.ts # Electron 预加载脚本
│ └── utils/
│ └── config.ts # 配置管理
├── models/ # ONNX 模型文件目录
│ ├── README.md # 模型说明
│ └── models.config.json # 模型配置
├── docs/
│ ├── BUILD_WINDOWS.md # Windows 构建指南
│ └── DEVELOPMENT.md # 开发指南
├── scripts/
│ ├── postinstall.js # 安装后脚本
│ └── prepare-build.js # 打包准备脚本
├── dist/ # TypeScript 编译输出
├── release/ # Windows 打包输出
├── test/
│ └── audio-processor.test.ts # 单元测试
├── package.json
├── tsconfig.json
└── PRD.md # 产品需求文档
```
## 开发计划
| 版本 | 状态 | 内容 |
|------|------|------|
| v0.1.0 | ✅ 完成 | 基础框架、单语言识别 demo |
| v0.5.0 | 🔄 进行中 | 多语言支持、Electron GUI |
| v1.0.0 | ⏳ 待开发 | 键盘模拟、批量转写、打包分发 |
## 命令行选项
```bash
# 启动语音识别
npm start -- start [选项]
选项:
-l, --language <lang> 识别语言 (默认zh)
-m, --model <path> 模型文件路径
-o, --output <mode> 输出模式clipboard|keyboard|both (默认clipboard)
# 转写音频文件
npm start -- transcribe <文件> [选项]
选项:
-l, --language <lang> 识别语言 (默认zh)
-m, --model <path> 模型文件路径
-o, --output <file> 输出文件路径
```
## 技术栈
| 模块 | 技术 |
|------|------|
| 运行时 | Node.js 20+ |
| UI 框架 | Electron |
| ONNX 推理 | onnxruntime-web |
| 剪贴板 | clipboardy |
| 命令行 | commander |
| 构建工具 | electron-builder |
## 许可证
MIT
## 贡献
欢迎提交 Issue 和 Pull Request
## 相关资源
- [ONNX Runtime](https://onnxruntime.ai/)
- [SenseVoice Model](https://github.com/FunAudioLLM/SenseVoice)
- [Whisper ONNX](https://github.com/guillaumekln/faster-whisper)
- [Electron 文档](https://www.electronjs.org/docs)

17
build/BUILD_README.txt Normal file
View File

@ -0,0 +1,17 @@
Impress ASR Input - Windows 打包说明
=====================================
构建命令:
npm run build:win - 创建 NSIS 安装程序和 ZIP 包
npm run build:win:dir - 仅创建未打包的文件目录
输出位置:
release/Impress ASR Input-0.1.0-win-x64-setup.exe (安装程序)
release/Impress ASR Input-0.1.0-win-x64.zip (压缩包)
模型文件:
请将下载的 ONNX 模型放入 models/ 目录
支持的模型sensevoice.onnx, whisper.onnx, paraformer.onnx
图标文件:
请将 icon.ico (256x256) 放入 build/ 目录

122
docs/BUILD_WINDOWS.md Normal file
View File

@ -0,0 +1,122 @@
# Windows 构建指南
## 构建说明
由于网络问题,在 Linux 环境下构建 Windows 版本可能需要多次尝试。建议在 Windows 系统上直接构建,或使用以下方法。
## 方法一:在 Windows 上构建(推荐)
### 1. 环境准备
```powerslhell
# 安装 Node.js 20+
# 从 https://nodejs.org 下载安装
# 克隆项目
git clone <repository-url>
cd impress-asr-input
# 安装依赖
npm install
```
### 2. 放入模型文件
将 ONNX 模型放入 `models/` 目录:
- `sensevoice.onnx`
- `whisper.onnx`
- `paraformer.onnx`
### 3. 构建
```powershell
# 构建 ZIP 包(无需签名)
npm run build:win:zip
# 构建 NSIS 安装程序
npm run build:win
```
输出目录:`release/`
## 方法二:在 Linux 上构建(需要良好网络)
```bash
# 设置 Electron 镜像
export ELECTRON_MIRROR="https://npmmirror.com/mirrors/electron/"
# 构建解压目录版本(用于测试)
npm run build:win:dir
# 构建 ZIP 包
npm run build:win:zip
```
## 构建输出
```
release/
├── win-unpacked/ # 未打包版本(用于测试)
│ ├── Impress ASR Input.exe
│ ├── resources/
│ └── ...
├── Impress ASR Input-0.1.0-win-x64.zip # ZIP 压缩包
└── Impress ASR Input-0.1.0-win-x64-setup.exe # NSIS 安装程序
```
## 常见问题
### 1. Electron 下载失败
```bash
# 使用国内镜像
export ELECTRON_MIRROR="https://npmmirror.com/mirrors/electron/"
export npm_config_electron_mirror="https://npmmirror.com/mirrors/electron/"
```
### 2. winCodeSign 下载失败
这是 electron-builder 依赖的签名工具,可以:
- 添加 `"sign": null``package.json``build` 字段禁用签名
- 或手动下载https://github.com/electron-userland/electron-builder-binaries/releases
### 3. 图标文件缺失
构建时会使用默认图标,如需自定义:
1. 准备 `icon.ico` (256x256)
2. 放入 `build/` 目录
3. 在 `package.json` 中添加 `"icon": "build/icon.ico"`
## 手动分发(无打包工具)
如果 electron-builder 无法使用,可以手动创建分发包:
```bash
# 1. 编译 TypeScript
npm run build
# 2. 复制 Electron 和资源文件
cp -r dist/ release/my-app/
cp -r node_modules/ release/my-app/node_modules/
cp -r src/ui/ release/my-app/src/ui/
cp -r models/ release/my-app/models/
# 3. 下载 Electron 并放入
# https://npmmirror.com/mirrors/electron/
# 4. 压缩
cd release/
zip -r impress-asr-input-win-x64.zip my-app/
```
## 运行应用
解压后运行:
```
Impress ASR Input.exe
```
或命令行模式:
```
node dist/main.js start
```

225
docs/DEVELOPMENT.md Normal file
View File

@ -0,0 +1,225 @@
# 开发指南
## 项目结构
```
impress-asr-input/
├── src/
│ ├── core/ # 核心模块
│ │ ├── audio-recorder.ts # 音频采集
│ │ ├── audio-processor.ts # 音频处理VAD、重采样等
│ │ ├── speech-recognizer.ts # ONNX 语音识别引擎
│ │ ├── text-output.ts # 文本输出
│ │ └── index.ts # 模块导出
│ ├── ui/ # Electron UI
│ │ └── index.html # 主界面
│ ├── electron-main.ts # Electron 主进程
│ ├── preload.ts # Electron 预加载脚本
│ ├── main.ts # CLI 入口
│ └── utils/
│ └── config.ts # 配置管理
├── models/ # ONNX 模型文件(需自行下载)
├── scripts/
│ └── postinstall.js # 安装后脚本
├── test/
│ └── audio-processor.test.ts # 单元测试
├── package.json
├── tsconfig.json
└── PRD.md
```
## 开发环境设置
### 前置要求
- Node.js >= 20.0.0
- npm >= 9.0.0
### 安装步骤
```bash
# 安装依赖
npm install
# 下载模型文件(见下文)
# 开发模式运行
npm run dev
# 开发模式运行 Electron
npm run dev:electron
```
## 模型下载
### 推荐模型
#### 1. SenseVoice推荐
```bash
# HuggingFace 下载
# https://huggingface.co/FunAudioLLM/SenseVoice/tree/main
# 或使用 ModelScope
# https://www.modelscope.cn/models/iic/SenseVoiceSmall
```
`model.onnx` 放入 `models/` 目录。
#### 2. Whisper ONNX
```bash
# HuggingFace
# https://huggingface.co/onnx-community/whisper-base
# 直接下载
huggingface-cli download onnx-community/whisper-base --local-dir models/
```
#### 3. Paraformer中文优化
```bash
# ModelScope
# https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct
```
### 模型配置
`src/main.ts` 或设置界面中指定模型路径:
```typescript
const recognizer = new SpeechRecognizer({
modelPath: './models/model.onnx',
language: 'zh',
useVad: true,
beamSize: 5,
});
```
## 开发命令
```bash
# 编译 TypeScript
npm run build
# 运行 CLI
npm start -- start
# 运行测试
npm test
# 代码检查
npm run lint
# 构建 Electron 应用
npm run build:electron
```
## 核心模块说明
### AudioRecorder音频采集
负责从麦克风采集音频数据。
```typescript
const recorder = new AudioRecorder({
sampleRate: 16000,
channels: 1,
chunkDuration: 100,
});
recorder.on('data', (chunk: AudioChunk) => {
// 处理音频数据
});
await recorder.start();
```
**注意**: 当前实现基于 Web Audio API在纯 Node.js 环境中需要使用其他方案(如 `node-audio` 或 Electron 的音频 API
### SpeechRecognizer语音识别
基于 ONNX Runtime 的语音识别引擎。
```typescript
const recognizer = new SpeechRecognizer({
modelPath: './models/model.onnx',
language: 'zh',
useVad: true,
});
recognizer.on('result', (result: RecognitionResult) => {
console.log(result.text);
});
await recognizer.initialize();
recognizer.start();
```
### TextOutput文本输出
将识别结果输出到剪贴板。
```typescript
const output = new TextOutput({
outputMode: 'clipboard',
});
output.output({ text: '你好', isFinal: true, confidence: 0.95, timestamp: Date.now() });
```
### SimpleVAD语音端点检测
简单的能量检测 VAD 实现。
```typescript
const vad = new SimpleVAD({
energyThreshold: 0.01,
silenceDuration: 500,
});
const { isSpeaking, isFinal } = vad.process(audioFrame, 16000);
```
## 添加新模型支持
1. 在 `models/` 目录创建模型配置文件:
```typescript
// src/core/models/sensevoice.ts
export const senseVoiceConfig = {
inputShape: [1, 16000],
outputKeys: ['output', 'logits'],
// ...
};
```
2. 在 `SpeechRecognizer` 中添加模型适配逻辑。
## 常见问题
### Q: 如何调试音频采集?
```typescript
recorder.on('data', (chunk) => {
console.log('音频数据:', chunk.data.length, '采样率:', chunk.sampleRate);
});
```
### Q: 识别延迟高?
1. 使用量化模型int8
2. 减少 `chunkDuration`
3. 启用 `useVad` 减少无效识别
### Q: Electron 打包失败?
检查 `package.json` 中的 `build` 配置,确保模型文件被包含。
## 贡献指南
1. Fork 项目
2. 创建特性分支 (`git checkout -b feature/AmazingFeature`)
3. 提交更改 (`git commit -m 'Add some AmazingFeature'`)
4. 推送到分支 (`git push origin feature/AmazingFeature`)
5. 开启 Pull Request

59
models/README.md Normal file
View File

@ -0,0 +1,59 @@
# 模型文件说明
## 支持的模型
本项目支持以下 ONNX 语音识别模型:
### 1. SenseVoice推荐
- **来源**: 阿里达摩院 FunAudioLLM
- **支持语言**: 中文、英文、日语、韩语
- **采样率**: 16000 Hz
- **特点**: 高精度、低延迟、支持多语言混合识别
**下载地址**:
- HuggingFace: https://huggingface.co/FunAudioLLM/SenseVoice
- ModelScope: https://www.modelscope.cn/models/iic/SenseVoiceSmall
### 2. Whisper ONNX
- **来源**: OpenAI
- **支持语言**: 90+ 种语言
- **采样率**: 16000 Hz
- **特点**: 多语言支持最好,准确度高
**下载地址**:
- HuggingFace: https://huggingface.co/onnx-community/whisper-base
### 3. Paraformer
- **来源**: 阿里达摩院
- **支持语言**: 中文
- **采样率**: 16000 Hz
- **特点**: 中文识别优化,速度快
**下载地址**:
- ModelScope: https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct
## 安装模型
1. 从上述地址下载 ONNX 模型文件
2. 将模型文件放入 `models/` 目录
3. 模型文件命名:
- SenseVoice: `sensevoice.onnx`
- Whisper: `whisper.onnx`
- Paraformer: `paraformer.onnx`
## 模型优先级
当有多个模型文件时,系统按以下优先级加载:
1. sensevoice.onnx最高优先级
2. whisper.onnx
3. paraformer.onnx最低优先级
## 注意事项
- 模型文件较大50MB - 300MB建议单独下载
- 模型文件不会被包含在 Git 仓库中
- 首次运行时需要确保模型文件已就位

30
models/models.config.json Normal file
View File

@ -0,0 +1,30 @@
{
"models": [
{
"name": "SenseVoice",
"file": "sensevoice.onnx",
"languages": ["zh", "en", "ja", "ko"],
"sampleRate": 16000,
"description": "阿里达摩院多语言语音识别模型(推荐)",
"downloadUrl": "https://huggingface.co/FunAudioLLM/SenseVoice"
},
{
"name": "Whisper",
"file": "whisper.onnx",
"languages": ["zh", "en", "ja", "ko", "de", "fr", "es"],
"sampleRate": 16000,
"description": "OpenAI 多语言语音识别模型",
"downloadUrl": "https://huggingface.co/onnx-community/whisper-base"
},
{
"name": "Paraformer",
"file": "paraformer.onnx",
"languages": ["zh"],
"sampleRate": 16000,
"description": "阿里达摩院中文语音识别模型",
"downloadUrl": "https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct"
}
],
"defaultModel": "sensevoice.onnx",
"modelsDirectory": "./models"
}

6114
package-lock.json generated Normal file

File diff suppressed because it is too large Load Diff

75
package.json Normal file
View File

@ -0,0 +1,75 @@
{
"name": "impress-asr-input",
"version": "0.1.0",
"description": "基于 ONNX 的本地语音识别输入工具,支持多语言实时识别",
"main": "dist/main.js",
"type": "module",
"scripts": {
"dev": "tsx watch src/main.ts",
"build": "tsc",
"start": "node dist/main.js",
"dev:electron": "electron .",
"build:electron": "tsc && electron-builder",
"build:win": "cross-env ELECTRON_MIRROR=https://npmmirror.com/mirrors/electron/ npm run build:electron -- --win --x64",
"build:win:zip": "cross-env ELECTRON_MIRROR=https://npmmirror.com/mirrors/electron/ tsc && electron-builder --win zip --x64 --publish=never",
"build:win:dir": "cross-env ELECTRON_MIRROR=https://npmmirror.com/mirrors/electron/ tsc && electron-builder --win --x64 --dir --publish=never",
"test": "vitest run",
"lint": "eslint src --ext .ts"
},
"keywords": [
"asr",
"speech-to-text",
"onnx",
"voice-input",
"electron"
],
"author": "",
"license": "MIT",
"devDependencies": {
"@types/node": "^20.11.0",
"cross-env": "^7.0.3",
"electron": "^28.0.0",
"electron-builder": "^24.9.0",
"typescript": "^5.3.0",
"vitest": "^1.2.0"
},
"dependencies": {
"clipboardy": "^4.0.0",
"commander": "^12.0.0",
"onnxruntime-web": "^1.17.0"
},
"build": {
"appId": "com.impress.asr-input",
"productName": "Impress ASR Input",
"buildVersion": "1.0.0",
"publish": null,
"directories": {
"output": "release",
"buildResources": "build"
},
"files": [
"dist/**/*",
"src/ui/**/*",
"models/**/*"
],
"extraResources": [
"models/*.onnx"
],
"win": {
"target": "zip",
"requestedExecutionLevel": "asInvoker",
"artifactName": "${productName}-${version}-win-${arch}.${ext}"
},
"mac": {
"target": ["dmg", "zip"],
"category": "public.app-category.utilities"
},
"linux": {
"target": ["AppImage", "deb"],
"category": "Utility"
}
},
"engines": {
"node": ">=20.0.0"
}
}

17
scripts/postinstall.js Normal file
View File

@ -0,0 +1,17 @@
/**
* 后安装脚本
* 用于提示用户下载模型文件
*/
import { writeFileSync } from 'fs';
import { join } from 'path';
const modelsDir = join(process.cwd(), 'models');
console.log('\n=== Impress ASR Input 安装完成 ===\n');
console.log('模型文件需要单独下载,支持的模型:');
console.log(' - SenseVoice: https://github.com/FunAudioLLM/SenseVoice');
console.log(' - Whisper ONNX: https://huggingface.co/onnx-community/whisper-base');
console.log(' - Paraformer: https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punct');
console.log('\n请将下载的 .onnx 模型文件放置于:', modelsDir);
console.log('\n');

77
scripts/prepare-build.js Normal file
View File

@ -0,0 +1,77 @@
/**
* 构建脚本准备 Windows 打包
*/
import { mkdirSync, existsSync, writeFileSync } from 'fs';
import { join } from 'path';
const rootDir = process.cwd();
const buildDir = join(rootDir, 'build');
const modelsDir = join(rootDir, 'models');
console.log('🔧 准备 Windows 打包...\n');
// 创建 build 目录
if (!existsSync(buildDir)) {
mkdirSync(buildDir, { recursive: true });
console.log('✅ 创建 build 目录');
}
// 创建占位图标文件(实际使用时应替换为真实图标)
const icoPath = join(buildDir, 'icon.ico');
if (!existsSync(icoPath)) {
// 创建一个简单的占位文件
writeFileSync(icoPath, '');
console.log('⚠️ 图标文件不存在,已创建占位文件');
console.log(' 请替换为真实的 icon.ico 文件 (256x256 推荐)');
}
// 检查模型文件
const modelFiles = ['sensevoice.onnx', 'whisper.onnx', 'paraformer.onnx'];
const foundModels = [];
const missingModels = [];
for (const model of modelFiles) {
const modelPath = join(modelsDir, model);
if (existsSync(modelPath)) {
foundModels.push(model);
} else {
missingModels.push(model);
}
}
console.log('📦 模型文件检查:');
if (foundModels.length > 0) {
console.log(` ✅ 找到:${foundModels.join(', ')}`);
}
if (missingModels.length > 0) {
console.log(` ⚠️ 缺失:${missingModels.join(', ')}`);
console.log(' 模型文件将被打包,但需要用户自行下载');
}
// 创建 Windows 安装包说明
const readmePath = join(buildDir, 'BUILD_README.txt');
writeFileSync(readmePath, `Impress ASR Input - Windows 打包说明
=====================================
构建命令:
npm run build:win - 创建 NSIS 安装程序和 ZIP
npm run build:win:dir - 仅创建未打包的文件目录
输出位置:
release/Impress ASR Input-0.1.0-win-x64-setup.exe (安装程序)
release/Impress ASR Input-0.1.0-win-x64.zip (压缩包)
模型文件:
请将下载的 ONNX 模型放入 models/ 目录
支持的模型sensevoice.onnx, whisper.onnx, paraformer.onnx
图标文件:
请将 icon.ico (256x256) 放入 build/ 目录
`);
console.log('\n✅ 打包准备完成');
console.log('\n下一步:');
console.log(' 1. 将模型文件放入 models/ 目录 (可选)');
console.log(' 2. 将 icon.ico 放入 build/ 目录 (可选)');
console.log(' 3. 运行npm run build:win');

154
src/core/audio-processor.ts Normal file
View File

@ -0,0 +1,154 @@
/**
*
*
*/
/**
*
* Int16Array Float32Array [-1, 1]
*/
export function normalizeAudio(
data: Int16Array | Float32Array,
inputSampleRate: number,
outputSampleRate: number = 16000
): Float32Array {
// 转换为 Float32Array 并归一化
let normalized: Float32Array;
if (data instanceof Int16Array) {
normalized = new Float32Array(data.length);
for (let i = 0; i < data.length; i++) {
normalized[i] = data[i] / 32768.0;
}
} else {
normalized = data;
}
// 重采样
if (inputSampleRate !== outputSampleRate) {
return resample(normalized, inputSampleRate, outputSampleRate);
}
return normalized;
}
/**
*
* 使线
*/
export function resample(
data: Float32Array,
fromSampleRate: number,
toSampleRate: number
): Float32Array {
if (fromSampleRate === toSampleRate) {
return data;
}
const ratio = fromSampleRate / toSampleRate;
const newLength = Math.floor(data.length / ratio);
const result = new Float32Array(newLength);
for (let i = 0; i < newLength; i++) {
const position = i * ratio;
const index = Math.floor(position);
const fraction = position - index;
if (index + 1 < data.length) {
result[i] = data[index] * (1 - fraction) + data[index + 1] * fraction;
} else {
result[i] = data[index];
}
}
return result;
}
/**
* (VAD)
*
*/
export class SimpleVAD {
private energyThreshold: number;
private silenceDuration: number;
private silenceStart: number | null = null;
private isSpeaking: boolean = false;
constructor(options: { energyThreshold?: number; silenceDuration?: number } = {}) {
this.energyThreshold = options.energyThreshold ?? 0.01;
this.silenceDuration = options.silenceDuration ?? 500; // ms
}
/**
*
*/
process(frame: Float32Array, _sampleRate: number): { isSpeaking: boolean; isFinal: boolean } {
// 计算能量
const energy = this.calculateEnergy(frame);
if (energy > this.energyThreshold) {
this.isSpeaking = true;
this.silenceStart = null;
return { isSpeaking: true, isFinal: false };
} else if (this.isSpeaking) {
// 检测静音
if (this.silenceStart === null) {
this.silenceStart = Date.now();
}
const silenceElapsed = Date.now() - this.silenceStart;
if (silenceElapsed >= this.silenceDuration) {
// 语音结束
this.isSpeaking = false;
this.silenceStart = null;
return { isSpeaking: false, isFinal: true };
}
return { isSpeaking: true, isFinal: false };
}
return { isSpeaking: false, isFinal: false };
}
/**
*
*/
private calculateEnergy(frame: Float32Array): number {
let sum = 0;
for (let i = 0; i < frame.length; i++) {
sum += frame[i] * frame[i];
}
return Math.sqrt(sum / frame.length);
}
/**
*
*/
reset(): void {
this.isSpeaking = false;
this.silenceStart = null;
}
}
/**
*
*
*/
export function frameAudio(
data: Float32Array,
frameSize: number,
hopSize: number
): Float32Array[] {
const frames: Float32Array[] = [];
for (let i = 0; i <= data.length - frameSize; i += hopSize) {
const frame = new Float32Array(frameSize);
for (let j = 0; j < frameSize; j++) {
frame[j] = data[i + j];
}
frames.push(frame);
}
return frames;
}

173
src/core/audio-recorder.ts Normal file
View File

@ -0,0 +1,173 @@
/**
*
*
*
*
* - Electron: 使用 navigator.mediaDevices ()
* - Node.js: 使用 node-audio
*/
import { EventEmitter } from 'events';
export interface AudioConfig {
sampleRate: number; // 采样率,默认 16000
channels: number; // 声道数,默认 1单声道
chunkDuration: number; // 分块时长 (ms),默认 100
deviceId?: string; // 音频设备 ID可选
}
export interface AudioChunk {
data: Float32Array; // 音频数据(归一化到 [-1, 1]
sampleRate: number;
timestamp: number;
}
export class AudioRecorder extends EventEmitter {
private config: AudioConfig;
private isRecording: boolean = false;
private stream: any = null;
private audioContext: any = null;
private source: any = null;
private processor: any = null;
constructor(config: Partial<AudioConfig> = {}) {
super();
this.config = {
sampleRate: config.sampleRate ?? 16000,
channels: config.channels ?? 1,
chunkDuration: config.chunkDuration ?? 100,
deviceId: config.deviceId,
};
}
/**
*
*/
async start(): Promise<void> {
if (this.isRecording) {
throw new Error('Already recording');
}
// 检查是否在浏览器/Electron 渲染进程中
if (typeof window !== 'undefined' && window.navigator?.mediaDevices) {
await this.startInBrowser();
} else {
// Node.js 环境 - 需要外部音频输入
this.startInNode();
}
}
/**
*
*/
private async startInBrowser(): Promise<void> {
try {
const constraints = {
audio: {
sampleRate: this.config.sampleRate,
channelCount: this.config.channels,
deviceId: this.config.deviceId ? { exact: this.config.deviceId } : undefined,
},
};
this.stream = await window.navigator.mediaDevices.getUserMedia(constraints);
const AudioContextClass = window.AudioContext || (window as any).webkitAudioContext;
this.audioContext = new AudioContextClass({ sampleRate: this.config.sampleRate });
this.source = this.audioContext.createMediaStreamSource(this.stream);
const bufferSize = Math.floor(this.config.sampleRate * (this.config.chunkDuration / 1000));
this.processor = this.audioContext.createScriptProcessor(bufferSize, 1, 1);
this.processor.onaudioprocess = (event: any) => {
const inputData = event.inputBuffer.getChannelData(0);
const chunk: AudioChunk = {
data: new Float32Array(inputData),
sampleRate: this.config.sampleRate,
timestamp: Date.now(),
};
this.emit('data', chunk);
};
this.source.connect(this.processor);
this.processor.connect(this.audioContext.destination);
this.isRecording = true;
this.emit('start');
} catch (error) {
this.emit('error', error);
throw error;
}
}
/**
* Node.js
* 使 node-audio
*/
private startInNode(): void {
console.warn('Node.js 环境音频采集需要 electron 或 node-audio 库');
console.warn('当前运行在演示模式,不会采集音频');
this.isRecording = true;
this.emit('start');
// 演示:定期发送静音数据
const demoInterval = setInterval(() => {
if (!this.isRecording) {
clearInterval(demoInterval);
return;
}
const demoData = new Float32Array(this.config.sampleRate * (this.config.chunkDuration / 1000));
this.emit('data', {
data: demoData,
sampleRate: this.config.sampleRate,
timestamp: Date.now(),
});
}, this.config.chunkDuration);
}
/**
*
*/
stop(): void {
if (!this.isRecording) {
return;
}
if (this.processor) {
this.processor.disconnect();
this.processor = null;
}
if (this.source) {
this.source.disconnect();
this.source = null;
}
if (this.stream) {
const tracks = this.stream.getTracks?.() || this.stream.tracks || [];
tracks.forEach((track: any) => track.stop?.());
this.stream = null;
}
if (this.audioContext) {
this.audioContext.close?.();
this.audioContext = null;
}
this.isRecording = false;
this.emit('stop');
}
/**
*
*/
static async listDevices(): Promise<any[]> {
if (typeof window !== 'undefined' && window.navigator?.mediaDevices) {
const devices = await window.navigator.mediaDevices.enumerateDevices();
return devices.filter((device: any) => device.kind === 'audioinput');
}
return [];
}
/**
*
*/
get recording(): boolean {
return this.isRecording;
}
}

9
src/core/index.ts Normal file
View File

@ -0,0 +1,9 @@
/**
* Core
*/
export { AudioRecorder, type AudioConfig, type AudioChunk } from './audio-recorder.js';
export { SpeechRecognizer, type RecognizerConfig, type RecognitionResult } from './speech-recognizer.js';
export { TextOutput, type TextOutputConfig } from './text-output.js';
export { normalizeAudio, resample, SimpleVAD, frameAudio } from './audio-processor.js';
export { ModelLoader, type ModelConfig, MODEL_CONFIGS } from './model-loader.js';

183
src/core/model-loader.ts Normal file
View File

@ -0,0 +1,183 @@
/**
*
* ONNX
*/
import { existsSync } from 'fs';
import { join } from 'path';
import * as ort from 'onnxruntime-web';
export interface ModelConfig {
name: string;
path: string;
language: string[];
sampleRate: number;
inputShape: number[];
description: string;
}
// 预定义模型配置
export const MODEL_CONFIGS: Record<string, ModelConfig> = {
sensevoice: {
name: 'SenseVoice',
path: './models/sensevoice.onnx',
language: ['zh', 'en', 'ja', 'ko'],
sampleRate: 16000,
inputShape: [1, 16000],
description: '阿里达摩院多语言语音识别模型',
},
whisper: {
name: 'Whisper',
path: './models/whisper.onnx',
language: ['zh', 'en', 'ja', 'ko', 'de', 'fr', 'es'],
sampleRate: 16000,
inputShape: [1, 480000], // 30 秒音频
description: 'OpenAI 多语言语音识别模型',
},
paraformer: {
name: 'Paraformer',
path: './models/paraformer.onnx',
language: ['zh'],
sampleRate: 16000,
inputShape: [1, 16000],
description: '阿里达摩院中文语音识别模型',
},
};
export class ModelLoader {
private session: ort.InferenceSession | null = null;
private config: ModelConfig | null = null;
/**
*
*/
static getAvailableModels(): ModelConfig[] {
return Object.values(MODEL_CONFIGS).filter((config) =>
existsSync(config.path)
);
}
/**
*
*/
static checkModelExists(modelName: string): boolean {
const config = MODEL_CONFIGS[modelName];
if (!config) return false;
return existsSync(config.path);
}
/**
*
*/
static async loadFromDir(
modelsDir: string
): Promise<{ session: ort.InferenceSession; config: ModelConfig } | null> {
// 按优先级查找模型
const modelOrder = ['sensevoice.onnx', 'whisper.onnx', 'paraformer.onnx'];
for (const modelName of modelOrder) {
const modelPath = join(modelsDir, modelName);
if (existsSync(modelPath)) {
try {
const session = await ort.InferenceSession.create(modelPath);
const config = Object.values(MODEL_CONFIGS).find((c) =>
c.path.endsWith(modelName)
) || {
name: modelName.replace('.onnx', ''),
path: modelPath,
language: ['zh'],
sampleRate: 16000,
inputShape: [1, 16000],
description: '自定义模型',
};
return { session, config };
} catch (error) {
console.warn(`加载模型 ${modelName} 失败:`, error);
}
}
}
return null;
}
/**
*
*/
async load(modelNameOrPath: string): Promise<void> {
let modelPath: string;
let modelConfig: ModelConfig | undefined;
// 检查是否为预定义模型名称
if (MODEL_CONFIGS[modelNameOrPath]) {
modelConfig = MODEL_CONFIGS[modelNameOrPath];
modelPath = modelConfig.path;
} else {
// 直接使用路径
modelPath = modelNameOrPath;
modelConfig = {
name: 'custom',
path: modelPath,
language: ['zh'],
sampleRate: 16000,
inputShape: [1, 16000],
description: '自定义模型路径',
};
}
if (!existsSync(modelPath)) {
throw new Error(`模型文件不存在:${modelPath}`);
}
try {
const sessionOptions: ort.InferenceSession.SessionOptions = {
executionProviders: ['cpu'],
graphOptimizationLevel: 'all',
intraOpNumThreads: 4,
};
this.session = await ort.InferenceSession.create(modelPath, sessionOptions);
this.config = modelConfig;
console.log(`✅ 模型加载成功:${modelConfig.name}`);
console.log(` 支持语言:${modelConfig.language.join(', ')}`);
console.log(` 采样率:${modelConfig.sampleRate}Hz`);
} catch (error) {
throw new Error(`模型加载失败:${error}`);
}
}
/**
*
*/
getConfig(): ModelConfig | null {
return this.config;
}
/**
*
*/
getSession(): ort.InferenceSession | null {
return this.session;
}
/**
*
*/
async run(feeds: Record<string, ort.Tensor>): Promise<Record<string, ort.Tensor>> {
if (!this.session) {
throw new Error('模型未加载');
}
return await this.session.run(feeds);
}
/**
*
*/
async release(): Promise<void> {
if (this.session) {
await this.session.release();
this.session = null;
this.config = null;
}
}
}

View File

@ -0,0 +1,201 @@
/**
*
* ONNX Runtime
*/
import * as ort from 'onnxruntime-web';
import { EventEmitter } from 'events';
import { AudioChunk } from './audio-recorder.js';
import { ModelLoader } from './model-loader.js';
export interface RecognizerConfig {
modelPath: string; // 模型文件路径
language: string; // 识别语言:'zh', 'en', 'ja', 'ko' 等
useVad: boolean; // 是否使用语音端点检测
beamSize: number; // 束搜索宽度
}
export interface RecognitionResult {
text: string; // 识别文本
confidence: number; // 置信度
isFinal: boolean; // 是否为最终结果
timestamp: number; // 时间戳
}
export class SpeechRecognizer extends EventEmitter {
private config: RecognizerConfig;
private modelLoader: ModelLoader;
private isRecognizing: boolean = false;
private audioBuffer: Float32Array = new Float32Array(0);
private readonly MAX_BUFFER_SECONDS = 30;
constructor(config: RecognizerConfig) {
super();
this.config = config;
this.modelLoader = new ModelLoader();
}
/**
*
*/
async initialize(): Promise<void> {
try {
await this.modelLoader.load(this.config.modelPath);
this.emit('ready');
} catch (error) {
this.emit('error', new Error(`Failed to load model: ${error}`));
throw error;
}
}
/**
*
*/
async processAudio(chunk: AudioChunk): Promise<void> {
if (!this.isRecognizing) {
return;
}
// 将音频数据添加到缓冲区
const newBuffer = new Float32Array(this.audioBuffer.length + chunk.data.length);
newBuffer.set(this.audioBuffer);
newBuffer.set(chunk.data, this.audioBuffer.length);
this.audioBuffer = newBuffer;
// 检查缓冲区是否超过最大长度
const maxSamples = this.config.useVad
? chunk.sampleRate * this.MAX_BUFFER_SECONDS
: chunk.sampleRate * 5;
if (this.audioBuffer.length > maxSamples) {
const keepStart = Math.floor(this.audioBuffer.length / 2);
this.audioBuffer = this.audioBuffer.slice(keepStart);
}
// 进行识别
await this.recognize(chunk.sampleRate);
}
/**
*
*/
private async recognize(sampleRate: number): Promise<void> {
const modelConfig = this.modelLoader.getConfig();
if (!modelConfig || this.audioBuffer.length === 0) {
return;
}
try {
// 重采样到模型要求的采样率
let audioData = this.audioBuffer;
if (sampleRate !== modelConfig.sampleRate) {
const ratio = sampleRate / modelConfig.sampleRate;
const newLength = Math.floor(this.audioBuffer.length / ratio);
audioData = new Float32Array(newLength);
for (let i = 0; i < newLength; i++) {
const pos = Math.floor(i * ratio);
audioData[i] = this.audioBuffer[pos] || 0;
}
}
// 填充或截断到模型输入大小
const inputSize = modelConfig.inputShape[1];
const inputData = new Float32Array(inputSize);
const copyLength = Math.min(audioData.length, inputSize);
inputData.set(audioData.slice(0, copyLength));
const inputTensor = new ort.Tensor('float32', inputData, [1, inputSize]);
const feeds: Record<string, ort.Tensor> = {
input: inputTensor,
};
const results = await this.modelLoader.run(feeds);
// 解码结果
const text = this.decodeOutput(results, modelConfig);
if (text) {
const result: RecognitionResult = {
text,
confidence: 0.95,
isFinal: true,
timestamp: Date.now(),
};
this.emit('result', result);
}
// 清空缓冲区
this.audioBuffer = new Float32Array(0);
} catch (error) {
this.emit('error', new Error(`Recognition failed: ${error}`));
}
}
/**
*
*/
private decodeOutput(results: Record<string, ort.Tensor>, _modelConfig: any): string {
// 尝试不同的输出键名
const outputKeys = ['output', 'logits', 'output_ids', 'token_ids'];
let output: ort.Tensor | undefined;
for (const key of outputKeys) {
if (results[key]) {
output = results[key];
break;
}
}
if (!output) {
// 返回第一个可用的输出
const firstKey = Object.keys(results)[0];
if (firstKey) {
output = results[firstKey];
}
}
if (!output || !output.data) {
return '';
}
// 简化处理:实际应根据具体模型使用 tokenizer 解码
// 这里返回一个占位字符串
const tokens = Array.from(output.data as Float32Array | Int32Array);
return `[识别结果:${tokens.length} tokens]`;
}
/**
*
*/
start(): void {
this.isRecognizing = true;
this.emit('start');
}
/**
*
*/
stop(): void {
this.isRecognizing = false;
if (this.audioBuffer.length > 0) {
this.recognize(16000);
}
this.emit('stop');
}
/**
*
*/
async release(): Promise<void> {
this.stop();
await this.modelLoader.release();
}
/**
*
*/
get recognizing(): boolean {
return this.isRecognizing;
}
}

121
src/core/text-output.ts Normal file
View File

@ -0,0 +1,121 @@
/**
*
*
*/
import { EventEmitter } from 'events';
import { RecognitionResult } from './speech-recognizer.js';
export interface TextOutputConfig {
outputMode: 'clipboard' | 'keyboard' | 'both'; // 输出模式
autoPaste: boolean; // 是否自动粘贴
delayMs: number; // 延迟时间 (ms)
}
export class TextOutput extends EventEmitter {
private config: TextOutputConfig;
private lastText: string = '';
constructor(config: Partial<TextOutputConfig> = {}) {
super();
this.config = {
outputMode: config.outputMode ?? 'clipboard',
autoPaste: config.autoPaste ?? true,
delayMs: config.delayMs ?? 50,
};
}
/**
*
*/
async output(result: RecognitionResult): Promise<void> {
if (!result.isFinal || !result.text) {
return;
}
this.lastText = result.text;
try {
switch (this.config.outputMode) {
case 'clipboard':
await this.copyToClipboard(result.text);
break;
case 'keyboard':
case 'both':
// keyboard 模式在 Electron 中通过主进程实现
// 在纯 Node.js 环境中回退到剪贴板
await this.copyToClipboard(result.text);
if (this.config.outputMode === 'both') {
console.log('提示:文本已复制到剪贴板,请手动粘贴');
}
break;
}
this.emit('output', result.text);
} catch (error) {
this.emit('error', error);
}
}
/**
*
*/
private async copyToClipboard(text: string): Promise<void> {
// 尝试使用 clipboardy
try {
const clipboardy = await import('clipboardy');
await clipboardy.default.write(text);
this.emit('clipboard', text);
return;
} catch (e) {
// clipboardy 不可用,尝试其他方法
}
// Electron 环境
const globalObj = typeof globalThis !== 'undefined' ? globalThis : typeof window !== 'undefined' ? window : {};
if ((globalObj as any).navigator?.clipboard) {
await (globalObj as any).navigator.clipboard.writeText(text);
this.emit('clipboard', text);
return;
}
// 使用系统命令
const platform = process.platform;
const { exec } = await import('child_process');
return new Promise((resolve, reject) => {
let cmd: string;
if (platform === 'win32') {
cmd = `echo ${text} | clip`;
} else if (platform === 'darwin') {
cmd = `echo "${text}" | pbcopy`;
} else {
// Linux - 尝试多种工具
cmd = `echo "${text}" | xclip -selection clipboard 2>/dev/null || echo "${text}" | xsel --clipboard 2>/dev/null || echo "clipboardy failed"`;
}
exec(cmd, (error) => {
if (error) reject(error);
else {
this.emit('clipboard', text);
resolve();
}
});
});
}
/**
*
*/
getLastText(): string {
return this.lastText;
}
/**
*
*/
clear(): void {
this.lastText = '';
this.emit('clear');
}
}

105
src/electron-main.ts Normal file
View File

@ -0,0 +1,105 @@
/**
* Impress ASR Input - Electron
* electron npm install electron --save-dev
*/
import { app, BrowserWindow, ipcMain, globalShortcut, clipboard } from 'electron';
import { join } from 'path';
import { fileURLToPath } from 'url';
const __dirname = fileURLToPath(new URL('.', import.meta.url));
let mainWindow: BrowserWindow | null = null;
function createWindow() {
mainWindow = new BrowserWindow({
width: 400,
height: 600,
title: 'Impress ASR Input',
webPreferences: {
preload: join(__dirname, 'preload.js'),
contextIsolation: true,
nodeIntegration: false,
},
resizable: false,
skipTaskbar: false,
alwaysOnTop: false,
});
// 加载主界面
if (process.env.NODE_ENV === 'development') {
mainWindow.loadURL('http://localhost:5173');
} else {
mainWindow.loadFile(join(__dirname, '../ui/index.html'));
}
mainWindow.on('closed', () => {
mainWindow = null;
});
}
// 应用就绪时创建窗口
app.whenReady().then(() => {
createWindow();
// 注册全局热键
globalShortcut.register('CommandOrControl+Shift+Space', () => {
mainWindow?.webContents.send('toggle-recording');
});
globalShortcut.register('CommandOrControl+Escape', () => {
mainWindow?.webContents.send('stop-recording');
});
});
// IPC 处理
ipcMain.handle('start-recording', async () => {
// 启动录音
console.log('开始录音');
return { success: true };
});
ipcMain.handle('stop-recording', async () => {
// 停止录音
console.log('停止录音');
return { success: true };
});
ipcMain.handle('copy-to-clipboard', async (_, text: string) => {
clipboard.writeText(text);
return { success: true };
});
ipcMain.handle('get-settings', async () => {
// 获取设置
return {
language: 'zh',
outputMode: 'clipboard',
modelPath: './models/model.onnx',
};
});
ipcMain.handle('save-settings', async (_event: any, settings: Record<string, unknown>) => {
// 保存设置
console.log('保存设置:', settings);
return { success: true };
});
// 所有窗口关闭时退出应用
app.on('window-all-closed', () => {
globalShortcut.unregisterAll();
if (process.platform !== 'darwin') {
app.quit();
}
});
app.on('activate', () => {
if (BrowserWindow.getAllWindows().length === 0) {
createWindow();
}
});
// 应用退出前清理
app.on('will-quit', () => {
globalShortcut.unregisterAll();
});

110
src/main.ts Normal file
View File

@ -0,0 +1,110 @@
/**
* Impress ASR Input -
*
*/
import { Command } from 'commander';
import { SpeechRecognizer, RecognitionResult } from './core/speech-recognizer.js';
import { TextOutput } from './core/text-output.js';
import { readFileSync } from 'fs';
import { fileURLToPath } from 'url';
import { dirname, join } from 'path';
const __dirname = dirname(fileURLToPath(import.meta.url));
const packageJson = JSON.parse(
readFileSync(join(__dirname, '../package.json'), 'utf-8')
);
const program = new Command();
program
.name('impress-asr-input')
.description('基于 ONNX 的本地语音识别输入工具')
.version(packageJson.version);
program
.command('start')
.description('开始语音识别')
.option('-l, --language <lang>', '识别语言', 'zh')
.option('-m, --model <path>', '模型文件路径', join(__dirname, '../models/model.onnx'))
.option('-o, --output <mode>', '输出模式clipboard|keyboard|both', 'clipboard')
.action(async (options) => {
console.log('🎤 启动语音识别...');
console.log(` 语言:${options.language}`);
console.log(` 模型:${options.model}`);
console.log(` 输出:${options.output}`);
const recognizer = new SpeechRecognizer({
modelPath: options.model,
language: options.language,
useVad: true,
beamSize: 5,
});
const textOutput = new TextOutput({
outputMode: options.output as 'clipboard' | 'keyboard' | 'both',
autoPaste: true,
delayMs: 50,
});
// 绑定事件
recognizer.on('ready', () => {
console.log('✅ 模型加载完成,开始识别...');
recognizer.start();
});
recognizer.on('result', (result: RecognitionResult) => {
console.log(`📝 ${result.text}`);
textOutput.output(result);
});
recognizer.on('error', (error: Error) => {
console.error('❌ 识别错误:', error.message);
process.exit(1);
});
// 初始化并开始
try {
await recognizer.initialize();
// 注意:音频采集在纯 Node.js 环境需要额外处理
// 这里仅作为框架演示
console.log('⚠️ 当前为演示模式,完整功能需要 Electron 环境');
} catch (error) {
console.error('❌ 启动失败:', error);
process.exit(1);
}
// 优雅退出
process.on('SIGINT', async () => {
console.log('\n🛑 停止识别...');
recognizer.stop();
await recognizer.release();
process.exit(0);
});
});
program
.command('transcribe')
.description('转写音频文件')
.argument('<file>', '音频文件路径')
.option('-l, --language <lang>', '识别语言', 'zh')
.option('-m, --model <path>', '模型文件路径')
.option('-o, --output <file>', '输出文件路径')
.action(async (file, options) => {
console.log(`🎵 转写文件:${file}`);
console.log(` 语言:${options.language}`);
// TODO: 实现文件转写功能
console.log('⚠️ 文件转写功能开发中...');
});
program
.command('list-devices')
.description('列出可用音频设备')
.action(() => {
console.log('🎧 可用音频设备:');
// TODO: 实现设备列表功能
console.log('⚠️ 设备列表功能开发中...');
});
program.parse();

29
src/preload.ts Normal file
View File

@ -0,0 +1,29 @@
/**
* Electron
* electron
*/
import { contextBridge, ipcRenderer } from 'electron';
// 暴露给渲染进程的 API
contextBridge.exposeInMainWorld('electronAPI', {
// 录音控制
startRecording: () => ipcRenderer.invoke('start-recording'),
stopRecording: () => ipcRenderer.invoke('stop-recording'),
// 剪贴板
copyToClipboard: (text: string) => ipcRenderer.invoke('copy-to-clipboard', text),
// 设置
getSettings: () => ipcRenderer.invoke('get-settings'),
saveSettings: (settings: Record<string, unknown>) =>
ipcRenderer.invoke('save-settings', settings),
// 事件监听
onToggleRecording: (callback: () => void) => {
ipcRenderer.on('toggle-recording', () => callback());
},
onStopRecording: (callback: () => void) => {
ipcRenderer.on('stop-recording', () => callback());
},
});

315
src/ui/index.html Normal file
View File

@ -0,0 +1,315 @@
<!DOCTYPE html>
<html lang="zh-CN">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta http-equiv="Content-Security-Policy" content="default-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'">
<title>Impress ASR Input</title>
<style>
:root {
--bg-primary: #1a1a2e;
--bg-secondary: #16213e;
--accent: #e94560;
--accent-hover: #ff6b6b;
--text-primary: #ffffff;
--text-secondary: #a0a0a0;
--success: #00d9a5;
--border: #2d3748;
}
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
background: var(--bg-primary);
color: var(--text-primary);
min-height: 100vh;
display: flex;
flex-direction: column;
}
.header {
padding: 20px;
text-align: center;
border-bottom: 1px solid var(--border);
}
.header h1 {
font-size: 18px;
font-weight: 600;
}
.header p {
font-size: 12px;
color: var(--text-secondary);
margin-top: 4px;
}
.main-content {
flex: 1;
padding: 20px;
display: flex;
flex-direction: column;
gap: 20px;
}
.status-card {
background: var(--bg-secondary);
border-radius: 12px;
padding: 20px;
text-align: center;
}
.status-indicator {
width: 80px;
height: 80px;
border-radius: 50%;
background: var(--border);
margin: 0 auto 12px;
display: flex;
align-items: center;
justify-content: center;
font-size: 32px;
transition: all 0.3s ease;
}
.status-indicator.recording {
background: var(--accent);
animation: pulse 1.5s infinite;
}
@keyframes pulse {
0%, 100% { transform: scale(1); opacity: 1; }
50% { transform: scale(1.1); opacity: 0.8; }
}
.status-text {
font-size: 14px;
color: var(--text-secondary);
}
.record-btn {
width: 100%;
padding: 16px;
border: none;
border-radius: 12px;
background: var(--accent);
color: white;
font-size: 16px;
font-weight: 600;
cursor: pointer;
transition: all 0.2s ease;
}
.record-btn:hover {
background: var(--accent-hover);
}
.record-btn:active {
transform: scale(0.98);
}
.record-btn.recording {
background: var(--border);
}
.result-area {
flex: 1;
background: var(--bg-secondary);
border-radius: 12px;
padding: 16px;
min-height: 150px;
}
.result-area h3 {
font-size: 14px;
margin-bottom: 12px;
color: var(--text-secondary);
}
.result-text {
font-size: 14px;
line-height: 1.6;
white-space: pre-wrap;
word-break: break-word;
}
.result-text:empty::before {
content: '识别结果将显示在这里...';
color: var(--text-secondary);
font-style: italic;
}
.settings-section {
background: var(--bg-secondary);
border-radius: 12px;
padding: 16px;
}
.setting-item {
display: flex;
justify-content: space-between;
align-items: center;
padding: 8px 0;
border-bottom: 1px solid var(--border);
}
.setting-item:last-child {
border-bottom: none;
}
.setting-item label {
font-size: 14px;
}
.setting-item select {
background: var(--bg-primary);
border: 1px solid var(--border);
border-radius: 6px;
padding: 6px 12px;
color: var(--text-primary);
font-size: 13px;
cursor: pointer;
}
.hotkey-hint {
font-size: 12px;
color: var(--text-secondary);
text-align: center;
padding: 12px;
background: var(--bg-secondary);
border-radius: 8px;
}
.hotkey-hint kbd {
background: var(--bg-primary);
padding: 2px 8px;
border-radius: 4px;
border: 1px solid var(--border);
font-family: monospace;
}
</style>
</head>
<body>
<div class="header">
<h1>🎤 Impress ASR Input</h1>
<p>语音识别输入工具</p>
</div>
<div class="main-content">
<div class="status-card">
<div class="status-indicator" id="statusIndicator">🎤</div>
<div class="status-text" id="statusText">点击按钮开始录音</div>
</div>
<button class="record-btn" id="recordBtn">开始录音</button>
<div class="result-area">
<h3>识别结果</h3>
<div class="result-text" id="resultText"></div>
</div>
<div class="settings-section">
<div class="setting-item">
<label>识别语言</label>
<select id="languageSelect">
<option value="zh">中文</option>
<option value="en">English</option>
<option value="ja">日本語</option>
<option value="ko">한국어</option>
</select>
</div>
<div class="setting-item">
<label>输出模式</label>
<select id="outputModeSelect">
<option value="clipboard">剪贴板</option>
<option value="both">剪贴板 + 提示</option>
</select>
</div>
</div>
<div class="hotkey-hint">
<p>快捷键:<kbd>Ctrl+Shift+Space</kbd> 开始/停止录音</p>
<p style="margin-top: 6px;"><kbd>Ctrl+Escape</kbd> 强制停止</p>
</div>
</div>
<script>
const recordBtn = document.getElementById('recordBtn');
const statusIndicator = document.getElementById('statusIndicator');
const statusText = document.getElementById('statusText');
const resultText = document.getElementById('resultText');
const languageSelect = document.getElementById('languageSelect');
const outputModeSelect = document.getElementById('outputModeSelect');
let isRecording = false;
// 更新 UI 状态
function updateUI() {
if (isRecording) {
recordBtn.textContent = '停止录音';
recordBtn.classList.add('recording');
statusIndicator.classList.add('recording');
statusText.textContent = '正在录音中...';
} else {
recordBtn.textContent = '开始录音';
recordBtn.classList.remove('recording');
statusIndicator.classList.remove('recording');
statusText.textContent = '点击按钮开始录音';
}
}
// 点击录音按钮
recordBtn.addEventListener('click', async () => {
isRecording = !isRecording;
updateUI();
if (isRecording) {
await window.electronAPI?.startRecording();
} else {
await window.electronAPI?.stopRecording();
}
});
// 监听全局热键
window.electronAPI?.onToggleRecording(() => {
isRecording = !isRecording;
updateUI();
});
window.electronAPI?.onStopRecording(() => {
isRecording = false;
updateUI();
});
// 模拟识别结果(开发用)
function simulateResult(text) {
resultText.textContent = text;
if (text) {
window.electronAPI?.copyToClipboard(text);
}
}
// 设置保存
languageSelect.addEventListener('change', () => {
window.electronAPI?.saveSettings({ language: languageSelect.value });
});
outputModeSelect.addEventListener('change', () => {
window.electronAPI?.saveSettings({ outputMode: outputModeSelect.value });
});
// 加载设置
window.electronAPI?.getSettings().then(settings => {
if (settings) {
languageSelect.value = settings.language || 'zh';
outputModeSelect.value = settings.outputMode || 'clipboard';
}
});
</script>
</body>
</html>

103
src/utils/config.ts Normal file
View File

@ -0,0 +1,103 @@
/**
*
*/
import { readFileSync, writeFileSync, existsSync } from 'fs';
export interface AppSettings {
// 识别设置
language: string;
modelPath: string;
useVad: boolean;
// 输出设置
outputMode: 'clipboard' | 'keyboard' | 'both';
autoPaste: boolean;
// 热键设置
startHotkey: string;
stopHotkey: string;
// 音频设置
audioDeviceId?: string;
sampleRate: number;
}
export const defaultSettings: AppSettings = {
language: 'zh',
modelPath: './models/model.onnx',
useVad: true,
outputMode: 'clipboard',
autoPaste: true,
startHotkey: 'CommandOrControl+Shift+Space',
stopHotkey: 'CommandOrControl+Escape',
sampleRate: 16000,
};
/**
*
* Electron 使 electron-store
* Node.js 使 JSON
*/
export class ConfigStore {
private settings: AppSettings;
private filePath: string;
constructor(filePath: string) {
this.filePath = filePath;
this.settings = { ...defaultSettings };
this.load();
}
/**
*
*/
load(): void {
try {
if (existsSync(this.filePath)) {
const content = readFileSync(this.filePath, 'utf-8');
const saved = JSON.parse(content);
this.settings = { ...this.settings, ...saved };
}
} catch {
// 文件不存在或解析失败,使用默认设置
}
}
/**
*
*/
save(): void {
writeFileSync(this.filePath, JSON.stringify(this.settings, null, 2));
}
/**
*
*/
get<K extends keyof AppSettings>(key: K): AppSettings[K] {
return this.settings[key];
}
/**
*
*/
set<K extends keyof AppSettings>(key: K, value: AppSettings[K]): void {
this.settings[key] = value;
this.save();
}
/**
*
*/
getAll(): AppSettings {
return { ...this.settings };
}
/**
*
*/
reset(): void {
this.settings = { ...defaultSettings };
this.save();
}
}

View File

@ -0,0 +1,90 @@
/**
*
*/
import { describe, it, expect } from 'vitest';
import { normalizeAudio, resample, SimpleVAD, frameAudio } from '../src/core/audio-processor.js';
describe('audio-processor', () => {
describe('normalizeAudio', () => {
it('应该归一化 Int16Array 数据', () => {
const input = new Int16Array([32767, -32768, 0, 16384]);
const result = normalizeAudio(input, 16000);
expect(result).toBeInstanceOf(Float32Array);
expect(result.length).toBe(input.length);
expect(result[0]).toBeCloseTo(1, 3);
expect(result[1]).toBeCloseTo(-1, 3);
expect(result[2]).toBe(0);
});
it('应该处理 Float32Array 输入', () => {
const input = new Float32Array([1, -1, 0, 0.5]);
const result = normalizeAudio(input, 16000);
expect(result).toBe(input);
});
});
describe('resample', () => {
it('应该保持相同采样率的数据不变', () => {
const input = new Float32Array([1, 2, 3, 4]);
const result = resample(input, 16000, 16000);
expect(result).toBe(input);
});
it('应该降低采样率', () => {
const input = new Float32Array([1, 2, 3, 4, 5, 6, 7, 8]);
const result = resample(input, 16000, 8000);
expect(result.length).toBe(4);
});
});
describe('SimpleVAD', () => {
it('应该检测到高能量音频', () => {
const vad = new SimpleVAD({ energyThreshold: 0.01 });
const loudFrame = new Float32Array([0.5, 0.6, 0.7, 0.8]);
const result = vad.process(loudFrame, 16000);
expect(result.isSpeaking).toBe(true);
});
it('应该忽略低能量音频', () => {
const vad = new SimpleVAD({ energyThreshold: 0.01 });
const quietFrame = new Float32Array([0.001, 0.002, 0.001, 0]);
const result = vad.process(quietFrame, 16000);
expect(result.isSpeaking).toBe(false);
});
it('应该重置状态', () => {
const vad = new SimpleVAD();
vad.process(new Float32Array([0.5, 0.6, 0.7]), 16000);
vad.reset();
expect(vad.process(new Float32Array([0]), 16000).isSpeaking).toBe(false);
});
});
describe('frameAudio', () => {
it('应该正确分帧', () => {
const input = new Float32Array([1, 2, 3, 4, 5, 6, 7, 8]);
const frames = frameAudio(input, 4, 2);
expect(frames.length).toBe(4);
expect(frames[0]).toEqual(new Float32Array([1, 2, 3, 4]));
expect(frames[1]).toEqual(new Float32Array([3, 4, 5, 6]));
});
it('应该处理不足一帧的数据', () => {
const input = new Float32Array([1, 2, 3]);
const frames = frameAudio(input, 4, 2);
expect(frames.length).toBe(0);
});
});
});

25
tsconfig.json Normal file
View File

@ -0,0 +1,25 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "NodeNext",
"moduleResolution": "NodeNext",
"lib": ["ES2022", "DOM"],
"outDir": "./dist",
"rootDir": "./src",
"strict": false,
"esModuleInterop": true,
"skipLibCheck": true,
"forceConsistentCasingInFileNames": true,
"resolveJsonModule": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true,
"noUnusedLocals": false,
"noUnusedParameters": false,
"noImplicitReturns": false,
"noFallthroughCasesInSwitch": false,
"allowSyntheticDefaultImports": true
},
"include": ["src/**/*.ts"],
"exclude": ["node_modules", "dist", "test", "src/electron-main.ts", "src/preload.ts"]
}

11
vitest.config.ts Normal file
View File

@ -0,0 +1,11 @@
import { defineConfig } from 'vitest/config';
export default defineConfig({
test: {
include: ['test/**/*.test.ts'],
exclude: ['node_modules', 'dist'],
},
resolve: {
extensions: ['.ts', '.js'],
},
});