AI Voice

Speech generation, voice agents, speech-to-text, and real-time audio models.

Latest 2026-07-31 19:42 UTC 10 source-backed 0 watch RSS JSON Feed Page JSON

Latest Signals

AI Voice feed

10 on this page 10 total

2026-07-31 19:42 UTC

Discussion on Best Free Text to Speech Tools on Hacker News

A Hacker News thread discusses free text-to-speech options, highlighting two points but no comments. The conversation centers on accessible TTS tools available at the time.

Hacker News USEFUL NOW GENERAL

SOURCE-BACKED 78% Open signal Original source

2026-07-30 13:57 UTC

Large-Scale Study on AI Voice Agents in Job Interviews

A natural field experiment with 70,000 job applicants compared AI voice agent interviews to human recruiter interviews, with humans making final hiring decisions in both cases. The study examines whether AI can reduce variance in information collection and improve organizational outcomes.

arXiv RESEARCH TECHNICAL

SOURCE-BACKED 95% Open signal Original source

2026-07-26 06:58 UTC

Inflect TTS v2+ONNX: 9M/4M Text-to-Speech Models Running in Browser

Inflect TTS v2+ONNX offers 9 million and 4 million parameter text-to-speech models that run directly in the browser. This enables efficient, client-side voice synthesis without server dependency.

Hacker News BIG MOVE TECHNICAL

SOURCE-BACKED 83% Open signal Original source

2026-07-23 19:33 UTC

Production Duplex Speech Model Launched for Revenue Calls

A new duplex speech model designed for revenue calls has been introduced, enabling real-time, interactive voice communication. The model was discussed on Hacker News, highlighting its potential applications in business contexts.

Hacker News USEFUL NOW GENERAL

SOURCE-BACKED 81% Open signal Original source

2026-07-20 17:12 UTC

FlashRT Enables Efficient Deployment of Real-Time Multimodal AI Applications

FlashRT is an agent harness designed to optimize deployment of real-time multimodal applications like voice agents and interactive video generation by managing model pipelines with application-specific placement and parallelism. It addresses limitations of existing serving systems that rely on fixe...

arXiv RESEARCH TECHNICAL

SOURCE-BACKED 95% Open signal Original source

2026-07-10 20:54 UTC

AI Advances Enable Scalable Automated Voice Phishing Attacks

New research shows that AI voice synthesis and large language models can automate voice phishing attacks, removing the need for human operators. A large-scale study assessed U.S.

arXiv TECHNICAL

SOURCE-BACKED 95% Open signal Original source

2026-07-08 23:24 UTC

Reliability of Gemini Models as Audio Judges for Full-Duplex Voice Agents

The study evaluates the reliability of Gemini models (2.5 Flash, 3.5 Flash, 3.1 Pro) as audio judges scoring full-duplex voice agent conversations from raw stereo waveforms. Gemini 2.5 Flash was validated against human raters across 209 sessions on eight production dimensions.

arXiv TECHNICAL

SOURCE-BACKED 95% Open signal Original source

2026-07-03 15:27 UTC

DETECT-3B-Omni detects deepfake audio independent of content and demographics

DETECT-3B-Omni is a GDPR-compliant deepfake audio detector that bases its decisions on acoustic artifacts rather than speech content or speaker identity. A large-scale study using 10,240 samples from diverse US English speakers across 30 states and 8 AI voice-cloning systems confirms its semantic i...

arXiv GENERAL

SOURCE-BACKED 95% Open signal Original source

2026-07-01 00:00 UTC

Hugging Face and Cerebras launch Gemma 4 for real-time voice AI

Hugging Face and Cerebras have introduced Gemma 4, a model designed to enhance real-time voice AI applications. This collaboration aims to improve the performance and responsiveness of voice-based AI systems.

Hugging Face Blog GENERAL

SOURCE-BACKED 91% Open signal Original source

2026-06-11 02:12 UTC

AI Voice Cloning Challenges the Protection of Vocal Identity

Advanced AI voice cloning technologies raise significant legal and ethical issues about protecting vocal identity. The similarity between OpenAI's ChatGPT-4o voice and Scarlett Johansson's highlights these concerns.

arXiv GENERAL

SOURCE-BACKED 95% Open signal Original source