Comprehensive Review of Offline Voice Control Systems for Smart Homes: Technological Advances, Existing Challenges, and Prospective Developments

Section: Review Articles

Abstract

The offline voice control systems are becoming more prominent in the smart home environment because of their ability to deliver quick feedback, preserve the privacy of the user, and be able to function even without network connectivity. However, their usage has not been mainstreamed, due to restrictions on vocabulary, resistance to noise, and hardware performance. In this paper, the authors attempt to provide a critical assessment of the recent developments in offline voice control technology in smart homes. On devices with limited resources, it evaluates a variety of datasets, frameworks, embedded platforms, and algorithmic enhancements. Moreover, the paper examines the issues linked to the growth of vocabulary, audio variability, and privacy, as well as interoperability and compares offline and online paradigms in terms of a more detailed analytical framework. Among the emerging trends highlighted include lightweight keyword spotting, embedded spoken language understanding, neural processing units, and on-device personalization, as being crucial in the evolution of future systems. This paper summarizes the current situation and outlines the future trends of the offline voice control technologies as it is applicable to smart home usage by synthesizing recent achievements and pinpointing the current limitations of the technology.

References

  1. R. Martinek, J. Vanus, J. Nedoma, M. Fridrich, J. Frnda, and A. Kawala-Sterniuk, Voice communication in noisy environments in a smart house using hybrid LMS+ICA algorithm. Sensors (Switzerland), vol. 20, no. 21, 2020.
  2. A. M. Rostami, A. Karimi, and M. A. Akhaee, Keyword spotting in continuous speech using convolutional neural network. Speech Commun., vol. 142, 2022.
  3. S. Yang, B. Kim, I. Chung, and S. Chang, Personalized keyword spotting through multi-task learning. in Proc. Annual Conf. Int. Speech Commun. Assoc. (Interspeech), 2022.
  4. S. Drgaś, "A Survey on Low-Latency DNN-Based Speech Enhancement," Sensors, vol. 23, no. 3, p. 1380, 2023.
  5. J. Bushur and C. Chen, Neural network exploration for keyword spotting on edge devices. Future Internet, vol. 15, no. 6, 2023.
  6. Y. Abadade, A. Temouden, H. Bamoumen, N. Benamar, Y. Chtouki, and A. S. Hafid, A comprehensive survey on TinyML. IEEE Access, vol. 11, 2023.
  7. D. Bermuth, A. Poeppel, and W. Reif, Jaco: an offline running privacy-aware voice assistant. arXiv preprint arXiv:2209.07775, 2022.
  8. C. Oumard, J. Kreimeier, and T. Götzelmann, Pardon? an overview of the current state and requirements of voice user interfaces for blind and visually impaired users. in Lecture Notes in Computer Science, 2022.
  9. J. Mishra, T. Malche, and A. Hirawat, Embedded intelligence for smart home using TinyML approach to keyword spotting. Engineering Proceedings, vol. 82, no. 1, p. 30, 2024.
  10. C. Gao, Y. Gu, F. Caliva, and Y. Liu, Self-supervised speech representation learning for keyword spotting with light-weight transformers, arXiv:2303.04255, 2023.
  11. V. Rajapakse, I. Karunanayake, and N. Ahmed, Intelligence at the extreme edge: a survey on reformable TinyML. ACM Comput. Surv., vol. 55, no. 13, 2023.
  12. A. G. Howard et al., "MobileNets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
  13. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. C. Chen, "MobileNetV2: Inverted residuals and linear bottlenecks," in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4510–4520.
  14. A. Berg, M. O’Connor, and M. Tairum Cruz, Keyword Transformer: A self-attention model for keyword spotting, in Proc. INTERSPEECH 2021, pp. 4249–4253, 2021.
  15. N. H. Tandel, H. B. Prajapati, and V. K. Dabhi, Voice recognition and voice comparison using machine learning techniques: a survey. In 2020 6th Int. Conf. on Advanced Computing and Communication Systems (ICACCS), 2020.
  16. A. S. Dhanjal and W. Singh, A comprehensive survey on automatic speech recognition using neural networks. Multimed. Tools Appl., vol. 83, no. 8, 2024.
  17. Irugalbandara C., Naseem A. S., Perera S., Kiruthikan S., and Logeeshan V., A secure and smart home automation system with speech recognition and power measurement capabilities, Sensors, vol. 23, no. 13, article 6109, 2023.
  18. M. Nalini, S. Suveka, and S. A. C. Bukhari, AI-based fingerprint and voice recognition systems, in AI based advancements in biometrics and its applications, CRC Press, 2024, pp. 101–117.
  19. S. Heydari and Q. H. Mahmoud, Tiny machine learning and on-device inference: a survey of applications, challenges, and future directions, Sensors, vol. 25, no. 10, p. 3191, 2025.
  20. H. Han and J. Siebert, TinyML: A systematic review and synthesis of existing research, in 4th Int. Conf. on Artificial Intelligence in Information and Communication (ICAIIC), 2022.
  21. Z. Xie., The BIGAI Offline Speech Translation Systems for the IWSLT 2023 Evaluation, in Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT), Association for Computational Linguistics, pp. 243–248, 2023.
  22. A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, and Y. Wu, Conformer: Convolution-augmented transformer for speech recognition, Proc. Interspeech, 2020.
  23. L. Hernández Acosta and D. Reinhardt, A survey on privacy issues and solutions for voice-controlled digital assistants, Pers. Media Comput. J., 2022.
  24. S. Liao, C. Wilson, L. Cheng, H. Hu, and H. Deng, Measuring the effectiveness of privacy policies for voice assistant applications, in ACM Int. Conf. Proceeding Series, 2020.
  25. M. Imam and G. Gupta, Precision location keyword detection using offline speech recognition technique, J. Internet Technol., vol. 23, no. 2, pp. 125–138, 2023.
  26. C. Cioflan, L. Cavigelli, M. Rusci, M. de Prado, and L. Benini, "On-Device Domain Learning for Keyword spotting on Low-Power Extreme Edge Embedded Systems," in Proc. IEEE 6th Int. Conf. Artificial Intelligence Circuits and Systems (AICAS), 2024.
  27. R. Aloufi, H. Haddadi, and D. Boyle, On-device voice authentication with paralinguistic privacy, arXiv:2205.14026, 2022.
  28. H. M. S. Di Leo, L. De Cicco, and S. Mascolo, Real-time speech-to-text on edge: a prototype system for ultra-low latency communication with AI-powered NLP, Information, vol. 16, no. 8, p. 685, 2025.
  29. R. Kruger and B. Klug, Voice assistant technology: Alexa® in the sim lab, Can. J. Crit. Care Nurs., vol. 29, no. 2, 2018.
  30. Akhtar Z., Khursheed M. O., Du D., Liu Y., Small-footprint slimmable networks for keyword spotting, in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2023.
  31. C. Banbury, A. Reddi, M. Lam, W. Fu, S. Han, and V. Chandra, Micronets: Neural network architectures for deploying TinyML applications, in Proc. Machine Learning and Systems (MLSys), 2021, pp. 1–15.
  32. Z. Yang, S. Sun, J. Li, X. Zhang, X. Wang, L. Ma, and L. Xie, CaTT-KWS: A Multi-stage customized Keyword spotting Framework based on Cascaded Transducer-Transformer, in Proc. Interspeech 2022, pp. 4245–4249, 2022.
  33. C. Banbury, V. J. Reddi, P. Torelli, J. Holleman, N. Jeffries, C. Kirsch, and V. Sze, MLPerf Tiny: Benchmarking TinyML Systems, in Proc. 35th Conference on Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2021.
  34. T. Malche, A. Hirawat, and J. Mishra, Voice-activated home automation system for IoT edge devices using TinyML, Discover Internet of Things, 2025.
  35. A. B. Nassif, I. Shahin, I. Attili, M. Azzeh, and K. Shaalan, Speech recognition using deep neural networks: a systematic review, IEEE Access, vol. 7, 2019.
  36. Pietro Bartoli, Tommaso Bondini, Christian Veronesi, Andrea Giudici & Franco Zappa, end-to-end Efficiency in Keyword spotting: A System-Level Approach for Embedded Microcontrollers, arXiv:2509.07051, 2025
  37. A. Kintz, A. G. Howard, M. Sandler, and A. Zhmoginov, EdgeSpeechNets: Highly efficient deep neural networks for speech recognition on the edge, arXiv preprint arXiv:1810.08559, 2018.
  38. N. Hoy, Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants, Medical Reference Services Quarterly, vol. 37, no. 1, pp. 81–88, 2018.
  39. A. Pandey and D. Wang, A new framework for supervised speech enhancement in the time domain, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 7, pp. 1179–1188, 2019.
  40. J. Wang and S. Li, Keyword spotting system and evaluation of pruning and quantization methods on low-power edge microcontrollers, arXiv:2208.02765, 2022.
  41. S. Kim, T. Hori, and S. Watanabe, Joint CTC–attention based end-to-end speech recognition: Advances and trends, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2742–2756, 2022.
  42. Y. Ma, Y. Zhang, M. Bachinski, and M. Fjeld, Emotion-aware voice assistants: design, implementation, and preliminary insights, in proceedings of the 11th International Symposium on Chinese CHI (Chinese CHI), pp. 527–532, 2023.
  43. Y. Iliev and G. Ilieva, A Framework for Smart Home System with Voice Control Using NLP Methods, Electronics, vol. 12, no. 1, article 116, 2023.
  44. J. Wang, S. Kim, and M. Sunwoo, Hardware-efficient Customized Keyword spotting with Spectral-Temporal Graph Attentive Pooling, arXiv preprint arXiv:2409.00099, 2024.
  45. C. Zonios and V. Tenentes, Energy efficient speech command recognition for private smart home iot applications, international conference on smart internet of Things, 2023.
  46. I. López-Espejo, Z. Tan, and J. Jensen, Deep spoken keyword spotting: An overview, IEEE Access, vol. 10, pp. 4169–4199, 2022.
  47. Md N. Miah, Voice command recognition with deep neural network on edge devices, M.S. thesis, Dept. Electrical & Computer Engineering, Purdue University, 2021.
  48. M. A. Torad, B. Bouallegue, and A. M. Ahmed, A voice controlled smart home automation system using artificial intelligent and internet of things, TELKOMNIKA (Telecommunication, Computing, Electronics and Control), vol. 20, no. 4, pp. 808–816, 2022.
  49. A. Baevski, H. Zhou, A. Mohamed, and M. Auli, wav2vec 2.0: A framework for self-supervised learning of speech representations, in Proc. NeurIPS, 2020.
  50. W. Xiong, L. Wu, F. Alleva, J. Droppo, X. Huang, and A. Stolcke, The Microsoft 2017 conversational speech recognition system, in Proc. IEEE ICASSP, 2018, pp. 5934–5938.
  51. G. Menghani, Efficient Deep Learning: A survey on making deep learning models smaller, faster, and better, acm comput. surv., vol. 55, no. 12, 2023.
  52. V. Sze, Y. H. Chen, T. J. Yang, and J. S. Emer, efficient processing of deep neural networks: a tutorial and survey, proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329, 2017.
  53. J. Lau, B. Zimmerman, and F. Schaub, Alexa, are you listening? privacy perceptions, concerns and privacy-seeking behaviors with smart speakers, in Proc. ACM CHI Conference on Human Factors in Computing Systems, 2018, pp. 1–13.
  54. S. Latif, J. Qadir, A. Qayyum, M. Usama, and S. Younis, Speech Technology for Healthcare: Opportunities, challenges, and future directions, IEEE reviews in Biomedical Engineering, vol. 14, pp. 342–356, 2021.
  55. S. Sicari, A. Rizzardi, L. A. Grieco, and A. Coen-Porisini, Security, privacy and interoperability in the Internet of Things: challenges and opportunities, Computer Networks, vol. 76, pp. 146–164, 2015.
  56. P. Aimtongkul and K. Janchitrapongvej, Development and Assessment of Internet of Things-Driven smart home security and automation with voice commands, sensors, vol. 24, no. 3, p. 896, 2024.
  57. S. Majumdar and B. Ginsburg, MatchboxNet: 1D time-channel separable convolutions for small-footprint keyword spotting, in Proc. INTERSPEECH, 2020, pp. 1977–1981.
  58. P. Drahoš, Edge container for speech recognition, Electronics, vol. 10, no. 19, p. 2420, 2021.
  59. H. Kim and J. S. Han, Smart home advancements for health care and beyond: systematic review of two decades of user-centric innovation, Sensors, vol. 24, no. 11, p. 3317, 2024.
  60. P. Warden, Speech commands: a dataset for limited-vocabulary speech recognition, arXiv preprint arXiv:1804.03209, 2018.
  61. R. David et al., TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems, in Proc. Machine Learning and Systems (MLSys), 2021.
  62. A. Pervaiz, J. M. Rabaey, and M. Tariq, Incorporating noise robustness in speech command recognition by noise augmentation of training data, sensors, vol. 20, no. 8, 2020.
  63. A. Howard et al., Searching for MobileNetV3, in Proc. IEEE/CVF international conference on computer vision (ICCV), 2019, pp. 1314–1324.
  64. J. Lin, W.-M. Chen, Y. Lin, and C. Gan, MCUNet: Tiny deep learning on IoT devices, in Proc. NeurIPS, 2020.
  65. K. Ding, M. Zong, J. Li, and B. Li, LETR: a lightweight and efficient transformer for keyword spotting, in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), 2022.
  66. G. Cámbara, J. Luque, and M. Farrús, recycle your Wav2Vec2 codebook: a speech perceiver for keyword spotting, in Proc. 29th Int. Conf. Comput. Linguistics (COLING), 2022, pp. 7166–7170.
  67. M. Yu, X. Ji, B. Wu, D. Su, and D. Yu, end-to-end multi-look keyword spotting, in Proc. Interspeech, 2020.
  68. A. Barovic and A. Moin, TinyML for speech recognition, arXiv preprint, Apr. 2025.
  69. J. Lin, W.-M. Chen, J. Gan, S. Han, and Y. Lin, MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning, in Proc. 35th Conference on Neural Information Processing Systems (NeurIPS), 2021, pp. 2874–2887.
  70. Edge Impulse, Edge Impulse Documentation, 2024. [Online]. Available: https://docs.edgeimpulse.com. [Accessed: Jan. 15, 2026].
  71. V. J. Reddi, B. Griffith, P. Warden, A. Faust, and G. Janapa Reddi, Widening access to applied machine learning with TinyML, Communications of the ACM, vol. 65, no. 4, pp. 34–40, 2022.
  72. A. L. Georgescu, A. Pappalardo, H. Cucu, and M. Blott, Performance vs. hardware requirements in state-of-the-art automatic speech recognition, EURASIP Journal on Audio, Speech, and Music Processing, vol. 2021, no. 1, article 24, 2021.
  73. Yi Luo and Nima Mesgarani, Conv-TasNet: Surpassing ideal time-frequency masking for real-time end-to-end monaural speech separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, no. 8, pp. 1256–1266, 2019.
  74. S. Latif, R. Rana, S. Khalifa, R. Jurdak, and J. Epps, Direct modelling of speech representations for context-aware emotion recognition, IEEE Transactions on Affective Computing, vol. 14, no. 2, pp. 1184–1197, 2023.
  75. Jinyu Li, Yu Wu, Yashesh Gaur, Chengyi Wang, Wayne Xiong, and Zhong Meng, Recent advances in end-to-end automatic speech recognition, Nanoscale Research Letters, vol. 15, no. 1, article 5, 2021.
  76. Y. Gong, S. Khurana, L. Karlinsky, and J. Glass, Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers, in Proc. INTERSPEECH, 2023.
  77. M. Suresh, M. S. Roopa, and K. G. Srinivasa, IoT-based smart security and home automation system, in Intelligent Technologies for Sensors: Applications, Design, and Optimization of a Smart World, 2023.
  78. T. Higuchi, A. Gupta, and C. Dhir, Multi-task learning with cross attention for keyword spotting, in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021.
  79. S. Ghangam, D. Whitenack, and J. Nemecek, Dyn-ASR: compact, multilingual speech recognition via spoken language and accent identification, arXiv preprint arXiv:2108.02034, 2021.
  80. A. Diwan, C.-F. Yeh, W.-N. Hsu, P. Tomasello, E. Choi, D. Harwath, and A. Mohamed, Continual learning for on-device speech recognition using disentangled conformers, in proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
Download this PDF file
##submission.supplementaryFiles##

Statistics

How to Cite

Comprehensive Review of Offline Voice Control Systems for Smart Homes: Technological Advances, Existing Challenges, and Prospective Developments. (2026). AL-Rafidain Journal of Computer Sciences and Mathematics, 20(1), 1-11. https://doi.org/10.33899/rjcsm.v20i1.60652
Copyright and Licensing

How to Cite

Comprehensive Review of Offline Voice Control Systems for Smart Homes: Technological Advances, Existing Challenges, and Prospective Developments. (2026). AL-Rafidain Journal of Computer Sciences and Mathematics, 20(1), 1-11. https://doi.org/10.33899/rjcsm.v20i1.60652