A Human-engaging Robotic Interactive Assistant

Abstract

The integration of intelligent robotic systems into human-centric environments, such as laboratories, hospitals, and educational institutions, has become increasingly important due to the growing demand for accessible and context-aware assistants. However, current solutions often lack scalability—for instance, relying on specialized personnel to repeatedly answer the same questions as administrators for specific departments—and adaptability to dynamic environments that require real-time situational responses. This study introduces a novel framework for an interactive robotic assistant (Beckerle et al. , 2017) designed to assist during laboratory tours and mitigate the challenges posed by limited human resources in providing comprehensive information to visitors. The proposed system operates through multiple modes, including standby mode and recognition mode, to ensure seamless interaction and adaptability in various contexts. In standby mode, the robot signals readiness with a smiling face animation while patrolling predefined paths or conserving energy when stationary. Advanced obstacle detection ensures safe navigation in dynamic environments. Recognition mode activates through gestures or wake words, using advanced computer vision and real-time speech recognition to identify users. Facial recognition further classifies individuals as known or unknown, providing personalized greetings or context-specific guidance to enhance user engagement. The proposed robot and its 3D design are shown in Figure 1. In interactive mode, the system integrates advanced technologies, including advanced speech recognition (ASR Whisper), natural language processing (NLP), and a large language model Ollama 3.2 (LLM Predictor, 2025), to provide a user-friendly, context-aware, and adaptable experience. Motivated by the need to engage students and promote interest in the RAI department, which receives over 1,000 visitors annually, it addresses accessibility gaps where human staff may be unavailable. With wake word detection, face and gesture recognition, and LiDAR-based obstacle detection, the robot ensures seamless communication in English, alongside safe and efficient navigation. The Retrieval-Augmented Generation (RAG) human interaction system communicates with the mobile robot, built on ROS1 Noetic, using the MQTT protocol over Ethernet. It publishes navigation goals to the move_base module in ROS, which autonomously handles navigation and obstacle avoidance. A diagram is explained in Figure 2. The framework includes a robust back-end architecture utilizing a combination of MongoDB for information storage and retrieval and a RAG mechanism (Thüs et al., 2024) to process program curriculum information in the form of PDFs. This ensures that the robot provides accurate and contextually relevant answers to user queries. Furthermore, the inclusion of smiling face animations and text-to-speech (TTS BotNoi) enhanced user engagement metrics were derived through a combination of observational studies and surveys, which highlighted significant improvements in user satisfaction and accessibility. This paper also discusses capability to operate in dynamic environments and human-centric spaces. For example, handling interruptions while navigating during a mission. The modular design allows for easy integration of additional features, such as gesture recognition and hardware upgrades, ensuring long-term scalability. However, limitations such as the need for high initial setup costs and dependency on specific hardware configurations are acknowledged. Future work will focus on enhancing the system’s adaptability to diverse languages, expanding its use cases, and exploring collaborative interactions between multiple robots. In conclusion, the proposed interactive robotic assistant represents a significant step forward in bridging the gap between human needs and technological advancements. By combining cutting-edge AI technologies with practical hardware solutions, this work offers a scalable, efficient, and user-friendly system that enhances accessibility and user engagement in human-centric spaces.

Objective

งานวิจัยนี้มีที่มาจาก ความต้องการที่เพิ่มขึ้นสำหรับผู้ช่วยอัจฉริยะ ใน สภาพแวดล้อมที่เน้นมนุษย์เป็นศูนย์กลาง เช่น ห้องปฏิบัติการและสถาบันการศึกษา ซึ่งเผชิญปัญหาเรื่อง ข้อจำกัดด้านทรัพยากรบุคคล ในการให้ข้อมูลแก่ผู้เยี่ยมชมและนักศึกษา ปัจจุบัน โซลูชันที่มีอยู่มัก ขาดความสามารถในการขยายขนาด และ ปรับตัวให้เข้ากับสภาพแวดล้อมที่เปลี่ยนแปลง ได้อย่างมีประสิทธิภาพ นอกจากนี้ ระบบผู้ช่วยแบบเดิมมักพึ่งพาบุคลากรเฉพาะทาง ทำให้เกิดภาระในการตอบคำถามซ้ำๆ และไม่สามารถรองรับจำนวนผู้ใช้ที่เพิ่มขึ้นได้ ดังนั้น งานวิจัยนี้จึงมุ่งพัฒนา ผู้ช่วยหุ่นยนต์เชิงโต้ตอบ ที่สามารถ ทำงานอัตโนมัติในสภาพแวดล้อมแบบไดนามิก โดยใช้ AI และโมเดลภาษาขนาดใหญ่ (LLM Predictor) ผสานกับ การรู้จำเสียง ท่าทาง และใบหน้า เพื่อเพิ่ม การมีส่วนร่วมของผู้ใช้ และ ความสามารถในการโต้ตอบ แบบเรียลไทม์ ระบบนี้ยังช่วยลดภาระของบุคลากรและเพิ่ม การเข้าถึงข้อมูล ได้อย่างแม่นยำและมีประสิทธิภาพ อีกทั้งยังรองรับการพัฒนาเพิ่มเติมเพื่อให้สามารถขยายขีดความสามารถและใช้งานได้หลากหลายขึ้นในอนาคต

Other Innovations

คณะวิศวกรรมศาสตร์

Garbage sorting Systems

The presented project topic is Garbage Sorting Systems. The purpose is to study the operation and develop a waste sorting system that can automatically detect the type of waste using a proximity sensor to separate the types of metal and non-metal waste, as well as an ultrasonic sensor to check the amount of waste in the bin. If the amount of waste exceeds the specified amount, the system will send a notification to the communication device connected to the system, such as a smartphone or computer. The operation of the system is designed to increase the efficiency of waste management, reduce the burden of manual waste sorting, and promote recycling. This system can be applied in various places, such as educational institutions or public places, to help reduce the amount of waste that is not properly separated and increase the opportunity to reuse waste.

คณะวิศวกรรมศาสตร์

SignGen: An LLM-Based Thai Sign Language Generator

The Thai Sign Language Generation System aims to create a comprehensive 3D modeling and animation platform that translates Thai sentences into dynamic and accurate representations of Thai Sign Language (TSL) gestures. This project enhances communication for the Thai deaf community by leveraging a landmark-based approach using a Vector Quantized Variational Autoencoder (VQVAE) and a Large Language Model (LLM) for sign language generation. The system first trains a VQVAE encoder using landmark data extracted from sign videos, allowing it to learn compact latent representations of TSL gestures. These encoded representations are then used to generate additional landmark-based sign sequences, effectively expanding the training dataset using the BigSign ThaiPBS dataset. Once the dataset is augmented, an LLM is trained to output accurate landmark sequences from Thai text inputs, which are then used to animate a 3D model in Blender, ensuring fluid and natural TSL gestures. The project is implemented using Python, incorporating MediaPipe for landmark extraction, OpenCV for real-time image processing, and Blender’s Python API for 3D animation. By integrating AI, VQVAE-based encoding, and LLM-driven landmark generation, this system aspires to bridge the communication gap between written Thai text and expressive TSL gestures, providing the Thai deaf community with an interactive, real-time sign language animation platform.

คณะเทคโนโลยีการเกษตร

Detection of Durian Leaf Diseases Using Image Analysis and Artificial Intelligence

Durian is a crucial economic crop of Thailand and one of the most exported agricultural products in the world. However, producing high-quality durian requires maintaining the health of durian trees, ensuring they remain strong and disease-free to optimize productivity and minimize potential damage to both the tree and its fruit. Among the various diseases affecting durian, foliar diseases are among the most common and rapidly spreading, directly impacting tree growth and fruit quality. Therefore, monitoring and controlling leaf diseases is essential for preserving durian quality. This study aims to apply image analysis technology combined with artificial intelligence (AI) to classify diseases in durian leaves, enabling farmers to diagnose diseases independently without relying on experts. The classification includes three categories: healthy leaves (H), leaves infected with anthracnose (A), and leaves affected by algal spot (S). To develop the classification model, convolutional neural network (CNN) algorithms—ResNet-50, GoogleNet, and AlexNet—were employed. Experimental results indicate that the classification accuracy of ResNet-50, GoogleNet, and AlexNet is 93.57%, 93.95%, and 68.69%, respectively.