Powering machines with AI to revolutionise HCI

The way how humans communicate with machines, which is known as Human Computer Interaction (HCI), has drastically changed over the last decades. When personal computers first appeared, keyboards were the only input devices used and interaction was based on typing commands into a black- sometimes intimidating- screen. The introduction of the mouse in personal computers changed this paradigm, as users were able to navigate over their screen and click icons to perform particular actions. More recently, with the release of the first iPhone in 2007, touchscreens were introduced as an interface to interact with this pocket-size personal computer. The question now is: what will be next?

Undoubtedly, a natural interaction interface between humans and computers is the answer to this question; and, actually, we are not that far from this. Virtual assistants, such as Siri, Alexa, Cortana or Bixby, have already reached the consumer market and are revolutionising the current HCI paradigm. Thus, in order not to fall behind this trend, sustAGE aims to boost user experience by powering our system with natural interaction capabilities.

For the successful implementation of such functionalities, the sustAGE system first needs to understand what users are saying. Although this is a relatively easy cognitive task for humans, computers require much more effort to gain this ability. First of all, a user’s speech needs to be converted into text to generate the transcription of the user’s message. So, “how can computers get this ability if it requires several years for us, as humans, to master on this during our early ages in school?” With Artificial Intelligence (AI). Using AI techniques together with huge amounts of data, researchers and scientists have developed methodologies for training computers to get this ability, which is known as Automatic Speech Recognition (ASR). Next, computers need to infer context from these words for a computational understanding through mathematical analysis. So, “if computers operate with numbers, how can they handle these transcribed words?” Mapping them into word embeddings, which are numerical representations of each word that capture its linguistic, and sometimes even contextual, meaning. Finally, after understanding the message, the system can determine the most suitable answer to the current user’s message. Nevertheless, this is not magic! There is a team of experts designing the dialogues, so these can be used by computers to know what to say when.

The analysis of the linguistic information of a message allows us to understand what is being said. However, how this message is being said provides valuable paralinguistic information that can be used to complement the linguistic information to better understand the context of the conveyed message. So, “are you now telling me that computers can understand human’s states?” Indeed, and, again, this is achieved with the use of AI techniques. For instance, information related to the pitch or loudness (among many others) of the user’s speech can be used to understand user’s emotional states.

Therefore, our team aims to explore the use of automatically inferred linguistic and paralinguistic information in the development of a dialog management system able to customise the interaction to the current emotional state of the user. This way, we will not only provide the system with natural interaction capabilities but also engage users to interact with the system. If you are now curious about how AI can improve HCI, stay tuned to the outcomes of sustAGE.