News and commentary about the Great Frontiers

ISS007-E-10807 (21 July 2003) --- This view of Earth's horizon as the sunsets over the Pacific Ocean was taken by an Expedition 7 crewmember onboard the International Space Station (ISS). Anvil tops of thunderclouds are also visible. Credit: Earth Science and Remote Sensing Unit, NASA Johnson Space Center

Image Credit: ISS007-E-10807 (21 July 2003) – Earth Science and Remote Sensing Unit, NASA Johnson Space Center

The Rise of Human-to-Machine Communication


From the perspective of you and me, Web 2.0 has been all about human-to-human communication. Sure, this communication is mediated by increasingly sophisticated machine-to-machine interactions, but from blogs to social networks to new media, best practices to graphic design to standards, we consider human-to-human communication to be central to the web as we know it today.

This will not be the case for much longer. The direct communication between humans and machines will soon dominate our social activity. Siri, Qwiki, and IBM’s Watson are just the beginning of a transformation that will radically transform the web and our relationship to it and each other over the next decade. Soon the focus of funding, development, and news coverage will shift away from human socialization to this new and intimate relationship between humans and our machines. Meanwhile, specific human-to-human interactions will begin to vanish, replaced by a chain of mostly automated interactions with the following pattern:

human <-> [machine <-> machine]<sub>n</sub> <-> human

where n is a variable number of automated machine-to-machine interactions.

For example, consider the predominately human-to-human interactions involved in making a reservation for dinner with a friend. We typically begin with a conversation between friends about dates, times, and locations, and we continue communicating as necessary until all the arrangements are made. This may include a conversation between one of us and someone at the restaurant to make the reservation, or the same conversation but use of a website or app to make the reservation.

The latter is something we have been doing more frequently in recent years. We often use our web browsers or apps to find places to dine, read reviews, and even make the actual reservation. What is coming is a refinement of this, pushing machine-to-machine interactions further out on either side of the arrangement until even the human-to-human interactions are optional. This is not going to happen because we are becoming less sociable. Instead, we will have the tools in place to make these arrangements so convenient and immediate that a barely thought-out passing mention in the vicinity of our machines will be translated into a specific arrangement.

Consider instead this near-future scenario: I decide I want to have dinner with a friend. This might be an independent whim or the outcome of a conversation between me and my friend. What is important is that our intent – to go eat dinner together – will be picked up at any stage by the technology around us. In a series of human-to-machine and machine-to-machine interactions, my digital devices and services will consult with my friend’s digital devices and services, and these technologies will consult back with us individually as necessary. A preliminary plan arranged, these services reach out to other services to finalize the arrangements. In minutes, or even seconds, all the arrangements have been made, whether or not my friend and I decided to communicate with each other directly, and without either of us having to talk to any other person, such as a human taking reservations at the restaurant.

As in the past, the end result is an enjoyable dining with a friend, but the majority of the communications required to make these arrangements will shift away from human-to-human interactions to human-to-machine and machine-to-machine.

Consider another current-day scenario: I am injured at work and end up on short term disability. I have not received any money from the insurance company and begin to worry about paying my bills. I call and wait on hold to talk to a representative. When they answer they are not very friendly. They insist that my employer has not faxed in the required information. I hang up and call a representative in my company’s Human Resources department, but they insist that they have already sent all the paperwork. After more phone calls I get more and more impatient. When the insurance company finally lets me know they have found the paperwork – it was filed in the wrong place! – my stress level has gone through the roof.

In the near future, there will be no telephone calls to HR or the insurance company. I will instigate an investigation with intention; perhaps a verbal expression of frustration with no one listening but my digital tools. These technologies, such as intelligent agents (IA), will work with IA in HR and IA at the insurance company to resolve the issue. There may still be misfiled paperwork, inefficient systems, and other issues, but the conversations taking place will generally require little if any participation by me or other humans.

Humans are inefficient and often unsuccessful when it comes to diving into the depths of data and bureaucracy to recover what is necessary and useful and to complete tasks in a timely and mutually agreeable fashion. IA will not necessarily eliminate system inefficiencies and bureaucracy, but IA will have the speed, tenaciousness and non-human detachment that will make it seem as if there were no inefficiencies! As these interactions increase and the focus of technological progress shifts further toward human-to-machine and machine-to-machine communications, the frustrations of today will quickly dissolve, replaced, probably, by new frustrations, but vanquished nonetheless.

Voice user interfaces (VUI) have been around for years, but they have not been widely distributed except in particular verticals like word processing and customer service. In word processing, the automated conversion of dictation to formatted text is all that is required for the successful completion of a task. In customer services, interactive voice response (IVR) has been moderately successful primarily because of the narrow range of questions asked and possible responses. The reason why VUI and IVR are not more widespread is because they are just one of a host of technologies that need to come together to be useful. Voice recognition, VUI, IVR, and natural language processing are some of the technologies coming together with a background intelligence – IA – to enable these new interactions. The combination of these technologies as demonstrated recently by both Watson and Siri provides much more capability and highlights the very human-to-machine interactions – conversations – that will soon become the focus of our online activities.

Although we still appreciate search engines today, they have begun to show their age. Search is not an easy conversation. Search requires the use of keyboards, mice, and buttons. Search can result in spam, wrong answers at the top of results, and otherwise unsatisfactory results. With intelligent agents evolving from technologies like Watson and Siri, the distance and required effort between my request and an answer or action will narrow greatly. Questions will be asked and requests will be made naturally, with our voices, body language, and emotions, to the devices in our environment. These devices will divine actionable intention from our expressed hopes and frustrations. The result will be specific answers and actions, tailored to specific situations and environments, all of it in a conversational manner, the very same manner with which we interact with other people.

Search verticals like Google, supported primarily by advertising, are on notice now that IA, combined with other technologies, have arrived. In fact, Google has already begun to explore and develop the kinds of technologies that will sweep search engines away and replace them with conversational IA. They have demonstrated technologies that interpret the world around them as viewed through a mobile device’s camera or heard through its microphone. Google’s deep roots in textual search, however, could slow their transition to IA. Apple has bet on a conversational IA future with their deep integration of Siri into iOS 5.0 and iPhone 4S. Unhampered by a legacy of textual search and gaining a voice and an attitude, Siri has already captured the imagination of several million consumers now using it regularly.

What follows Siri and Watson are increasingly sophisticated technologies that understand humans without requiring humans to narrow and tailor their communications. These are not technologies to augment search engines; they are technologies to replace them completely. The ease at which I can speak (sign, emote, or otherwise indicate) a question or request will dwarf the limited subset of questions that can be asked of search engines today.

IA will also speed up, or even replace, certain human activities. Siri can already make a dinner reservation or call for a cab more quickly than I can alone. It can also answer trivia questions more quickly than I can search for them in a search engine. Which is faster: (1) holding down a button and vocally asking to be reminded of a task you need to complete or (2) launching an app and typing in the reminder yourself? You might be a fast typer, but in the time it took you to launch your app, type your reminder, and set the time and date, Siri will have already entered it for you, waiting for just the right time or location to remind you. As Siri learns more about you, it will be able to do so more efficiently, and eventually it will learn how to prioritize and even complete some of these tasks for you.

Primitive IA were already being marketed at the end of the 20th century, but they flopped because the rest of infrastructure was not in place to truly support it, including natural interfaces like voice and gesture. These early IA – in the guise of parrots, paper clips, and an operating system named “Bob” – were slow on the hardware that existed at the time. There were no web services to query and no cloud data centers to which to outsource processing. Perhaps most importantly, interaction with these IA was limited to what we could do with a mouse and keyboard.

Some may argue that there is still much development to be completed in IA, and that this technology is many years or decades away. The truth is, much of the foundation for IA has already been created and demonstrated. Watson is in some ways even more sophisticated than Siri but its hardware takes up an entire room. Like all room-sized technologies, however, its infrastructure requirements will rapidly shrink until Watson can fit into consumer electronics. At the same time it shrinks, Watson will also gain new capabilities. IBM already has an aggressive roadmap to bring Watson technology to hospitals and health workers in the next 18 to 24 months.

Meanwhile, mobile platforms have reached a multi-touch plateau. Everything that was initial cool and cutting-edge about multi-touch now seems routine and widely distributed. There are limitations to multi-touch, some of which will be eliminated by updated hardware, but others that are inherent to the technology. The next revolution in mobile platforms, then, will be multi-modal interfaces. This is the integration of voice, gesture, and haptics, among others, into a seamless, adaptive interface.

This infrastructure is falling into place, capable of supporting rapid, efficient, and conversational interactions between humans and machines. Behind the scenes, the machines are getting smarter and improving their interactions with each other to complete increasingly sophisticated tasks. IA are now conversational. They adapt to us, rather than we to them. They beg to become a part of our conversations. Soon, we will be let them, and those interactions will become some of the most valuable we will ever know.

%d bloggers like this: