Keywords

1 Introduction

Robotic technologies have recently made significant advances towards the goal of autonomy. Though some fields may not require or even benefit from autonomous robotic systems (e.g., perhaps we will never be comfortable with fully autonomous surgeons), there are significant improvements that can be made in areas such as search and rescue, emergency response, military operations, etc. which involve high workload, hazardous scenarios. Putting aside fears of potentially malicious AI (a la Isaac Asimov, Stanley Kubrick, or William Gibson), it may be possible to employ such systems to replace humans in life-risking operations to not only protect but to save lives [1,2,3].

As improvements are made to robotic technologies, it is important for researchers to stay ahead of the curve and attend to the potential directions in which autonomous robots may develop. Anticipating near-future advances improves our ability to curb dangerous approaches as well as optimize the way that technologies are leveraged. A promising direction that is currently the focus of a great deal of research is the transition of near-autonomous robots from tools that may be used to accomplish an operator’s goals into teammates that may assist with a human’s goal while also attending to separate objectives [4, 5]. The dream of single operator, multi-robot teams may still be a ways off [6, 7]; however, multi-human teams which incorporate intelligent robotic teammates are very nearly a reality. That said, there are a number of challenges beyond the technological (e.g., computer vision, object recognition, artificial intelligence networks, etc.) that will need to be tackled before humans can effectively partner with robots [8].

Our understanding of human-human teaming has been systematically improved for centuries, yet gaps still exist in our knowledge of how best to optimize team performance; the introduction of non-human intelligences into the mix will necessarily complicate matters further [9,10,11]. Fortunately, we have some understanding of how humans partner with non-humans as a result of human-animal teams which have proven to be extremely effective in a host of scenarios from law enforcement to search and rescue [12]. Human-dog teams, for instance, have functioned effectively both with and without the assistance of technology (see Bozkurt et al. [13] for an interesting use of technology in this field). A fundamental difference between current human-non-human teams and the vision for the future is that autonomous robots may be designed to communicate with a human far more naturally than animals (e.g., speech, graphical depictions, intuitive data arrays) [14]. Though current methods of human-robot communication are primarily limited to video feeds and environmental data transmission [15,16,17] the future of human-robot teams (HRT) will depend on the expansion of communication modalities to support more natural interactions [10, 18,19,20,21] (see Fig. 1).

Fig. 1.
figure 1

A diagrammatic representation of the type of multi-modal communication that will be needed to facilitate future HRT communications: bi-directional auditory communication, visual and touch/button interaction facilitated by an interface, and tactile communication which would likely only need to be one way and is represented here as a tactile belt.

2 Multimodal Communication and Teaming

Currently, the needs of human members of HRTs are not clearly understood [22], but it is known that in the context of high-workload situations human performance benefits from reducing cognitive load as much as possible [23,24,25]. Multi-modal communication (MMC) presents a solution to the danger of overloading human operators by leveraging the fact that the brain is more capable of attending to more information if it is split across modalities [26, 27]. By flexibly utilizing both explicit (e.g., speech, visuals, tactile displays) and implicit communication modes, MMC techniques offload information processing demands to facilitate interactions and improve performance [21].

To emphasis this point, further, one should consider the operational environment in which human-robot teaming will take place. For a soldier, the mission space has potential for noise, low visibility, and many dynamic events. Speech may be the primary method of human-human communication, but visual signals (gestures), or touch (hand on shoulder) provide signal redundancy to ensure the message is received and understood. Consider the case of a HRT consisting of a dismounted soldier on patrol while interacting with a robotic teammate that is conducting a search of an area: initial communications may primarily be speech based commands and acknowledgements, but once the robot begins to encounter objects of interest it must either send images, video feed, or descriptions to their human counterpart. The human, in this case, would already be experiencing visual load as a result of their patrol, so although images may be most effective for communicating findings it may be better in that situation to generate speech descriptions to which the soldier may attend without reducing the quality of the patrol. The additional of tactile messages may also emphasis the content in speech further, [28]. It is presently impossible to wholly determine the optimal ways to support teaming in such a scenario because the teammates and MMC tools that will be needed to run assessments in the field do not exist.

3 Researching Future HRI: Physical and Virtual Simulation

A challenge for current investigations of HRI is the lack of functional autonomous robots with which to test interactions in relevant scenarios. A possible approach to tackling that issue is to create representative tools/teammates which are capable of simulating a given scenario; however, implementing a real-world simulation of even giving the appearance of needed capability requires significant resources. A far less demanding approach is to virtually simulate capabilities in a laboratory setting, though this approach yields findings that are less applicable to real world interactions. Fundamentally, the incongruence between the experience of participating in an experiment and actually interacting with a robot in an ongoing mission reduces the degree to which such investigations can accurately predict interaction outcomes and thereby deduce methods for supporting effective teaming. Though it is important to note those drawbacks, simulating near-future HRI is currently the most viable approach to preparing for the arrival of usable technologies, and therefore it is more relevant to consider the type and design of simulations that can most accurately approximate reality.

Physical simulations are a common tool for evaluating human performance, and can be used for nearly any scenario that does not incorporate undue risk (e.g., firefighters may run drills in burning buildings, but running search and rescue drills in the presence of real radioactive fallout is likely not advisable or necessary) [29]. Jentsch et al. 2010 describe the pros and cons of a scaled Model Operations in Urban Terrain (MOUT) physical simulation that makes heavy use of the “wizard-of-oz” technique to simulate interactions between human operators and remote robotic teammates in a military context. Here, the wizard-of-oz technique essentially describes the use of pre-determined events, confederates, and faked information/communications to create the illusion of conducting a real mission in a relevant “environment” without requiring functional autonomous systems or a real world environment in which to test them. Real world technologies and spaces are not required by this approach, however, a great deal of effort, preparation, and maintenance is still necessary to employ a physical simulation such as the MOUT as the faked technologies and environments must still be implemented (see Fig. 2) [30].

Fig. 2.
figure 2

The scaled MOUT environment including a model urban environment, remote control “autonomous” ground vehicles, and a pulley system for controlling “autonomous” aerial vehicles.

Another approach to simulating near-future HRI is to employ virtual reality simulations. Several such simulations have been developed in the last decade, particularly for the investigation of military focused HRI. The Mixed-Initiative Experimental (MIX) Testbed is one such simulation which provides simulated ground and air robotic systems in a 3D environment as well as an operator control unit (OCU) which allows users to interact with the virtual systems [32]. A benefit of the MIX simulator over physical simulations such as the MOUT is that it does not require a large room to run nor confederates to operate the simulated autonomous teammates. On the other hand, the use of confederates does allow for a larger degree of flexibility than pre-programmed interactions or capabilities which constrain interaction possibilities From an experimental perspective, however, the repeatability of predetermined actions is often better than allowing too much freedom (e.g., two participants who engaged in different interactions may not be directly compared). The MIX testbed has supported several investigations focused on HRT, but did not address the need for MMC which supports verbal communication and virtual multi-tasking (though participants could, of course, be tasked with real-world tasks while interacting with the MIX).

The Virtual Test Bed (VTB) was developed as an extension of the MIX to more effectively study multi-modal communication in HRTs. It not only includes a virtual monitoring task which simulates ongoing activities of a human team member (executed by the user), but also incorporates a prototype Multi-Modal Interface (MMI) developed with the support of the Robotics Collaborative Technology Alliance (RCTA). The RCTA MMI is a real-world device designed explicitly to support multi-modal communication for HRTs in military operations; even though the tasks and events simulated in the VTB are faked, the interactions themselves are identical to what would be experienced in near-future teams (note that the MMI software is presented within the VTB, the physical device itself is not used). The one drawback of the original VTB was that users had to engage in their tasks and interactions through a desktop computer and monitor and did not have a viewpoint which immersed them in the virtual environment. Advances in commercial virtual reality displays (primarily head mounted displays and motion tracked controllers) allowed the issue of immersion to be properly addressed and led to another overhaul of the VTB: extension into immersive virtual reality.

The immersive version of the MIX/VTB style HRI testing environment, the VRMIX, relies on an HTC Vive and paired handheld controller to put participants right into a virtually simulated mission environment (see Fig. 3).

Fig. 3.
figure 3

Participant view of a perimeter monitoring task taking place in a simulated urban environment in the VRMIX.

Much like the VTB, the VRMIX has the capacity to simulate autonomus teammates in addition to a variety of tasks and operations for human-in-the-loop experimentation [31]; however, it also has the vital capability of allowing users to interact with a simulated MMI that they can “hold” (insofar as they hold and interact with a controller which is represented in VR) and use to communicate with a remote autonomous teammate. As indicated in Fig. 1, several wearable devices may be required to support MMC with the RCTA MMI, potentially including wireless headset, a tablet of visual display, and a tactile display such as a tactor belt or vest depending on need. The VRMIX simulation testbed has the capacity to simulate interaction through each of those required modalities by generating simulated events which trasmit real data to most any device in order to elicit natural communication experiences. As such, the VRMIX provides a unique method for evaluating HRT effectiveness in relevant scenarios while supporting environments, tasks, and interactions. Moreover, VRMIX support dismounted HRT while environments like the scaled MOUT and MIX testbed do not.

4 HRI Research Example: Dismounted Solider-Robot Teams

Although it is known that cognitive overload may be ameliorated by distributing loading between processing modes, the optimal method for balancing loading and especially for implementing a system to execute that balancing is far from being well understood. Accordingly, it is important at this stage in MMC research to address the issues of when, why, and how to make use of each available modality. Here, we describe a possible research approach for investigating the specific performance benefits introduced by the tactile modality as a supplementary avenue of communication for human-robot teams.

Tactile communication is a potential untapped resource for use in augmenting robot-to-human communication. The use of haptic cues and feedback is a well known method for notifications in commercial products like cell phone and gaming consoles. Moreover, efforts are ongoing to deliver hands-free navigation using tactile belts for the U.S. Army, enabling a Soldier to maintain light discipline and keep their “heads-up” while traveling in a pre-defined point, [31]. Research to extend a tactile displays ability to convey content similar or complementary to speech are still early, but show promise. Barber, et al., conducted a series of studies to determine the feasibility of delivering two-word phrases within a HRT task, [28]. Results of this effort showed the ability of participants to receive simple reports such as “danger to the north” with high reliability and low response times. Although promising, it is still unclear if one may benefit from having this additional delivery modality paired with speech or other forms of communication from a robot.

The virtual simulation approach that we propose here for investigating tactile communications within MMC makes use of both the VRMIX environment, and the Cordon and Search (C&S) operational context briefly described in the introduction. Though the focus of such an experiment would be centered on the potential performance augmenting effects of tactile communication, it is important to keep in mind that auditory and visual communication may also be provided and evaluated as comparative standards. The basic approach that we suggest is a systematic combination/evaluation of communication modalities in a mission context that allows for empirical performance evaluation (e.g., assessing performance of the Cordon and Search task with the provision of communication via visual, visual + auditory, visual + tactile, auditory + tactile, etc.).

The experimental approach would ideally directly compare each combination of communication modalities with respect to two primary outcomes: performance of the monitoring tasks that are vital to the Cordon aspect of C&S, as well as performance of the Search component of the task. The paradigm that we propose separates the two elements of the task between a human and robotic teammate such that the human (the participant in the context of human subjects research) completes the perimeter monitoring component of the C&S task while also monitoring reports from an autonomous robotic teammate as they ostensibly complete the search component. Using a wizard-of-oz approach, the robot teammate searches the inner cordon area and reports findings using the proposed combinations of modalities. Accordingly, the tasks administered to the human teammate would include the perimeter monitoring task as well as a dedicated robot monitoring or communication task. Evaluation of communication modality effectiveness would therefore be accomplished via investigation of improved or reduced performance of the communication task.

An important a priori decision for such an experiment is the quantification of performance metrics. Performance of a perimeter monitoring task may be quantified well enough through signal detection metrics (see 30, 33), however, the success and effectiveness of communication exchanges as they relate to team performance is not so easily determined by a generalizable paradigm. We suggest that one effective approach to measuring the effectiveness of communications is to measure the development and maintenance of team situational awareness in the context of a given mission context. Situational awareness (SA) generally refers to ones understanding of the current state as well as potential future states of a given situation as it relates to a known (or developing) set of objectives or goals. Given the context of a C&S operation, we propose that evaluation of the quality (and therefore measured effectiveness) of SA should follow from the nature of the search at hand. Consider, for example, a C&S operation that is concerned with the identification of potential bomb-making materials in an urban environment. While the human teammate is busy conducting a monitoring task to ensure the safety of the operation, a robotic teammate may sweep the cordoned area and relay information regarding the presence of potentially dangerous materials. Awareness of the robotic teammate’s location, current/past/future actions, findings, and status are all important aspects of communicable information which lends itself to evaluation in that context. Accordingly, if the robotic teammates actions, findings, and communications are predetermined (as they necessarily would be then the following performance data may be assessed during the course of the mission and in the context of communication modalities: overall team performance in identifying hazardous materials, the human’s ability to recall and recognize information regarding reports that occur during the mission, self-reported and objective outcomes with respect to the human teammate’s ability to interpret the communications sent by their robotic counterpart, the development and quality of situational awareness as it develops over the course of a mission and in the context of various loading conditions, and the overall performance of the human’s primary perimeter monitoring task. Note also that it may be relevant to consider individual differences in the ways that human teammates interact with their virtual counterpart as these variations have been shown to have a significant effect on both individual and team behaviors. Particularly, considering the virtual reality administration method and military focused scenario, some relevant experience should be accounted for to avoid confounding effects. Table 1 below details the possible independent and dependent variables included in the proposed investigative approach.

Table 1. Design approach for investigating MMC

Additional measures that may be relevant to the evaluation of the effectiveness of MMC pairings/sets include measures of perceived workload which may include self-report measures (such as the NASA-Task Load Index, see [33]) or physiological response measures such as Heart-Rate Variability (HRV), Inter-Beat Interval, or Galvanic Skin Response (GSR) which have been shown to correlate with workload, [18, 34]. Physiological responses may prove especially useful for future implementations of MMC in real world systems as they have the potential to provide real time information that could be used to tailor communication modalities on-the-fly in order to optimize performance.

5 Conclusions

The purpose for this paper is to capture current gaps in the development and assessment of multimodal communication in squad-level human robot teams. Although advances are rapid in the area of machine intelligence, the ability to perform as a cohesive team in environments relevant to the military is not yet achieved. Therefore simulation techniques are necessary to explore how well research findings from human-human communication translate to human-robot teams, as well as how to take advantage of the “super-human” capabilities robots can provide. The VRMIX testbed described provides a platform to emulate future military relevant scenarios with soldier-robot teams to evaluate multimodal communication strategies. Finally, an example experiment focusing on Cordon & Search with a robot team member to investigate tactile communications is proposed with goal of advancing our understanding of how tactile messages may improve communication and situation awareness.