Abstract
As robots become more affordable and more common in everyday life, there will be an ever-increasing demand for adaptive behavior that is personalized to the individual needs of users. To accomplish this, robots will need to learn about their users unique preferences through interaction. Current preference learning techniques lack the ability to infer long-term, task-independent preferences in realistic, interactive, incomplete-information settings. To address this gap, we introduce a novel preference-inference formulation, inspired by assistive robotics applications, in which a robot must infer these kinds of preferences based only on observing the user’s behavior in various tasks. We then propose a candidate inference algorithm based on maximum-margin methods, and evaluate its performance in the context of robot-assisted prehabilitation. We find that the algorithm learns to predict aspects of the user’s behavior as it is given more data, and that it shows strong convergence properties after a small number of iterations.