User interaction
From MIReS
The grand challenge of user interaction is how to design MIR systems that put the user at the centre of the system. This applies to the whole interaction loop, including visualisation, input devices, manipulation metaphors, and also system adaptation to user behaviour. This challenge is relevant because it contributes to both the user’s and to the researcher’s (e.g. system designer’s) understanding of the system’s features and components, the overall purpose of the system, and the contribution the system can make to the user’s activities. The benefit to users is more productive workflows and systems which better serve the users’ needs. The researchers stand to benefit from the feedback loop which enables them to fine-tune and develop systems with greater accuracy. Effective user-oriented research will have a major impact on the usability of MIR systems and their wider deployment.
Back to → Roadmap:User perspective
Contents |
State of the art
In the last decade, Human Computer Interaction (HCI) research has witnessed a change in focus from conventional ways to control and communicate with computers (keyboards, joysticks, mice, knobs, levers, buttons, etc.) to more intuitive uses of non-conventional devices such as gloves, speech recognition, eye trackers, cameras, and tangible user interfaces. As a result of technological advances and the desire to surpass the WIMP (window, icon, menu, pointing device) limitations, interaction research has progressed beyond the desktop and the ubiquitous graphical user interface (GUI) into new physical and social contexts. Since terms such as "multi-touch" and gestures like "two-finger pinch and zoom" have become part of the users’ daily life, novel research areas such as "tangible interaction" have finally entered the mainstream. However, aside from the ongoing research explicitly focused towards real-time musical performance which typically falls under the New Interfaces for Musical Expression ([www.nime.org NIME]) discipline, not much of this research has yet been devoted to novel interface and interaction concepts in the field of MIR.
The Association for Computing Machinery defines human-computer interaction (HCI) as "a discipline concerned with the design, evaluation and implementation of interactive computing systems for human use and with the study of major phenomena surrounding them." (ACM SIGCHI Curricula for Human-Computer Interaction) HCI involves the study, planning, and design of the interaction between people (users) and computers. It is often regarded as the intersection of computer science, behavioural sciences, design and several other fields of study. Interaction between users and computers occurs at the interface which is the result of particular affordances of a given combination of software and hardware. The basic and initial goal of HCI is therefore to improve the interactions between users and computers by making computers more usable and responsive to the user's needs. For decades HCI has mostly focused on making interaction more efficient, though more recently the emphasis has shifted to the user’s Quality of Experience, highlighting the benefits of beauty and fun, and the intrinsic values of the experience and its outcomes [e.g. Norman, 2004; McCarthy and Wright, 2004]. The human component in HCI is therefore highly relevant from the cultural, psychological and physiological perspectives.
MIR could benefit from knowledge inherited from HCI and other related disciplines such as User Experience (UX), and Interface and Interaction Design studies. These methodologies could bring benefits not only to the conception of MIR systems at earlier design stages, but also for the evaluation and subsequent iterative refinement of these systems. While the evaluation of MIR systems is traditionally and historically conceived to provide categorically correct answers (e.g finding or identifying a known target song), new evaluation challenges are presented by open systems which leave users room for interpretation [e.g. Sengers and Gaver, 2006], include more subjective aspects (e.g. the users’ emotions, perceptions and internal states [e.g. Hekkert, 2006]), or encourage contextual engagement [e.g. Hassenzahl and Tractinsky, 2011](The current SOA in MIR evaluation of research results is covered in section Evaluation methodologies). Furthermore, beyond the evaluation of User Experience, another MIR component that would directly benefit from HCI-related knowledge would be research into open and holistic systems for the creation of MIR systems and tools.
Music Search Interaction
Over the past 12 years a few projects from the MIR community have contributed to the development of interfaces for music search and discovery. In the field of data visualisation, there is an extensive bibliography on the representation of auditory data. In the particular case of the visual organisation of musical data, solutions often consist of extracting feature descriptors from data files, and creating a multidimensional feature space that will be projected onto a 2D surface, using dimensionality reduction techniques (e.g. Islands of Music [Pampalk, 2003]; SOM: Self Organizing Map [Kohonen, 2001]; SOMeJB [Lidy and Rauber, 2003] and [Hlavac et al., 2004]). Beyond 2D views, the advantage of a topological metaphor has been applied to facilitate users’ exploration of big data collections in nepTune, an interactively explorable 3D version of Islands of Music, which supports spatialised sound playback [Knees et al., 2007], and the Globe of Music which places a collection on a spherical surface to avoid any edges or discontinuities [Leitich and Topf, 2007]. More recently, MusicGalaxy [Stober and Nürnberger, 2010] implements an adaptive zoomable interface for exploration that makes use of a complex non-linear multi-focal zoom lens and introduces the concept of facet distances representing different aspects of music similarity. Musicream [Goto and Goto, 2005] uses the "search by example" paradigm, representing the songs with dynamic coloured circles which fall from the top of the screen and when selected show their title and can be used to ‘fish’ for similar ones.
In terms of developing a user-oriented visual language for screen-based music searches, the interactive aspect of most commercial library music applications has resorted to the metaphor of spreadsheets (e.g. iTunes) or has relied on searching for music by filling a set of forms and radio buttons (e.g. SynchTank). Innovative approaches from the MIR community suggested visually mapping sound clusters into abstract "islands" [e.g. Pampalk, 2003]; collaborative mapping onto real geographical visual references (e.g. Freesound); and tangible tabletop abstract symbols (e.g. SongExplorer [Julià and Jordà, 2009]). Visual references have included control panels used in engineering (e.g. MusicBox [Lillie, 2008]); gaming platforms (Musicream [Goto and Goto, 2005]); lines of notation (e.g. Sonaris and mHashup [Magas et al., 2008]); or turntables (Songle [Goto et al., 2012]).
A few MIR-driven search interfaces have addressed different user contexts. Mediasquare [Dittenbach et al., 2007] addresses social interaction in 3D virtual space where users are impersonated by avatars enabling them to browse and experience multimedia content by literally walking through it. decibel 151 [Magas et al., 2009] enables multi-user social interaction in physical space by turning each user into a "walking playlist", creating search environments for social networking in real time. Special visual interfaces have addressed poorly described or less familiar music to the user (e.g. field recordings; ethnomusicological collections) to both educate and allow music discovery in an entertaining way (e.g. Songlines 2010 and [Magas and Proutskova, 2013]). User contexts however remain vastly under-researched and remain a major challenge for the MIR community.
Some of the above interfaces have adopted HCI research methods which consider MIR-driven search systems holistically, not only as visual representations of data, but focusing on the user Quality of Experience. This resulted from a coherent system design approach which creates a feedback loop for an iterative research and innovation process between the interactive front end and the data processing back end of the application. Further research challenges are presented by a holistic approach to MIR user-oriented system design in the context of novel devices and modalities, real-time networks, collaborative platforms, open systems, physical experiences and tangible interfaces.
Tangible and Tabletop Interaction
Tangible User Interfaces (TUI), which combine control and representation in a single physical device emphasise tangibility and materiality, physical embodiment of data, bodily interaction and the embedding of systems in real spaces and contexts. Although several implementations predate this concept, the term Tangible User Interface was coined at the MIT MediaLab in 1997 [Ullmer and Ishii, 2001] to define interfaces which augment the real physical world by coupling digital information to everyday physical objects and environments. Such interfaces contribute to the user experience by fusing the representation and control of digital data with physical artefacts thus allowing users to literally "grasp data" with their own hands.
Within the domain of Tangible Interaction, Tabletop Interaction constitutes a special research field which uses the paradigm of a horizontal surface meant to be touched and/or manipulated via the objects placed on it. In contrast to the mouse and keyboard interface model which restricts the user's input to an ordered sequence of events (click, click, double click, etc.), this type of interface allows multiple input events to enter the system at the same time, enabling any action at any time or position, by one or several simultaneous users. The implicit ability of tabletop interfaces to support physical tracked objects with particular volume, shape and weight properties, expands the bandwidth and richness of the interaction beyond the simple idea of multi-touch. Such objects can represent abstract concepts or real entities; they can relate to other objects on the surface; they can be moved and turned around on the table surface, and these spatial changes can affect their internal properties and their relationships with neighbouring objects. The availability of open-source, cross-platform computer vision frameworks that allow the tracking of fiducial markers combined with multi-touch finger tracking (e.g. reacTIVision, which was developed for the Reactable project [Bencina et al., 2005]), have become widely used among the tabletop developers community (both academic and industrial), and have increased the development of tabletop applications for educational and creative use [e.g. Khandelwal and Mazalek, 2007; Gallardo et al., 2008].
There is a growing interest in applying Tabletop Interfaces to the music domain. From the Audiopad [Patten et al., 2002] to the Reactable [Jordà et al., 2007], music performance and creation has become the most popular and successful application field in the entire lifetime of this interaction paradigm. Tabletop interfaces developed using MIR have specifically focused on interacting with large music collections. Musictable [Stavness et al., 2005], takes a visualisation approach similar to the one chosen in Pampalk’s Islands of Music, for creating a two dimensional map that, when projected on a table, is used to make collaborative decisions to generate playlists. Hitchner [Hitchner et al., 2007] uses a SOM to build the map visually represented by a low-resolution mosaic, enabling the users to redistribute the songs according to their preferences. Audioscapes is a framework enabling innovative ways of interacting with large audio collections using touch-based and gestural controllers [Ness and Tzanetakis, 2009]. The MTG’s SongExplorer [Julià and Jordà, 2009] uses high-level descriptors of musical songs applied to N-Dimensional navigation on a 2D plane, thus creating a coherent 2D map based on similarity with specially designed tangible pucks for more intuitive interaction with the tabletop visual interface. Tests comparing the system with a conventional GUI interface controlling the same music collection, showed that the tabletop implementation was a much more efficient tool for discovering new, valuable music to the users. Thus the specific affordances of tabletop interfaces (support of collaboration and sharing of control; continuous, real-time interaction with multidimensional data; support of complex, expressive and explorative interaction [Jordà, 2008]), together with the more ubiquitous and easily available individual multi-touch devices, such as tablets and smart-phones, can bring novel approaches to the field of MIR, not only for music browsing but particularly for the more creative aspects related to MIR music creation and performance.
The physical embodiment of data, bodily interaction and the embedding of systems in real spaces and contexts is particularly present in recent research into gestural and spatial interaction. The Real-Time Musical Interactions team at IRCAM has been working with motion sensors embedded within everyday objects to explore concepts of physical and gestural interaction which integrate performance, gaming and musical experience. Their Interlude project combined interactivity, multimodal modelling, movement tracking and machine learning to explore new means for musical expression [Bevilacqua et al., 2011a], [Bevilacqua et al., 2011b] and [Schnell et al., 2011]. The results included the Urban Musical Game which breaks down some of the boundaries between audience and musician by producing a sound environment through the introduction of a musical ball; Mogees which uses piezo sensors coupled with gesture recognition technology for music control allowing users to easily transform any surface into a musical interface; and MOs (Modular Musical Objects) which represent one of the pioneering attempts to answer the challenges of tangible, behaviour-driven musical objects for music creation. This project has demonstrated the huge potential of research into physical and gestural interfaces for MIR within the context of future internet applications for the Internet of Things.
References
- [Bencina et al., 2005] R. Bencina, M. Kaltenbrunner, and S. Jordà. Improved Topological Fiducial Tracking in the reacTIVision System. In Proceedings of the IEEE International Workshop on Projector-Camera Systems, 2005.
- [Bevilacqua et al., 2011a] Bevilacqua, F., Schnell, N., and Alaoui, S. Gesture capture: Paradigms in interactive music/dance systems. In: Transcript Verlag, editor, Emerging Bodies: The Performance of Worldmaking in Dance and Choreography, p. 183-193, 2011.
- [Bevilacqua et al., 2011b] F. Bevilacqua, N. Schnell, N. Rasamimanana, B. Zamborlin, and F. Guedy. Online gesture analysis and control of audio processing, musical robots and interactive multimodal systems. In Tracts in Advanced Robotics, 74, 2011, Springer Verlag.
- [Dittenbach et al., 2007] M. Dittenbach, H. Berge, R. Genswaider, A. Pesenhofer, A. Rauber, T. Lidy, and W. Merkl. Shaping 3D multimedia environments: The media square. In Proceedings of the 6th ACM International Conference on Image and Video Retrieval, Amsterdam, Netherlands, 2007.
- [Hlavac et al., 2004] P. Hlavac, E. Pampalk and P. Herrera. Hierarchical organization and visualization of drum sample libraries. In Proceedings of the 7th International Conference on Digital Audio Effects (DAFx'04), pp. 378-383, Naples, Italy, 2004.
- [Gallardo et al., 2008] D. Gallardo, C.F. Julià, and S. Jordà. TurTan: A tangible programming language for creative exploration. In Proceedings of the 3rd IEEE International Workshop on Horizontal Interactive Human Computer Systems, pp. 89-92, 2008.
- [Goto and Goto, 2005] M. Goto and T. Goto. Musicream: New music playback interface for streaming, sticking, sorting, and recalling musical pieces. In Proceedings of the 6th International Conference on Music Information Retrieval, pp. 404-411, London, UK, 2005.
- [Goto et al., 2012] M. Goto, J. Ogata, K. Yoshii, H. Fujihara, M. Mauch, and T. Nakano. PodCastle and Songle: Crowdsourcing-Based Web Services for Spoken Content Retrieval and Active Music Listening. In Proceedings of the 2012 ACM Workshop on Crowdsourcing for Multimedia (CrowdMM 2012), pp. 1-2, 2012.
- [Hassenzahl and Tractinsky, 2011] M. Hassenzahl and N. Tractinsky. Online gesture analysis and control of audio processing, musical robots and interactive multimodal systems. In Tracts in Advanced Robotics, 74, 2011, Springer Verlag.
- [Hekkert, 2006] P. Hekkert. Design aesthetics: Principles of pleasure in product design. Psychology Science, 48(2): 157-172, 2006.
- [Hitchner et al., 2007] S. Hitchner, J. Murdoch, and G. Tzanetakis. Music browsing using a tabletop display. In Proceedings of the 8th International Conference on Music Information Retrieval, Vienna, Austria, 2007.
- [Jordà, 2008] S. Jordà. On Stage: the reacTable and other musical tangibles go real. International Journal of Arts and Technology, 1(34): 268-287, 2008.
- [Jordà et al., 2007] S. Jordà, G. Geiger, M. Alonso, and M. Kaltenbrunner. The reacTable: exploring the synergy between live music performance and tabletop tangible interfaces. In Proceedings of the 1st International Conference on Tangible and Embedded Interaction, pp. 139-146, ACM, 2007.
- [Julià and Jordà, 2009] C.F. Julià and S. Jordà. Songexplorer: A tabletop application for exploring large collections of songs. In Proceedings of the 10th International Conference on Music Information Retrieval, Kobe, Japan 2009.
- [Khandelwal and Mazalek, 2007] M. Khandelwal and A. Mazalek. Teaching Table: a tangible mentor for pre-k math education. In Proceedings of the 1st International Conference on Tangible and Embedded Interaction, pp. 191-194, ACM, 2007.
- [Knees et al., 2007] P. Knees, T. Pohle, M. Schedl, and G. Widmer. Exploring Music Collections in Virtual Landscapes. IEEE MultiMedia, 14(3): 46-54, 2007.
- [Kohonen, 2001] T. Kohonen. Self-Organizing Maps. Springer, 2001.
- [Leitich and Topf, 2007] S. Leitich and M. Topf. Globe of Music: Music Library Visualization Using GEOSOM. In Proceedings of the 8th International Conference on Music Information Retrieval, Vienna, Austria, 2007.
- [Lidy and Rauber, 2003] T. Lidy and A. Rauber. Genre-oriented Organisation of Music Collections using the SOMeJB System: An Analysis of Rhythm Patterns and Other Features. In Proceedings of the DELOS Workshop on Multimedia Contents in Digital Libraries, 2003.
- [Lillie, 2008] A.S. Lillie. Musicbox: Navigating the space of your music. Master's thesis, Massachussets Institute of Technology, 2008.
- [Magas and Proutskova, 2013] M. Magas and P. Proutskova. A location-tracking interface for ethnomusicological collections. JNMR Special Issue on Computational Musicology, 2013.
- [Magas et al., 2009] M. Magas, R. Stewart, and B. Fields. decibel 151: Collaborative spatial audio interactive environment. In ACM SIGGRAPH, 2009.
- [Magas et al., 2008] M. Magas, M. Casey and C. Rhodes. mHashup: fast visual music discovery via locality sensitive hashing. In ACM SIGGRAPH New Tech Demos, 2008.
- [McCarthy and Wright, 2004] J. McCarthy and P. Wright. Technology as experience. Interactions, 1(5):42-43, 2004.
- [Schnell et al., 2011] N. Schnell, F. Bevilacqua, F. Guédy, N. Rasamimanana. Playing and Replaying – Sound, Gesture and Music Analysis and Re-Synthesis for the Interactive Control and Re-Embodiment of Recorded Music, in H. von Loesch and S. Weinzierl (Eds.), Gemessene Interpretation - Computergestützte Aufführungsanalyse im Kreuzverhör der Disziplinen, Klang und Begriff, Volume 4, Schott Verlag, Mainz, 2011.
- [Ness and Tzanetakis, 2009] S. R. Ness and G. Tzanetakis. Audioscapes: Exploring surface interfaces for music exploration. In Proceedings of the International Computer Music Conference (ICMC), Montreal, Canada, 2009.
- [Norman, 2004] D. Norman. Emotional Design: Why We Love (Or Hate) Everyday Things. Basic Books, 2004.
- [Pampalk, 2003] E. Pampalk. Islands of Music: Analysis, Organization, and Visualization of Music Archives. Journal of the Austrian Society for Artificial Intelligence, 24(4): 20-23, 2003.
- [Patten et al., 2002] J. Patten, B. Recht and H. Ishii. Audiopad: A tag-based interface for musical performance. In Proceedings of the Conference on New Interfaces for Musical Expression (NIME), pp. 1-6, 2002.
- [Sengers and Gaver, 2006] P. Sengers and B. Gaver. Staying open to interpretation: Engaging multiple meanings in design and evaluation. In Proceedings of the 6th Conference on Designing Interactive Systems, 2006.
- [Stavness et al., 2005] I. Stavness, J. Gluck, L. Vilhan, and S. Fels. The MUSICtable: A map-based ubiquitous system for social interaction with a digital music collection. Lecture Notes in Computer Science, 3711:291-302 , 2005.
- [Stober and Nürnberger, 2010] Sebastian Stober and Andreas Nürnberger. MusicGalaxy: A multi-focus zoomable interface for multi-facet exploration of music collections. In: Sølvi Ystad, Mitsuko Aramaki, Richard Kronland-Martinet, and Kristoffer Jensen, editors, Exploring Music Contents, volume 6684 of LNCS, pp. 273-302, 2011, Springer Verlag.
- [Ullmer and Ishii, 2001] B. Ullmer and H. Ishii. Emerging Frameworks for Tangible User Interfaces. In Human-Computer Interaction in the New Millenium, Ed. John M. Carroll, Addison-Wesley, pp. 579-601, 2001.
Challenges
- Develop open systems which adapt to the user. HCI research has shown that systems which leave users room for interpretation, include more subjective aspects or encourage contextual engagement, contribute to an improved Quality of Experience for the user.
- Design MIR-based systems more holistically. A System Design approach must include user experience, and not only focus on the engine or the algorithms of a given system. Front and back-ends cannot be interchanged without consequences: a given algorithmic mechanism will probably favour a particular type of interface or interaction methods.
- Develop interfaces to better address collaborative, co-creative and sharing multi-user applications. Most MIR interfaces have been developed for a single user. In the context of open social networks, multi-user MIR applications present opportunities for enhanced co-creation and sharing or music.
- Develop interfaces which make a broader range of music more accessible and "edutain" audiences. Many users find that new (to them) styles of music are inaccessible. Interfaces which elucidate structure, expression, harmony, etc. can contribute to "enhanced listening" offering both education and entertainment at the same time.
- Expand the MIR systems interaction beyond the multi-touch paradigm. Physical tracked objects with particular volume, shape and weight properties, can considerably expand the bandwidth and richness of MIR interaction.
- Consider the context in the design of MIR systems. MIR methods or applications should take into account the context and device in which they will be used, e.g., a multi-user spatial environment is not simply an augmented geographic interface; interaction methodologies for a tabletop application cannot be simply transferred from those used on a smartphone or mobile screen-based device.
- Design MIR system interfaces for existing and novel modalities for music creation. "Real-time MIR" interface and interaction research can successfully bridge the gap between MIR and NIME (New Interfaces for Musical Expression). Physical and gestural interaction can integrate performance, gaming and musical experience.