Creative tools
From MIReS
Creative practitioners produce, transform and reuse music materials. The MIR challenge is how to develop tools that process music information in a way that enhances creative production. Tools for automatically extracting relevant information from audio materials could be developed for purposes such as content-based manipulation, generativity, synchronisation with other media, or real-time processing. Moreover, the large volume of available data requires efficient data manipulation systems that enable new methods of manipulation for creative purposes. This challenge requires collaborative research between music information researchers and the actors, including artists, performers and creative industries professionals. The impact of this research is in generating more creative possibilities and enabling production efficiency in a variety of creative contexts including music performance, music and sound production and post-production, sound engineering applications for audiovisual production, art installations, creative marketing, mobile apps, gaming, commercial installations, environmental installations, indoor and outdoor events. Collaborative research between creative practitioners and music information researchers contributes to bridging the gap between the arts and the sciences, and introduces novel practices and methodologies. It extends the paradigm of user-centred research to creative-input research, where the feedback loop between creative practitioner and researcher is an iterative, knowledge-building process, supported by adaptive modelling of research environments and resulting in more versatile creative tools.
Back to → Roadmap:Exploitation perspective
Contents |
State of the art
Music Information Research offers multiple possibilities for supporting musical creation or for inspiring the creation of multimedia art pieces. The creative possibilities of MIR can be studied and classified according to many different criteria, such as off-line tools (for composition or editing) vs. real-time tools for interaction, which in turn can be divided into applications for live performance and for art installations, or categories of tools designed for professional vs. tools designed for novice end-users, the latter including all type of applications promoting different models of "active listening".
Content-based sound processing
Content-based sound processing consists in using high-level information of the audio signal in order to process it. It includes processes controlled by high-level parameters and processes based on the decomposition of the audio content, through operations such as segmentation, source separation and transcription, into elements that can be processed independently. The goal is to provide expert end-users (e.g. musicians, sound designers) with intuitive tools, controlled through parameters relevant from the viewpoint of human cognition of music and sound, and also to enhance the quality of existing processes by selecting appropriate processes and parameter sets according to the nature of the extracted elements. For instance, a better subjective quality for slowing down a sound by time-stretching is obtained if the transient parts are separated from the sustained ones and preserved in the time-scale change process [Roebel, 2003].
Sound editing
Sound editing refers to offline tools using pre-recorded audio contents. Some commercial products have started to implement such features. These include Celemony’s Melodyne and Roland’s R-Mix, which provide studio production tools for pitch recognition and correction, tempo and timing alteration and spectrum visualisation. IRCAM’s Audiosculpt, targeting expert users, enables to compute various kinds of analyses (segmentation and beat tracking, pitch and spectral envelope analysis) and use them as inputs for high-quality audio processes. Apple’s GarageBand is a good example of content-based processing application aimed at mass-market end-users: it automatically processes the content of an AppleLoop imported into a sequence by adapting its tempo and pitch scale to the sequence musical context. Most existing tools efficiently implement content-based editing for monophonic signals, however they also demonstrate the current limitations of the state-of-the-art of the research on the analysis of polyphonic recordings. A significant advance in this direction is the integration of a polyphonic transcription (audio to MIDI) in the Live 9 application by Ableton issued early 2013.
Computer-aided composition
Software environments for computer-aided composition such as OpenMusic, CommonMusic or PWGL are not only used for computing instrumental scores from user-defined algorithms, but also for controlling various kinds of sound syntheses from symbolic music representations [Agon et al., 2011]. In these environments the availability of audio analysis modules extracting musical information in the form of symbolic structures enables composers to elaborate scores with parameters in relation to the content of input sound files, and also to control sound synthesis from processing the extracted information at the symbolic level. The unlimited computing possibilities of these music languages allow expert musicians to adapt all the available analysis parameters to a broad variety of aesthetic approaches.
Use of audio databases for music and sound production
The advancement of audio database technologies enables new applications for sound and music production, not only for content-based management of audio samples but also for the development of new methods for sound synthesis and music composition.
Content-based management of audio samples
MIR techniques can be very convenient for finding suitable loops or sound files to fit a particular composition or mix. The MuscleFish [Wold et al., 1996] and Studio Online [Wöhrmann and Ballet, 1999] systems developed at the end of the 90s were the very first applications of content-based search in audio sample databases that have been further elaborated in the CUIDADO European project [Vinet et al., 2002]. More recently, the availability of large public and free sound databases and repositories such as Freesound has become mainstream. Using repositories and APIs such as EchoNest’s Remix Python API or MTG’s Essentia, developers and hackers are creating a panoply of imaginative remix applications, many of them being developed during Music Hack Day events, which lately appear to be a very productive place for MIR based creation. However, even though the use of large audio sample banks is now mainstream in music production, the existing products, such as Native Instrument’s Kontakt, MOTU’s MachFive or Vienna Symphonic Library are not yet exploiting the full potential of MIR technologies for the content-based management audio databases.
Corpus-based synthesis and musaicing
One of the most obvious MIR applications for real-time music creation is that of "concatenative synthesis" [e.g. Schwarz, 2005; Maestre et al., 2009], "musaicing" [Zils and Pach, 2001] or mashup. These three terms approximately relate to the same idea, creating new music by means of concatenating short fragments of sound or music recordings to "approximate" the sound of a target piece. More precisely, an existing music piece or musical fragment is substituted with small, similar sounding music fragments, leading to a similarly structured result. The duration of these sound units can vary depending on the techniques employed and the desired aesthetic results, but are roughly in the range of 10 milliseconds up to several seconds or several musical bars. While manual procedures could be used with longer fragments (i.e. of several seconds), the use of shorter fragments inevitably leads to automatised MIR analysis and recovery techniques, in which a "target" track or sound is analysed, its descriptors extracted for every small fragment, and these fragments substituted with the best candidates from a large database of sound snippets. When using a pre-analysed sound repository and a compact feature representation, these techniques can be efficiently applied in real-time. [Janer and de Boer, 2008] describe a method for real-time voice-driven audio mosaicing synthesis. BeatJockey [Molina et al., 2011] is a system aimed at DJs, which integrates audio mosaicing, beat-tracking and machine learning techniques and brings them into the Reactable musical tabletop. Several commercial tools following this approach (such as Steinberg’s LoopmashVST plugin and iOS app) are also already available. These techniques bring new creative possibilities somewhere in between synthesis control and remixing, and open the path to radically novel control interfaces and interaction modalities for music performance.
Computer-aided orchestration
In comparison to other aspects of musical composition (harmony, rhythm, counterpoint), orchestration has a specific status: intended as the art of selecting and mixing individual instrument timbres to produce a given "colour", it relates more closely to the real experience of sound from an orchestra. The same chord can produce a very different timbre depending on the instruments selected for performing it, and, despite existing treatises providing recipes for specific cases, orchestration has generally remained an empirical art based on mostly unelicited rules. An original approach recently developed in the framework of computer-aided composition tools has been to concentrate on the approximation, in terms of sound similarity, of a given sound target from the combination of elementary note samples from a set of selected instruments, using multiobjective optimisation, for managing the combinatorial issue of search into sound sample databases of hundreds of thousands of items. This work has been already used for the composition of numerous contemporary music works and implemented as the Orchidée software [Carpentier and Bresson, 2010]. One of its main limitations was however that it only considered the static properties of the source, and the latest advances in related research have been to design a new search algorithm, named MultiObjective Time Series, that efficiently computes similarity distances from the coding of the temporal evolution of multiple descriptors of audio samples so that the dynamic properties of the target and of the original sound samples are taken into account in the approximation [Esling and Agon, 2013].
Live performance applications
The applications discussed in the previous parts mainly concern offline processes and composition tools. The process of music information generated in the context of live performance applications imposes specific constraints on the audio analysis algorithms, in terms of causality, latency, and implementation (computing power, distributed processing vs. real-time performance). Applications not only concern live music, but also theatre, dance and multimedia. The computer music community has produced numerous software environments dedicated to the programming and real-time scheduling of algorithms for audio and music information processing, including, among many others, Max, Pd, SuperCollider, and Chuck.
Beat syncing
A broad use-case of automatic beat tracking algorithms is live mixing applications for DJs, such as Native Instrument’s Traktor, that enable to manage the transition between tracks in a beat-synchronous way, using time-stretching for managing the tempo evolution between them.
Improvisation and interaction using symbolic sequence models
While musaicing or remixing applications mostly rely on low-level signal processing analysis, the following examples focus on musical knowledge and understanding. [Assayag et al., 2006] describe a multi-agent architecture for an improvisation-oriented musician-machine interaction system that learns in real-time from human performers, and establishes improvisatory dialogues with the performers by recycling their own audio material using an Oracle Factor for coding the multiple relationships of music symbol subsequences. Recent applications of this model also include interactive arranging and voicing from a learned musical corpus. The Wekinator [Fiebrink et al., 2006] is a real-time machine learning toolkit that can be used in the processes of music composition and performance, as well as to build new musical interfaces. Pachet is working with Constrained Markov Models (CMM) for studying musical style by analysing musicians, extracting relevant features and modelling them using CMM [Pachet and Roy, 2009], an approach that allows systems to improvise in a given style or along with any other musicians.
Score following and interactive accompaniment
Numerous contemporary music works, named mixed works, rely on the combination of instrumental parts and electronic sounds produced by real-time synthesis or processing of the instrument sounds. Different strategies exist for synchronising those various parts live in concert, the most straightforward ones, but least musical, consisting in pre-recording a soundtrack and superimposing the performers to play with it. Conversely, score following aims to automatically synchronise computer actions with real-time analysis of performance and to compare them with an internal model of the performed score. The latest advances of research on this subject, implemented in the Antescofo application include a continuous tempo tracking of the performance and the definition of a language for specifying the real-time processes [Cont, 2010]. Another use case of the same algorithms is interactive accompaniment or "music minus one", where a solo performer can train on a pre-recorded accompaniment sound track that follows his (her) tempo evolutions.
Performance/sound interaction
The NIME community is very active in the design of new performance/sound interaction systems that extend the traditional notion of musical instruments. The main aspects of the field related to MIR technologies are presented in section 3 and will not be developed here.
Art installations
Sound has featured extensively in art installations since Luigi Russolo’s Futurist Manifesto "The Art of Noises" described the sound of the urban industrial landscape and became the source of inspiration for many artists and composers (e.g. Edgard Varèse, John Cage and Pierre Schaefer) [Russolo, 1913]. Recent music technologies offer increased opportunities for immersive sound art experiences and physical explorations of sound. Art installations offer novel ways of using these technologies and enable novel experiences particularly through placing the focus on the audience and their context.
Environmental sound installations
Art installations have increasingly been using data from field recordings or sounds generated through real-time location-based interaction in order to trigger various behaviours (e.g. Sound Mapping London Tea Houses which was an installation at the Victoria and Albert Museum in 2011, by the G-Hack group from Queen Mary, University of London). Artists have explored the integration of unfamiliar sounds into new physical environments (e.g. Bill Fontana’s White Sound: An Urban Seascape, 2011). Generative music has been used in response to the environment. For instance Variable 4 is an environmental algorithmic weather machine which generates a unique musical composition which reflects the changing atmosphere of that particular environment. Radioactive Orchestra aims to produce a musical sequence from the radioactivity of nuclear isotopes.
Collaborative sound art
Collaborative music making has been expressed through art installations such as Play.Orchestra which blurs the borders between audience and player, amateur and professional. Atau Tanaka’s Global String metaphorically wraps a musical string around the world and through user engagement creates a collaborative instrument between world art galleries exploring the idea of communication via non-linguistic musical interaction and collaboration.
Body generative sound art
Using the human body as an instigator in the generation of sound and music is a growing research area. Atau Tanaka's Sensorband includes performers wearing a combination of MIDIconductor machines that send and receive ultrasound signals measuring the hands’ rotational positions and relative distance; gestural interaction with invisible infrared beams; and the BioMuse, a system that tracks neural signals (EMG), translating electrical signals from the body into digital data. Since around 2006 Daito Manabe has been working on the project Electric Stimulus which literally plays the body as a sensory network for outputting sound and expression through the application of shock waves to particular nerve centres. The Serendiptichord Dance focuses on the act of performing physically with a circular instrument unifying performance and sound.
Public art using MIR
Few examples of public art installations have used MIR to enable physical presence to interact with music information. Since September 2011 Barcelona's City Council has installed an automatic water and lights choreographies generator for the Magic Fountain of Montjuic (one of the main tourist attractions of the city), based on MIR techniques (more concretely on the Essentia engine). This system allows the person in charge of creating a choreography for the fountain to pick up a musical mp3 track, decide among several high-level parameters' tendencies (such as the average intensity, contrast, speed of change, the amount of repetition, or the main colour tonalities of the desired choreography), and the system generates automatic, music-controlled choreographies at the push of a button. Another example, decibel 151 [Magas et al., 2009] installed at SIGGRAPH 2009, turns users into "walking playlists" and encourages physical explorations of music. MIR systems can therefore offer novel art installation experiences and have a profound impact on the way we as human beings understand space, time and our own bodies. The arts are also uniquely placed, with a degree of freedom from the commercial sector, to offer test grounds for MIR research into gestural and environmental applications of music data.
Commercial end-user applications
As Mark Mulligan states in his 2011 report "digital and social tools have already transformed the artist-fan relationship, but even greater change is coming…the scene is set for the Mass Customisation of music, heralding in the era of Agile Music" [Mulligan, 2011]. Agile Music is a framework for understanding how artist creativity, industry business models and music products must all undergo a programme of radical, transformational change. In this context MIR offers new opportunities for creative commercial installations, applications and environments (e.g. creative marketing tools, mobile apps, gaming, commercial installations, environmental installations, indoor and outdoor events).
Social music applications
The increased choice of music available to the users is currently being explored through creative applications engaging with social media (e.g. Coke Music 24 hr challenge). As one of the current market leaders in social recommendation, Spotify has enjoyed an economic growth of 1 million paying users in March 2011 to 3 million paying users by January 2012, partly thanks to its integration with Facebook. The application Serendip creates a real time social music radio allowing users the opportunity to independently choose ‘DJs’ from their followers and share songs across a range of social media via a seamless Twitter integration.
Mobile music applications
Application developers are using the range of sensory information available on mobile devices to create more immersive sonic experiences and music generators (e.g. the Musicity project; RjDj). There are many apps which allow smart devices, for example the iPhone, to be transformed into portable musical instruments which engage with the body and allow for spontaneous performances (e.g. Reactable App; Bloom). Together with the advent of the Internet-of-Things (IoT), the communication society is witnessing the generalisation of ubiquitous communication, the diversification of media (radio, TV, social media, etc.), the diversification of devices and respective software platforms, and APIs for communities of developers (e.g., iPhone, Android, PDAs, but also Arduinos, Open Hardware, sensors and electronic tags) and the multiplicity of modalities of interaction. This imposes a challenge of facilitating interoperability between devices and facilitating combinations between diverse modalities.
Gaming music applications
The musical interaction team at IRCAM has been working with motion sensors embedded within a ball to explore some of the concepts integrating fun, gaming and musical experience in the Urban Musical Game. Joust is a spatial musical gaming system using motion rhythm and pace as the instigator of the action (Innovation Award: Game Developers Choice Award 2012). Interactive and immersive musical environments have also been used as a way of making commercial products memorable and fun. By gamifying their services and producing interactive experiences, innovative companies are working to increase their products’ core values (e.g. Volkswagen’s Fun Theory and Wrigleys Augmented Reality Music Mixer). The Echo Temple at Virgin Mobile FreeFest created a shared experience of making music through the use of motion tracking cameras and fans branded with special symbols. The use of gaming is another developing application for MIR with various research and commercial possibilities.
References
- [Agon et al., 2011] C. Agon, J. Bresson, and M. Stroppa. OMChroma: Compositional control of sound synthesis. Computer Music Journal, 35(2), 2011. Springer Verlag.
- [Assayag et al., 2006] G. Assayag, G. Bloch, M. Chemillier, A. Cont, and Sh. Dubnov. OMax brothers: a dynamic topology of agents for improvization learning. In Proceedings of the 1st ACM Workshop on Audio and Music Computing Multimedia, pp. 125-132, 2006.
- [Carpentier and Bresson, 2010] G. Carpentier and J. Bresson. Interacting with symbolic, sound and feature spaces in orchidée, a computer-aided orchestration environment. Computer Music Journal, 34(1): 10-27, 2010.
- [Cont, 2010] A. Cont. A coupled duration-focused architecture for realtime music to score alignment. IEEE Transaction on Pattern Analysis and Machine Intelligence, 32(6), 2010.
- [Esling and Agon, 2013] P. Esling and C. Agon. Intelligent sound samples database with multi objective time series matching. IEEE Transactions on Speech Audio and Language Processing, 2013. To appear.
- [Fiebrink et al., 2006] R. Fiebrink, D. Trueman, and P.R. Cook. A meta-instrument for interactive, on-the-fly machine learning. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), Pittsburgh, USA, 2006.
- [Janer and Boer, 2008] J. Janer and M. De Boer. Extending voice-driven synthesis to audio mosaicing. In Proceedings of the 5th Sound and Music Computing Conference, volume 4, Berlin, 2008.
- [Maestre et al., 2009] E. Maestre, R. Ramírez, S. Kersten, and X. Serra. Expressive concatenative synthesis by reusing samples from real performance recordings. Computer Music Journal, 33(4), pp. 23-42, 2009.
- [Magas et al., 2009] M. Magas, R. Stewart, and B. Fields. decibel 151: Collaborative spatial audio interactive environment. In Proceedings of the ACM SIGGRAPH, 2009.
- [Molina et al., 2011] P. Molina, M. Haro, and S. Jordà. Beatjockey: A new tool for enhancing DJ skills. In Proceedings of the International Conference on New Interfaces for Musical Expression (NIME), pp. 288-291, Oslo, Norway, 2011.
- [Mulligan, 2011] M. Mulligan. Music formats and artist creativity in the age of media mass communication. A music industry blog report. Accessed at: Agile Music.
- [Pachet and Roy, 2009] F. Pachet and P. Roy. Markov constraints: steerable generation of Markov sequences. Constraints, 16(2): 148-172, 2009.
- [Roebel, 2003] A. Roebel. A new approach to transient processing in the phase vocoder. In Proceedings of the International Conference on Digital Audio Effects (DAFx), pp. 2344-349, 2003.
- [Russolo, 1913] L. Russolo. l'Arte dei Rumori, 1913.
- [Schwarz, 2005] D. Schwarz. Current research in concatenative sound synthesis. In Proceedings of the International Computer Music Conference (ICMC), Barcelona, Spain, 2005.
- [Vinet et al., 2002] H. Vinet, P. Herrera, and F. Pachet. The CUIDADO project. In Proceedings of the 3rd International Conference on Music Information Retrieval, Paris, France, 2002.
- [Wöhrmann and Ballet, 1999] R. Wöhrmann and G. Ballet. Design and architecture of distributed sound processing and database systems for web-based computer music applications. Computer Music Journal, 23(3): 73-84, 1999.
- [Wold et al., 1996] E. Wold, T. Blum, D. Keislar, and J. Wheaten. Content-based classification, search, and retrieval of audio. MultiMedia, IEEE, 3(3): 27-36, 1996.
- [Zils and Pach, 2001] A. Zils and F. Pach. Musical Mosaic. In Proceedings of the COST G-6 Conference on Digital Audio Effects (DaFx-01), pp. 39-44, Limerick, Ireland, 2001.
Challenges
- Develop methodologies to take advantage of MIR for artistic applications in close collaboration with creators. New possibilities for music content manipulation resulting from MIR research have the power to transform music creation. The development of tools for artistic applications can only be done with the involvement of the creators in the whole research and development process.
- Develop tools for sound processing based on high-level concepts. New approaches in MIR-related research should provide musicians and sound designers with a high-level content-based processing of sound and music related data. This entails furthering the integration of relevant cognitive models and representations in the creative tools, and also enabling users to implement their own categories by providing them with a direct access to machine learning and automatic classification features.
- Enable tools for direct manipulation of sound and musical content. Significant enhancements are required in polyphonic audio analysis methods (e.g. audio-to-score, blind source separation) in order to build applications allowing content-based manipulation of sound. This is expected to have a major impact on professional and end-user markets.
- Develop new computer languages for managing temporal processes. This will provide more adapted creative tools, not only for music composition and performance, but more generally for temporal media and interactive multimedia.
- Improve integration of audio database management systems in standard audio production tools. These should combine online and offline access to audio materials, and feature content-based search.
- Develop real-time MIR tools for performance. Research real-time issues beyond the "faster search engines" in the use of MIR technologies for music performance, addressing the design of specific algorithms and of potential applications in their entirety, in collaboration with the NIME community.
- Performance modeling and spatial dimensions. The management of sound and music information in creative tools shall not be limited to basic music categories (such as pitch, intensity and timbre) and must integrate in particular the dimensions of performance modelling and sound space. Beyond direct sound/gesture mapping, the design of new electronic instruments requires a better understanding of the specific structures underlying gesture and performance and their relation to the sound content. As for the spatial dimension of sound, new research advances are expected in the automatic extraction of spatial features for mono- and multichannel recordings and the simulation of virtual acoustic scenes from high-level spatial descriptors, with applications in music production, audiovisual post-production, games and multimedia.
- Develop MIR methods for soundscaping. Immersive music environments and virtual soundscaping are growth areas in the creative industries, particularly in relation to physical spaces. Research may involve knowledge gained from collaborations with specialists in building acoustics, architects, and installation artists.
- Use artistic sound installation environments as MIR research test grounds. Immersive discovery experiences and physical explorations of music presented as art installations can contribute to a better understanding of the user’s Quality of Experience (QoE); the potential of using sound as an aid to narrative creation and as a non-linguistic means for communication; the use of the the human body as an instigator of the generation of sound; and active engagement of listeners with their environment.
- Develop creative tools which include data useful to commerce. Research areas uncovered by consulting commercial and industry practices may include e.g.~sonic branding, personalisation, interactive media environments, social platforms, and marketing tools between artists and fans.
- Improve data interoperability between devices An effort is required towards the standardisation of data protocols for a pan-European exchange of music software and hardware modalities. This is especially relevant for music, which is a paradigmatic example of multimodal media, with active communities of developers, working with a rich diversity of devices.
- Develop automatic playlist generation and automatic mixing tools for commercial environments. Systems which deliver the appropriate atmosphere for purchase or entertainment require music information research in conjunction with consumer psychology. For example, high level descriptors may be developed to include relationships between music and certain types of product, and music psychology may include field work in commercial environments.