Music distribution applications
From MIReS
(→References) |
|||
Line 83: | Line 83: | ||
* [Wang et al., 2003] A. Wang et al. An industrial strength audio search algorithm. In ''Proceedings of the 4th International Conference on Music Information Retrieval'', volume 3, Baltimore, Maryland, USA, 2003. | * [Wang et al., 2003] A. Wang et al. An industrial strength audio search algorithm. In ''Proceedings of the 4th International Conference on Music Information Retrieval'', volume 3, Baltimore, Maryland, USA, 2003. | ||
+ | |||
=='''[[Music distribution applications: Challenges|Challenges]]'''== | =='''[[Music distribution applications: Challenges|Challenges]]'''== |
Latest revision as of 18:18, 20 April 2013
MIR is fundamental for developing technologies to be used in the music distribution ecosystem. The stakeholders in the music value chain are music services, record companies, performing rights organisations, music tech companies, music device and equipment manufacturers, and mobile carriers. The main challenge is to develop scalable technologies that are relevant to both the services that organise and distribute the music and also those services that track what is being distributed. These technologies span from music search and recommendation to audio identification both for recordings and compositions among others. By fully addressing the music distribution challenges, the MIR Community will establish closer ties with the industry which will help accessing resources (such as actual music data) and alternative ways of funding. On its side, the Music Distribution industry will have access to technologies more targeted to actual end-user scenarios which will give them an edge in the global market. Incidentally, it will help reducing innovation cycles from research to development and exploitation which, in turn, will have a clear impact on competitiveness and hence music distribution companies’ profitability.
Back to → Roadmap:Exploitation perspective
Contents |
State of the art
A number of topics on the future of electronic music distribution have been addressed. This includes music search and discovery of music catalogues, the music rights industry-related technologies and other more transversal topics such as scalability and metadata cleaning.
As could be witnessed over the last few years, music is being produced and published at a faster rate than ever before: estimates range form yearly 11,000 (nonclassical) major label albums averaging some ten songs per album [Vogel, 2010] up to 97,751 albums released in the United States in 2009, as reported by Nielsen SoundScan.
In the physical world, record shops were de-facto intermediaries that preselected music due to the physical constraints of storing music records and cd's. Digital technologies have changed this situation in at least two respects: digital music distribution channels such as iTunes, Amazon or Spotify can provide quick access to millions of music pieces at very low cost, hence they are less strictly preselected, and, with the abandonment of physical records, they shifted granularity from albums to single tracks, making it even harder for potential customers to make a choice. To fill this gap of missing preselections, automatic music recommendation systems supporting search and discovery have been developed attempting to provide an improved and manageable access to the music of the world.
Amazon suggests albums or songs based on what has been purchased in the same order or by the same customers as items one searched for or bought. This is a form of collaborative filtering [Herlocker et al., 1999], which assumes that users who have agreed in the past (in their purchase decisions) will also agree in the future (by purchasing the same items). Collaborative filtering generally suffers from two related problems: the coldstart problem and the popularity bias. The coldstart problem is the fact that albums that have not yet been purchased by anybody can never be suggested. The popularity bias is the problem that for any given item, popular albums are more likely to have been purchased in conjunction with it than unpopular ones, and so have a better chance of being recommended. In consequence, collaborative filtering alone is incapable of suggesting new music releases. An additional problem specific to Amazon is that users may purchase items for somebody else (e.g., as a present), which might flaw the recommendations generated both for them and for other users of allegedly the same taste. Spotify, a music streaming service, bases its recommendations (see also Erik Bern) on its users' listening behavior, analysing which artists are often played by the same listeners. While this may potentially result in better suggestions than analysing sparse data such as record purchases, it is again subject to the cold-start problem and popularity bias. Furthermore, Spotify only recommends related artists and not songs, which is rather unspecific. Genius is a function in Apple iTunes which generates playlists and song recommendations by comparing music libraries, purchase histories and playlists of all its users, possibly integrating external sources of information. Assuming such external information does not play a major role, this system is again based mainly on collaborative filtering. Last.fm combines information obtained from users' listening behavior and user-supplied tags (words or short expressions describing a song or artist). Tags can help to make recommendations transparent to users, e.g. a user listening to a love song may be recommended other tracks that have frequently been tagged as 'slow' and 'romantic'. But they are also inherently erroneous due to the lack of carefulness of some users, and require a range of counter measures for data cleaning. Tags are also affected by the cold-start problem and popularity bias. Pandora, another music streaming service, recommends songs from its catalogue based on expert reviews of tracks with respect to a few hundred genre-specific criteria. This allows for very accurate suggestions of songs that sound similar to what a user listens to, including sophisticated explanations for why a song was suggested (e.g., a track may be recommended because it is in a 'major key', features 'acoustic rhythm guitars', 'a subtle use of vocal harmony' and exhibits 'punk influences'). Such expert reviews incur high costs in terms of time and money which makes it impossible to extend the catalogue at a rate that can keep up with new releases. This has a limiting effect on the selection of music available to users.
Most approaches described so far rely on some form of meta-information: user's listening or purchasing behavior, statistics about artists and genres in music collections, user defined tags etc. Another option is to actually analyse the audio content trying to model what is important for the perceived similarity between songs: instrumentation, tempo, rhythm, melody, harmony, etc. While many research prototypes of recommendation systems that use content-based audio similarity have been described in the literature [e.g. Pampalk, 2001;, Neumayer et al., 2005; Lamere and Eck, 2007; Knees et al., 2007; to name just a few], very little has been reported about successful adoption of such approaches- without combination with other methods- to real-life scenarios. Content based recommendation is used to some extent by a number of music companies like Mufin, echonest or BMAT amongst others. An exhaustive view on Music Recommendation systems can be found at [Celma, 2010].
In a landscape where the music industry is facing difficult times with income from physical sales shrinking, the music rights revenues are increasing worldwide. According to CisacPortal the author’s society royalty collections were 7.5€ billion in 2010 (climbing a 5,5% year-on-year) and [IFPI, 2012] announced that the global performance rights reached the 905 US$ millions in 2011 (an increase of 4,9% from the previous year). These positive numbers are due to the increase of the number of media paying royalties and an improvement of the collecting methods of these societies. Hence, it is important to address the needs of the music rights business; i.e. the process of paying the owners of these rights (authors, performers, labels…) for the usage of the music they have created and performed (licensing FAQ).
The rights organisations get most of their revenue not only from television, radio stations and those industries whose services are based on music, like clubs or venues, but also from a lot of other companies and associations from shops or dentists to school plays, basically anyone who aims at using somebody else’s music creation (paying royalties). In recent years, the music rights revenues coming from the digital world have also grown in importance. All this rights money is collected through the royalty collection societies, which are divided in three kinds depending on the rights they represent: Authors, Performance and Master. Most of authors’ societies worldwide are associated with the CISAC while the master societies are associated with the IFPI. The societies collect music rights and distribute them among their associates. At this point, a lot of controversy arises due to the different processes they use for such distribution and questions are raised about how to make this process as fair as possible (more).
Ideally, every right owner should be paid for the use of their music but in practice it is difficult and expensive to control all the media and all potential venues where music could eventually be used. The solutions that have been found vary depending on the country, the society and the type of source. Some years ago, the societies used to distribute based on the results of the top selling charts which created huge inequalities between artists. Later some other systems and technologies appeared:
- Cue sheets: Media companies are obliged to fill cue sheets, the list of music broadcast, explaining their use. However, this tends to be inaccurate because, while generating the cue sheets represents lots of work, media companies don’t benefit from the accuracy of those.
- Watermarking: It consists in embedding an extra signal into a digital music work so this signal can be detected when the work is reproduced. Watermarking requires the use of watermarked audio references when broadcasting which is very rare. Also, the extra signal can easily be removed from original audio.
- Fingerprinting: It consists in an algorithm that extracts the main features of an audio piece making a so-called fingerprint of the track. This fingerprint may easily be matched against an audio database which may comprise recordings from television, radio or internet radio broadcasts [e.g. Cano, 2007; Wang et al., 2003].
- Clubs: The collecting societies track music played in all types of venues by sending a specialist who recognises music and writes down a cue sheet or by installing recording stations in Dj boards.
- Online: Some of the most used music channels on the Internet as streaming or peer-to-peer services are extremely difficult to monitor. Nowadays the music monitored online is based on crawling millions of webs pages to detect their music usage.
- Social Networks: A particular case of online music tracking is finding phylogenetic relationships between music objects spread on social networks. The type of relationships may include: "is the same song as", "contains snippet of", "includes", "remixes", "similar", "are the same song with different durations", "is the live version of", "is a cover version", "is a radio edit of" and so forth. This hasn’t been addressed by MIR but is documented in other fields. [e.g. Dias et al., 2012; Dias et al., 2011]
- Music vs Non Music discrimination: TV channels have normally blanket fees contracts with performing rights organisations according to which they pay royalties proportionally to the percentage of music broadcast. As this data is usually inaccurate, the PROs tend to outsource statistical estimation of this percentage which is also inaccurate. Although there has been quite some research on speech/music discrimination [e.g. Panagiotakis and Tziritas, 2005; Scheirer and Slaney, 1997], generic music vs non music discrimination- robust to speech overlap- is a challenge for the industry.
While the research and engineering problems of simple audio identification use cases have practically been solved; for other real industry use cases, such as background music detection (over voice), in noisy backgrounds and with edited music, there are no robust technical solutions. In this business niche, a number of players share the market: Tunesat in the USA, BMAT in Spain, kollector in Europe, Monitec in Southamerica, Soundmouse in the UK and yacast in France.
A major challenge a new technology must face when it is to be applied in viable commercial products is scalability; i.e. the ability of the technology to handle massive amounts of data and the ability to handle that data’s eventual growth in a cost effective manner. The problem is twofold. Firstly, some techniques are simply neither deployed nor tested since it’s computationally impossible due to the size of datasets. Secondly, assuming the technique is scalable from a non-functional point of view, applying it to multi-million datasets may reveal problems which were not obvious in the first place. Beyond the problem of handling "big data", granting research access to huge music-related datasets may generate beneficial by-products for the music information research world. First, in large collections, certain phenomena may become discernible and lead to novel discoveries. Secondly, a large dataset can be relatively comprehensive, encompassing various more specialised subsets. By having all subsets within a single universe, we can have standardised data fields, features, etc. Lastly, a big dataset available to academia greatly promotes the interchange of ideas and results leading to, yet again, novel discoveries. A good example here is the "Million Songs Dataset" [Bertin-Mahieux et al., 2011], which contains user tags provided by Last.Fm.
Systems that are able to automatically recommend music (as described above) are one of the most commercially relevant outcomes from the MIR community. For such recommender systems it is especially important to be able to cope with very large - and growing - collections of music. The core technique driving automatic music recommendation systems is the modelling of music similarity which is one of the central notions of MIR. Proper modelling of music similarity is at the heart of every application allowing automatic organisation and processing of music databases. Scaling up sublinearly the computation of music similarity to the millions is therefore an essential concern of MIR. Scalable music recommendation systems have been the subject of a number of publications. Probably one of the first content-based music recommendation systems working on large collections (over 200.000 songs) was published by [Cano et al., 2005]. Although latest results [see e.g. Schnitzer et al., 2012] enable systems to answer music similarity queries in about half a second on a standard desktop CPU on a collection of 2.5 million music tracks yet, the system performs in a linear fashion.
The issue of scalability clearly also affects other areas of MIR: music identification meaning both pure fingerprinting technologies and cover detection, multimodal music recommendation and personalisation (using contextual and collaborative filtering Information).
References
- [IFPI, 2012] Recording industry in numbers. Ed. 2012.
- [Bertin-Mahieux et al., 2011] Thierry Bertin-Mahieux, Daniel PW Ellis, Brian Whitman, and Paul Lamere. The million song dataset. In Proceedings of the 12th International Society for Music Information Retrieval Conference, pp. 591-596, Miami, Florida, USA, 2011.
- [Cano, 2007] P. Cano. Content-based audio search from fingerprinting to semantic audio retrieval. PhD thesis, Pompeu Fabra University, Barcelona, España, 2007.
- [Cano et al., 2005] P. Cano, M. Koppenberger, and N. Wack. An industrial-strength content-based music recommendation system. In Proceedings of the 28th Annual International Conference on Research and Development in Information Retrieval, pp. 673-673, 2005.
- [Celma, 2010] Oscar Celma. Music recommendation and discovery: The long tail, long fail, and long play in the digital music space. Springer, 2010.
- [Dias et al., 2011] Z. Dias, A. Rocha, and S. Goldenstein. Video phylogeny: Recovering near-duplicate video relationships. In Proceedings of the IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1-6, 2011.
- [Dias et al., 2012] Z. Dias, A. Rocha, and S. Goldenstein. Image phylogeny by minimal spanning trees. IEEE Transactions on Information Forensics and Security, 7(2): 774-788, 2012.
- [Herlocker et al., 1999] J.L. Herlocker, J.A. Konstan, A. Borchers, and J. Riedl. An algorithmic framework for performing collaborative filtering. In Proceedings of the 22nd Annual International Conference on Research and Development in Information Retrieval, pp. 230-237, 1999.
- [Knees et al., 2007] Peter Knees, Tim Pohle, Markus Schedl, and Gerhard Widmer. A music search engine built upon audio-based and web-based similarity measures. In Proceedings of the 30th Annual International Conference on Research and Development in Information Retrieval, pp. 447-454, 2007.
- [Lamere and Eck, 2007] P. Lamere and D. Eck. Using 3d visualizations to explore and discover music. In Proceedings of the 8th International Conference on Music Information Retrieval, pp. 173-174, Vienna, Austria, 2007.
- [Neumayer et al., 2005] R. Neumayer, M. Dittenbach, and A. Rauber. Playsom and pocketsomplayer, alternative interfaces to large music collections. In Proceedings of the 6th International Conference on Music Information Retrieval, London, UK, 2005.
- [Pampalk, 2001] E. Pampalk. Islands of music: Analysis, organization, and visualization of music archives. Master's thesis, Vienna University of Technology, Vienna, Austria, 2001.
- [Panagiotakis and Tziritas, 2005] C. Panagiotakis and G. Tziritas. A speech/music discriminator based on rms and zero-crossings. IEEE Transactions on Multimedia, 7(1): 155-166, 2005.
- [Scheirer and Slaney, 1997] E. Scheirer and M. Slaney. Construction and evaluation of a robust multifeature speech/music discriminator. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, volume 2, pp. 1331-1334, 1997.
- [Schnitzer et al., 2012] D. Schnitzer, A. Flexer, and G. Widmer. A fast audio similarity retrieval method for millions of music tracks. Multimedia Tools and Applications, pp. 1-18, 2012.
- [Vogel, 2010] H.L. Vogel. Entertainment industry economics: a guide for financial analysis. Cambridge University Press, 2010.
- [Wang et al., 2003] A. Wang et al. An industrial strength audio search algorithm. In Proceedings of the 4th International Conference on Music Information Retrieval, volume 3, Baltimore, Maryland, USA, 2003.
Challenges
- Demonstrate better exploitation possibilities of MIR technologies. The challenge is to convince stakeholders of the value of the technology provided by the MIR community and help them find new revenue streams from their digital assets which are additive and non-cannibalising to existing revenue channels. For these technologies to be relevant they should re-valorise the digital music product, help reduce piracy, streamline industry processes, and reduce inefficiencies.
- Develop systems that go beyond recommendation, towards discovery. Systems have to go beyond simple recommendation and playlisting by supporting discovery and novelty as opposed to predictability and familiarity. This should be one way of making our systems interesting and engaging for prospective users.
- Develop music similarity methods for particular applications and contexts. This means that results produced by computers have to be consistent with human experience of music similarity. Therefore it will be necessary to research methods of personalising our systems to individual users in particular contexts instead of providing one-for-all services.
- Develop systems which cater to the scale of Big Data. The data sets might be songs, users or any other music related elements. From a non-functional perspective, the algorithms and tools themselves should be fast enough to run with sublinear performance on very large datasets so they can easily enable solutions for streaming and subscription services. Beyond raw performance such as processing speed, from a functional view, the algorithms have to be designed to handle the organisation of large music catalogues and the relevance weighting of rapidly increasing quantities of music data mined from crowd-sourced tagging and social networks. Applying algorithms to those big datasets may reveal problems and new research scenarios which were not obvious in the first place.
- Develop large scale robust identification methods for recordings and works. Performing rights organisations and record companies are shifting towards fingerprinting technologies for complete solutions for tracking their affiliates’/partners’ music and for managing their music catalogues. While music fingerprinting has been around for years and it has been widely used, new use cases which require extensive R&D are arising: copyright enforcement for songs and compositions in noisy and live environments and music metadata autotagging among others. Also, finding phylogenetic relationships between songs/performances available on the web, such as "is a remix of" or "is the live version of", may unlock new application scenarios based on music object relationship graphs such as multimodal trust and influence metering in social networks.
- Develop music metadata cleaning techniques. One common feedback from all industry stakeholders such as record companies, music services, music distributor and PROs is the lack of so-called "clean music databases". The absence of clean music databases causes broken links between data from different systems and incorrect editorial metadata tagging for music recordings, which ultimately affects the perceived end-user quality of the applications and services relying on MIR technologies. We encourage the MIR community to address music metadata cleaning by using music analysis and fingerprinting methods as well as text-based techniques borrowed from neighbouring research fields such as text information retrieval and data management among others.
- Develop music detection technology for broadcast audio streams. The media industry is lacking the means for accurately detecting when music (including background music) has been broadcast, in order to transparently handle music royalty payments. This technology should go beyond music vs speech discrimination and address real life use cases such as properly discriminating music vs generic noise.