Musically relevant data: Challenges
From MIReS
- Identify all relevant types of data sources describing music. We have to consider the all-encompassing experience of music in all its broad multi-modality beyond just audio (video, lyrics, scores, symbolic annotations, gesture, tags, diverse metadata from web-sites and blogs, etc.). To achieve this it will be necessary to work together with experts from the full range of the multimedia community and organise the data gathering process in a more systematic way compared to what has happened so far.
- Guarantee sufficient quality of data (both audio and meta-data). At the moment data available to our community stems from a wide range of very different sources obtained with very different methods often not documented sufficiently. We will have to come to an agreement concerning unified data formats and protocols documenting the quality of our data. For this a dialogue within our community is necessary which should also clarify our relation to more general efforts of unifying data formats.
- Clarify the legal and ethical concerns regarding data availability as well as its use and exploitation. This applies to the question what data we are allowed to have and what data we should have. The various copyright issues will make it indispensable to work together with owners of content, copyright and other stakeholders. All ethical concerns on privacy issues have to be solved. The combination of multiple sources of data poses additional problems in this sense.
- Ascertain what data users are willing to share. One of the central goals of future MIR will be to model the tastes, behaviors and needs of individual and not just generic users. Modelling of individual users for personalisation of MIR services presents a whole range of new privacy issues since it requires handling of very detailed and possibly controversial information. This is of course closely connected to policies of diverse on-line systems concerning privacy of user data. This is also a matter of system acceptance going far beyond mere legal concerns.
- Make available a sufficient amount of data to the research community allowing easy and legal access to the data. Even for audio data, which has been used for research right from the beginning of MIR, availability of sufficient benchmark data sets usable for evaluation purposes is still not a fully resolved issue. To allow MIR to grow from an audio-centered to a fully multi-modal science we will need benchmark data for all these modalities to allow evaluation and comparison of our results. Hence the already existing problem of data availability will become even more severe.
- Create open access data repositories. It will be of great importance for the advancement of MIR to create and maintain sustainable repositories of diverse forms of music related data. These repositories should follow open access licensing schemes.