Data processing methodologies: Challenges

From MIReS

(Difference between revisions)
Jump to: navigation, search
 
Line 5: Line 5:
* '''Adopt recent Machine Learning techniques.''' As exemplified above, MIR makes a great use of machine learning methodologies, in particular many tasks are formulated according to a batch learning approach where a fixed amount of annotated training data is used to learn models which can then be evaluated with similar data. However, music data can now be found in very large amounts (e.g. in the scale of hundreds of thousands of items for music pieces in diverse modalities, or in the scale of tens of millions in the case of e.g. tags), music is increasingly existing in data streams rather than in data ''sets'', and the characterisation of music data can evolve with time (e.g. tag annotations are constantly evolving, sometimes even in an adverse way). These data characteristics (i.e. very large amounts, streaming, non-stationarity) Big Data characteristics imply a number of challenges for MIR, such as data acquisition, dealing with weakly structured data formats, scalability, online (and real-time) learning, semi-supervised learning, iterative learning and model updates, learning from sparse data, learning with only positive examples, learning with uncertainty, etc. (see e.g. Yahoo! Labs “key scientific challenges” in [http://labs.yahoo.com/ksc/Machine_Learning Machine Learning] and the White Paper [http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf “Challenges and Opportunities with Big Data”] published by the Computing Community Consortium).
* '''Adopt recent Machine Learning techniques.''' As exemplified above, MIR makes a great use of machine learning methodologies, in particular many tasks are formulated according to a batch learning approach where a fixed amount of annotated training data is used to learn models which can then be evaluated with similar data. However, music data can now be found in very large amounts (e.g. in the scale of hundreds of thousands of items for music pieces in diverse modalities, or in the scale of tens of millions in the case of e.g. tags), music is increasingly existing in data streams rather than in data ''sets'', and the characterisation of music data can evolve with time (e.g. tag annotations are constantly evolving, sometimes even in an adverse way). These data characteristics (i.e. very large amounts, streaming, non-stationarity) Big Data characteristics imply a number of challenges for MIR, such as data acquisition, dealing with weakly structured data formats, scalability, online (and real-time) learning, semi-supervised learning, iterative learning and model updates, learning from sparse data, learning with only positive examples, learning with uncertainty, etc. (see e.g. Yahoo! Labs “key scientific challenges” in [http://labs.yahoo.com/ksc/Machine_Learning Machine Learning] and the White Paper [http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf “Challenges and Opportunities with Big Data”] published by the Computing Community Consortium).
-
 
<!--
<!--
* So far, inspiration came from a relatively limited number of external fields, mainly through of the works of individuals (e.g. Logan and Ellis for Speech Processing methodologies). One challenge for the MIR community is to more systematically identify potentially relevant methodologies from other disciplines.
* So far, inspiration came from a relatively limited number of external fields, mainly through of the works of individuals (e.g. Logan and Ellis for Speech Processing methodologies). One challenge for the MIR community is to more systematically identify potentially relevant methodologies from other disciplines.
Line 15: Line 14:
-->
-->
</onlyinclude>
</onlyinclude>
 +
<!--
<comments />
<comments />
{{:Talk:{{PAGENAME}}}}__NOEDITSECTION__
{{:Talk:{{PAGENAME}}}}__NOEDITSECTION__
-
 
+
-->
[[Data processing methodologies|Back to previous page]]
[[Data processing methodologies|Back to previous page]]

Latest revision as of 22:55, 25 March 2013


Back to previous page

Personal tools
Namespaces
Variants
Actions
Navigation
Documentation Hub
MIReS Docs
Toolbox