Chapter 4 Conclusions And Recommendations

There is always a hypothetical possibility that organizations with monopolistic power try to corner the market or make the playing field uneven. Because the music industry, in the absence of live performances due to the pandemic, relies critically on streaming revenues, and streaming is managed by large organizations outside of the music industry, any such problems may not even be present within the music industry itself. There are some music royalties in some countries of the world where public bodies, copyright tribunals, and other specialized institutions are taking care of potential payment problems, but music is mainly a lightly regulated business where systematic market problems must be resolved via the application of competition law (Antal, Fletcher, and Ormosi 2021).

This is precisely one of the most important ideas behind a permanent music data observatory.

4.1 Ongoing Data Collection and Market Monitoring: The Data Observatory Concept

The music industry requires a permanent market monitoring facility to win fights in competition tribunals, because it is increasingly disputing revenues with the world’s biggest data owners. This was precisely the role of the former CEEMID (Artisjus et al. 2014) program, that was initiated by a few collective management societies after a dropped GESAC project. Starting out from three relatively data-poor countries, where data pooling allowed rightsholders to increase revenues, the CEEMID data collection program was extended by 2019 to 12 countries. It was eventually transformed to the Demo Music Observatory in 2020 (Antal 2021), which is now open for any national rightsholder, stakeholder organization, or music research institute. The idea of this observatory was brought to the UK policy debate on music streaming by the observatory’s only (former) British users, via the Written evidence submitted by The state51 Music Group to the Economics of music streaming review of the DCMS Committee (state51 Music Group 2020).

“There are instructive initiatives in other industries in which there is perhaps a clearer and longer standing recognition of the role of economic analysis. This sometimes results in initiatives such as ‘Observatories’ like the European Market Observatory for Fisheries an Aquatorial Products or the European construction sector observatory […] These tend to be collaborative endeavours, with a varying mix of government, industry, economists and in some cases funding bodies. […] To date there have been few if any entities or initiatives for music similar to the above-mentioned observatories. We suggest this is something that policy makers can support and encourage, but which ultimately needs to be driven by the industry itself. […] This is one reason we have worked with the economist Daniel Antal and his team, in particular on the Central European Music Industry Report 2020. Economists such as Daniel Antal produce data about the music industry that is consistent with international statistical standards and adhere to rigorous data ethics principles, seeking external validation through data and code repositories for underlying data and methodologies.”

The data observatory concept is derived from Earth and natural sciences, when often many research stakeholders build large observation stations, such as the Hubble telescope in space, or CERN. Data observatories are managed by a triangular stakeholder base of business, scientific, and policy stakeholders.

The EU has been sponsoring about 60 data observatories, i.e. permanent data collection programs for policy, industrial and scientific research in various domains. These include, for example, the functioning of the milk market, homelessness, the development of alternative fuels, among many others. A similar concept has been embraced by the OECD and some UN bodies, too.

Data observatories are usually managed by a consortium of policy, scientific, and consulting users, or by a public body. In our preliminary research, we have reviewed about 60 observatories sponsored by various EU organizations and added another dozen of data observatories recognized by the Council of Europe, UNESCO, or OECD, and gathered information on defunct observatories that failed to generate a high enough value for money to be continued.

We have identified various problems:

  • Almost no data observatory relies on any form of open data, either under the EU Open Data Directive or various definitions of scientific open data. The EU is financing many data gathering initiatives that do not even seem to consider the feel and high quality alternatives of proprietary data.
  • The quality of the data is often questionable, plagued with erros in measurement units, currency translation, and wrong labeling.
  • The data is often not machine readable, does not conform to tidy data principles, and the use of the data requires significant investment into re-processing in order to be used in statistical or spreadsheet applications.
  • Data documentation is not standardized, and often does not meet scientific or other use standards. Metadata is often ad-hoc and does not follow any international or best practices.

The EU Open Data Portal is faced with similar problems. In the case of these open data initiatives, usually metadata quality is high, but the usability of the data is low. The data is processed for the public sector and it is often incomprehensible, lacking documentation that will allow it to be re-processed for new public policy, business or scientific uses.

In the age of big data and AI, the data needs of small and medium enterprises, consulting agencies, universities, and policy-makers is increasing day by day. Yet, only the world’s largest companies, Ivy League universities, and rich governments can sustain comprehensive data collection programs. The EU tackles this problem with a financial injection to about 60 data observatories, providing legal access to the re-use of public sector information, and setting up various metadata portals to help accessing these data assets.

Our application aims to significantly improve two serious problems: the EU makes available valuable data in a form that makes its use very difficult for the abovementioned actors; and various EU institutions spend a significant amount of their policy consulting and scientific research budgets to create data programs that do not even consider open data as a source.

4.2 AI Problems

The simplest recommendation systems just follow the charts: for example, they select from well-known current or perennial greatest hits. Such a system may work well for an amateur DJ in a home party or a small local radio that just wants to make sure that the music in its programme will be liked by many people. They re-inforce existing trends and make already popular songs and their creators even more popular.

Spotify’s recommendation system (Jacobson et al. 2016) is a mix of content- and collaborative filtering that exploits information about users’ past behaviour (e.g. liked, skipped, and re-listened songs), the behaviour of similar users, as well as data collected from the users’ social media and other online activities, or from blogs. Deezer uses a similar system that is boosted by the acquisition of – big data created from user comments are used to understand the mood of the songs, for example.

YouTube, which plays an even larger role in music discovery, uses a system comprised of two neural networks: one for candidate generation and one for ranking. The candidate generation deep neural network provides works on the basis of collaborative filtering, while the ranking system is based on content-based filtering and a form of utility ranking that takes into consideration the user’s languages, for example. (Covington, Adams, and Sargin 2016)

These systems offer a high-level of personalization and usually re-inforce use trends, in turn discriminating users (Werner 2020; Kraemer and Holden 2020). Externally validating, or in YouTube’s case even understanding how they work would be impossible – YouTube’s system uses so many resources and data that replication is impossible outside Google’s systems. The deep neural networks are black-box deep learning systems that cannot be fully interpreted by humans.

A commonality across these systems is that they maximize the algorithm creators’ corporate key performance indicators. Spotify wants to be ‘your playlist to life’ and increase the amount of music played during work or sports in the background, during travelling, or active music listening –- i.e. maximizing the number of hours spent using it. YouTube and Netflix have similar targets. They are in many ways like commercial radio targets, which want to maximize the time spent listening to the broadcast stream. Radios and YouTube, in particular, have similar goals because they are mainly financed through advertising. For Spotify or Netflix, their key financial motivation is to avoid users’ canceling their subscriptions or changing it to different providers, such as Apple or Amazon.

Local content guidelines in public broadcasting, or local content requirements informed by quotas set for commercial broadcasting are similar to utility or knowledge-based recommendation systems. A utility-based recommendation system that targets Slovakness, for example, would prefer from two playlist candidates whichever one has markers of Slovakness in the nationalities of composers, performers, or lyrical content. In this example, a knowledge-based system knows the language or the nationality of a song and creates mixes with a pre-defined Slovak rate.

In a recommendation system many bad outcomes may happen that can eventually lead to lower or no payment for a rightsholder. In these cases, we are not talking about unjustified differences of payment as simulated in 3 Differences in Earngins: Simulation Results, but a systematic breach of non-copyright rules that leads to an unjustified lower streaming volume, and therefore a lower royalty payment on lower volumes:

  • It may recommend too few female or small country artists, or start recommending artists with hateful language.
  • It may put certain label’s music on less visible places.
  • It may make the works of major labels easier to find than independent labels.
  • It puts less works and recordings on personalized lists than local content guidelines (applied in about 90 territories of the world, see (Stein, Brock, and Inc. 2012)) would require from competing local television or radio stations.
  • Your personalized list is filled with misleading track/artists names that are hunting for accidental streams much like click-hunting text content in search engines.

These potential undesirable outcomes are sometimes illegal, and they may go against non-discrimination or competition law. They may undermine national or EU-level cultural policy goals. They may make Welsh artists earn significantly less than American or English artists.

In our work in Slovakia, we reverse engineered some of these undesirable outcomes. Spotify’s and YouTube recommendation system have at least three major parts, which employ machine learning from sources which include metadata:

  1. The user’s history. Is it the user’s history that is sexist, or might the training metadata database be skewed against women?
  2. The works characteristics – are Henry Purcell’s works as well documented for the algorithm as Taylor Swift’s or Drake’s?
  3. Independent information from the internet. Does the internet write less about women artists?

More often than not, the biggest problem is that the algorithm is learning from data that is historically biased against women or biased for British and American artists, or data that is only discoverable in the English language about relatively popular works that journalists have been covering. Metadata plays an incredibly important role in supporting or undermining the general music education, media policy, copyright policy, or competition rules. If a streaming providers’s algorithm does not know the music that music educators or parents find suitable for teenagers, then it will not recommend that music to your children. Parental control algorithms may filter out harmful content for children, but will not play a more proactive role without deliberate involvement of public bodies into documenting children-friendly works.

Until now, in most cases, it was assumed that it is the artists or their representatives’ duty to provide high quality metadata; and in mass uses, like in radio or television broadcast, collective management organizations were taking care of the process with appropriate data and IT knowledge. But streaming services usually bypass (with some exceptions) collective management organizations. Small organizations and individual, self-publishing authors do not have an appropriate level of data literacy to provide relevant metadata. What’s worse, they cannot scale up the production and validation of metadata. Metadata errors can easily be captured by machine learning algorithms if those algorithms are trained on hundreds of thousands of work/recording records – a precious resource that only large labels, publishers, or collective rights management organizations possess. (Senftleben et al. 2021)

In some cases, it is well understood that some public investment is needed to maintain metadata records for each new technological innovation. While the works of Henry Purcell are eternal, played and re-recorded centuries after his death, neither he nor his heirs receive copyright royalties. If we want to correctly document his works for newer and newer AI-driven technologies, somebody must do this investment. Investment into documentation of cultural heritage, for example, by specialist public libraries is not a novel idea. However, extending this public service to the long-tail of living artists, or heirs who still enjoy copyright protection may be necessary if we want to give them an equal chance on streaming services. As explored in this report, often works earn only pennies over years, which does not even cover the costs of documenting these works. If rightsholders believe that works and recordings in the long-tail should have an equal chance to be re-discovered, heard and paid, then considerable public or philanthropic investment may be needed.


———. 2021. “Launching Our Demo Music Observatory.” Data & Lyrics. Reprex.
Antal, Daniel, Amelia Fletcher, and Peter Ormosi. 2021. “Music Streaming: Is It a Level Playing Field?” CPI Antitrust Chronicle, February.
Artisjus, HDS, SOZA, and Candole Partners. 2014. “Measuring and Reporting Regional Economic Value Added, National Income and Employment by the Music Industry in a Creative Industries Perspective. Memorandum of Understanding to Create a Regional Music Database to Support Professional National Reporting, Economic Valuation and a Regional Music Study.”
Covington, Paul, Jay Adams, and Emre Sargin. 2016. “Deep Neural Networks for YouTube Recommendations.” In Proceedings of the 10th ACM Conference on Recommender Systems, 191–98. RecSys ’16. New York, NY, USA: Association for Computing Machinery.
Jacobson, Kurt, Vidhya Murali, Edward Newett, Brian Whitman, and Romain Yon. 2016. “Music Personalization at Spotify.” In Proceedings of the 10th ACM Conference on Recommender Systems, 373. RecSys ’16. New York, NY, USA: Association for Computing Machinery.
Kraemer, David, and Steve Holden. 2020. “Spotify, Apple Music, Deezer and YouTube Found Recently Hosting Racist Music.” BBC News.
Senftleben, Martin, Thomas Margoni, Daniel Antal, Balázs Bodó, Stef van Gompel, Christian Handke, Martin Kretschmer, Joost Poort, João Quintais, and Sebastian Felix Schwemer. 2021. “Ensuring the Visibility and Accessibility of European Creative Content on the World Market: The Need for Copyright Data Improvement in the Light of New Technologies.” SSRN, February.
state51 Music Group. 2020. “Written Evidence Submitted by The state51 Music Group. Economics of Music Streaming Review. Response to Call for Evidence.” UK Parliament website.
Stein, Shelley, Sacks Brock, and Chaloux Group Inc. 2012. “On Quotas as They Are Found in Broadcasting Music.” Canadian Radio-television; Telecommunications Commission.
Werner, Ann. 2020. “Organizing Music, Organizing Gender: Algorithmic Culture and Spotify Recommendations.” Popular Communication 18 (1): 78–90.