000049144 000__ 03755cam\a22004455i\4500 000049144 001__ 49144 000049144 003__ SzGeWIPO 000049144 005__ 20240322214818.0 000049144 006__ m eo d 000049144 007__ cr bn |||m|||a 000049144 008__ 240321s2024\\\\nyu\\\\\o\\\\\000\0\eng\d 000049144 035__ $$a(OCoLC)1427545890 000049144 040__ $$aSzGeWIPO$$beng$$erda$$cSzGeWIPO$$dCaBNVSL 000049144 041__ $$aeng 000049144 24500 $$aCopyright Law and the Lifecycle of Machine Learning Models. 000049144 264_1 $$aNew York:$$bSpringer LINK,$$c2024 000049144 300__ $$a1 online resource (pages 110–138) 000049144 336__ $$atext$$2rdacontent 000049144 337__ $$acomputer$$2rdamedia 000049144 338__ $$aonline resource$$bcr$$2rdacarrier 000049144 4901_ $$aInternational Review of Intellectual Property Law & Competition Law ;$$vVolume 55, Issue 1 000049144 520__ $$aMachine learning, a subfield of artificial intelligence (AI), relies on large corpora of data as input for learning algorithms, resulting in trained models that can perform a variety of tasks. While data or information are not subject matter within copyright law, almost all materials used to construct corpora for machine learning are protected by copyright law: texts, images, videos, and so on. There are global policy moves to address the copyright implications of machine learning, in particular in the context of so-called “foundation models” that underpin generative AI. This paper takes a step back, exploring empirically three technological settings through detailed case studies. We set out the established industry methodology of a lifecycle of AI (collecting data, organising data, model training, model operation) to arrive at descriptions suitable for legal analysis. This will allow an assessment of the challenges for a harmonisation of rights, exceptions and disclosure under EU copyright law. The three case studies are: 1. Machine learning for scientific purposes, in the context of a study of regional short-term letting markets; 2. Natural Language Processing (NLP), in the context of large language models; 3. Computer vision, in the context of content moderation of images. We find that the nature and quality of data corpora at the input stage is central to the lifecycle of machine learning. Because of the uncertain legal status of data collection and processing, combined with the competitive advantage gained by firms not disclosing technological advances, the inputs of the models deployed are often unknown. Moreover, the “lawful access” requirement of the EU exception for text and data mining may turn the exception into a decision by rightholders to allow machine learning in the context of their decision to allow access. We assess policy interventions at EU level, seeking to clarify the legal status of input data via copyright exceptions, opt-outs or the forced disclosure of copyright materials. We find that the likely result is a fully copyright-licensed environment of machine learning that may have problematic effects for the structure of industry, innovation and scientific research. 000049144 542__ $$fCC BY 4.0 000049144 588__ $$aCrossref 000049144 590__ $$aPublished online: 2024 000049144 650_0 $$aIntellectual property.$$zEuropean Union. 000049144 650_0 $$aData protections. 000049144 650_0 $$aArtificial intelligence. 000049144 650_0 $$aCopyright. 000049144 650_0 $$aIntellectual property. 000049144 650_0 $$aPatents. 000049144 7001_ $$aKretschmer, Martin,$$eauthor. 000049144 7001_ $$aMargoni, Thomas,$$eauthor. 000049144 7001_ $$aPinar, Oruç,$$eauthor. 000049144 7731_ $$tInternational Review of Intellectual Property Law & Competition Law 000049144 7731_ $$wIIC 000049144 830_0 $$aInternational Review of Intellectual Property Law & Competition Law ;$$vVolume 55, Issue 1. 000049144 85641 $$uhttps://link.springer.com/article/10.1007/s40319-023-01419-3$$yonline version 000049144 904__ $$aArticle 000049144 980__ $$aIIC