AI tools like ChatGPT are built on mass copyright infringement
ZAINAB CHOUDHRY
CONTRIBUTED TO THE GLOBE AND MAIL
PUBLISHED MAY 25, 2023
UPDATED MAY 26, 2023
Zainab Choudhry is a startup founder who has worked in law, technology and media in New York and Toronto.
The old world, wherein only human minds had evolved to create stories, art, music and poetry, is no more.
We have now entered the era of generative artificial intelligence, a type of AI that can create new content based on datasets of existing content it has been fed and trained on. In this new era, generative AI developers are building the minds of these machines by training them on content created by humans of the past and present, from Shakespeare to Atwood, Caravaggio to Koons. Thus far, we have marvelled at the creations that generative AI tools such as ChatGPT have produced, but this use of AI raises crucial ethical and legal questions.
It takes enormous amounts of data to train a generative AI program like ChatGPT, and in order to build these tools cheaply and quickly, developers are committing mass copyright infringement. These datasets are largely created by combing and scraping the internet for every type of content, from articles, books and artwork to our photos and tweets. These methods give rise to some big questions: Is the use of our copyright-protected content for training generative AI models legal? Does the use of copyrighted content for training AI fall under fair-use exceptions in the United States and fair dealing in Canada? Do we have a right to compensation when our work is being fed to the machines?
As a former copyright startup founder equipped with a law degree and a long-standing career at the intersection of intellectual property (IP) law, media and tech, I know the rules broadly boil down to one central tenet: To use someone elses original content, you must get their permission, barring some exceptions. In my opinion, using copyrighted content to train a generative AI, without permission, easily falls under copyright infringement. If you train a generative AI model on the content of a particular painter or poets work, or even a singers voice, the AI can do a pretty good job of replicating the exact content and style of those paintings, poems or vocals in the new works it creates. At its lightning speed, generative AI can train on and write a new book based on an authors work long before the human author ever could.
Continued
highplainsdem
(48,968 posts)bullimiami
(13,084 posts)there are no original ideas here.
just the ability to scan a vast repository of existing information and repackage it according to programming.
Pisces
(5,599 posts)But it is scary nonetheless
bullimiami
(13,084 posts)just an advancement in programming.
more processing power and availability of data has made it possible.
Bernardo de La Paz
(48,988 posts)AIs go beyond their algorithms the same way Einstein went beyond Newton: by making connections that other people did not see.
Liberty Belle
(9,534 posts)liberal N proud
(60,334 posts)That might be one of the biggest dangers of AI.
bucolic_frolic
(43,128 posts)If you ask AI detailed questions, you soon find out it is a bit shallow. Don't expect real thinking or interpretation. I've stumped it several times. It just doesn't "know" what to think because it's a rehash of others' thinking.
Bernardo de La Paz
(48,988 posts)Pisces
(5,599 posts)Works and develop things from our own library of knowledge taken from others. These works are purchased at some point and then input or taken from the internet where some works are free. I dont like AI or what can come of it in the future, but Im not sure copyright infringement will apply.
Bernardo de La Paz
(48,988 posts)It is as if the writer wants artists to never see a Picasso or Van Gogh or Dali.
BootinUp
(47,141 posts)to a human. Why? They are as much alike as an amoeba is to a rock.
Bernardo de La Paz
(48,988 posts)It is you who want a tractor to be alike a donkey, by your logic.
Of course humans and AI are different, duh.
BootinUp
(47,141 posts)It is as if the writer wants artists to never see a Picasso or Van Gogh or Dali.
Is that you think it should be ok to "train" an AI computer on copyrighted works because humans are trained that way.
Bernardo de La Paz
(48,988 posts)... you have to train the someone or something with copyright protected content.
Technically, though Picasso, VG and Dali did many or all of their important works before 1927, the photographs of their work is protected copyright content.
If you won't forbid an African American woman from studying Van Gogh, why would you prevent an AI?
Then flip it. How would you expect an AI to have any knowledge or understanding of Van Gogh's impact without studying his paintings?
How would you expect an AI to have ANY level of comprehension of "Imagine" by John Lennon without hearing the copyrighted recording or without reading the copyrighted lyrics? Or do you expect it to learn everything about it only by reading copyright protected reviews in music magazines and copyright protected Ph.D. theses on popular music?
BootinUp
(47,141 posts)and then I objected to the idea. Seems like we can leave it there.
Effete Snob
(8,387 posts)Just give the number of years.
Also, do you know how long copyright lasted when you were born?
Do you know the rationale behind making copyright term limited instead of lasting forever?
BootinUp
(47,141 posts)to what I was discussing. To be relevant the subject would be about advocating for a change to copyright laws. I was not.
Effete Snob
(8,387 posts)Which means that you are happy with the current terms.
The post to which you responded mentioned several artists - none of whose work is subject to copyright.
The creation of new works requires the use of existing works, and of course authors, artists and others in the creative arts are familiar with their fields and influenced by works and artists which they have studied to achieve mastery in their fields.
Your point is that this is okay, provided that all of the work used was produced by (a) someone who died at least 70 years ago or (b) is owned by a corporation and was first published more than 95 years ago. In many places outside of the US, then your position is that it should have been produced by someone who died at least 50 years ago.
But your position relies on the prevailing copyright term laws to define what you believe is, or is not, material which can or should be used to train AI models.
BootinUp
(47,141 posts)I could have more carefully stated why its not relevant, but I am tired.