It's library conference season!  A week or so after the Kraemer Copyright Conference ended, this librarian's feverish mental buzz – a condition a colleague has coined “conference brain”, incurred from non-stop exposure to deep and novel topics to think about – has mostly subsided.  Now, it's time to reflect and start discussing a bit what it means.  Here's a brief dive into a topic that touches AI, copyright and state policy, and it's going to involve dipping our toes in some legalese, the slipperiest of waters.*

The Kraemer Copyright Conference has been going for 11 years now, and gathers an international group of librarians together to discuss the state of copyright in law and society and how librarians can work within the law to promote their institutions' missions of access, research and preservation.  This year, one of the presenters was a colleague at K-State, Gwen Sibley.  The informational nugget of gold that formed the basis of her talk?  It is against official state policy for state employees to use copyrighted information to train AI.

The policy Gwen cited is KS OITS POLICY AND PROCEDURES MEMORANDUM 8200.00.  This went into effect last summer and applies to KUMC as a state agency.

The relevant section [emphasis in all blockquoted sections courtesy the post author]:

9.2.5 Material that is copyrighted or the property of another, shall not be entered as input to generative AI.

That's it.  It seems to read pretty clearly/bluntly; no inputting copyrighted/other people's material into gen AI.

Of further relevance to KUMC-affiliated folks is the following: 

9.2.3 Restricted Use Information (RUI) shall not be provided when interacting with generative AI. Refer to ITEC Policy 7230A Section 9.16 Account Management - RUI

So let's dig up the definition of RUI in ITEC Policy 7230A:  

Restricted Use Information (RUI): Includes PFI [Personal Financial Information], PII [Personally Identifiable Information], and PHI [Personal Health Information] as defined in this Standard, as well as other regulated data (e.g. tax or criminal justice information) or information agencies designate as Restricted-Use  Information due to their confidential or sensitive nature (e.g. physical or logical security information for state agencies and their systems.

As you can see, there are some things the state really doesn't want you to feed into an AI model, and reasonably so.  The blanket restriction in section 9.2.5, though, see

Thinking about such things almost inevitably steers my focus to: what are users adopting now, and what are we as KUMC employees and librarians using now?  (Even if they/we are not fully aware they're the tools are AI-based.)  What about tools that transform your data as soon as it's out of your mouth?  Otter.ai, Read.ai, Fireflies.ai, Fathom.  These are all relatively new tools that use AI models to – among other things – auto-transcribe and summarize meetings and documents.  Many reviewers and users – read.ai specifically came up during the conference – have used these tools and report good outcomes using them.  Other AI-based tools – auto-transcription of Zoom or Teams meetings – have been available for years and are already in wide use everywhere.  Also, there is already a similar AI-enabled tool specifically for doctors: PatientNotes.

This gets to a very particular copyright concept: when does a work become protected by copyright law?  Well, let's take a look at 17 U.S. Code § 102:

(a) Copyright protection subsists, in accordance with this title, in original works of authorship fixed in any tangible medium of expression, now known or later developed, from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device. Works of authorship include the following categories:
(1) literary works;
(2) musical works, including any accompanying words;
(3) dramatic works, including any accompanying music;
(4) pantomimes and choreographic works;
(5) pictorial, graphic, and sculptural works;
(6) motion pictures and other audiovisual works;
(7) sound recordings; and
(8) architectural works.

For the sake of argument, it can be conceived that any given meeting attendee in a recorded meeting owns the copyright to their contribution the very second it is recorded.  If this is the case, the auto-transcription becomes an immediate no-no under section 9.2.5 of the OITS policy, as the meeting attendee's contribution is copyrighted.

Perhaps these uses are all covered in a blanket licensing agreement for the software, but that's a risky assumption for any end user.  (When was the last time you closely read any licensing information?) Of course: IANAL, IANYL, TINLA.  However, the points are:

  1. there is a risk here that eventually someone may claim their copyright was infringed upon by the use such a software tool;
  2. these tools are already in use by state employees;
  3. state policy makes no accommodation for grandfathering in/exempting current uses of AI-enabled software.

Additionally, section 9.2.3 above would preclude use of tools like PatientNotes by state employees, given the necessary use of PII and PHI in any clinical setting.

These tools raise a lot of tantalizing prospects when put into healthcare, education, and library contexts: comprehensive and accurate note-taking/meeting summarization; data mining and training of AI using diverse, quality sources (beyond The Pile); new uses we haven't even begun to conceptualize.  They also raise a multitude of questions: is the data secure from access? Is the data used for training other models? Is the data being anonymized and sold to insurers or some other business interest?

Also, what's going to be the result when all these questions fall into our specific context (university-affiliated health sciences library)?

OK, answering that last one is definitely beyond the scope of a single post.  However, this blog will return to this topic in the next months.  We'll cover: new developments/refinements in the federal and state regulatory environments regarding AI; recent court cases, the rulings of which determine the regulatory and statuary environment in which AI and copyright exist; current best practices in assessing the usefulness of AI-enabled tools and the risks of using them; particular tools that will be of interest to medical professionals; and the implications of AI for everyone's favorite copyright topic, fair use.

Hope you stay tuned in the weeks and months, there's lots more ahead!

* Keep in mind that, with regards to this essay (and others to come), yours truly is none of the following:

  • an AI expert;
  • a copyright expert;
  • a policy wonk;
  • a lawyer;
  • your lawyer;
  • etc.