...
As the modern tech landscape continues to invest in the development of AI tools such as a large language models, their presence on CNU campus becomes more and more likely for students and faculty alike. In particular, with the development of job opportunities centered around the usage of AI at companies as large as Microsoft, knowing how to navigate such usage in a safe and ethical manner becomes increasingly important. With this in mind, CNU's IT department has developed a set of recommendations for implementing the use of AI tools on campus:. The first of these recommendations, for faculty and students alike, is to be aware of the potential issues surrounding the rise of AI in classrooms.
Potential Issues
What IS an AI?
Oftentimes the first and largest barrier to using AI safely and ethically in a classroom setting comes from a fundamental misunderstanding of what AI even is. This is understandable, as the term comes with many complicated cultural connotations that can make users believe that the modern-day tools are capable of more than they actually are. Discussions around modern AI models also come with multiple terms that are either unfamiliar, or being used in unfamiliar contexts to new users.
Artificial General Intelligence (AGI): A shorthand term for an artificial intelligence model that meets or exceeds human abilities on a broad range of cognitive tasks, and can perform those tasks autonomously. In essence, AGI would be considered intelligent in the same way that humans themselves are intelligent. While true AGI does not currently exist yet, modern research suggests that current models are a significant step towards its eventual creation.
Machine Learning: A computer system that is trained on outside data, that then makes predictions extrapolated from that data. The more data the computer system is exposed to, the more accurate the predictions; thus, exposure to more data is how machine learning systems 'learn'.
Generative AI: Another name for an AI model designed to generate 'new' text, images, code, or other content by using the content of data used to train it. Some experts consider the term to be a misnomer, as the 'new' content is not wholly new, but recombined from other sources.
Large Language Models: An AI model designed to be trained on, and output, natural language text. Examples include ChatGPT and Google's Gemini.
Diffusion Models: An AI model designed to be trained on, and output, non-text data. Common diffusion models are designed to interpret and produce images based on a text prompt. Models that are designed to output audio or video results are also considered diffusion models.
Training Data: Any outside information fed to an AI system to 'teach' it how to respond to prompts.
For other definitions, this article may be a useful starting point.
Unintended Biases and Discrimination
A strong general rule when judging an AI model is that the model cannot create anything it hasn't already seen before–meaning that the output of an AI will always be the result of all the data that was input into the model to train it. This can result in some key issues in the output of large AI models going completely unnoticed by users who assume the model is more intelligent than it is. One example is that a model trained on biased data will replicate that bias in its results; for example, if a medical AI used to diagnose skin diseases is trained using only data from white patients, it may struggle to properly identify medical issues on bodies of color, and result in fewer correct diagnoses on nonwhite skin as a result. However, the model will be seen as presenting its output as purely objective data. Uncritical usage of models fed on biased data can feed into deeply entrenched systems of discrimination, and furthermore mask that discrimination by presenting it as the unbiased truth from a model unaffected by personal bigotry.
Misinformation
A term you may hear often in conversations about AI tools is 'hallucinations'. In the case of AI, a hallucination is when a model confidently outputs incorrect information, either by repeating incorrect information found in its training data or by extrapolating a 'best guess' based on data that does not apply to the situation at hand. This happens because AI models are only able to generate outputs based on the training data they are given, and are not able to differentiate between good and bad data when selecting which information to use. Users who are not aware of this possibility may be fooled into thinking the model is providing real facts, and use and spread this misinformation as a result.
Environmental Impact
As AI models grow in size and are trained with ever larger data sets, the energy costs of using AI grow in turn. Larger models using larger sets of training data consume more energy per query, which raises questions of the environmental impact of widespread AI usage. Environmentally conscious users may not find that the benefit of using AI models to generate text outstrips these costs in the face of rapidly incoming climate crises.
Educational Shortcutting
A common concern stated by educators regarding AI is that the use of AI models will enable students to hide a lack of knowledge about a subject they were intended to learn by using an AI model that can generate the knowledge needed to pass the class. For example, a student who does not understand a unit about Shakespeare, instead of demonstrating his own lack of mastery by writing an essay that would not be received well, may use an AI language model to generate an essay that would demonstrate knowledge he has not retained. Such 'shortcuts', if not properly managed, retain the risk of making students dependent on tech tools they do not understand rather than learning the knowledge and skills they need for themselves, which can have dangerous consequences if said students enter fields where said knowledge is required to perform a job properly and safely.
Training Data, Privacy and Data Governance
The most important part of any AI model is the data set it is trained on, as that data is the source of any possible outputs it could generate. Robust AI models require a large amount of data for training, and are programmed to actively seek out more data. However, this can lead to some ethical and legal issues regarding said data. Many models have been found 'scraping' proprietary work from artists who explicitly don't want their work used in this way, potentially violating intellectual property and copyright laws in the process. Other models have been found violating other, more serious laws when obtaining data; well-known examples include AI models found to have accessed and scanned confidential personal information, up to and including personal messages, private files and even medical records.
While these issues are still being resolved, using these models creates a non-zero possibility that AI-generated outputs may contain snippets of illegally-obtained information, which creates multiple opportunities for faculty and students alike to violate Academic Integrity policies entirely by accident.
Transparency and Oversight
The above issues can be compounded by a lack of transparency on the part of the company hosting an AI tool. Corporations may obscure or outright hide the source of their training data, as well as information such as who is responsible for the AI's development and management. This can make it difficult for users to obtain clear answers on key questions regarding safeguards against misinformation, examination of potential biases, or even legal issues such as data theft. It is the opinion of this department that transparency is necessary in order to ensure any AI tool is being developed ethically and responsibly, and that AI tools that operate with a lack of transparency or clear oversight should be avoided until such transparency and oversight are provided.
Beyond education, we have additional recommendations for both faculty and students considering using AI tools in the classroom.
For Faculty
Create Clear Expectations
...
At this time, there are no tech tools that can reliably detect when a piece of text was generated using an AI tool. As a result, using tools that can claim to do so in order to check student work can not be currently recommended. We instead recommend that faculty use more analog methods to investigate student work, such as checking the accuracy of the text and noting sudden stylistic changes. Depending on the context and submission method of the assignment, IT department may investigate text submitted into Scholar to provide further evidence, but this may also be inconclusive.
Carefully Vet Suggested Tools
AI models are trained on large sets of outside data to develop their algorithms, which can come from a variety of places depending on the type of model. This can be the source of multiple kinds of unethical and even illegal behavior on the part of the model, including the generation of unintended biases, copyright violation, and outright theft of confidential data such as medical records. Work is being done by multiple organizations to enforce transparency regarding the sourcing of training data used to train popular AI models such as ChatGPT; this work has included the filing of multiple lawsuits against models accused of scraping data from unethical or illegal sources. In order to ensure that campus work does not unintentionally replicate these past mistakes, we recommend that all users research the intended AI model's history with their training data, and avoid using models that do not provide information about where said data is sourced from. We also recommend that faculty warn students about AI models that have either current issues or a record of unethical data sourcing.
For Students
Check Classroom Guidelines
...