Artificial Intelligence and the Mental Health Space: Current Failures and Future Directions

October 31, 2023

Artificial Intelligence and the Mental Health Space: Current Failures and Future Directions

Over the last decade, we have seen an increasing rise in the use and study of artificial intelligence for the mental health space. Among the algorithms being explored, large language models (LLMs) may have the highest potential to give clinicians and researchers access to a new mode of understanding symptom and diagnosis onset and progression. However, these algorithms have brought forth important conversations about how we approach treatment of patients and the training of clinicians in addition to novel complications relating to data privacy and security. For example, a few questions that all AI researchers in the mental health space should consider are:  

  • While LLMs can never replace traditional therapy or crisis lines, could they be used as a support tool? If so, could LLM-enhanced therapy be more useful or acceptable in certain settings? 

  • If we employ the aid of LLMs in the therapeutic process, who is ethically responsible for making clinical decisions in situations where AI replaces human decision-making? 

  • In what ways could these models contribute to bridging the gap between mental health needs and available resources in settings and populations with the greatest needs? 

  • Finally, what data privacy issues and model bias concerns need to be addressed before these models can be implemented safely?

While AI and in particular LLMs have a potential to be impactful in mental health it is important for researchers to be considerate of the unique intricacies of psychopathology and its progression within-individuals. We must consider where the field stands, where it could go, and what has failed thus far if we have any hope of making a safe impact.

Where the field stands and where we could be going

As of now, the AI space is very focused on trying to create chatboxes1 to replace traditional therapy. However, these models as they currently stand will never be able to replace clinicians, due to the intricacy required to address the  complex issues of treating and diagnosing severe mental illness.

However, the strength of these models lies in their usefulness in supporting and complementing traditional therapy. There is a significant need worldwide for properly trained clinical staff. Most clinical programs in the United States are underfunded, leaving a burden on students to obtain appropriate training and are limited by licensure laws to what they can treat as trainees. One way in which LLMs could be very helpful is in supplementing the training of clinicians and non-specialists. Often, clinicians are trained by role playing with their peers as a means of learning how to interact with a range of mental health issues. Here, LLMs chat boxes could pose as clients to help trainees learn how to respond to and interact with a wide range of individuals and issues. This approach could also be effective to train non-specialists or mandated reporters (i.e. teachers, CPS workers) so they can practice skills to help individuals who may be under their care. LLMs allow these individuals to access a range of complex cases and obtain feedback at their own pace. 

Another way in which LLMs may be useful is pre-screening patients and note taking for clinicians. Similar to frequently used automated phone calls, LLMs have the potential to help streamline the process of collecting data from potential patients to screen them and find a suitable provider. At the same time, note taking and record keeping adds a significant burden on clinicians, particularly training clinicians, preventing them providing care to additional patients. Using AI tools to streamline this process would give clinicians more time for direct patient care while still maintaining appropriate records. While these models could help relieve some pressure of the mental healthcare process on patients and providers, there remains a potential legal issue of how this data would be maintained. Data breaches of these models could cause a significant risk to providers and patients. Additionally, there are significant issues of model bias which could be potentially harmful in how these models portray the patient they are interacting with. LLM bias must be sufficiently explored and fine tuned before such an approach could be implemented safely. 

A still untapped area where these models could make a significant impact is in the research on the etiology or the cause(s) and progression of mental illnesses. For example, exploring these models as a means rating interview assessment tools, like the Structured Clinical Interview for DSM-5 for example, could give a newfound perspective on how these issues arise and progress throughout the lifespan by directly exploring the language used by patients and subjects as opposed to an interviewer defined rating scale. In fact, these models could be used to create new rating scales based on the language people use to describe their symptoms which, used in conjunction with traditional scales,give us a clearer understanding of psychopathology. We could also use LLMs in combination with ecological momentary assessment (EMA) vlogs or written text, which are data that are recorded several times throughout the day, usually 2-4 times, for at least 10-14 consecutive days. EMA text and spoken word gives us unique access to real time information about how symptoms progress or new ones develop on a moment by moment basis. By understanding patterns of how individuals speak about their feelings or day to day events we could clarify individual models of psychopathology that help us tailor to their unique needs. 

Finally, these models could also be used to create novel measures of psychopathology commonly used in research to help understand individual and group differences in thoughts, feelings, behaviors, and emotions. There is potential for using language as an alternative to traditional item response theory approaches to questionnaire building to help researchers discover new ways to measure mental health and think about the underlying structure of psychopathology. 

Current examples of AI used for mental health services

Currently in the mental health space, the field of AI has wrongly been very focused on trying to build LLMs that can be used as an alternative to traditional therapy. In fact, there are several cases out there already of dangerously failed attempts to do this.

For example, the National Eating Disorder Association (NEDA) had to remove their crisis chatbox Tessa, built by a group of researchers at Washington University with help from a startup, because it was consistently giving harmful information to its users (1). Individuals who used the service to help with their active eating disorders were being given advice that is well known to be dangerous when dealing with those kinds of mental health issues. Other mental health AIs like Koko, founded by Robert Norris using GPT-4, have also been found to give dangerous and false help. The founder himself gave a long statement on Twitter about their mistakes and its dangers (2). An AI chatbox known as Eliza, built by Chai using GPT-4, is being blamed for the suicide of a man in Belgium after it gave him harmful help and even seemed to suggest that he kill himself (3).

All in all, these cases go to show that AI chatboxes are very far from having the potential to be truly useful as alternatives to true mental health care and most likely will never really be fine tuned enough to do so given the sheer intricacies of individual experiences with mental illnesses.

Thinking into the future

Future directions of AI for the mental health world should instead think creatively about how we can support providers rather than replace them. One place that could be extremely valuable is in the early identification of risk, as well as the study of etiology and of illness and individual symptoms progression. 

As a PhD student in the department of psychology here at Berkeley, my own research focuses on the use of language models to explore identification of risk and etiology of externalizing psychopathology such as personality disorders and substance related issues. Currently, my work focuses on exploring the language that individuals with alcohol and cannabis use disorders use to speak and think about their habits with the aim of creating individual models of substance use related language. I hope that my work and the work of my colleagues will inspire other AI researchers to shift away from direct treatment oriented research and use their skills to help alleviate the pressures of the mental health process for patients, providers, and clinical researchers in other ways.


  1. Mccarthy, Lauren. 2023. “A Wellness Chatbot Is Offline after Its ‘harmful’ Focus on Weight Loss.” The New York Times, The New York Times.
  3. Lovens, Pierre-François. 2023.“‘sans Ces Conversations Avec Le Chatbot Eliza, Mon Mari Serait Toujours Là.’” La Libre.Be,


1 In the field of psychology it is usually preferred to refer to these algorithms as chatboxes instead of bots because it seems to make individuals, particularly patients, feel more comfortable with the idea of interacting with these algorithms.