Using Artificial Intelligence to Help Write Code

February 28, 2023

What is ChatGPT?

ChatGPT is a natural language processing (NLP) model that has emerged as a powerful tool for a wide variety of use cases. Created by OpenAI using a deep learning model trained through reinforcement learning from human feedback, ChatGPT is able to understand and respond to questions and prompts in a human-like manner. ChatGPT has been trained on massive datasets of text from the internet, including the web pages, books, articles, and the entirety of Wikipedia. Users are also able to provide constraints or specifications to ChatGPT’s responses. Here is an example of ChatGPT understanding and responding to a prompt with a constraint on the number of sentences in the reply:

 "ChatGPT is a large language model created by OpenAI, trained on a massive corpus of text to understand and generate human-like language. It can be used to answer..."

How can we use ChatGPT to help us write Stata code?

As an NLP model, ChatGPT has been trained on a vast body of programming languages. As such, it has various use cases within programming, such as providing examples of code for specific tasks, correcting syntax errors, and suggesting ways to optimize code. The following sections explore ways that ChatGPT can be used in scenarios that frequently arise when programming in Stata. For these examples, we will use the census.dta shipped dataset, which can be imported into Stata as follows:

clear all


sysuse census.dta

This dataset, which is included when Stata is installed, contains the following information for 50 states in the United States:

  • state – state name

  • state2 – two-letter state abbreviation

  • region –  census region

  • pop – population

  • poplt5 – population, < 5 year

  • pop5_17 – population, 5 to 17 year

  • pop18p – population, 18 and older

  • pop65p – population, 65 and older

  • popurban – urban population

  • medage – median age

  • death – number of deaths

  • marriage – number of marriages

  • divorce – number of divorces

To facilitate the process of asking ChatGPT to help write code for this dataset, we can feed it the list of variables as such:

A user asks ChatGPT to use a set of variables in Stata code, and ChatGPT asks what kind of task or analysis to complete.

Proposing Code for Specific Tasks

For folks learning how to code, getting started with solving a specific problem can often be the hardest step. ChatGPT can be used to support first attempts at coding for specific tasks.. We begin with a relatively simple task of attempting to calculate the death rate for each state, which is the number of deaths per 100,000 people: 

The user asks for a variable equal to the death rate per 100,000. ChatGPT responds with the Stata code as well as an explanation of each element of code.

In this example, ChatGPT is able to understand the input prompt, consider the variable names that were initially provided, and propose accurate and concise code to accomplish the requested task. In the event that the suggested code is non-functional or does not accomplish the task, ChatGPT can also be asked to write alternative code:

A user asks ChatGPT to write the code it provided in an alternative way. It does so and explains the elements as well as reasoning for using this alternative method.

As in the above examples, ChatGPT will not only provide the requested code, but also a description of what each component of said code accomplishes. As such, ChatGPT can also serve as an educational tool for those seeking to further their programming skills and knowledge. 

Correcting Syntax

Another common problem that arises when programming in Stata is determining the correct syntax. Iterating upon the previous example, we can ask ChatGPT to correct code that appears to be incorrect as follows:

The user asks ChatGPT to correct Stata code that accidentally uses an X for multiplication instead of an asterisk. ChatGPT corrects the code appropriately.

As shown in this example, ChatGPT will provide remediated syntax as well as indicate exactly how and why it corrected the line of code that we provided. 

Optimizing Code

Even functional code can be improved to be more readable and computationally efficient. In the following example, we ask ChatGPT to optimize code that generates the death rate per 100,000:

The user asks ChatGPT to make a pre-written Stata code more efficient. ChatGPT replies with a single line after complimenting the original coding's efficiency.

The input code above generates the death rate in three distinct steps: it creates an empty variable, assigns a value to said variable, then generates the death rate as the quotient of the total number of deaths with the newly created intermediate variable. As in the example to correct syntax, ChatGPT will provide the code while also explaining exactly why its proposed solution is more efficient. Although this is a fairly simple example, it illustrates how ChatGPT can be used to optimize more complex processes, saving valuable computing resources and time. 

Iterating on Requests to Fix Erroneous Responses

ChatGPT does not always provide correctly working code on a first pass. we may need to iterate upon requests to get to the right answer. The following example asks ChatGPT to split the states across two groups and apply a statistical test to determine whether the means across the two groups are statistically different from each other:

ChatGPT provides a line of code that will trigger an error message in Stata.

However, when inputting this code into Stata, the following error message appears:

A snapshot of Stata returning an error code that says "more than 2 groups found, only 2 allowed, r(520);"

We can then ask ChatGPT to alter its response to account for this error message as well as any additional information we can provide:

A user notifies ChatGPT of the error in the code, providing the wording of the error code from Stata. ChatGPT provides solutions in the form of re-written and additional lines of code.

What are the downsides to using ChatGPT?

As with any tool, the utility of ChatGPT is largely determined by its user. Although ChatGPT represents a major advancement in the development and use of artificial intelligence, one must be prepared to work around its various limitations in order to utilize it to its fullest potential. 

First, ChatGPT is not infallible. As shown in an earlier example, it is capable of providing incorrect responses that seem correct at first glance. These instances of erroneous responses can be limited by providing as much information as possible at the outset of the request–in the earlier example, ChatGPT was not aware that region included more than two categories because we neglected to provide that information.  

The nature of ChatGPT as a language-trained model also presents structural barriers that contribute to its shortcomings. For instance, as a model trained up to data from 2021, it is not an appropriate source for up-to-date information such as newly developed statistical packages. 

Finally, some would argue that overreliance on ChatGPT can be a barrier to learning. Because of its ability to synthesize information from a large body of information, extensively using ChatGPT for research limits opportunities for directly interfacing with the source material, which obscures important context and can result in spurious insights. It can also present a barrier to learning outcomes in the classroom–students may be tempted to use ChatGPT to complete assignments, which can disrupt the process of fully understanding the material. One should always inform professors, TA’s, and other authority figures when using ChatGPT in an academic setting.

Based on its increasingly widespread adoption in a variety of applications, it appears that ChatGPT and its forthcoming competitors are here to stay. While users should certainly consider it as a tool to assist with writing code and other programming needs, they should take the necessary precautions with this new technology, including fact-checking its output and citing it as a source when appropriate. Additionally, ChatGPT can be used to assist with other programming languages such as R and Python and can also be asked to translate code across languages.