What is ChatGPT?
ChatGPT is a natural language processing (NLP) model that has emerged as a powerful tool for a wide variety of use cases. Created by OpenAI using a deep learning model trained through reinforcement learning from human feedback, ChatGPT is able to understand and respond to questions and prompts in a human-like manner. ChatGPT has been trained on massive datasets of text from the internet, including the web pages, books, articles, and the entirety of Wikipedia. Users are also able to provide constraints or specifications to ChatGPT’s responses. Here is an example of ChatGPT understanding and responding to a prompt with a constraint on the number of sentences in the reply:
How can we use ChatGPT to help us write Stata code?
As an NLP model, ChatGPT has been trained on a vast body of programming languages. As such, it has various use cases within programming, such as providing examples of code for specific tasks, correcting syntax errors, and suggesting ways to optimize code. The following sections explore ways that ChatGPT can be used in scenarios that frequently arise when programming in Stata. For these examples, we will use the census.dta shipped dataset, which can be imported into Stata as follows:
clear all sysuse census.dta |
This dataset, which is included when Stata is installed, contains the following information for 50 states in the United States:
-
state – state name
-
state2 – two-letter state abbreviation
-
region – census region
-
pop – population
-
poplt5 – population, < 5 year
-
pop5_17 – population, 5 to 17 year
-
pop18p – population, 18 and older
-
pop65p – population, 65 and older
-
popurban – urban population
-
medage – median age
-
death – number of deaths
-
marriage – number of marriages
-
divorce – number of divorces
To facilitate the process of asking ChatGPT to help write code for this dataset, we can feed it the list of variables as such:
Proposing Code for Specific Tasks
For folks learning how to code, getting started with solving a specific problem can often be the hardest step. ChatGPT can be used to support first attempts at coding for specific tasks.. We begin with a relatively simple task of attempting to calculate the death rate for each state, which is the number of deaths per 100,000 people:
In this example, ChatGPT is able to understand the input prompt, consider the variable names that were initially provided, and propose accurate and concise code to accomplish the requested task. In the event that the suggested code is non-functional or does not accomplish the task, ChatGPT can also be asked to write alternative code:
As in the above examples, ChatGPT will not only provide the requested code, but also a description of what each component of said code accomplishes. As such, ChatGPT can also serve as an educational tool for those seeking to further their programming skills and knowledge.
Correcting Syntax
Another common problem that arises when programming in Stata is determining the correct syntax. Iterating upon the previous example, we can ask ChatGPT to correct code that appears to be incorrect as follows:
As shown in this example, ChatGPT will provide remediated syntax as well as indicate exactly how and why it corrected the line of code that we provided.
Optimizing Code
Even functional code can be improved to be more readable and computationally efficient. In the following example, we ask ChatGPT to optimize code that generates the death rate per 100,000:
The input code above generates the death rate in three distinct steps: it creates an empty variable, assigns a value to said variable, then generates the death rate as the quotient of the total number of deaths with the newly created intermediate variable. As in the example to correct syntax, ChatGPT will provide the code while also explaining exactly why its proposed solution is more efficient. Although this is a fairly simple example, it illustrates how ChatGPT can be used to optimize more complex processes, saving valuable computing resources and time.
Iterating on Requests to Fix Erroneous Responses
ChatGPT does not always provide correctly working code on a first pass. we may need to iterate upon requests to get to the right answer. The following example asks ChatGPT to split the states across two groups and apply a statistical test to determine whether the means across the two groups are statistically different from each other:
However, when inputting this code into Stata, the following error message appears:
We can then ask ChatGPT to alter its response to account for this error message as well as any additional information we can provide:
What are the downsides to using ChatGPT?
As with any tool, the utility of ChatGPT is largely determined by its user. Although ChatGPT represents a major advancement in the development and use of artificial intelligence, one must be prepared to work around its various limitations in order to utilize it to its fullest potential.
First, ChatGPT is not infallible. As shown in an earlier example, it is capable of providing incorrect responses that seem correct at first glance. These instances of erroneous responses can be limited by providing as much information as possible at the outset of the request–in the earlier example, ChatGPT was not aware that region included more than two categories because we neglected to provide that information.
The nature of ChatGPT as a language-trained model also presents structural barriers that contribute to its shortcomings. For instance, as a model trained up to data from 2021, it is not an appropriate source for up-to-date information such as newly developed statistical packages.
Finally, some would argue that overreliance on ChatGPT can be a barrier to learning. Because of its ability to synthesize information from a large body of information, extensively using ChatGPT for research limits opportunities for directly interfacing with the source material, which obscures important context and can result in spurious insights. It can also present a barrier to learning outcomes in the classroom–students may be tempted to use ChatGPT to complete assignments, which can disrupt the process of fully understanding the material. One should always inform professors, TA’s, and other authority figures when using ChatGPT in an academic setting.
Based on its increasingly widespread adoption in a variety of applications, it appears that ChatGPT and its forthcoming competitors are here to stay. While users should certainly consider it as a tool to assist with writing code and other programming needs, they should take the necessary precautions with this new technology, including fact-checking its output and citing it as a source when appropriate. Additionally, ChatGPT can be used to assist with other programming languages such as R and Python and can also be asked to translate code across languages.