8 ChatGPT tools for R programming

ChatGPT can answer questions about a wide range of technology subjects, including how to write R code. That means ChatGPT’s power is available to every R programmer, even those who know little about large language models.

An ecosystem is forming around ChatGPT and R, making it easy to incorporate AI technology into your R language workflow. But before you begin using LLMs and related tools for your R projects, there are a few important things to keep in mind:

  1. Everything you ask using these tools gets sent to OpenAI’s servers. Don’t use ChatGPT tools to process sensitive information.
  2. ChatGPT may confidently return incorrect answers. Even incorrect responses can be a starting point, and save you time, but don’t assume the code will do exactly what you expect. Kyle Walker (an associate professor at Texas Christian University and author of the popular tidycensus R package) tweeted that ChatGPT can “supercharge your work if you understand a topic well,” or it can leave you “exposed for not knowing what you are doing.” The difference is in knowing when the AI output isn’t right. Always check ChatGPT’s responses.
  3. ChatGPT can also generate different responses to the same query—and some answers might be accurate while others aren’t. For instance, when I asked multiple times for a ggplot2 bar chart with blue bars, the code generated a graph with blue bars sometimes but not others, even though I submitted the same request each time. This is less than ideal if you need a reproducible workflow.
  4. LLMs have training data cutoff dates, so if there’s been a recent update to a package you’re using, your tool of choice may not know about it.
  5. Most of the resources in this article require you to have your own OpenAI API key, and the API isn’t free to use. While pricing is low at the moment, especially for GPT-3.5, there’s no guarantee it will stay that way.
  6. Asking ChatGPT for coding help is unlikely to ensnare you in the ethics of AI racial and gender bias. However, there are heated discussions about the wisdom of furnishing OpenAI with yet more data; the ethics of how the training data was scraped and repurposed; and whether it’s better to use open source LLMs (such as Mistral’s) rather than OpenAI’s. Those dilemmas are for every individual and organization to wrangle for themselves. However, as of this writing, there simply aren’t R-specific LLM tools that are comparable to those building up around ChatGPT. 

Now, let’s look at some of the most notable R-focused resources for using OpenAI’s GPT-3.5 and GPT-4 LLMs.

8 ChatGPT packages for R

  • The air package
  • TheOpenAIR package
  • RTutor
  • CodeLingo
  • askgpt
  • gptstudio
  • gpttools
  • gptchatteR

The air package

The air R package aims to create GitHub Copilot chat-like functionality for R users with two functions: howto() and whatis(). You can ask questions about how to write R code for a specific task using howto(), such as howto("How do I rotate x-axis text on a ggplot2 graph?"). Answers show up in your R console instead of within your R script, as happens with GitHub Copilot. That might be a bit less convenient. On the other hand, I found the responses quite good when using GPT-4 as my underlying model.

For example, when asking

howto("Create a bar chart in R where the bars are colored steel blue and the bars are ordered by descending Y value")

air responded with


# Generate example data
set.seed(42)
data <- data.frame(
  Category = factor(paste("Category", 1:10)),
  Value = sample(1:100, 10)
)

# Order data by Value in descending order
data$Category <- factor(data$Category, levels = data$Category[order(data$Value, decreasing = TRUE)])

# Load ggplot2 package
library(ggplot2)

# Create bar chart
ggplot(data, aes(x = Category, y = Value, fill = "steelblue")) +
  geom_bar(stat = "identity") +
  scale_fill_identity() +
  theme_minimal()

GitHub Copilot gave me


# Create a bar chart in R where the bars are colored still blue and the bars are ordered by descending Y value
ggplot(data = diamonds, aes(x = cut, fill = cut)) + 
  geom_bar() + 
  scale_fill_manual(values = "steelblue") + 
  coord_flip() + 
  theme_minimal() + 
  theme(legend.position = "none") + 
  labs(x = "Cut", y = "Count", title = "Bar Chart of Diamond Cut")

The GitHub Copilot code did not work (scale_fill_manual() is looking for one color for each category). GitHub Copilot uses an OpenAI Codex model for its responses. Copilot also offers unlimited use for a monthly fee, as does ChatGPT with the GPT-4 model; but using the OpenAI API within an application like this will trigger a charge for each query. Running three or four queries cost me less than a penny, but heavy users should keep the potential charges in mind.

The air package has excellent and elegant setup instructions on its GitHub README page, including a secure way to store your OpenAI key. The air::set_key() command triggers a pop-up window for securely storing the key in your system’s key ring. You can also set the OpenAI model you want to use with set_model() if you don’t want to use the gpt-4 default.

Note that this package is for R-related questions only and will not respond to questions about other programming languages. You don’t have to specify that you want code in R in your questions; I did that in my example to make the question comparable to what I asked GitHub Copilot.

The air package was created by Professor Soumya Ray at the College of Technology Management, National Tsing Hua University in Taiwan. It is available on CRAN.

TheOpenAIR package

TheOpenAIR package is an excellent choice for incorporating ChatGPT technology into your own R applications, such as a Shiny app that sends user input to the OpenAI API. You can register your key with the openai_api_key(“YOUR-KEY”) function.

Its chat() function gives you the option to print results to your console with

chat(“My request”), save results as text with my_results <- chat(“My request”, output = “message”), or return a complete API response object with my_results_object <- chat(“My request”, output = “response object”)

The response object is a list that also includes information like tokens used.

Other useful functions include count_tokens() to count the number of ChatGPT tokens a character string will cost when sent to the API, extract_r_code() to get R code from a ChatGPT response that includes a text explanation with code, and get_chatlog_id() to get the ID of the current ChatGPT (useful if you want to break up a complex application into smaller functions).

The package has some general coding functions, as well. For example, write_code(“filename”) generates a prompt asking for your input and in what language you want the code written. The refactor() syntax is R-specific and does what you’d expect:

TheOpenAIR package lets you select the language for your code. Screenshot by Sharon Machlis for IDG.

Figure 1. Select the language for your generated code.

There are also functions to convert between Python and R or Java and R, although you may end up with a warning message that the conversion from R to Python could result in invalid Python code.

Run help(package = “TheOpenAIR”) in your R console to see its many other functions.

TheOpenAIR package was developed by Assistant Professor Ulrich Matter and PhD student Jonathan Chassot at the University of St. Gallen in Switzerland. It is available on CRAN.

RTutor

This app is an elegant and easy way to sample ChatGPT and R. Upload a data set, ask a question, and watch as it generates R code and your results, including graphics. Although it’s named RTutor, the app can also generate Python code.

RTutor is available on the web. It’s currently the only app or package listed that doesn’t require a ChatGPT API key to use, but you’re asked to supply your own for heavy use so as not to bill the creators’ account.

Bar graph generated by asking for bars to be steel blue if value > 0 and red if <0, sorted. Screenshot by Sharon Machlis for IDG.

Figure 2. Results when asking RTutor to create a bar chart.

The app’s About page explains that RTutor’s primary goal is “to help people with some R experience to learn R or be more productive … RTutor can be used to quickly speed up the coding process using R. It gives you a draft code to test and refine. Be wary of bugs and errors.”

The code for RTutor is open source and available on GitHub, so you can install your own local version. However, licensing only allows using the app for nonprofit or non-commercial use, or for commercial testing. RTutor is a personal project of Dr. Steven Ge, a professor of bioinformatics at South Dakota State University. 

CodeLingo

This multi-language app “translates” code from one programming language to another. Available languages include Java, Python, JavaScript, C, C++, PHP and more, including R. This is a web application only, available at https://analytica.shinyapps.io/codelingo . You need to input your OpenAI API key to use it (you may want to regenerate the key after testing).

ggplot2 code is 'translated' into Python using the pandas and matplotlib libraries. Screenshot by Sharon Machlis for IDG.

Figure 3. ChatGPT in the CodeLingo app attempts to translate ggplot2 graph code to Python.

A request to translate code for a ggplot2 R graph into JavaScript generated output using the rather hard-to-learn D3 JavaScript library, as opposed to something a JavaScript newbie would be more likely to want such as Observable Plot or Vega-Lite.

The request to translate into Python, shown in Figure 3, was more straightforward and used libraries I’d expect. However, ChatGPT didn’t understand that “Set1” is a ColorBrewer color palette and can’t be used directly in Python. As is the case for many ChatGPT uses, translating code between programming languages may give you a useful starting point, but you will need to know how to fix mistakes.

The app was created by Analytica Data Science Solutions.

askgpt

This package, available at https://github.com/JBGruber/askgpt, can be a good starting point for first-time users who want ChatGPT in their console, in part because it gives some instructions upon initial startup. Load the package with library(askgpt) and it responds with:


Hi, this is askgpt ☺.
• To start error logging, run `log_init()` now.
• To see what you can do use `?askgpt()`.
• Or just run `askgpt()` with any question you want!

Use the login() function without first storing a key, and you’ll see a message on how to get an API key:


ℹ It looks like you have not provided an API key yet.
1. Go to <https://platform.openai.com/account/api-keys>
2. (Log into your account if you haven't done so yet)
3. On the site, click the button + Create new secret key to create an API key
4. Copy this key into R/RStudio

You’ll be asked to save your key in your keyring, and then you’re all set for future sessions. If your key is already stored, login() returns no message.

askgpt‘s default is to store the results of your query as an object so you can save them to a variable like this one:


barchart_instructions <- askgpt("How do I make a bar chart with custom colors with ggplot2?")

Submit a query and you’ll first see:


GPT is thinking ⠴ 

This way, you know your request has been sent and an answer should be forthcoming—better than wondering what’s happening after you hit Submit.

Along with the package’s general askgpt() function, there are a few coding-specific functions such as annotate_code(), explain_code(), and test_function(). These will involve cutting and pasting responses back into your source code.

For those familiar with the OpenAI API, the package’s chat_api() function allows you to set API parameters such as the model you want to use, maximum tokens you’re willing to spend per request, and your desired response temperature (which I’ll explain shortly).

The chat_api() function returns a list, with the text portion of the response in YourVariableName$choices[[1]]$message$content. Other useful information is stored in the list, as well, such as the number of tokens used.

The askgpt package was created by Johannes Gruber, a post-doc researcher at Vrije Universiteit Amsterdam. It can be installed from CRAN.


Source link

Leave a Reply