--- title: "Introduction to mergen" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to mergen} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup,include=FALSE} library(mergen) ``` mergen is a package which employs artificial intelligence to convert data analysis questions into executable code, explanations, and algorithms. The self-correction features allow the generated code to be optimized for performance and accuracy. mergen features a user-friendly chat interface, enabling users to interact with the AI agent and extract valuable insights from their data effortlessly. This document introduces you to mergens basic set of tools, and shows you how to apply them to answer data analysis related questions and generate relevant R code. ### Setting up an AI-agent To be able to interact with an AI agent and use this agent for subsequent tasks, mergen contains the `setupAgent` function for setting up a framework for the agent.Mergen allows you to set up an agent for the openai API platform as well as for the replicate API platform. #### Setting up an openai agent For setting up an agent for the openai API platform, you can make use of the `setupAgent` function by setting the `name="openai"` argument. Let's look how to setting up an agent works: ```{r} myAgent <- setupAgent(name="openai",type="chat",model="gpt-4",ai_api_key = "your_key") myAgent ``` the `setupAgent` function returns a list containing all the agent information which can be used by subsequent functions. mportant to note is that the `ai_api_key` should be your OpenAI API key, provided as a string. #### Setting up an agent for replicate `setupAgent` also contains functionality for setting up an agent for replicate AIs. Let's look at how this works: ```{r} myAgent <- setupAgent(name="replicate",type=NULL,model="llama-2-70b-chat",ai_api_key="my_key") myAgent ``` ### Sending a prompt Once you have set up an agent, it is time to ask some questions to your AI model of choice! For this, you can make use of the `sendPrompt` function, or the `selfcorrect` function. The choice of which one to use depends on whether you want possible errors in the answered code to be corrected by sending another request to the model or not. #### Using the sendPrompt function Sending a prompt with the `sendPrompt` function is very easy. The function takes the arguments `agent`, `prompt`, `return.type` and `context`. By default the context is set to rbionfoExp. This tells your model of choice to act as a bioinformatics expert, and return any code as R code in triple backticks. Your prompt must be given as a string, but can contain any question and additional information that you want to send. The return value is a string containing the models answer. ```{r,eval=FALSE} answer <- sendPrompt(myAgent, "how do I perform PCA on data in a file called test.txt?",return.type = "text") answer ``` ```{r,echo = FALSE } answer <- "\n\nThe following R code will read the file called \"test.txt\", normalize the table and do PCA. First, the code will read the file into an R data frame: \n\n```\ndata <- read.table(\"test.txt\", header = TRUE, sep = \"\\t\")\n```\n\nNext, the data will be normalized to the range of 0 to 1:\n\n```\nnormalized.data <- scale(data, center = TRUE, scale = TRUE)\n```\n\nFinally, the normalized data will be used to do a Principal Component Analysis (PCA):\n\n```\npca <- princomp(normalized.data)\n```" print (answer) ``` #### Using the selfcorrect function Sending a prompt with the selfcorrect function, will allow the possible generated code to be optimized for performance and accuracy. If the code that is returned by the model is not excecutable, the selfcorrect function will send the prompt back to the agent together with a list of errors and warnings, so that the code can be optimized. The amount of rounds of possible selfcorrect can be set by the user using the `attempts = n` argument. The return value is a list containing the initial answer of the agent and the final answer after n rounds of selfcorrection. ```{r,include=FALSE} botResponses <- list( "\n\nThe following R code will read the file called \"test.txt\", normalize the table and do PCA. First, the code will read the file into an R data frame: \n\n```R\ndata <- read.table(\"test.txt\", header = TRUE, sep = \"\\t\")\n```\n\nNext, the data will be normalized to the range of 0 to 1:\n\n```{r}\nnormalized.data <- scale(data, center = TRUE, scale = TRUE)\n```\n\nFinally, the normalized data will be used to do a Principal Component Analysis (PCA):\n\n```{R}\npca <- princomp(normalized.data)\n```", "\n\nThe second response.The following R code will read the file called \"test.txt\", normalize the table and do PCA. First, the code will read the file into an R data frame: \n\n```\ndata <- read.table(\"test.txt\", header = TRUE, sep = \"\\t\")\n```\n\nNext, the data will be normalized to the range of 0 to 1:\n\n```\nnormalized.data <- scale(data, center = TRUE, scale = TRUE)\n```\n\nFinally, the normalized data will be used to do a Principal Component Analysis (PCA):\n\n```\npca <- princomp(normalized.data)\n```", "\n\nThe third response.The following R code will read the file called \"test.txt\", normalize the table and do PCA. First, the code will read the file into an R data frame: \n\n```{r}\nplot(1:10)```\n\nNext, the data will be normalized to the range of 0 to 1:\n\n" ) answer <- list(init.response=botResponses[[1]], init.blocks=extractCode(clean_code_blocks(botResponses[[1]])), final.response=botResponses[[3]], final.blocks=extractCode(clean_code_blocks(botResponses[[3]])), code.works=TRUE, exec.result="path/to/html/file", tried.attempts=3) ``` ```{r,eval=FALSE} answer <- selfcorrect(myAgent, prompt="How do I perform PCA?",attempts=3) ``` ```{r} print(answer) ``` ### Running the code #### Extracting code blocks Once you have sent a prompt and recieved the answer, mergen features a function `extractCode` that allows the user to extract code blocks from the given text. Before using this, however, the code blocks need to be cleaned up, as every agent will return its answer in a slightly different way. This can be done with the help of the `clean_code_blocks` function. Below is an example of what `clean_code_blocks` does with the answer returned by our agent above: ```{r} code_cleaned <- clean_code_blocks(answer$final.response) cat(code_cleaned) ``` As you can see above, `clean_code_blocks` ensures that all code is stripped from extra symbols such as {r}, R, r and {R}. This ensures that the function `extractCode` can extract the code blocks properly. The `extractCode` function takes as input a string, and also allows the user to set a delimiter used to enclose the code blocks (default is three backtics). Now lets have a look at what the `extractCode` function returns: ```{r} final_code <- extractCode(code_cleaned,delimiter = "```") print (final_code) ``` As shown above, `extractCode` returns a list containing the actual code and the associated text. The code block can then be tested for execution using the `executeCode` function. #### Running the code mergen features functions that make it easy for the user to run the code returned by an AI agent. Once code blocks are cleaned up and extracted, code blocks can be executed using the `executeCode` function. Before doing that, however, it is advised to run the `extractInstallPkg` function. This function extracts package names and installs any missing packages needed for the code to run. Finally, the `executeCode` function can be used. Lets see what the `executeCode` function does: ```{r} executeCode(final_code$code) ``` As shown above, the code runs as it should! It is important to note that the `executeCode` function will not change the global environment. Any variables that might be created while executing the code will be deleted as the function completes.