Automated Code Generation in Computational notebooks using pandas data analysis framework
Natural Language to Code Generation in Interactive Data Science Notebooks
An example of a computational notebook adapted from our dataset, with examples of reading and preprocessing data (cell
Data science notebooks are interactive computing environments that are ubiquitous among data scientists to perform datawrangling and analytic tasks.
To measure the performance of artificial pair programmers that automatically synthesize programs for those tasks given natural language(nl)intents from users, we build arcade, a benchmark of 1082 code generation problems using the pandas data analysis framework in data science notebooks.arcade features multiple rounds of nl-to-code problems from the same notebook.
It requires a model to understand rich multi-modal contexts, such as existing notebook cells and their execution states as well as previous turns of interaction.
To establish a strong baseline on this challenging task, we develop pachinco, a 62b code language model (lm) for python computational notebooks, which significantly outperforms public code lms.
Finally, we explore few-shot prompting strategies to elicit better code with step-by-step decomposition and nl explanation, showing the potential to improve the diversity and explainability of model predictions.
Authors
Pengcheng Yin, Wen-Ding Li, Kefan Xiao, Abhishek Rao, Yeming Wen, Kensen Shi, Joshua Howland, Paige Bailey, Michele Catasta, Henryk Michalewski, Alex Polozov, Charles Sutton
Data science is the process of extracting insights from data, and has become an integral part of decision making and knowledge discovery.
This has motivated research on automating and accelerating the data science workflow in general, with particular interest in data wrangling and exploratory data analysis (eda) tasks.
Large language models (llms) trained on code can assist developers by translating natural language (nl)intents into executable programs, with promising applications in synthesizing code for data wrangling.
Several benchmarks have been proposed to evaluate program synthesis of data science programs from nl intents, but these datasets have several limitations.
Existing datasets usually contain independent tasks with isolated contexts, or limited number of contextually dependent problems, rather than having multiple, related tasks with complex dependencies such as in.
Features a series of nl utterances written by professional data scientists with the intention of interacting with an ai pair programmer when working in a notebook (, green texts in) with high-quality code solutions using the library.
Therefore, there is a need for a benchmark that provides, so as to better reflect real-world usage by data scientists.
To fill this gap, we present, a new benchmark for code generation in computational notebooks in computational notebooks.
Result
In this paper we present, a code generation benchmark for data wrangling and eda tasks in computational notebooks.
Features problems with realistic nl intents and rich notebook contexts.
We also develop, a @xmath0 lm tailed for data science, and show that outperforms public code lms on, while being effective in few-shot learning to improve code style and solution diversity.