Pachter et al., 2026 bioRxiv
The work of Pachter et al., focuses on porting well established edgeR package to Python (edgePython) using Claude Opus 4.5, 4.6 and Codex. As described in the paper, the edgeR is a non trivial package supporting various functions used for differential expression analysis in genomics research. Due to Python’s popularity in data science, having edgeR natively available in Python ecosystem, would be highly beneficial for the scientific community. Although, there exists alternative solutions of combining/converting R and Python objects interchangeably,1 they can be cumbersome.
We have already seen an interest from the single-cell community where DESeq2 was rewritten from scratch to PyDESeq2, although it does not produce exact results as the original. That being said, there were few things I enjoyed when reading this article.
- Transparency of usage of LLM models.
- Using existing tests in edgeR to verify correctness. I do not think asking LLM model to write tests against its own implementation would be a good idea in practice.
I am curious to see if more tools in future will be ported to other more optimized programming languages such as Rust with bindings for interpreters (i.e. pyo3). It seems like these types of tasks are well suited for the LLMs where the objective is not to invent (gcc compiler) but to rather “translate” from one language to another.
Lior Pachter, Differential analysis of genomics count data with edge. bioRxiv 2026.02.16.706223; doi: https://doi.org/10.64898/2026.02.16.706223