Skip to content
Published on

Modern Statistical Computing 2026 Complete Guide - R 4.5 · Posit RStudio · Stan · Pyro · NumPyro · Brms · JAX · Tidyverse · data.table · Polars · Marimo Deep Dive

Authors

Prologue — Why statistical computing is hot again in 2026

Statistical computing in 2026 is shaped by two currents that collide yet feed each other. On one side stand R 4.5 and the Posit commercial stack. They have become the default tools for reproducible research, clinical trials, government statistics, and academic papers. On the other side stand JAX, NumPyro, and Polars. They run NUTS samplers on GPUs and TPUs and group tens of gigabytes of dataframes in seconds on Rust backends.

The two currents do not just compete. They complement each other. A Bayesian model is written in R with brms, handed off to Stan through cmdstanr, while heavy data is preprocessed with Polars and pushed through NumPyro for GPU inference. Marimo cuts Jupyter's order-of-execution problem with a reactive notebook, and Quarto binds R, Python, Julia, and Observable into one PDF or website.

The one-line summary is this.

  • R side — R 4.5 (April 2025), Posit (formerly RStudio), Tidyverse, data.table, and Quarto cement the reproducible-research standard.
  • Bayesian side — Stan remains the NUTS standard. brms and rstanarm have made it routine for R users. PyMC 5 and NumPyro split the Python camp.
  • JAX side — On top of Google's JAX sit Flax, Optax, Equinox, and NumPyro, forming the scientific-computing acceleration stack.
  • Data side — Polars 1.x (Rust) and Pandas 2.x (Arrow), DuckDB, and Ibis shake up the analytic-dataframe standard.

This article walks through the whole landscape in one flow.


Chapter 1 · R 4.5 — the April 2025 release and what it means

R 4.5.0 shipped on April 11, 2025, codename "How About a Twenty-Six." Since the R Core Team began work at the University of Auckland in New Zealand back in 1993, R has held the position of the de facto statistical language across academia and industry for more than thirty years.

The user-visible improvements in R 4.5 land where they matter. A new use argument cuts package-search overhead at dataset load. tryInvokeRestart() improves error-handling performance. The ALTREP framework has been polished to lower memory usage on large vectors. Integer-overflow warnings are friendlier.

As of May 2026 CRAN holds about 22,000 packages. Bioconductor adds another 2,300 life-science packages. Together these two repositories are the heart of the R ecosystem.

R remains in 2026 the standard language of statistical analysis, clinical trials, financial risk, and government statistics. Python dominates machine learning, but if you want to drop a clean ANOVA into a report, R is still faster.


Chapter 2 · Posit — why RStudio renamed itself

Posit (posit.co) is the result of RStudio Inc. renaming itself in October 2022. Founded in 2009 by JJ Allaire, RStudio effectively standardized the IDE for R users, but as the company expanded into Python, Julia, and VS Code extensions, it dropped the "R-only" image with a new name.

The three core Posit products break down like this.

  • Posit Workbench (formerly RStudio Server Pro) — Enterprise IDE server. Hosts RStudio, VS Code, and JupyterLab on the same server. Kubernetes integration.
  • Posit Connect (formerly RStudio Connect) — Deploys Shiny apps, Quarto docs, Streamlit, FastAPI, Flask, and Plumber APIs in one place with authentication and scheduling.
  • Posit Package Manager (formerly RStudio Package Manager) — Internal CRAN and PyPI mirror. Security audit, license tracking, curated package queues.

On the open-source side RStudio Desktop and Posit Cloud (formerly RStudio Cloud) remain free or low-cost. The point is to keep the on-ramp open for students and individual users.

Hadley Wickham (founder of Tidyverse), JJ Allaire (R Markdown and Quarto), Yihui Xie (knitr and bookdown), and Joe Cheng (Shiny) — the core developers of the R ecosystem — are all Posit employees.


Chapter 3 · Tidyverse — the second standard of R built by Hadley Wickham

Tidyverse (tidyverse.org) is a collection of R packages that share the same "tidy data" philosophy. The individual packages existed before Hadley Wickham unified them under one banner in 2016, and in 2026 the core packages are these.

  • dplyr — Data manipulation (filter, select, mutate, group, summarise, join). Together with the pipe operator it changed how readable R code looks.
  • tidyr — Data tidying (pivot, missing, nesting).
  • ggplot2 — Visualization based on Grammar of Graphics. The standard R graphics library.
  • purrr — Functional programming. Map and reduce patterns made consistent.
  • readr — Fast CSV and TSV I/O.
  • stringr — String processing.
  • lubridate — Date and time handling.
  • forcats — Factor (categorical) processing.
  • tibble — A modern data.frame.

One install.packages("tidyverse") brings in nine core packages together. The learning curve sits in getting comfortable with both the dplyr pipe (|> or %>%) and the ggplot2 grammar — and once you are, the expressiveness of R goes up a lot.


Chapter 4 · data.table — the other standard outside the Tidyverse

data.table (r-datatable.com) is an R package built by Matt Dowle in 2008. Its syntax differs from the Tidyverse. The bracket form DT[i, j, by] expresses filter, aggregate, and group all at once.

The defining qualities are speed and memory efficiency. In benchmarks such as h2oai's db-benchmark and DuckDB's grouped-aggregation tests it often beats dplyr and pandas. Data engineers who process tens of gigabytes on a single node tend to prefer it.

dtplyr (since 2019) is a bridge package. You write dplyr-style code, and it translates internally into data.table operations. The "dplyr readability plus data.table speed" compromise.

In the R ecosystem the Tidyverse versus data.table split sometimes feels like a religious war. Both sides are actively maintained, and users either pick one camp or use dtplyr to bridge.


Chapter 5 · tidymodels — a unified modeling interface for R

tidymodels (tidymodels.org) is an R modeling metapackage led by Max Kuhn (former author of caret, now at Posit). It is the successor to caret (2007) and is redesigned to fit the Tidyverse philosophy.

  • parsnip — Unified model interface. Calls backends like glm, ranger, xgboost, lightgbm, keras, and brms with the same function signature.
  • recipes — Preprocessing pipeline. Chains normalization, dummy encoding, missing handling, and polynomial transforms.
  • rsample — Cross-validation, bootstrap, and time-series splits.
  • yardstick — Evaluation metrics (AUC, RMSE, LogLoss, and many more).
  • workflows — Wraps preprocessing, model, and post-processing into one object.
  • tune — Hyperparameter tuning (grid, random, Bayesian).
  • dials — Hyperparameter-space definitions.

It feels like Python's scikit-learn was brought into R. Compared to caret it is more modern and slots naturally into the Tidyverse.


Chapter 6 · CRAN · Bioconductor · R-universe — three package repositories

CRAN (Comprehensive R Archive Network) has been the official R package repository since 1997. It is hosted by the Vienna University of Economics and Business in Austria. About 22,000 packages. Strict code review and regression tests must pass before a package is admitted, and the same applies to updates.

Bioconductor (bioconductor.org) is a life-sciences-only R repository that started in 2001. About 2,300 packages. Tools for sequencing, RNA-seq, single-cell, and imaging live here. Six-month release cycle.

R-universe (r-universe.dev) is the next-generation repository run by rOpenSci. It builds R packages directly from GitHub repositories. Faster updates than CRAN and developer-friendly policies are its strengths. Adoption has grown quickly between 2024 and 2026.

In enterprise environments Posit Package Manager mirrors all three (CRAN, Bioconductor, R-universe) and layers security review on top, providing an internal repository.


Chapter 7 · renv · Quarto — reproducible R environments

renv (rstudio.github.io/renv) is the R virtual-environment tool built by Posit. It plays the same role as Python's venv or conda. It locks package versions per project (renv.lock) and lets renv::restore() recreate the same environment.

It replaces the older packrat, and since 2020 it has effectively become the standard for R reproducibility. Clinical-trial statistical code, academic-paper reproduction bundles, and government statistical reports all ship with an renv.lock alongside.

Quarto (quarto.org) is the next-generation publishing system Posit released in 2022. It is the successor to R Markdown (rmarkdown). The key differences are these.

  • Multi-language — R, Python, Julia, and Observable JavaScript can be mixed in one document.
  • Multi-output — One source produces HTML, PDF, Word, ePub, revealjs slides, websites, and books.
  • Jupyter compatibility.ipynb and .qmd convert freely.
  • Academic publishing — Official templates for journals like Nature and JAMA started arriving in 2024-2026.

Quarto is broader than R Markdown in every direction. For new projects Quarto is the default choice.


Chapter 8 · Shiny · plumber — web apps and APIs built in R

Shiny (shiny.posit.co) is the R web framework released by RStudio in 2012. It builds interactive dashboards in pure R. It is widely used for internal dashboards, clinical-trial dashboards, and government statistical visualizations.

  • Shinydashboard — Dashboard layout template.
  • shinyWidgets, shinyjs, DT — UI extension packages.
  • Shiny for Python (since 2022) — Extends the same model into Python. Led by Posit.
  • Posit Connect — Hosts Shiny apps with authentication and scheduling.

plumber (www.rplumber.io) is a package that turns annotated R functions into a REST API. It is roughly the R equivalent of Python's FastAPI. The standard pattern is: train a model in R, expose it as an API with plumber.


Chapter 9 · Stan — the industry standard NUTS sampler

Stan (mc-stan.org) is the Bayesian probabilistic-programming language that began in 2012 at Andrew Gelman's lab at Columbia University. Core developers include Bob Carpenter, Matt Hoffman, and Daniel Lee, among others.

The two foundational contributions of Stan are these.

  • NUTS (No-U-Turn Sampler) — An automatic-tuning Hamiltonian Monte Carlo that effectively became the standard for Bayesian inference.
  • Stan language — A domain-specific language (DSL) for describing models that compiles down to C++. Supports CPU and GPU backends.

Stan itself is a C++ interpreter, and users normally interact with it through one of these interfaces.

  • CmdStan — Command-line interface.
  • CmdStanR (R) and CmdStanPy (Python) — Modern wrappers around CmdStan. Recommended in 2026.
  • RStan (R) and PyStan (Python) — Older interfaces. Compilation-dependency issues have pushed the community toward CmdStan-family wrappers.

Stan is the most-cited Bayesian tool in academic papers. It is the standard in clinical trials, epidemiology, physics, and astronomical observation.


Chapter 10 · brms · rstanarm — Stan wrappers for R users

brms (paul-buerkner.github.io/brms) is an R package built by Paul-Christian Bürkner (Aalto University, Germany) in 2017. You write Bayesian models in R formula syntax, and brms generates the Stan code and runs it for you.

For example, bf(y ~ x1 + x2 + (1|group)) turns into a multilevel-regression Stan model behind the scenes. Linear, logistic, Poisson, multinomial, survival, time-series, GAM, and multilevel models are all covered.

rstanarm (mc-stan.org/rstanarm) is an R package built directly by the Stan team. It is similar to brms but calls pre-compiled Stan models so users do not have to compile anything themselves. Faster to start, but a narrower set of models than brms.

The choice criterion is this. brms is more expressive but requires you to wait through compilation. rstanarm is fast but locked to its pre-defined models. For an R user starting Bayesian work, brms is the standard recommendation.


Chapter 11 · Pyro · NumPyro — Python Bayesian from Uber

Pyro (pyro.ai) is a Python and PyTorch-based probabilistic-programming library released by Uber AI Labs (now under the Linux Foundation) in 2017. The core developers are Eli Bingham and Noah Goodman (Stanford). It is strong on variational inference (SVI), MCMC, and Bayesian methods combined with neural networks.

NumPyro (num.pyro.ai) is a JAX-backend variant of Pyro built by the same team. It replaces PyTorch's dynamic graph with JAX's functional transforms (jit, vmap, pmap), which made the NUTS sampler much faster. In 2026 NumPyro's NUTS on a GPU often beats Stan in wall time.

Together with PyMC 5 it splits the Python Bayesian camp. PyMC sits closer to general academia and industry, while Pyro and NumPyro hold the edge in deep-learning-coupled Bayesian work and GPU acceleration.


Chapter 12 · PyMC 5 · TensorFlow Probability · Turing.jl — other Bayesian choices

PyMC (www.pymc.io) is the Python Bayesian library Christopher Fonnesbeck started back in 2003. It moved from PyMC3 (Theano backend) to PyMC 4 in 2022 and PyMC 5 in 2023. The library forked Theano into its own backend named PyTensor while also supporting NumPyro, JAX, and Numba backends.

TensorFlow Probability (TFP) (www.tensorflow.org/probability) is the probabilistic modeling library Google released in 2018. It layers distributions, MCMC, and variational inference on top of TensorFlow. Academic adoption is narrower than Stan, PyMC, or Pyro, but inside Google it is the standard.

Edward (2016-2018) and Edward2 (since 2018) are earlier probabilistic-programming libraries built by Dustin Tran (now at Google) and others. They were absorbed into TensorFlow Probability.

Turing.jl (turinglang.org) is a Julia Bayesian library started in 2018 by Hong Ge at the University of Cambridge. It leverages Julia's multiple dispatch to let users freely write custom distributions. Julia is growing in academia, and Turing's adoption is growing with it.

Soss.jl and Gen (MIT) — other Julia-side probabilistic-programming tools. Gen, led by Vikash Mansinghka at MIT, is strong at meta-modeling (models of models).


Chapter 13 · JAX — the functional numerical-computing base from Google

JAX (jax.readthedocs.io) is the Python numerical-computing library Google Research released in 2018. It follows the NumPy API while automatically providing these four capabilities.

  • Automatic differentiation (successor to autograd) — grad, jacobian, hessian functions.
  • JIT compilation — XLA (Accelerated Linear Algebra) gives GPU and TPU acceleration.
  • Vectorizationvmap adds a batch dimension automatically.
  • Parallelizationpmap distributes across multiple GPUs or TPUs.

The key distinction is that JAX is functional. JAX functions avoid side effects and do not carry state (like PyTorch's .grad attribute). The constraint is uncomfortable at first but simplifies code once you internalize it.

Libraries like NumPyro, Flax, Optax, and Equinox have stacked on top of JAX, and by 2026 the JAX stack is the standard accelerated scientific-computing platform. Google DeepMind's Alpha series is written in JAX.


Chapter 14 · Flax · Optax · Equinox — JAX neural-network libraries

JAX itself is not a neural-network library. On top of it these libraries stack.

  • Flax (flax.readthedocs.io) — Neural-network library built by Google. Module abstractions that fit the functional style. Adopted as the standard by Google DeepMind.
  • Optax (optax.readthedocs.io) — Optimization library built by Google DeepMind. Expresses optimizers like Adam, AdamW, SGD, Lion, and Adafactor via functional composition.
  • Equinox (docs.kidger.site/equinox) — Neural-network library built by Patrick Kidger. A PyTree-based class model that is more concise than Flax.
  • Haiku (2019-2024) — Another neural-network library built by Google DeepMind. Gradually being merged into Flax after 2024.
  • RLax — Function bundle for reinforcement learning.
  • Distrax — Probability-distribution library (a functional alternative to TFP).
  • Chex — Testing and validation utilities.

For new JAX projects in 2026 Flax (Google canon) or Equinox (concise) is the standard pick.


Chapter 15 · Polars 1.x — the rise of Rust-backed dataframes

Polars (pola.rs) is a Rust-based dataframe library that Ritchie Vink (Netherlands) started in 2020. Polars 1.0 shipped in 2024 and stabilized into the 1.x line in 2026. A commercial offering called Polars Cloud runs alongside.

Three features shake up the analytics camp.

  • Rust backend — Faster than the pandas NumPy/Python combo. Multi-threaded by default.
  • Lazy evaluationscan_csv and scan_parquet build the query first and run it all at once. It behaves like a SQL optimizer.
  • Apache Arrow format — Dataframes are in Arrow columnar memory. Exchange data with DuckDB and PyArrow without memory copies.

More data engineers move from pandas to Polars every quarter. The gap widens when processing several to tens of gigabytes on a single node. The interface differs from pandas, so a learning curve applies.


Chapter 16 · Pandas 2.x · PyArrow · Modin · Dask · Ibis — the full Python dataframe landscape

Pandas (pandas.pydata.org) is the standard Python dataframe library that Wes McKinney started in 2008. Pandas 2.0 in 2023 introduced official Arrow-backend support, narrowing the performance gap. Pandas 2.2 to 2.3 is actively maintained in 2026.

PyArrow (arrow.apache.org/docs/python) is the Python binding for Apache Arrow. It supplies the columnar memory format, Parquet I/O, and Flight RPC. Pandas, Polars, and DuckDB exchange data through PyArrow memory.

Modin (modin.readthedocs.io) is 100% pandas-API compatible and runs on Ray or Dask underneath. One line of code (the import) is enough to scale across many cores or many nodes.

Dask (www.dask.org) is the Python distributed-computing library Matthew Rocklin started in 2014. It splits large data into chunks and runs NumPy, Pandas, and Scikit-learn operations across the cluster.

Ibis (ibis-project.org) gives an abstract dataframe interface that looks like SQL, so the same code runs on DuckDB, BigQuery, Snowflake, PostgreSQL, Pandas, or Polars. Wes McKinney drives the project.

DuckDB (duckdb.org) is an embedded OLAP SQL engine. Think of it as SQLite for analytics. It works alongside Polars and Pandas with memory sharing instead of copies.


Chapter 17 · Marimo · Jupyter — the reactive notebook challenge

Jupyter (jupyter.org) is the notebook tool Fernando Perez and others forked from IPython in 2014. JupyterLab 4 is the stable version in 2026. It supports many kernels including R, Python, Julia, and Scala.

Jupyter's weakness is that state depends on cell-execution order. The same notebook can yield different results depending on whether you run it top-to-bottom or in some arbitrary order. A reproducibility crisis.

Marimo (marimo.io) is a reactive Python notebook started in 2023 by Akshay Agrawal and Myles Scolnick (formerly at Stanford). It tracks data dependencies between cells automatically — change one cell and every dependent cell reruns. Similar to Excel's automatic recalculation.

Marimo notebooks are also saved as .py files. Git diffs are human-readable, and IDEs can treat them as normal Python. Adoption has grown quickly in the data-science community between 2024 and 2026.

Observable Notebooks (observablehq.com) is the JavaScript reactive notebook built by Mike Bostock, the creator of D3.js. Marimo carries the same model into Python that Observable first pioneered in JavaScript.

Hex (hex.tech) and Deepnote (deepnote.com) — managed Jupyter hosting. Their strengths are collaboration and data connectors.


Chapter 18 · ggplot2 · matplotlib · seaborn · Plotly — the core visualization tools

ggplot2 (ggplot2.tidyverse.org) is the R implementation of Leland Wilkinson's "Grammar of Graphics," authored by Hadley Wickham since 2005. It is the standard visualization library for R. It is the statistical graphics you see most often in academic publications.

The ggplot2 extension ecosystem is rich.

  • gghighlight — Group highlighting.
  • patchwork — Multi-panel composition.
  • ggridges — Ridge (joyplot) charts.
  • gganimate — Animation.
  • ggrepel — Label-collision avoidance.
  • ggdist — Distribution representation.
  • ggtext — Markdown and HTML text.

matplotlib (matplotlib.org) is the Python visualization standard that John Hunter started in 2003. It is the Python side of academic publications.

seaborn (seaborn.pydata.org) is a statistical-visualization wrapper over matplotlib built by Michael Waskom. The abstraction is similar to ggplot2.

Plotly Express (plotly.com/python/plotly-express) is interactive visualization. The same code produces static, interactive, and web-embeddable output.

Bokeh (bokeh.org), Altair (altair-viz.github.io), Vega-Lite (vega.github.io/vega-lite), Apache ECharts (echarts.apache.org), and D3.js (d3js.org) — alternative options for interactive web visualization. Altair is the Python library that calls Vega-Lite.


Chapter 19 · scikit-learn · statsmodels · mlr3 — statistical-learning packages

scikit-learn (scikit-learn.org) is the Python machine-learning standard that started at INRIA (France) in 2007. David Cournapeau's GSoC project was the origin. Versions 1.5 to 1.6 are actively maintained in 2026. Regression, classification, clustering, dimensionality reduction, preprocessing, and evaluation are all in one package.

statsmodels (www.statsmodels.org) is a Python library that leans harder on statistical tests and regression diagnostics than scikit-learn does. OLS, GLM, time series (ARIMA, VAR), survival analysis, and mixed-effects models cover the R-style statistical-modeling set.

mlr3 (mlr3.mlr-org.com) is an R machine-learning meta-framework built by Bernd Bischl's group at LMU Munich, Germany. It is the successor to caret (2007) and mlr (2013). Built on R6 object orientation, it follows a design philosophy different from tidymodels.

caret (topepo.github.io/caret) is the first-generation R machine-learning meta-framework by Max Kuhn. It is still maintained in 2026, but tidymodels is recommended for new projects.

H2O.ai (h2o.ai) is the AutoML platform built by the company of the same name. Callable from R, Python, and Java. Strong enterprise adoption.


Chapter 20 · Causal inference — the 2024-2026 surge

DoWhy (www.pywhy.org/dowhy) is the Python causal-inference library Microsoft Research released in 2018. It puts Pearl's do-calculus into code. In 2023 it moved under the PyWhy foundation.

EconML (www.microsoft.com/en-us/research/project/econml) — A causal machine-learning library from Microsoft. It specializes in treatment-effect estimation. It includes Double ML, Causal Forest, and Meta-Learner.

CausalML (causalml.readthedocs.io) is a similar causal-ML library from Uber. Strong on uplift modeling and marketing-campaign evaluation.

CausalImpact (google.github.io/CausalImpact) is the R package from Google. It uses Bayesian structural time series to estimate the causal effect of marketing or policy interventions.

DAGitty (www.dagitty.net) is a web tool for drawing and analyzing causal diagrams. There is also an R package.

lavaan (lavaan.ugent.be), sem (R), and semopy (Python) — packages for structural equation modeling (SEM). The standard in psychology, sociology, and education research.

Causal inference is the fastest-growing area in statistical computing between 2024 and 2026. Conferences, workshops, and book publications have all grown together.


Chapter 21 · MCMC diagnostics and visualization — the post-fit toolkit

In Bayesian inference, what you do after sampling matters as much as the model. Some core tools.

  • bayesplot (R, mc-stan.org/bayesplot) — Diagnostic-graphics package built by the Stan team. Uses ggplot2. Trace plots, R-hat, ESS, posterior predictive checks.
  • posterior (R) — Standard object for posterior draws. brms, rstan, and cmdstanr all return results in this object.
  • tidybayes (R, mjskay.github.io/tidybayes) — Package built by Matthew Kay. Reshapes posterior draws into tidy data, working naturally with ggplot2.
  • shinystan (R) — Views Stan results as an interactive Shiny dashboard.
  • ArviZ (Python, python.arviz.org) — The Python standard for Bayesian diagnostics. Handles results from PyMC, NumPyro, Pyro, and CmdStanPy.

A run is considered converged when R-hat is below 1.01, ESS is high enough, and trace plots look like fuzzy caterpillars. Skip the check and the results are not trustworthy.


Chapter 22 · Survey and sampling packages

Probability-sample data (weights, stratification, clustering) must be handled differently from generic regression. These packages are the standard.

  • survey (R, r-survey.r-forge.r-project.org) — R package by Thomas Lumley (New Zealand). Handles weighting, stratification, and clustering since the 1990s. Effectively the standard for NHANES, PISA, and KOSIS analysis.
  • srvyr (R) — A dplyr-flavored wrapper around the survey package. Survey analytics in the tidy style.
  • samplingbook and PracTools — Other R survey-sampling packages from book accompaniments.
  • stratasamp — Stratified-sample design.

Public datasets:

  • PISA (OECD Programme for International Student Assessment).
  • PSID (Panel Study of Income Dynamics, University of Michigan).
  • NHANES (US National Health and Nutrition Examination Survey).
  • KOSIS (Korea National Statistical Office portal).
  • e-Stat (Japan's government statistics portal).

Survey statistics is central to government, international organizations, and epidemiology, and the R survey package effectively holds a monopoly.


Chapter 23 · Korea's statistical-computing community

Korea's R and statistics community has grown quickly since the mid-2010s.

  • R-Korea — Korean R users group. Active on Facebook and Discord.
  • Seoul R Meetup — Regular in-person meetups for talks, tutorials, and networking.
  • R Korea User Conference — Annual academic conference.
  • Pseudo Lab Korea (pseudo-lab.com) — Korea's self-directed machine-learning study community. Python-heavy but with an R chapter.
  • K-stat — Korean Statistical Society. Journals and conferences.
  • KSA (Korea Statistics Authority) — Government agency for statistics education and certification exams.
  • Korea Data Analysts Association (KDAA) — Industry-side data professional body.

In universities the statistics departments of Seoul National University, Yonsei, Korea University, and KAIST teach R and Python together. R remains the standard in clinical trials, epidemiology, and financial statistics. Industry data roles at large Korean enterprises lean toward Python, but medical, pharma, and government statistics keep R in the lead.


Chapter 24 · Japan's statistical-computing community

Japan's R and statistics community has a longer history than Korea's.

  • Tokyo.R — Tokyo R users group. Founded in 2010 with more than 100 monthly meetups cumulatively.
  • R-jp — Japanese R users mailing list.
  • Japan.R — Annual nationwide R conference.
  • Institute of Statistical Mathematics (ISM) — Government statistics research institute founded in 1944. Hub of Bayesian and time-series research.
  • Python and R Statistics Study Group — Multiple study groups that cover both Python and R.
  • DataScience.tokyo — Data-science community events.
  • JAGS-Japan and Stan Study Group — Study circles dedicated to Bayesian tools.

Academic Bayesian research is strong in Japan, with ISM at the center driving work on time series, spatial statistics, and structural equation modeling. The statistics and economics departments at Tokyo, Kyoto, Keio, Waseda, and Osaka universities use R and Stan as standard tools.

On the industry side R&D teams at NTT Data, Recruit, ZOZO, and DeNA use R and Stan as daily tools.


Chapter 25 · Closing — the 2026 statistical-computing map at a glance

To close, here is the 2026 map.

  • R 4.5 and Posit are the standard for reproducible research. Clinical trials, financial statistics, government statistics, and academic publications all converge on R.
  • The Tidyverse and data.table split R data manipulation. dtplyr connects the two.
  • Stan is the industry standard for NUTS sampling, while brms and rstanarm make Bayesian inference routine for R users.
  • Pyro, NumPyro, and PyMC 5 split the Python Bayesian camp. NumPyro is attractive for GPU acceleration on JAX.
  • The JAX stack (Flax, Optax, Equinox) has become the new standard for accelerated scientific computing.
  • Polars 1.x shakes up the dataframe market with a Rust backend. Pandas 2.x answers with an Arrow backend.
  • Marimo solves Jupyter's weakness with a reactive notebook. Quarto standardizes unified publishing across R, Python, and Julia.
  • Causal inference (DoWhy, EconML, CausalML, CausalImpact) is the fastest-growing area between 2024 and 2026.
  • The Korean and Japanese statistics communities both teach R and Python together. Academia leans toward R and Stan, while industry leans toward Python.

By 2027-2028 large language models will be deep into statistical-code generation and result interpretation. But the responsibility for model validation, reproduction, and causal inference still sits with the human analyst. The tools get faster, but statistical thinking matters more.


References