Java vs Rust Benchmark — 10M Transactions
This page compares Spring Batch (Java 25 / Spring Boot 4.x) and Spring Batch RS (Rust) on a realistic ETL pipeline: reading 10 million financial transactions from CSV, storing them in PostgreSQL, exporting to XML, then re-importing from XML into a second PostgreSQL table.
Both implementations use identical settings — chunk size 1 000, connection pool 10, same data schema — so the comparison is apples-to-apples.
Test Environment
Section titled “Test Environment”| Parameter | Value |
|-----------|-------|
| Machine | Apple Silicon, macOS |
| PostgreSQL | 17-alpine (Docker container, same machine) |
| Java | OpenJDK 25, Spring Boot 4.0.4, Spring Batch 6.0.3 |
| JVM flags | -Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch |
| Virtual threads | Enabled (spring.threads.virtual.enabled=true) |
| Java XML | StAX (XMLStreamWriter / XMLStreamReader) — no JAXB |
| Rust | stable, --release + RUSTFLAGS="-C target-cpu=native" |
| Chunk size | 1 000 (both) |
| Pool size | 10 connections (both) |
| DB volume | Fresh Docker volume per run |
Pipeline
Section titled “Pipeline”Generate 10M rows → transactions.csv │ ▼ CsvItemReader / FlatFileItemReader TransactionProcessor (USD/GBP → EUR conversion, CANCELLED → FAILED) │ ▼ PostgresItemWriter / JdbcBatchItemWriter (bulk insert, chunk=1000) PostgreSQL: table transactions │ ▼ RdbcItemReader / JdbcPagingItemReader (paginated, page_size=1000) │ ▼ XmlItemWriter / XMLStreamWriter transactions_export.xml │ ▼ XmlItemReader / XMLStreamReader (chunk=1000) │ ▼ PostgresItemWriter / JdbcBatchItemWriter (bulk insert, chunk=1000) PostgreSQL: table transactions_importTotal wall-clock time includes CSV generation.
Transaction record
Section titled “Transaction record”| Field | Type | Example |
|-------|------|---------|
| transaction_id | string | TXN-0000000001 |
| amount | float | 1234.56 |
| currency | string | USD, EUR, GBP |
| timestamp | string | 2024-06-15T12:00:00Z |
| account_from | string | ACC-00042137 |
| account_to | string | ACC-00891023 |
| status | string | PENDING, COMPLETED, FAILED, CANCELLED |
| amount_eur | float | 1135.80 (added by processor) |
Code Side by Side
Section titled “Code Side by Side”Step 2 — PostgreSQL → XML
Section titled “Step 2 — PostgreSQL → XML”// Keyset pagination: WHERE transaction_id > :last ORDER BY transaction_id LIMIT 1000// O(log n) per page — avoids the O(n²) cost of LIMIT/OFFSET on 10M rows.let reader = RdbcItemReaderBuilder::<Transaction>::new() .postgres(pool.clone()) .query( "SELECT transaction_id, amount, currency, timestamp, \ account_from, account_to, status, amount_eur \ FROM transactions", ) .with_page_size(1_000) .with_keyset("transaction_id", |t: &Transaction| t.transaction_id.clone()) .build_postgres();
let writer = XmlItemWriterBuilder::<Transaction>::new() .root_tag("transactions") .item_tag("transaction") .from_path(xml_path)?;// JdbcPagingItemReader uses keyset-based pagination via sortKeys@Beanpublic JdbcPagingItemReader<Transaction> postgresReader(DataSource ds) { return new JdbcPagingItemReaderBuilder<Transaction>() .name("postgresTransactionReader") .dataSource(ds) .selectClause("SELECT transaction_id,amount,currency,timestamp," + "account_from,account_to,status,amount_eur") .fromClause("FROM transactions") .sortKeys(Map.of("transaction_id", Order.ASCENDING)) .rowMapper(/* maps columns → Transaction */) .pageSize(1_000).build();}
// StAX writer — no JAXB reflection@Beanpublic TransactionXmlWriter xmlWriter() { return new TransactionXmlWriter(xmlPath); // wraps XMLStreamWriter}Step 3 — XML → PostgreSQL (transactions_import)
Section titled “Step 3 — XML → PostgreSQL (transactions_import)”let reader = XmlItemReaderBuilder::<Transaction>::new() .tag("transaction") .from_path(xml_path)?;
let writer = RdbcItemWriterBuilder::<Transaction>::new() .postgres(pool) .table("transactions_import") .column("transaction_id", |t: &Transaction| t.transaction_id.clone().into()) .column("amount", |t: &Transaction| t.amount.into()) .column("currency", |t: &Transaction| t.currency.clone().into()) .column("timestamp", |t: &Transaction| t.timestamp.clone().into()) .column("account_from", |t: &Transaction| t.account_from.clone().into()) .column("account_to", |t: &Transaction| t.account_to.clone().into()) .column("status", |t: &Transaction| t.status.clone().into()) .column("amount_eur", |t: &Transaction| t.amount_eur.into()) .build_postgres();// StAX reader — no JAXB reflection@Beanpublic TransactionXmlReader xmlReader() { return new TransactionXmlReader(xmlPath); // wraps XMLStreamReader}
@Beanpublic JdbcBatchItemWriter<Transaction> importWriter(DataSource ds) { return new JdbcBatchItemWriterBuilder<Transaction>() .dataSource(ds) .sql("INSERT INTO transactions_import (...) VALUES (:transactionId, ...)") .beanMapped().build();}Results
Section titled “Results”Measured on the reference environment described above. Run on a fresh Docker volume each time. Total wall-clock time includes CSV generation.
Overall performance
Section titled “Overall performance”| Metric | Spring Batch RS (Rust) | Spring Batch (Java) | Rust advantage | |--------|------------------------|---------------------|----------------| | Total pipeline time | 114.1 s | 199.7 s | 1.75× faster | | Generate CSV | 1.8 s | 6.7 s | 3.7× | | Step 1 — CSV → PostgreSQL | 38.6 s | 83.4 s | 2.2× | | Step 2 — PostgreSQL → XML | 20.8 s | 32.5 s | 1.6× | | Step 3 — XML → PostgreSQL | 53.0 s | 77.1 s | 1.5× |
Throughput (records/sec)
Section titled “Throughput (records/sec)”| Step | Rust | Java | Ratio | |------|------|------|-------| | Step 1 — CSV → PostgreSQL | 259 095 | 119 964 | 2.2× | | Step 2 — PostgreSQL → XML | 481 773 | 307 560 | 1.6× | | Step 3 — XML → PostgreSQL | 188 594 | 129 671 | 1.5× | | Average (full pipeline) | 87 610 | 50 071 | 1.75× |
Analysis
Section titled “Analysis”Why is Rust ~1.75× faster overall?
Section titled “Why is Rust ~1.75× faster overall?”1. CSV generation (3.7× gap).
Rust’s generator uses a simple linear-congruential RNG and writes directly to a
BufWriter<File>. Java’s DataGenerator does the same, but JVM startup, JIT warm-up,
and UTF-16 string handling add measurable overhead on a 10M-row write.
2. CSV → PostgreSQL (Step 1 — 2.2× gap).
Rust uses zero-copy CSV parsing (no intermediate string allocation per field) and
a single sqlx bulk-insert query per chunk. Java’s FlatFileItemReader allocates
a String[] and a Transaction bean per row via bean-wrapper reflection.
3. PostgreSQL → XML (Step 2 — 1.6× gap). Both now use streaming byte-level XML APIs (no reflection). Rust’s advantage comes from lower memory pressure (no GC, no JVM metadata overhead) and tighter CPU cache usage across a 10M-row write.
4. XML → PostgreSQL (Step 3 — 1.5× gap). Both use StAX pull-parsing. Rust’s advantage is again GC-free memory and the absence of Spring AOP / transaction proxies on the write path.
5. No garbage collection. Rust uses RAII — memory is freed the instant a chunk goes out of scope, with zero pauses. Java’s G1GC introduces stop-the-world pauses that accumulate over a 10M-record run.
6. Keyset pagination (both Step 2 and Step 3).
Spring Batch RS uses WHERE cursor_col > :last ORDER BY cursor_col LIMIT n — O(log n)
per page. Java’s JdbcPagingItemReader with sortKeys uses the same strategy.
The JAXB lesson
Section titled “The JAXB lesson”The previous iteration of this benchmark showed a 13.2× gap on Step 2 (XML export).
That gap was not a Java vs Rust problem — it was entirely due to JAXB’s reflection-based
marshalling. Switching to XMLStreamWriter brought Java to within 1.6× of Rust on that step.
If you are running Java batch jobs with JAXB-based XML output, switching to StAX is likely the highest-ROI optimisation available.
When to choose Java
Section titled “When to choose Java”- Your team is Java-first and migration cost outweighs performance gains
- You need Spring ecosystem integrations (Spring Data, Spring Cloud Task, Spring Integration)
- Your batch jobs run infrequently and throughput is not the bottleneck
- You require rich operational features:
JobRepository,JobExplorer, REST API control
When to choose Rust
Section titled “When to choose Rust”- Throughput and latency are business requirements (financial settlement, real-time ETL)
- Memory is constrained (embedded systems, small containers)
- GC pauses would cause SLA violations
- You want a single statically-linked binary with no runtime dependency
- Cold-start time matters (serverless, frequent scheduling)
How to Reproduce
Section titled “How to Reproduce”Prerequisites
Section titled “Prerequisites”# Start PostgreSQL with Docker Compose (from sbrs-java-bench/)cd sbrs-java-benchdocker compose up -d
# For reproducible results, restart with a fresh volume between runs:docker compose down -v && docker compose up -dRun the Rust benchmark
Section titled “Run the Rust benchmark”cd sbrs-lib
RUSTFLAGS="-C target-cpu=native" \cargo run --release --example benchmark_csv_postgres_xml \ --features csv,xml,rdbc-postgresRun the Java benchmark
Section titled “Run the Java benchmark”cd sbrs-java-bench
# Build fat JAR oncemvn package -q -DskipTests
# Runjava -Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch \ -jar target/spring-batch-benchmark-1.0.0.jar