Skip to content

Java vs Rust Benchmark — 10M Transactions

This page compares Spring Batch (Java 25 / Spring Boot 4.x) and Spring Batch RS (Rust) on a realistic ETL pipeline: reading 10 million financial transactions from CSV, storing them in PostgreSQL, exporting to XML, then re-importing from XML into a second PostgreSQL table.

Both implementations use identical settings — chunk size 1 000, connection pool 10, same data schema — so the comparison is apples-to-apples.


ParameterValue
MachineApple Silicon, macOS
PostgreSQL17-alpine (Docker container, same machine)
JavaOpenJDK 25, Spring Boot 4.0.4, Spring Batch 6.0.3
JVM flags-Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch
Virtual threadsEnabled (spring.threads.virtual.enabled=true)
Java XMLStAX (XMLStreamWriter / XMLStreamReader) — no JAXB
Ruststable, --release + RUSTFLAGS="-C target-cpu=native"
Chunk size1 000 (both)
Pool size10 connections (both)
DB volumeFresh Docker volume per run

Generate 10M rows → transactions.csv
▼ CsvItemReader / FlatFileItemReader
TransactionProcessor
(USD/GBP → EUR conversion, CANCELLED → FAILED)
▼ PostgresItemWriter / JdbcBatchItemWriter (bulk insert, chunk=1000)
PostgreSQL: table transactions
▼ RdbcItemReader / JdbcPagingItemReader (paginated, page_size=1000)
▼ XmlItemWriter / XMLStreamWriter
transactions_export.xml
▼ XmlItemReader / XMLStreamReader (chunk=1000)
▼ PostgresItemWriter / JdbcBatchItemWriter (bulk insert, chunk=1000)
PostgreSQL: table transactions_import

Total wall-clock time includes CSV generation.

FieldTypeExample
transaction_idstringTXN-0000000001
amountfloat1234.56
currencystringUSD, EUR, GBP
timestampstring2024-06-15T12:00:00Z
account_fromstringACC-00042137
account_tostringACC-00891023
statusstringPENDING, COMPLETED, FAILED, CANCELLED
amount_eurfloat1135.80 (added by processor)

// Keyset pagination: WHERE transaction_id > :last ORDER BY transaction_id LIMIT 1000
// O(log n) per page — avoids the O(n²) cost of LIMIT/OFFSET on 10M rows.
let reader = RdbcItemReaderBuilder::<Transaction>::new()
.postgres(pool.clone())
.query(
"SELECT transaction_id, amount, currency, timestamp, \
account_from, account_to, status, amount_eur \
FROM transactions",
)
.with_page_size(1_000)
.with_keyset("transaction_id", |t: &Transaction| t.transaction_id.clone())
.build_postgres();
let writer = XmlItemWriterBuilder::<Transaction>::new()
.root_tag("transactions")
.item_tag("transaction")
.from_path(xml_path)?;

Step 3 — XML → PostgreSQL (transactions_import)

Section titled “Step 3 — XML → PostgreSQL (transactions_import)”
let reader = XmlItemReaderBuilder::<Transaction>::new()
.tag("transaction")
.from_path(xml_path)?;
let writer = RdbcItemWriterBuilder::<Transaction>::new()
.postgres(pool)
.table("transactions_import")
.column("transaction_id", |t: &Transaction| t.transaction_id.clone().into())
.column("amount", |t: &Transaction| t.amount.into())
.column("currency", |t: &Transaction| t.currency.clone().into())
.column("timestamp", |t: &Transaction| t.timestamp.clone().into())
.column("account_from", |t: &Transaction| t.account_from.clone().into())
.column("account_to", |t: &Transaction| t.account_to.clone().into())
.column("status", |t: &Transaction| t.status.clone().into())
.column("amount_eur", |t: &Transaction| t.amount_eur.into())
.build_postgres();

Measured on the reference environment described above. Run on a fresh Docker volume each time. Total wall-clock time includes CSV generation.

MetricSpring Batch RS (Rust)Spring Batch (Java)Rust advantage
Total pipeline time114.1 s199.7 s1.75× faster
Generate CSV1.8 s6.7 s3.7×
Step 1 — CSV → PostgreSQL38.6 s83.4 s2.2×
Step 2 — PostgreSQL → XML20.8 s32.5 s1.6×
Step 3 — XML → PostgreSQL53.0 s77.1 s1.5×
StepRustJavaRatio
Step 1 — CSV → PostgreSQL259 095119 9642.2×
Step 2 — PostgreSQL → XML481 773307 5601.6×
Step 3 — XML → PostgreSQL188 594129 6711.5×
Average (full pipeline)87 61050 0711.75×

1. CSV generation (3.7× gap). Rust’s generator uses a simple linear-congruential RNG and writes directly to a BufWriter<File>. Java’s DataGenerator does the same, but JVM startup, JIT warm-up, and UTF-16 string handling add measurable overhead on a 10M-row write.

2. CSV → PostgreSQL (Step 1 — 2.2× gap). Rust uses zero-copy CSV parsing (no intermediate string allocation per field) and a single sqlx bulk-insert query per chunk. Java’s FlatFileItemReader allocates a String[] and a Transaction bean per row via bean-wrapper reflection.

3. PostgreSQL → XML (Step 2 — 1.6× gap). Both now use streaming byte-level XML APIs (no reflection). Rust’s advantage comes from lower memory pressure (no GC, no JVM metadata overhead) and tighter CPU cache usage across a 10M-row write.

4. XML → PostgreSQL (Step 3 — 1.5× gap). Both use StAX pull-parsing. Rust’s advantage is again GC-free memory and the absence of Spring AOP / transaction proxies on the write path.

5. No garbage collection. Rust uses RAII — memory is freed the instant a chunk goes out of scope, with zero pauses. Java’s G1GC introduces stop-the-world pauses that accumulate over a 10M-record run.

6. Keyset pagination (both Step 2 and Step 3). Spring Batch RS uses WHERE cursor_col > :last ORDER BY cursor_col LIMIT n — O(log n) per page. Java’s JdbcPagingItemReader with sortKeys uses the same strategy.

The previous iteration of this benchmark showed a 13.2× gap on Step 2 (XML export). That gap was not a Java vs Rust problem — it was entirely due to JAXB’s reflection-based marshalling. Switching to XMLStreamWriter brought Java to within 1.6× of Rust on that step.

If you are running Java batch jobs with JAXB-based XML output, switching to StAX is likely the highest-ROI optimisation available.

  • Your team is Java-first and migration cost outweighs performance gains
  • You need Spring ecosystem integrations (Spring Data, Spring Cloud Task, Spring Integration)
  • Your batch jobs run infrequently and throughput is not the bottleneck
  • You require rich operational features: JobRepository, JobExplorer, REST API control
  • Throughput and latency are business requirements (financial settlement, real-time ETL)
  • Memory is constrained (embedded systems, small containers)
  • GC pauses would cause SLA violations
  • You want a single statically-linked binary with no runtime dependency
  • Cold-start time matters (serverless, frequent scheduling)

Terminal window
# Start PostgreSQL with Docker Compose (from sbrs-java-bench/)
cd sbrs-java-bench
docker compose up -d
# For reproducible results, restart with a fresh volume between runs:
docker compose down -v && docker compose up -d
Terminal window
cd sbrs-lib
RUSTFLAGS="-C target-cpu=native" \
cargo run --release --example benchmark_csv_postgres_xml \
--features csv,xml,rdbc-postgres
Terminal window
cd sbrs-java-bench
# Build fat JAR once
mvn package -q -DskipTests
# Run
java -Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch \
-jar target/spring-batch-benchmark-1.0.0.jar