Skip to content

Java vs Rust Benchmark — 10M Transactions

This page compares Spring Batch (Java 25 / Spring Boot 4.x) and Spring Batch RS (Rust) on a realistic ETL pipeline: reading 10 million financial transactions from CSV, storing them in PostgreSQL, exporting to XML, then re-importing from XML into a second PostgreSQL table.

Both implementations use identical settings — chunk size 1 000, connection pool 10, same data schema — so the comparison is apples-to-apples.


| Parameter | Value | |-----------|-------| | Machine | Apple Silicon, macOS | | PostgreSQL | 17-alpine (Docker container, same machine) | | Java | OpenJDK 25, Spring Boot 4.0.4, Spring Batch 6.0.3 | | JVM flags | -Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch | | Virtual threads | Enabled (spring.threads.virtual.enabled=true) | | Java XML | StAX (XMLStreamWriter / XMLStreamReader) — no JAXB | | Rust | stable, --release + RUSTFLAGS="-C target-cpu=native" | | Chunk size | 1 000 (both) | | Pool size | 10 connections (both) | | DB volume | Fresh Docker volume per run |


Generate 10M rows → transactions.csv
▼ CsvItemReader / FlatFileItemReader
TransactionProcessor
(USD/GBP → EUR conversion, CANCELLED → FAILED)
▼ PostgresItemWriter / JdbcBatchItemWriter (bulk insert, chunk=1000)
PostgreSQL: table transactions
▼ RdbcItemReader / JdbcPagingItemReader (paginated, page_size=1000)
▼ XmlItemWriter / XMLStreamWriter
transactions_export.xml
▼ XmlItemReader / XMLStreamReader (chunk=1000)
▼ PostgresItemWriter / JdbcBatchItemWriter (bulk insert, chunk=1000)
PostgreSQL: table transactions_import

Total wall-clock time includes CSV generation.

| Field | Type | Example | |-------|------|---------| | transaction_id | string | TXN-0000000001 | | amount | float | 1234.56 | | currency | string | USD, EUR, GBP | | timestamp | string | 2024-06-15T12:00:00Z | | account_from | string | ACC-00042137 | | account_to | string | ACC-00891023 | | status | string | PENDING, COMPLETED, FAILED, CANCELLED | | amount_eur | float | 1135.80 (added by processor) |


// Keyset pagination: WHERE transaction_id > :last ORDER BY transaction_id LIMIT 1000
// O(log n) per page — avoids the O(n²) cost of LIMIT/OFFSET on 10M rows.
let reader = RdbcItemReaderBuilder::<Transaction>::new()
.postgres(pool.clone())
.query(
"SELECT transaction_id, amount, currency, timestamp, \
account_from, account_to, status, amount_eur \
FROM transactions",
)
.with_page_size(1_000)
.with_keyset("transaction_id", |t: &Transaction| t.transaction_id.clone())
.build_postgres();
let writer = XmlItemWriterBuilder::<Transaction>::new()
.root_tag("transactions")
.item_tag("transaction")
.from_path(xml_path)?;

Step 3 — XML → PostgreSQL (transactions_import)

Section titled “Step 3 — XML → PostgreSQL (transactions_import)”
let reader = XmlItemReaderBuilder::<Transaction>::new()
.tag("transaction")
.from_path(xml_path)?;
let writer = RdbcItemWriterBuilder::<Transaction>::new()
.postgres(pool)
.table("transactions_import")
.column("transaction_id", |t: &Transaction| t.transaction_id.clone().into())
.column("amount", |t: &Transaction| t.amount.into())
.column("currency", |t: &Transaction| t.currency.clone().into())
.column("timestamp", |t: &Transaction| t.timestamp.clone().into())
.column("account_from", |t: &Transaction| t.account_from.clone().into())
.column("account_to", |t: &Transaction| t.account_to.clone().into())
.column("status", |t: &Transaction| t.status.clone().into())
.column("amount_eur", |t: &Transaction| t.amount_eur.into())
.build_postgres();

Measured on the reference environment described above. Run on a fresh Docker volume each time. Total wall-clock time includes CSV generation.

| Metric | Spring Batch RS (Rust) | Spring Batch (Java) | Rust advantage | |--------|------------------------|---------------------|----------------| | Total pipeline time | 114.1 s | 199.7 s | 1.75× faster | | Generate CSV | 1.8 s | 6.7 s | 3.7× | | Step 1 — CSV → PostgreSQL | 38.6 s | 83.4 s | 2.2× | | Step 2 — PostgreSQL → XML | 20.8 s | 32.5 s | 1.6× | | Step 3 — XML → PostgreSQL | 53.0 s | 77.1 s | 1.5× |

| Step | Rust | Java | Ratio | |------|------|------|-------| | Step 1 — CSV → PostgreSQL | 259 095 | 119 964 | 2.2× | | Step 2 — PostgreSQL → XML | 481 773 | 307 560 | 1.6× | | Step 3 — XML → PostgreSQL | 188 594 | 129 671 | 1.5× | | Average (full pipeline) | 87 610 | 50 071 | 1.75× |


1. CSV generation (3.7× gap). Rust’s generator uses a simple linear-congruential RNG and writes directly to a BufWriter<File>. Java’s DataGenerator does the same, but JVM startup, JIT warm-up, and UTF-16 string handling add measurable overhead on a 10M-row write.

2. CSV → PostgreSQL (Step 1 — 2.2× gap). Rust uses zero-copy CSV parsing (no intermediate string allocation per field) and a single sqlx bulk-insert query per chunk. Java’s FlatFileItemReader allocates a String[] and a Transaction bean per row via bean-wrapper reflection.

3. PostgreSQL → XML (Step 2 — 1.6× gap). Both now use streaming byte-level XML APIs (no reflection). Rust’s advantage comes from lower memory pressure (no GC, no JVM metadata overhead) and tighter CPU cache usage across a 10M-row write.

4. XML → PostgreSQL (Step 3 — 1.5× gap). Both use StAX pull-parsing. Rust’s advantage is again GC-free memory and the absence of Spring AOP / transaction proxies on the write path.

5. No garbage collection. Rust uses RAII — memory is freed the instant a chunk goes out of scope, with zero pauses. Java’s G1GC introduces stop-the-world pauses that accumulate over a 10M-record run.

6. Keyset pagination (both Step 2 and Step 3). Spring Batch RS uses WHERE cursor_col > :last ORDER BY cursor_col LIMIT n — O(log n) per page. Java’s JdbcPagingItemReader with sortKeys uses the same strategy.

The previous iteration of this benchmark showed a 13.2× gap on Step 2 (XML export). That gap was not a Java vs Rust problem — it was entirely due to JAXB’s reflection-based marshalling. Switching to XMLStreamWriter brought Java to within 1.6× of Rust on that step.

If you are running Java batch jobs with JAXB-based XML output, switching to StAX is likely the highest-ROI optimisation available.

  • Your team is Java-first and migration cost outweighs performance gains
  • You need Spring ecosystem integrations (Spring Data, Spring Cloud Task, Spring Integration)
  • Your batch jobs run infrequently and throughput is not the bottleneck
  • You require rich operational features: JobRepository, JobExplorer, REST API control
  • Throughput and latency are business requirements (financial settlement, real-time ETL)
  • Memory is constrained (embedded systems, small containers)
  • GC pauses would cause SLA violations
  • You want a single statically-linked binary with no runtime dependency
  • Cold-start time matters (serverless, frequent scheduling)

Terminal window
# Start PostgreSQL with Docker Compose (from sbrs-java-bench/)
cd sbrs-java-bench
docker compose up -d
# For reproducible results, restart with a fresh volume between runs:
docker compose down -v && docker compose up -d
Terminal window
cd sbrs-lib
RUSTFLAGS="-C target-cpu=native" \
cargo run --release --example benchmark_csv_postgres_xml \
--features csv,xml,rdbc-postgres
Terminal window
cd sbrs-java-bench
# Build fat JAR once
mvn package -q -DskipTests
# Run
java -Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch \
-jar target/spring-batch-benchmark-1.0.0.jar