Java vs Rust Benchmark — 10M Transactions

This page compares Spring Batch (Java 25 / Spring Boot 4.x) and Spring Batch RS (Rust) on a realistic ETL pipeline: reading 10 million financial transactions from CSV, storing them in PostgreSQL, exporting to XML, then re-importing from XML into a second PostgreSQL table.

Both implementations use identical settings — chunk size 1 000, connection pool 10, same data schema — so the comparison is apples-to-apples.

Test Environment

| Parameter | Value | |-----------|-------| | Machine | Apple Silicon, macOS | | PostgreSQL | 17-alpine (Docker container, same machine) | | Java | OpenJDK 25, Spring Boot 4.0.4, Spring Batch 6.0.3 | | JVM flags | -Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch | | Virtual threads | Enabled (spring.threads.virtual.enabled=true) | | Java XML | StAX (XMLStreamWriter / XMLStreamReader) — no JAXB | | Rust | stable, --release + RUSTFLAGS="-C target-cpu=native" | | Chunk size | 1 000 (both) | | Pool size | 10 connections (both) | | DB volume | Fresh Docker volume per run |

Pipeline

Generate 10M rows → transactions.csv
        │
        ▼ CsvItemReader / FlatFileItemReader
  TransactionProcessor
  (USD/GBP → EUR conversion, CANCELLED → FAILED)
        │
        ▼ PostgresItemWriter / JdbcBatchItemWriter  (bulk insert, chunk=1000)
   PostgreSQL: table transactions
        │
        ▼ RdbcItemReader / JdbcPagingItemReader  (paginated, page_size=1000)
        │
        ▼ XmlItemWriter / XMLStreamWriter
  transactions_export.xml
        │
        ▼ XmlItemReader / XMLStreamReader  (chunk=1000)
        │
        ▼ PostgresItemWriter / JdbcBatchItemWriter  (bulk insert, chunk=1000)
   PostgreSQL: table transactions_import

Total wall-clock time includes CSV generation.

Transaction record

| Field | Type | Example | |-------|------|---------| | transaction_id | string | TXN-0000000001 | | amount | float | 1234.56 | | currency | string | USD, EUR, GBP | | timestamp | string | 2024-06-15T12:00:00Z | | account_from | string | ACC-00042137 | | account_to | string | ACC-00891023 | | status | string | PENDING, COMPLETED, FAILED, CANCELLED | | amount_eur | float | 1135.80 (added by processor) |

Code Side by Side

// Keyset pagination: WHERE transaction_id > :last ORDER BY transaction_id LIMIT 1000
// O(log n) per page — avoids the O(n²) cost of LIMIT/OFFSET on 10M rows.
let reader = RdbcItemReaderBuilder::<Transaction>::new()
    .postgres(pool.clone())
    .query(
        "SELECT transaction_id, amount, currency, timestamp, \
         account_from, account_to, status, amount_eur \
         FROM transactions",
    )
    .with_page_size(1_000)
    .with_keyset("transaction_id", |t: &Transaction| t.transaction_id.clone())
    .build_postgres();

let writer = XmlItemWriterBuilder::<Transaction>::new()
    .root_tag("transactions")
    .item_tag("transaction")
    .from_path(xml_path)?;

// JdbcPagingItemReader uses keyset-based pagination via sortKeys
@Bean
public JdbcPagingItemReader<Transaction> postgresReader(DataSource ds) {
    return new JdbcPagingItemReaderBuilder<Transaction>()
        .name("postgresTransactionReader")
        .dataSource(ds)
        .selectClause("SELECT transaction_id,amount,currency,timestamp," +
                      "account_from,account_to,status,amount_eur")
        .fromClause("FROM transactions")
        .sortKeys(Map.of("transaction_id", Order.ASCENDING))
        .rowMapper(/* maps columns → Transaction */)
        .pageSize(1_000).build();
}

// StAX writer — no JAXB reflection
@Bean
public TransactionXmlWriter xmlWriter() {
    return new TransactionXmlWriter(xmlPath); // wraps XMLStreamWriter
}

Step 3 — XML → PostgreSQL (transactions_import)

Rust
Java

let reader = XmlItemReaderBuilder::<Transaction>::new()
    .tag("transaction")
    .from_path(xml_path)?;

let writer = RdbcItemWriterBuilder::<Transaction>::new()
    .postgres(pool)
    .table("transactions_import")
    .column("transaction_id", |t: &Transaction| t.transaction_id.clone().into())
    .column("amount", |t: &Transaction| t.amount.into())
    .column("currency", |t: &Transaction| t.currency.clone().into())
    .column("timestamp", |t: &Transaction| t.timestamp.clone().into())
    .column("account_from", |t: &Transaction| t.account_from.clone().into())
    .column("account_to", |t: &Transaction| t.account_to.clone().into())
    .column("status", |t: &Transaction| t.status.clone().into())
    .column("amount_eur", |t: &Transaction| t.amount_eur.into())
    .build_postgres();

// StAX reader — no JAXB reflection
@Bean
public TransactionXmlReader xmlReader() {
    return new TransactionXmlReader(xmlPath); // wraps XMLStreamReader
}

@Bean
public JdbcBatchItemWriter<Transaction> importWriter(DataSource ds) {
    return new JdbcBatchItemWriterBuilder<Transaction>()
        .dataSource(ds)
        .sql("INSERT INTO transactions_import (...) VALUES (:transactionId, ...)")
        .beanMapped().build();
}

Results

Measured on the reference environment described above. Run on a fresh Docker volume each time. Total wall-clock time includes CSV generation.

Overall performance

| Metric | Spring Batch RS (Rust) | Spring Batch (Java) | Rust advantage | |--------|------------------------|---------------------|----------------| | Total pipeline time | 114.1 s | 199.7 s | 1.75× faster | | Generate CSV | 1.8 s | 6.7 s | 3.7× | | Step 1 — CSV → PostgreSQL | 38.6 s | 83.4 s | 2.2× | | Step 2 — PostgreSQL → XML | 20.8 s | 32.5 s | 1.6× | | Step 3 — XML → PostgreSQL | 53.0 s | 77.1 s | 1.5× |

Throughput (records/sec)

| Step | Rust | Java | Ratio | |------|------|------|-------| | Step 1 — CSV → PostgreSQL | 259 095 | 119 964 | 2.2× | | Step 2 — PostgreSQL → XML | 481 773 | 307 560 | 1.6× | | Step 3 — XML → PostgreSQL | 188 594 | 129 671 | 1.5× | | Average (full pipeline) | 87 610 | 50 071 | 1.75× |

Analysis

Why is Rust ~1.75× faster overall?

1. CSV generation (3.7× gap). Rust’s generator uses a simple linear-congruential RNG and writes directly to a BufWriter<File>. Java’s DataGenerator does the same, but JVM startup, JIT warm-up, and UTF-16 string handling add measurable overhead on a 10M-row write.

2. CSV → PostgreSQL (Step 1 — 2.2× gap). Rust uses zero-copy CSV parsing (no intermediate string allocation per field) and a single sqlx bulk-insert query per chunk. Java’s FlatFileItemReader allocates a String[] and a Transaction bean per row via bean-wrapper reflection.

3. PostgreSQL → XML (Step 2 — 1.6× gap). Both now use streaming byte-level XML APIs (no reflection). Rust’s advantage comes from lower memory pressure (no GC, no JVM metadata overhead) and tighter CPU cache usage across a 10M-row write.

4. XML → PostgreSQL (Step 3 — 1.5× gap). Both use StAX pull-parsing. Rust’s advantage is again GC-free memory and the absence of Spring AOP / transaction proxies on the write path.

5. No garbage collection. Rust uses RAII — memory is freed the instant a chunk goes out of scope, with zero pauses. Java’s G1GC introduces stop-the-world pauses that accumulate over a 10M-record run.

6. Keyset pagination (both Step 2 and Step 3). Spring Batch RS uses WHERE cursor_col > :last ORDER BY cursor_col LIMIT n — O(log n) per page. Java’s JdbcPagingItemReader with sortKeys uses the same strategy.

The JAXB lesson

The previous iteration of this benchmark showed a 13.2× gap on Step 2 (XML export). That gap was not a Java vs Rust problem — it was entirely due to JAXB’s reflection-based marshalling. Switching to XMLStreamWriter brought Java to within 1.6× of Rust on that step.

If you are running Java batch jobs with JAXB-based XML output, switching to StAX is likely the highest-ROI optimisation available.

When to choose Java

Your team is Java-first and migration cost outweighs performance gains
You need Spring ecosystem integrations (Spring Data, Spring Cloud Task, Spring Integration)
Your batch jobs run infrequently and throughput is not the bottleneck
You require rich operational features: JobRepository, JobExplorer, REST API control

When to choose Rust

Throughput and latency are business requirements (financial settlement, real-time ETL)
Memory is constrained (embedded systems, small containers)
GC pauses would cause SLA violations
You want a single statically-linked binary with no runtime dependency
Cold-start time matters (serverless, frequent scheduling)

How to Reproduce

Prerequisites

# Start PostgreSQL with Docker Compose (from sbrs-java-bench/)
cd sbrs-java-bench
docker compose up -d

# For reproducible results, restart with a fresh volume between runs:
docker compose down -v && docker compose up -d

Run the Rust benchmark

cd sbrs-lib

RUSTFLAGS="-C target-cpu=native" \
cargo run --release --example benchmark_csv_postgres_xml \
  --features csv,xml,rdbc-postgres

Run the Java benchmark

cd sbrs-java-bench

# Build fat JAR once
mvn package -q -DskipTests

# Run
java -Xms4g -Xmx4g -XX:+UseG1GC -XX:+AlwaysPreTouch \
  -jar target/spring-batch-benchmark-1.0.0.jar