Java vs Rust Benchmark — 10M Transactions

Ce contenu n’est pas encore disponible dans votre langue.

This page compares Spring Batch (Java 25 / Spring Boot 4.x) and Spring Batch RS (Rust) on a realistic ETL pipeline: reading 10 million financial transactions from CSV, storing them in PostgreSQL, then exporting to XML.

Both implementations use identical settings — chunk size 1 000, connection pool 10, same data schema — so the comparison is apples-to-apples.

Test Environment

Parameter	Value
Machine	8-core CPU, 16 GB RAM, NVMe SSD
OS	Ubuntu 22.04 LTS
PostgreSQL	15.4 (local, same machine)
Java	OpenJDK 25, Spring Boot 4.0.3, Spring Batch 6.x
JVM flags	`-Xms512m -Xmx4g -XX:+UseG1GC` + virtual threads enabled
Rust	1.77 stable, `--release` (`opt-level = 3`)
JVM GC	G1GC, logged with `-Xlog:gc*:gc.log`
Virtual threads	Enabled (`spring.threads.virtual.enabled=true`)
Chunk size	1 000 (both)
Pool size	10 connections (both)

Pipeline

transactions.csv (10M rows)
        │
        ▼ CsvItemReader / FlatFileItemReader
  TransactionProcessor
  (USD/GBP → EUR conversion, CANCELLED → FAILED)
        │
        ▼ PostgresItemWriter / JdbcBatchItemWriter  (bulk insert, chunk=1000)
   PostgreSQL: table transactions
        │
        ▼ RdbcItemReader / JdbcPagingItemReader  (paginated, page_size=1000)
        │
        ▼ XmlItemWriter / StaxEventItemWriter
  transactions_export.xml

Transaction record

Field	Type	Example
`transaction_id`	string	`TXN-0000000001`
`amount`	float	`1234.56`
`currency`	string	`USD`, `EUR`, `GBP`
`timestamp`	string	`2024-06-15T12:00:00Z`
`account_from`	string	`ACC-00042137`
`account_to`	string	`ACC-00891023`
`status`	string	`PENDING`, `COMPLETED`, `FAILED`, `CANCELLED`
`amount_eur`	float	`1135.80` (added by processor)

Code Side by Side

#[derive(Debug, Clone, Deserialize, Serialize, FromRow)]
struct Transaction {
    transaction_id: String,
    amount: f64,
    currency: String,
    timestamp: String,
    account_from: String,
    account_to: String,
    status: String,
    #[serde(default)]
    amount_eur: f64,
}

@Entity
@Table(name = "transactions")
@XmlRootElement(name = "transaction")
@XmlAccessorType(XmlAccessType.FIELD)
public class Transaction {
    @Id
    @Column(name = "transaction_id")
    private String transactionId;
    private double amount;
    private String currency;
    private String timestamp;
    @Column(name = "account_from")
    private String accountFrom;
    @Column(name = "account_to")
    private String accountTo;
    private String status;
    @Column(name = "amount_eur")
    private double amountEur;
    // getters / setters ...
}

Processor (currency conversion + status normalisation)

Rust
Java

#[derive(Default)]
struct TransactionProcessor;

impl ItemProcessor<Transaction, Transaction> for TransactionProcessor {
    fn process(&self, item: &Transaction) -> ItemProcessorResult<Transaction> {
        let rate = match item.currency.as_str() {
            "USD" => 0.92,
            "GBP" => 1.17,
            _     => 1.0,
        };
        let status = if item.status == "CANCELLED" {
            "FAILED".to_string()
        } else {
            item.status.clone()
        };
        Ok(Some(Transaction {
            amount_eur: (item.amount * rate * 100.0).round() / 100.0,
            status,
            ..item.clone()
        })
    }
}

@Component
public class TransactionProcessor
    implements ItemProcessor<Transaction, Transaction> {

    private static final Map<String, Double> RATES = Map.of(
        "USD", 0.92, "GBP", 1.17, "EUR", 1.0);

    @Override
    public Transaction process(Transaction item) {
        double rate = RATES.getOrDefault(item.getCurrency(), 1.0);
        item.setAmountEur(
            Math.round(item.getAmount() * rate * 100.0) / 100.0);
        if ("CANCELLED".equals(item.getStatus()))
            item.setStatus("FAILED");
        return item;
    }
}

Step 1 — CSV → PostgreSQL

Rust
Java

let file     = File::open(csv_path)?;
let buffered = BufReader::with_capacity(64 * 1024, file);

let reader = CsvItemReaderBuilder::<Transaction>::new()
    .has_headers(true)
    .from_reader(buffered);

let writer = RdbcItemWriterBuilder::<Transaction>::new()
    .postgres(&pool)
    .table("transactions")
    .add_column("transaction_id")
    // ... 8 columns total
    .postgres_binder(&TransactionBinder)
    .build_postgres();

let step = StepBuilder::new("csv-to-postgres")
    .chunk::<Transaction, Transaction>(1_000)
    .reader(&reader)
    .processor(&TransactionProcessor)
    .writer(&writer)
    .build();

@Bean
public FlatFileItemReader<Transaction> csvReader() {
    return new FlatFileItemReaderBuilder<Transaction>()
        .name("transactionCsvReader")
        .resource(new FileSystemResource(csvPath))
        .linesToSkip(1)
        .delimited().delimiter(",")
        .names("transactionId","amount","currency","timestamp",
               "accountFrom","accountTo","status")
        .targetType(Transaction.class)
        .build();
}

@Bean
public Step step1(...) {
    return new StepBuilder("csvToPostgresStep", repo)
        .<Transaction, Transaction>chunk(1_000, tx)
        .reader(csvReader())
        .processor(processor)
        .writer(postgresWriter(dataSource))
        .build();
}

Step 2 — PostgreSQL → XML

Rust
Java

let reader = RdbcItemReaderBuilder::<Transaction>::new()
    .postgres(pool.clone())
    .query(
        "SELECT transaction_id, amount, currency, timestamp, \
         account_from, account_to, status, amount_eur \
         FROM transactions ORDER BY transaction_id",
    )
    .with_page_size(1_000)
    .build_postgres();

let writer = XmlItemWriterBuilder::<Transaction>::new()
    .root_tag("transactions")
    .item_tag("transaction")
    .from_path(xml_path)?;

let step = StepBuilder::new("postgres-to-xml")
    .chunk::<Transaction, Transaction>(1_000)
    .reader(&reader)
    .processor(&PassThroughProcessor::new())
    .writer(&writer)
    .build();

@Bean
public JdbcPagingItemReader<Transaction> postgresReader(DataSource ds) {
    return new JdbcPagingItemReaderBuilder<Transaction>()
        .name("postgresTransactionReader")
        .dataSource(ds)
        .selectClause("SELECT transaction_id,amount,currency,timestamp," +
                      "account_from,account_to,status,amount_eur")
        .fromClause("FROM transactions")
        .sortKeys(Map.of("transaction_id", Order.ASCENDING))
        .rowMapper(/* maps columns → Transaction */)
        .pageSize(1_000).build();
}

@Bean
public Step step2(...) {
    return new StepBuilder("postgrestoXmlStep", repo)
        .<Transaction, Transaction>chunk(1_000, tx)
        .reader(postgresReader(dataSource))
        .writer(xmlWriter(marshaller))
        .build();
}

Results

Measured on the reference environment described above.

Overall performance

Metric	Spring Batch RS (Rust)	Spring Batch (Java)	Rust advantage
Total pipeline time	42 s	187 s	4.5× faster
Step 1 duration (CSV→PG)	28 s	124 s	4.4×
Step 2 duration (PG→XML)	14 s	63 s	4.5×
JVM / binary startup	< 10 ms	3 200 ms	320×
Deployable artefact size	8 MB (binary)	47 MB (fat JAR)	6× smaller

Throughput (records/sec)

Step	Rust	Java	Ratio
Step 1 — CSV → PostgreSQL	357 000	80 600	4.4×
Step 2 — PostgreSQL → XML	714 000	158 700	4.5×

Memory (peak RSS)

Metric	Rust	Java
Peak RSS	62 MB	1 840 MB
Heap peak	N/A (no GC)	1 620 MB
Steady-state RSS	~45 MB	~820 MB

GC (Java only)

Metric	Value
Total GC events	312
Total GC pause time	8.4 s
Longest single pause	340 ms
% of runtime in GC	4.5%

Analysis

Why is Rust ~4.5× faster?

1. No garbage collection. Java’s G1GC paused for a cumulative 8.4 seconds. Rust uses RAII — memory is freed the instant a chunk goes out of scope, with zero overhead and zero latency spikes.

2. Lower memory pressure. Java holds JVM metadata, class bytecode, and JIT-compiled code in addition to heap data. Spring Batch also retains JobExecution and StepExecution objects throughout the run. Rust’s binary is a single executable: 62 MB vs 1 840 MB peak RSS.

3. Zero-cost abstractions. Rust’s trait-based pipeline (ItemReader → ItemProcessor → ItemWriter) compiles to a tight loop with no virtual dispatch overhead. Java’s pipeline involves Spring AOP, proxy objects, and transaction management wrappers on every chunk boundary.

4. Startup time. The JVM takes 3.2 s to start, load classes, and JIT-compile hot paths. The Rust binary starts in under 10 ms — critical for short jobs or frequent schedules.

When to choose Java

Your team is Java-first and migration cost outweighs performance gains
You need Spring ecosystem integrations (Spring Data, Spring Cloud Task, Spring Integration)
Your batch jobs run infrequently and throughput is not the bottleneck
You require rich operational features: JobRepository, JobExplorer, REST API control

When to choose Rust

Throughput and latency are business requirements (financial settlement, real-time ETL)
Memory is constrained (embedded systems, small containers)
GC pauses would cause SLA violations
You want a single statically-linked binary with no runtime dependency
Cold-start time matters (serverless, frequent scheduling)

How to Reproduce

Prerequisites

# PostgreSQL 15+ (Docker):
docker run -d --name pg-bench \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=benchmark \
  postgres:15

Run the Rust benchmark

# Build in release mode (required for fair comparison)
cargo build --release --example benchmark_csv_postgres_xml \
  --features csv,xml,rdbc-postgres

# Run and measure peak RSS
/usr/bin/time -v \
  cargo run --release --example benchmark_csv_postgres_xml \
    --features csv,xml,rdbc-postgres \
  2>&1 | tee rust_bench.log

# Extract key metrics
grep -E "Step|SUMMARY|Maximum resident" rust_bench.log

Run the Java benchmark

cd benchmark/java

# Requires Java 25 + Maven 3.9+
# Build fat JAR (Spring Boot 4.0.3 / Spring Batch 6.x)
mvn package -q -DskipTests

# Run with GC logging, virtual threads, and RSS measurement
/usr/bin/time -v java \
  -Xms512m -Xmx4g \
  -XX:+UseG1GC \
  -Xlog:gc*:gc.log \
  -jar target/spring-batch-benchmark-1.0.0.jar \
  --spring.datasource.url=jdbc:postgresql://localhost:5432/benchmark \
  2>&1 | tee java_bench.log

# Parse GC summary
grep "Pause" gc.log | tail -20
grep "Maximum resident" java_bench.log

Truncate the table between runs to avoid primary key conflicts:

TRUNCATE TABLE transactions;

The Rust benchmark does this automatically on each run. For Java, run the SQL manually or set spring.sql.init.mode=always to re-create the table on startup.