Bulk Submission¶
Submit many samples at once using a TSV sample sheet — similar to ENA's Webin spreadsheet submission.
Overview¶
Bulk submission lets you register dozens or hundreds of samples in one operation by uploading a tab-separated spreadsheet. The system validates metadata, matches files, and creates all entities atomically.
Workflow¶
sequenceDiagram
participant U as User
participant API as SeqDB API
participant S as Staging
U->>API: Download template (GET /bulk-submit/template/{checklist})
U->>U: Fill template with sample data
U->>S: Upload FASTQ files (POST /staging/upload)
U->>API: Upload filled sheet (POST /bulk-submit/validate)
API->>S: Match filenames/MD5s against staged files
API-->>U: Validation report (per-cell status)
U->>API: Confirm (POST /bulk-submit/confirm)
API-->>U: Created accessions (samples, experiments, runs)
Via the Web UI¶
From the main Submit page¶
- Go to Submit → Bulk Submit
- Follow the 4-step wizard: Project → Upload → Sample Sheet → Confirm
From a project page¶
- Go to Browse → click a project
- Click Bulk Upload in the Samples card
- Select checklist, download template, fill and upload
Via the API¶
Step 1: Download template¶
This downloads a TSV file with:
- All checklist columns as headers
- 2 demo rows with realistic example data
- File matching columns (
filename_forward,filename_reverse,md5_forward,md5_reverse) - Sequencing columns (
platform,instrument_model,library_strategy)
Step 2: Upload files to staging¶
# Upload each FASTQ file
curl -X POST http://localhost:8000/api/v1/staging/upload \
-H "Authorization: Bearer $TOKEN" \
-F "file=@SAMPLE_001_R1.fastq.gz"
curl -X POST http://localhost:8000/api/v1/staging/upload \
-H "Authorization: Bearer $TOKEN" \
-F "file=@SAMPLE_001_R2.fastq.gz"
Step 3: Validate¶
curl -X POST http://localhost:8000/api/v1/bulk-submit/validate \
-H "Authorization: Bearer $TOKEN" \
-F "file=@filled_template.tsv" \
-F "checklist_id=ERC000011"
The response includes per-cell validation:
{
"valid": true,
"total_rows": 2,
"headers": ["sample_alias", "organism", "tax_id", ...],
"required_fields": ["organism", "tax_id", "sample_alias"],
"rows": [
{
"row_num": 2,
"sample_alias": "SAMPLE_001",
"cells": {
"organism": {"value": "Camelus dromedarius", "status": "ok"},
"tax_id": {"value": "9838", "status": "ok"},
"collection_date": {"value": "", "status": "empty_optional"}
},
"forward_file": {"filename": "SAMPLE_001_R1.fastq.gz", "md5": "abc123..."},
"reverse_file": {"filename": "SAMPLE_001_R2.fastq.gz", "md5": "def456..."},
"errors": [],
"warnings": []
}
]
}
Step 4: Confirm¶
curl -X POST http://localhost:8000/api/v1/bulk-submit/confirm \
-H "Authorization: Bearer $TOKEN" \
-F "file=@filled_template.tsv" \
-F "project_accession=NFDP-PRJ-000001" \
-F "checklist_id=ERC000011"
Response:
{
"status": "created",
"samples": ["NFDP-SAM-000001", "NFDP-SAM-000002"],
"experiments": ["NFDP-EXP-000001", "NFDP-EXP-000002"],
"runs": ["NFDP-RUN-000001", "NFDP-RUN-000002", "NFDP-RUN-000003", "NFDP-RUN-000004"]
}
Available checklists¶
| ID | Name | Required fields |
|---|---|---|
ERC000011 |
ENA Default | organism, tax_id |
ERC000020 |
Pathogen Clinical/Host | organism, tax_id, isolation_source, host |
ERC000043 |
Virus Pathogen | organism, tax_id, strain, isolation_source |
ERC000055 |
Farm Animal | organism, tax_id, breed |
snpchip_livestock |
SNP Chip Livestock | organism, tax_id, breed |
File matching¶
The system uses a 3-tier strategy to match sample sheet rows to staged files:
- Exact filename match — Looks for
filename_forwardin staged files - MD5 match — If filename not found, searches staged files by
md5_forward - Alias pattern — Falls back to matching
{sample_alias}[._-]R1in filenames
If none match, the system suggests the closest staged filename.
Filename typos
If your filenames have typos (e.g., _R1.fast.gz instead of _R1.fastq.gz), the system will try MD5 and alias matching before failing. The error message will suggest the closest match.
Template columns¶
| Column | Required | Description |
|---|---|---|
sample_alias |
Yes | Unique sample identifier |
organism |
Yes* | Species name |
tax_id |
Yes* | NCBI taxonomy ID |
collection_date |
Depends | Date of sample collection (YYYY-MM-DD) |
geographic_location |
Depends | Where the sample was collected |
breed |
Depends | Animal breed |
host |
No | Host organism |
tissue |
No | Tissue type |
sex |
No | male / female / unknown |
filename_forward |
No | Forward read filename |
filename_reverse |
No | Reverse read filename |
md5_forward |
No | MD5 of forward file |
md5_reverse |
No | MD5 of reverse file |
platform |
No | ILLUMINA (default) |
instrument_model |
No | e.g., Illumina NovaSeq 6000 |
library_strategy |
No | WGS (default) |
*Required fields depend on the selected checklist.
CLI Bulk Submit¶
The seqdb CLI wraps all of the above steps into a single command.
Install and authenticate¶
Download a template¶
Validate before submitting¶
Submit all at once¶
seqdb submit samples.tsv \
--checklist ERC000011 \
--project NFDP-PRJ-000001 \
--files ./reads/ \
--threads 8
The CLI uploads files in parallel (controlled by --threads), validates the sample sheet, and on success prints the created accessions. Add --yes to skip the confirmation prompt for non-interactive use (e.g., in CI pipelines).
Check results¶
See the CLI Reference for full option details.