Submitting Data¶

This guide walks through the full data submission workflow using the web interface.

Submission overview¶

The SeqDB follows the ENA data model hierarchy:

graph TD
    A[1. Create Project] --> B[2. Upload Files]
    B --> C[3. Fill Sample Sheet]
    C --> D[4. Validate]
    D --> E[5. Confirm Submission]
    E --> F[Samples + Experiments + Runs created]

Step 1: Create or select a project¶

A project groups related samples under a single study. Every submission belongs to a project.

Navigate to Submit → Bulk Submit
Choose Create New or Select Existing
For new projects, fill in:
- Title — Short descriptive name (e.g., "Arabian Camel WGS 2026")
- Project Type — Select from: whole_genome_sequencing, metagenomics, transcriptomics, etc.
- Description — Optional but recommended for FAIR compliance
Click Next

FAIR Tip

Adding a description and release date improves your project's Findability score.

Step 2: Upload sequencing files¶

Files must be staged before they can be linked to samples.

Browser upload¶

Click Choose Files to select FASTQ, BAM, or CRAM files
Files are uploaded directly to the staging area
MD5 checksums are computed server-side automatically
Wait for the upload to complete (progress bar shown)

FTP upload (large files)¶

For files larger than 5 GB, use FTP:

Connect to the FTP server: ftp://ftp.nfdp.example.sa
Log in with your SeqDB credentials
Upload files to your user directory
Files appear in the staging area automatically

See File Staging & Upload for more details.

Already staged?

If files were uploaded in a previous session, click Skip to proceed directly to the sample sheet step.

Step 3: Fill the sample sheet¶

The sample sheet is a TSV (tab-separated values) file that maps samples to sequencing files.

Select a metadata checklist (e.g., ERC000011 — ENA Default)
Click Download Template to get a pre-filled template with demo data
Open the template in Excel or Google Sheets
Replace the demo rows with your actual sample metadata
Save as .tsv (tab-separated)
Click Upload Filled Sheet

Required columns¶

Every sample sheet must include:

Column	Description	Example
`sample_alias`	Unique identifier per sample	`CAMEL_001`
`organism`	Species name	`Camelus dromedarius`
`tax_id`	NCBI taxonomy ID	`9838`

Additional required fields depend on the selected checklist.

File matching columns¶

Column	Description
`filename_forward`	Forward read filename (R1)
`filename_reverse`	Reverse read filename (R2)
`md5_forward`	MD5 checksum of forward file (optional)
`md5_reverse`	MD5 checksum of reverse file (optional)

Smart file matching

The system matches files using a 3-tier fallback:

Exact filename — Matches filename_forward against staged files
MD5 checksum — If filename not found, matches by MD5
Alias pattern — Falls back to {sample_alias}_R1.* / {sample_alias}_R2.*

If no match is found, the system suggests the closest match: "Did you mean 'SAMPLE_001_R1.fastq.gz'?"

Step 4: Review validation¶

After uploading the sample sheet, a validation preview appears:

Green cells — Field is filled and valid
Yellow cells — Optional field is empty (OK to proceed)
Red cells — Required field is missing (must fix)
Column headers marked with * are required

Fix any errors in your TSV file and re-upload.

Step 5: Confirm¶

Once validation passes:

Review the summary table showing all samples and matched files
Click Confirm & Create All
The system creates:
- One Sample per row
- One Experiment per sample (with platform/library info)
- One Run per matched file (with checksum and file path)

After confirmation, you'll see the created accession numbers and can view them on the project detail page.

After submission¶

View your project at /projects/{accession}
Check FAIR compliance score and suggestions
Download files via the API: GET /api/v1/filereport?accession={project_accession}
Add more samples later via the project's Bulk Upload button

CLI Submission¶

The same workflow can be performed entirely from the command line using the seqdb CLI.

Install and authenticate¶

pip install seqdb-cli
seqdb login --url https://api.seqdb.nfdp.dev --email you@example.com

Download a checklist template¶

seqdb template ERC000011 --output my_samples.tsv

Upload files and submit¶

# Upload FASTQ files to staging and submit metadata in one command
seqdb submit my_samples.tsv \
  --checklist ERC000011 \
  --project NFDP-PRJ-000001 \
  --files ./reads/ \
  --threads 8

Add --yes to skip the interactive confirmation prompt.

Check submission status¶

seqdb status NFDP-PRJ-000001

See the CLI Reference for the full list of options.