File Reports API¶

The /filereport endpoint provides ENA-style file metadata and download URLs. Query by any accession type to get all associated files.

Endpoint¶

GET /api/v1/filereport?accession={accession}

No authentication required.

How it works¶

Like ENA's file report, you pass any accession and the system resolves the full hierarchy to return all associated files.

graph LR
    A[Project Accession] --> B[All Samples]
    B --> C[All Experiments]
    C --> D[All Runs/Files]
    E[Sample Accession] --> C
    F[Experiment Accession] --> D
    G[Run Accession] --> D

Query by accession type¶

Project — get all files in a study¶

curl 'http://localhost:8000/api/v1/filereport?accession=NFDP-PRJ-000001'

Sample — get files for one sample¶

curl 'http://localhost:8000/api/v1/filereport?accession=NFDP-SAM-000001'

Experiment — get files for one experiment¶

curl 'http://localhost:8000/api/v1/filereport?accession=NFDP-EXP-000001'

Run — get a single file entry¶

curl 'http://localhost:8000/api/v1/filereport?accession=NFDP-RUN-000001'

Response format¶

[
  {
    "run_accession": "NFDP-RUN-000001",
    "experiment_accession": "NFDP-EXP-000001",
    "sample_accession": "NFDP-SAM-000001",
    "project_accession": "NFDP-PRJ-000001",
    "organism": "Camelus dromedarius",
    "tax_id": 9838,
    "file_type": "FASTQ",
    "filename": "SAMPLE_001_R1.fastq.gz",
    "file_path": "nfdp-raw/NFDP-PRJ-000001/NFDP-SAM-000001/NFDP-RUN-000001/SAMPLE_001_R1.fastq.gz",
    "file_size": 1234567890,
    "checksum_md5": "d41d8cd98f00b204e9800998ecf8427e",
    "checksum_sha256": null,
    "download_url": "/api/v1/runs/NFDP-RUN-000001/download",
    "platform": "ILLUMINA",
    "instrument_model": "Illumina NovaSeq 6000",
    "library_strategy": "WGS",
    "library_layout": "PAIRED"
  }
]

Response fields¶

Field	Description
`run_accession`	Run accession (file-level)
`experiment_accession`	Parent experiment
`sample_accession`	Parent sample
`project_accession`	Parent project
`organism`	Species name from sample
`tax_id`	NCBI taxonomy ID
`file_type`	`FASTQ`, `BAM`, `CRAM`, `VCF`, or `OTHER`
`filename`	Original filename
`file_path`	Storage path (bucket/project/sample/run/file)
`file_size`	Size in bytes
`checksum_md5`	MD5 hex digest
`checksum_sha256`	SHA-256 hex digest (if available)
`download_url`	Relative URL for downloading the file
`platform`	Sequencing platform
`instrument_model`	Instrument model
`library_strategy`	Library strategy (WGS, RNA-Seq, etc.)
`library_layout`	PAIRED or SINGLE

Comparison with ENA¶

ENASeqDB

# Get FASTQ files for study PRJNA123456
curl 'https://www.ebi.ac.uk/ena/portal/api/filereport?\
accession=PRJNA123456&\
result=read_run&\
fields=run_accession,fastq_ftp,fastq_md5,fastq_bytes'

Response: TSV with columns run_accession, fastq_ftp, fastq_md5, fastq_bytes

# Get all files for project NFDP-PRJ-000001
curl 'http://localhost:8000/api/v1/filereport?accession=NFDP-PRJ-000001'

Response: JSON array with full metadata + download URLs

Key differences:

Feature	ENA	SeqDB
Format	TSV (default)	JSON
`result` param	Required (`read_run` or `analysis`)	Not needed (always read_run)
`fields` param	Optional (filter columns)	Not supported (all fields returned)
Download	FTP/Aspera URLs	Presigned URL via `/runs/{acc}/download`

Batch download recipes¶

Download all files for a project¶

Bash + jqPythonParallel (xargs)

curl -s 'http://localhost:8000/api/v1/filereport?accession=NFDP-PRJ-000001' \
  | jq -r '.[].download_url' \
  | while read url; do
      curl -L -O "http://localhost:8000${url}"
    done

import requests

BASE = "http://localhost:8000/api/v1"
files = requests.get(f"{BASE}/filereport", params={"accession": "NFDP-PRJ-000001"}).json()

for f in files:
    print(f"Downloading {f['filename']} ({f['file_size']} bytes)...")
    resp = requests.get(f"{BASE}{f['download_url']}", allow_redirects=True)
    with open(f['filename'], 'wb') as out:
        out.write(resp.content)

curl -s 'http://localhost:8000/api/v1/filereport?accession=NFDP-PRJ-000001' \
  | jq -r '.[].download_url' \
  | xargs -P 4 -I {} curl -L -O 'http://localhost:8000{}'

Verify checksums¶

# Generate expected checksums
curl -s 'http://localhost:8000/api/v1/filereport?accession=NFDP-PRJ-000001' \
  | jq -r '.[] | "\(.checksum_md5)  \(.filename)"' > expected_md5.txt

# Verify downloads
md5sum -c expected_md5.txt

Filter by sample¶

# Get files for a specific sample only
curl -s 'http://localhost:8000/api/v1/filereport?accession=NFDP-SAM-000003'