varona.platypus

High-level module for building a DataFrame from Platypus-style VCF.

This module calls functions from the varona.dataframe module to build a DataFrame from a Platypus-style VCF file.

API_DF_SCHEMA = {'alt': String, 'contig': String, 'effect': String, 'gene_id': String, 'gene_name': String, 'pos': UInt32, 'ref': String, 'transcript_id': String, 'type': String}

Polars schema for the API DataFrame.

VCF_DF_SCHEMA = {'alt': String, 'contig': String, 'maf': Float64, 'max_variant_reads': UInt64, 'pos': UInt32, 'ref': String, 'sequence_depth': UInt64, 'variant_read_pct': Float64}

Polars schema for the VCF DataFrame.

platypus_dataframe(vcf_path: ~pathlib.Path, maf_method: ~varona.maf.MafMethod = MafMethod.SAMPLES, timeout: int = 300, genome_assembly: ~varona.ensembl.Assembly = Assembly.GRCH37, vcf_extractor=<function platypus_vcf_record_extractor>, api_extractor=<function default_vep_response_extractor>, no_vep: bool = False, vep_json_path: ~pathlib.Path | None = None) DataFrame[source]

Read a Platypus VCF file into a DataFrame.

Parameters:
  • vcf_path – The path to the Platypus VCF file.

  • maf_method – The method to use for calculating the MAF.

  • timeout – The timeout (seconds) for the VEP API query.

  • genome_assembly – The genome assembly used in the Ensembl VEP API.

  • vcf_extractor – The function to extract data from the VCF.

  • api_extractor – The function to extract data from the VEP API response.

  • no_vep – Skip querying the VEP API.

  • vep_json_path – Path to the VEP output file from running VEP locally, (bypasses querying API).

Returns:

A DataFrame with the VCF data.