varona.platypus
High-level module for building a DataFrame from Platypus-style VCF.
This module calls functions from the varona.dataframe
module to build
a DataFrame from a Platypus-style VCF file.
- API_DF_SCHEMA = {'alt': String, 'contig': String, 'effect': String, 'gene_id': String, 'gene_name': String, 'pos': UInt32, 'ref': String, 'transcript_id': String, 'type': String}
Polars schema for the API DataFrame.
- VCF_DF_SCHEMA = {'alt': String, 'contig': String, 'maf': Float64, 'max_variant_reads': UInt64, 'pos': UInt32, 'ref': String, 'sequence_depth': UInt64, 'variant_read_pct': Float64}
Polars schema for the VCF DataFrame.
- platypus_dataframe(vcf_path: ~pathlib.Path, maf_method: ~varona.maf.MafMethod = MafMethod.SAMPLES, timeout: int = 300, genome_assembly: ~varona.ensembl.Assembly = Assembly.GRCH37, vcf_extractor=<function platypus_vcf_record_extractor>, api_extractor=<function default_vep_response_extractor>, no_vep: bool = False, vep_json_path: ~pathlib.Path | None = None) DataFrame [source]
Read a Platypus VCF file into a DataFrame.
- Parameters:
vcf_path – The path to the Platypus VCF file.
maf_method – The method to use for calculating the MAF.
timeout – The timeout (seconds) for the VEP API query.
genome_assembly – The genome assembly used in the Ensembl VEP API.
vcf_extractor – The function to extract data from the VCF.
api_extractor – The function to extract data from the VEP API response.
no_vep – Skip querying the VEP API.
vep_json_path – Path to the VEP output file from running VEP locally, (bypasses querying API).
- Returns:
A DataFrame with the VCF data.