Dataloader
There are two ways to load data.
- class simuleval.data.dataloader.GenericDataloader(source_list: Union[path, str], target_list: List[str])[source]
Load source and target data
usage: [-h] [--source SOURCE] [--target TARGET] [--source-type {text,speech}] [--target-type {text,speech}] [--source-segment-size SOURCE_SEGMENT_SIZE] [--start-index START_INDEX] [--end-index END_INDEX] [--fairseq-data FAIRSEQ_DATA] [--fairseq-config FAIRSEQ_CONFIG] [--fairseq-gen-subset FAIRSEQ_GEN_SUBSET] [--fairseq-manifest FAIRSEQ_MANIFEST]
Named Arguments
- --source
Source file.
- --target
Target file.
- --source-type
Possible choices: text, speech
Source Data type to evaluate.
- --target-type
Possible choices: text, speech
Data type to evaluate.
- --source-segment-size
Source segment size, For text the unit is # token, for speech is ms
Default: 1
- --start-index
Start index for evaluation.
Default: 0
- --end-index
The last index for evaluation.
Default: -1
- --fairseq-data
Set fairseq data root.
- --fairseq-config
Set fairseq data root.
- --fairseq-gen-subset
Subset to evaluate. Assume there is a gen_subset.tsv file in fairseq_root
- --fairseq-manifest
Use fairseq manifest (tsv) format input
- class simuleval.data.dataloader.fairseq_s2t_dataloader.FairseqSpeechToTextDataloader(fairseq_s2t_dataset: SpeechToTextDataset)[source]
Load speech-to-text data in fairseq-s2t format.
usage: [-h] [--fairseq-data FAIRSEQ_DATA] [--fairseq-config FAIRSEQ_CONFIG] [--fairseq-gen-subset FAIRSEQ_GEN_SUBSET] [--fairseq-manifest FAIRSEQ_MANIFEST]
Named Arguments
- --fairseq-data
Set fairseq data root.
- --fairseq-config
Set fairseq data root.
- --fairseq-gen-subset
Subset to evaluate. Assume there is a gen_subset.tsv file in fairseq_root
- --fairseq-manifest
Use fairseq manifest (tsv) format input
Note
fairseq has to be installed to use this feature.