yaset.tools

Submodules

yaset.tools.conll

yaset.tools.conll.check_bioul_labels(input_file: str = None)
yaset.tools.conll.check_labels(input_file: str = None, label_type: str = None)
yaset.tools.conll.convert_labels(input_file: str = None, output_file: str = None, input_label_type: str = None, output_label_type: str = None)

Convert NER tagging schemes

Args:
input_file (str): input CoNLL filepath output_file (str): output CoNLL filepath input_label_type (str): source NER tagging scheme output_label_type (str): target NER tagging scheme
Returns:
None
yaset.tools.conll.convert_sequence(input_sequence: list = None, input_label_type: str = None, output_label_type: str = None)
yaset.tools.conll.convert_spaces_to_tabulations(input_file: str = None, output_file: str = None) → None

Convert a CoNLL file with spaces as column separators into a CoNLL file with tabulations as column separators

Args:
input_file (str): input CoNLL filepath output_file (str): output CoNLL filepath
Returns:
None
yaset.tools.conll.extract_entities_iob1(input_labels: list = None)

Extract entity offsets for a CoNLL file encoded in conll 2003

Args:
input_labels (list): source labels
Returns:
list: entity offsets
yaset.tools.conll.extract_sent_entities(sentence_buffer: list = None)
yaset.tools.conll.extract_tag_cat(label)

Separate tag from category

Args:
label (str): NER label to split
Returns:
(str, str): tag, category
yaset.tools.conll.load_sentences(input_file: str = None, debug: bool = False)
yaset.tools.conll.split_tag(tag: str = None)