aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/utils/conll_to_dataset.py
diff options
context:
space:
mode:
authorHsiangNianian <i@jyunko.cn>2025-12-30 19:14:39 +0800
committerHsiangNianian <i@jyunko.cn>2025-12-30 19:14:39 +0800
commit7ac684f1f82023c6284cd7d7efde11b8dc98c149 (patch)
tree4ac4e9fb72a4e1e2578d9fb4e9704967b052ec15 /utils/conll_to_dataset.py
parent12910f3a937633a25aa0de463a6edf756f2b8cdd (diff)
downloadbase-model-7ac684f1f82023c6284cd7d7efde11b8dc98c149.tar.gz
base-model-7ac684f1f82023c6284cd7d7efde11b8dc98c149.zip
feat: Implement TRPG NER training and inference script with robust model path detection and enhanced timestamp/speaker handling
- Added main training and inference logic in main.py, including CoNLL parsing, tokenization, and model training. - Introduced TRPGParser class for inference with entity aggregation and special handling for timestamps and speakers. - Developed utility functions for converting word-level CoNLL to char-level and saving datasets in various formats. - Added ONNX export functionality for the trained model. - Created a comprehensive requirements.txt and updated pyproject.toml with necessary dependencies. - Implemented tests for ONNX inference to validate model outputs.
Diffstat (limited to 'utils/conll_to_dataset.py')
0 files changed, 0 insertions, 0 deletions