Train a model¶
This document explains how to train a model with YASET.
Quick Start¶
To train a model, make a copy of the configuration file sample and adjust the parameters to you situation.
$ cp config.ini config-xp.ini
Invoke the YASET command. You can turn on the verbose mode with the debug
flag (--debug
).
$ yaset [--debug] LEARN --config config-xp.ini
Configuration Parameters¶
The configuration file is divided into 3 parts:
- general: parameters related to the experiment (see below for further explanations)
- data: parameters related to training instances and word embedding models
- training: parameters related to model training (e.g. learning algorithm, evaluation metrics or mini-batch size)
- <model parameters>: depending on your choice regarding the neural network model to use (specified in the training section), you can modify the model parameters (e.g. hidden layer sizes or character embedding size)
general section¶
batch_mode: bool
- Set this parameter to
true
if you want to perform multiple runs of the same experiment. This allows to check the model robustness to random seed initial value (Reimers et al. (2017) [3]).batch_iter: int
- Specify the number of runs to perform. This will be ignored if the value of the parameter
batch_mode
isfalse
experiment_name: str
- Specify the experiment name. The name will be used for directory and file naming.
data section¶
train_file_path: str
- Specify the training instance file path (absolute or relative). Please refer to the data formatting section of the data document for further information about file format.
dev_file_use: bool
- Set this parameter to
true
if you want to use a development instance file,false
otherwise.dev_file_path: str
- Specify the development instance file path (absolute or relative). This parameter will be ignored if the value of the parameter
dev_file_use
is set tofalse
. Please refer to the data formatting section of the data document for further information about file format.dev_random_ratio: float
- Specify the percentage of training instances that should be kept as development instances (float between 0 and 1, e.g. 0.2). This will be ignored if the value of the parameter
dev_file_use
istrue
.dev_random_seed_use: bool
- Set this parameter to
true
if you want to use a random seed for train/dev split. This will be ignored if the value of the parameterdev_file_use
istrue
.dev_random_seed_value: int
- Specify the random seed value (integer). This will be ignored if the value of the parameter
dev_file_use
istrue
or if the value of the parameterdev_random_seed_use
isfalse
preproc_lower_input: bool
- Set this parameter to
true
if you want YASET to lowercase tokens before token-vector looking-up,false
otherwise. This is useful if you have pre-trained word embeddings using a lowercased corpus.preproc_replace_digits: bool
- Set this parameter to
true
if you want YASET to replace digits by the digit 0 before token-vector looking-up,false
otherwise (e.g. “4,5mg” will be changed to “0,0mg”).embedding_model_type: str
Specify the format of the pre-trained word embeddings that you want to use to train the system. Two formats are supported:
embedding_model_path: str
- Specify the path of the pre-trained word embedding file (absolute or relative).
embedding_oov_strategy: str
Specify the strategy for Out-Of-Vocabulary (OOV) tokens. Two strategies are available:
map
: a vector for OOV tokens is provided in the embedding file. Set embedding_oov_strategy tomap
and specify the OOV vector ID (embedding_oov_map_token_id
parameter)replace
: following Lample et al. (2016) [2], an OOV vector will be randomly initialized and trained by randomly replacing singletons in the training instances by this vector. You can adjust the replacement rate by changing the value of the parameterembedding_oov_replace_rate
.embedding_oov_map_token_id: str
- Specify the OOV token ID if you use the strategy
map
. This will be ignored if the value of the parameterembedding_oov_strategy
is notmap
.embedding_oov_replace_rate: float
- Specify the replacement rate if you want to use the strategy
replace
(float between 0 and 1, e.g. 0.2). This will be ignored if the value of the parameterembedding_oov_strategy
is notreplace
.working_dir: str
- Specify the working directory path where a timestamped working directory will be created for the current run. For instance, if you specify
$USER/temp
, the directory$USER/temp/yaset-learn-YYYYMMDD
will be created.
training¶
model_type: str
Specify the neural network model that you want to use. There is only one choice at this time. Other models will be implemented in the next releases.
bilstm-char-crf
: implementation of the model presented in Lample et al. (2016) [2]. More information can be found in the original paper. Model parameters can be set in the bilstm-char-crf section of the configuration file.max_iterations: int
- Specify the maximum number of training iterations. Training will be stopped if early stopping criterion is not reached before this iteration number (see
patience
parameter).patience: int
- Specify the number of iterations to wait before early stop if there is no performance improvement on the validation instances.
dev_metric: str
Specify the metric used for performance computation on the validation instances.
accuracy
: standard token accuracy.conll
: metric which operates at the entity level. This should be used with a IOB(ES) markup on Named Entity Recognition related tasks. The implementation is taken for most parts from the Python adaptation by Sampo Pyysalo of the original script developed for the CoNLL-2003 Shared Task (Tjong et al., 2003 [4]).trainable_word_embeddings: bool
- Set this parameter to
true
if you want YASET to fine-tune word embeddings during network training,false
otherwise.cpu_cores: int
- Specify the number of CPU cores (upper-bound) that should be used during network training.
batch_size: int
- Specify the mini-batch size used during training.
store_matrices_on_gpu: bool
- Set this parameter to
true
if you want to keep the word embedding matrix on GPU memory,false
otherwise.bucket_use: bool
- Set this parameter to
true
if you want to bucketize training instances during network training. Bucket boundaries will be automatically computed.opt_algo: str
- Specify the optimization algorithm used during network training. You can choose between between
adam
(Kingma et al.,2014 [1]) orsgd
.opt_lr: float
- Specify the initial learning rate applied during network training.
opt_gc_use: bool
- Set this parameter to
true
if you want to use gradient clipping during network training,false
otherwise.opt_gc_type: str
- Specify the gradient clipping type (
clip_by_norm
orclip_by_value
) This will be ignored if the value of the parameteropt_gc_use
isfalse
.opt_gs_val: float
- Specify the gradient clipping value. This parameter will be ignored if the value for the parameter
opt_gc_use
isfalse
.opt_decay_use: bool
- Set this parameter to
true
if you want to use learning rate decay during network training,false
otherwise.opt_decay_rate: float
- Specify the decay rate (float between 0 and 1, e.g. 0.2). This parameter will be ignored if the value for the parameter
opt_decay_use
isfalse
.opt_decay_iteration: int
- Specify the learning rate decay frequency. If you set the frequency to \(n\), the learning rate \(lr\) will be decayed by the rate specified in the parameter
opt_decay_iteration
every \(n\) iterations.
bilstm-char-crf¶
These parameters are related to the neural network model presented in Lample et al. (2016) [2].
hidden_layer_size: int
- Specify the main LSTM hidden layer size.
dropout_rate: float
- Specify the dropout rate to apply on input embeddings before feeding them to the main LSTM.
use_char_embeddings: bool
- Set this parameter to
true
if you want to use character embeddings in the model,false
otherwise.char_hidden_layer_size: int
- Specify the character LSTM hidden layer size. This parameter will be ignored if the value for the parameter
use_char_embeddings
isfalse
.char_embedding_size: int
- Specify the character embedding size. This parameter will be ignored if the value for the parameter
use_char_embeddings
isfalse
.
[1] | Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR). 2015. |
[2] | (1, 2, 3) Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, and Chris Dyer. Neural Architectures for Named Entity Recognition. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 260–270. San Diego, California, 2016. Association for Computational Linguistics. |
[3] | Nils Reimers and Iryna Gurevych. Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2017. |
[4] | Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition. In Proceedings of CoNLL-2003, 142–147. Edmonton, Canada, 2003. Association for Computational Linguistics. |