Home
Getting started¶
Installing via pip¶
pip install tfkit
- You can use tfkit for model training and evaluation with
tfkit-train
andtfkit-eval
.
Running TFKit on the task you wanted¶
First step - prepare your dataset¶
The key to combine different task together is to make different task with same data format.
notice
- All data will be in csv format - tfkit will use csv for all task, normally it will have two columns, first columns is the input of models, the second column is the output of models.
- Plane text with no tokenization - there is no need to tokenize text before training, or do re-calculating for tokenization, tfkit will handle it for you.
- No header is needed.
For example, a sentiment classification dataset will be like:
how dare you,negative
Hint
For the detail and example format on different, you can check here
Hint
nlprep is a tool for data split/preprocessing/argumentation, it can help you to create ready to train data for tfkit, check here
Second step - model training¶
Using tfkit-train
for model training, you can use
Before training a model, there is something you need to clarify:
--model
what is your model to handle this task? check here to the detail of models.--config
what pretrained model you want to use? you can go https://huggingface.co/models to search for available pretrained models.--train
and--test
training and testing dataset path, which is in csv format.--savedir
model saving directory, default will be in '/checkpoints' folder
you can leave the rest to the default config, or use tfkit-train -h
to more configuration.
An example about training a sentiment classifier:
tfkit-train \
--model clas \
--config xlm-roberta-base \
--train training_data.csv \
--test testing_data.csv \
--lr 4e-5 \
--maxlen 384 \
--epoch 10 \
--savedir roberta_sentiment_classificer
Third step - model eval¶
Using tfkit-eval
for model evaluation.
- --model
saved model's path.
- --metric
the evaluation metric eg: emf1, nlg(BLEU/ROUGE), clas(confusion matrix).
- --valid
validation data, also in csv format.
- --panel
a input panel for model specific parameter.
for more configuration detail, you may use tfkit-eval -h
.
After evaluate, It will print evaluate result in your console, and also generate three report for debugging.
- *_score.csv
overall score, it is the copy of the console result.
- *each_data_score.csv
score on each data, 3 column predicted,targets,score
, ranked from the lowest to the highest.
- *predicted.csv
csv file include 3 column input,predicted,targets
.
Hint
nlp2go is a tool for demonstration, with CLI and Restful interface. check here
Example¶
Use distilbert to train NER Model¶
nlprep --dataset tag_clner --outdir ./clner_row --util s2t
tfkit-train --batch 10 --epoch 3 --lr 5e-6 --train ./clner_row/train --test ./clner_row/test --maxlen 512 --model tag --config distilbert-base-multilingual-cased
nlp2go --model ./checkpoints/3.pt --cli
Use Albert to train DRCD Model Model¶
nlprep --dataset qa_zh --outdir ./zhqa/
tfkit-train --maxlen 512 --savedir ./drcd_qa_model/ --train ./zhqa/drcd-train --test ./zhqa/drcd-test --model qa --config voidful/albert_chinese_small --cache
nlp2go --model ./drcd_qa_model/3.pt --cli
Use Albert to train both DRCD Model and NER Model¶
nlprep --dataset tag_clner --outdir ./clner_row --util s2t
nlprep --dataset qa_zh --outdir ./zhqa/
tfkit-train --maxlen 300 --savedir ./mt-qaner --train ./clner_row/train ./zhqa/drcd-train --test ./clner_row/test ./zhqa/drcd-test --model tag qa --config voidful/albert_chinese_small
nlp2go --model ./mt-qaner/3.pt --cli
You can also try tfkit in Google Colab:
Contributing¶
Thanks for your interest.There are many ways to contribute to this project. Get started here.
License¶
Icons reference¶
Icons modify from Freepik from www.flaticon.com
Icons modify from Nikita Golubev from www.flaticon.com