Home

Getting started¶

Installing via pip¶

pip install tfkit

You can use tfkit for model training and evaluation with tfkit-train and tfkit-eval.

Running TFKit on the task you wanted¶

First step - prepare your dataset¶

The key to combine different task together is to make different task with same data format.

notice

All data will be in csv format - tfkit will use csv for all task, normally it will have two columns, first columns is the input of models, the second column is the output of models.
Plane text with no tokenization - there is no need to tokenize text before training, or do re-calculating for tokenization, tfkit will handle it for you.
No header is needed.

For example, a sentiment classification dataset will be like:

how dare you,negative

Hint

For the detail and example format on different, you can check here

Hint

nlprep is a tool for data split/preprocessing/argumentation, it can help you to create ready to train data for tfkit, check here

Second step - model training¶

Using tfkit-train for model training, you can use

Before training a model, there is something you need to clarify:

--model what is your model to handle this task? check here to the detail of models.
--config what pretrained model you want to use？ you can go https://huggingface.co/models to search for available pretrained models.
--train and --test training and testing dataset path, which is in csv format.
--savedir model saving directory, default will be in '/checkpoints' folder

you can leave the rest to the default config, or use tfkit-train -h to more configuration.

An example about training a sentiment classifier:

tfkit-train \
--model clas \
--config xlm-roberta-base \
--train training_data.csv \
--test testing_data.csv \
--lr 4e-5 \
--maxlen 384 \
--epoch 10 \
--savedir roberta_sentiment_classificer

Third step - model eval¶

Using tfkit-eval for model evaluation.
- --model saved model's path.
- --metric the evaluation metric eg: emf1, nlg(BLEU/ROUGE), clas(confusion matrix).
- --valid validation data, also in csv format.
- --panel a input panel for model specific parameter.

for more configuration detail, you may use tfkit-eval -h.

After evaluate, It will print evaluate result in your console, and also generate three report for debugging.
- *_score.csv overall score, it is the copy of the console result.
- *each_data_score.csv score on each data, 3 column predicted,targets,score, ranked from the lowest to the highest.
- *predicted.csv csv file include 3 column input,predicted,targets.

Hint

nlp2go is a tool for demonstration, with CLI and Restful interface. check here

Example¶

Use distilbert to train NER Model¶

nlprep --dataset tag_clner  --outdir ./clner_row --util s2t
tfkit-train --batch 10 --epoch 3 --lr 5e-6 --train ./clner_row/train --test ./clner_row/test --maxlen 512 --model tag --config distilbert-base-multilingual-cased 
nlp2go --model ./checkpoints/3.pt  --cli

Use Albert to train DRCD Model Model¶

nlprep --dataset qa_zh --outdir ./zhqa/   
tfkit-train --maxlen 512 --savedir ./drcd_qa_model/ --train ./zhqa/drcd-train --test ./zhqa/drcd-test --model qa --config voidful/albert_chinese_small  --cache
nlp2go --model ./drcd_qa_model/3.pt --cli

Use Albert to train both DRCD Model and NER Model¶

nlprep --dataset tag_clner  --outdir ./clner_row --util s2t
nlprep --dataset qa_zh --outdir ./zhqa/ 
tfkit-train --maxlen 300 --savedir ./mt-qaner --train ./clner_row/train ./zhqa/drcd-train --test ./clner_row/test ./zhqa/drcd-test --model tag qa --config voidful/albert_chinese_small
nlp2go --model ./mt-qaner/3.pt --cli

You can also try tfkit in Google Colab:

Contributing¶

Thanks for your interest.There are many ways to contribute to this project. Get started here.

License¶

PyPI - License

License

Icons reference¶

Icons modify from Freepik from www.flaticon.com
Icons modify from Nikita Golubev from www.flaticon.com