Skip to content
Snippets Groups Projects
Commit 7ec0ab60 authored by Franziska Oschmann's avatar Franziska Oschmann
Browse files

Update of README.md

parent 02c0d772
No related branches found
No related tags found
1 merge request!2Dev train models
......@@ -11,7 +11,12 @@ pip install -r requirements.txt
## Tensorflow
## PyTorch
```
module load gcc/8.2.0 python_gpu/3.10.4 eth_proxy
python -m venv --system-site-packages pp_env_tf_python310
source pp_env_tf_python310/bin/activate
pip install -r requirements.txt
```
# Activation of environment
......@@ -22,13 +27,6 @@ source pp_env/bin/activate
## On Euler
### PyTorch
```
srun --pty --mem-per-cpu=3g --gpus=1 --gres=gpumem:12g bash
module load gcc/8.2.0 python_gpu/3.11.2 eth_proxy
source pp_env_torch/bin/activate
```
### TensorFlow
```
srun --pty --mem-per-cpu=3g --gpus=1 --gres=gpumem:12g bash
......@@ -45,12 +43,66 @@ moderation_classifier --prepare_data path_to_csv
## 2. Model training
### PyTorch
For the model training several option can be chosen:
```
moderation_classifier --train_bert_torch data/tamedia_for_classifier_v2_preproc.csv
Usage: moderation_classifier [OPTIONS] INPUT_DATA
Run moderation classifier.
:param split_data: Binary flag to specify if data should be split.
:param prepare_data: Binary flag to specify if data should be prepared.
:param text_preprocessing: Binary flag to set text preprocessing.
:param newspaper: Name of newspaper selected for training.
:param topic: Topic selected for training.
:param train_mnb: Binary flag to specify whether MNB should be trained.
:param train_bert: Binary flag to specify whether BERT should be trained.
:param eval_mnb: Binary flag to specify whether MNB should be evaluated.
:param eval_bert: Binary flag to specify whether BERT should be evaluated.
:param input_data: Path to input dataframe.
Options:
-s, --split
-p, --prepare_data
-tp, --text_preprocessing
-n, --newspaper TEXT
-t, --topic TEXT
-tm, --train_mnb
-tb, --train_bert
-em, --eval_mnb
-eb, --eval_bert
-tbto, --train_bert_torch
```
### TensorFlow
The most important options during training are the model type (MNB or BERT) and the newspaper and topic selected for training.
### MNB
Training for all newspapers and topics is started with the following command:
```
moderation_classifier --train_mnb INPUT_DATA
```
Training for one newspapers (here: tagesanzeiger) and one topic (here: Wissen) is started with the following command:
```
moderation_classifier --newspaper tagesanzeiger --topic Wissen --train_mnb INPUT_DATA
```
After the training is finished a log-file with all relevant information (path to train data, params for filtering, ..) is stored in `saved_models/MNB_logs`. For the evaluation of the training only the path to this log-file is needed. The evaluation of the training run is started with:
```
moderation_classifier --eval_mnb LOG_FILE
```
### BERT
Training for all newspapers and topics is started with the following command:
```
moderation_classifier --train_bert INPUT_DATA
```
Training for one newspapers (here: tagesanzeiger) and one topic (here: Wissen) is started with the following command:
```
moderation_classifier --newspaper tagesanzeiger --topic Wissen --train_bert INPUT_DATA
```
After the training is finished a log-file with all relevant information (path to train data, params for filtering, ..) is stored in `saved_models/BERT_logs`. For the evaluation of the training only the path to this log-file is needed. The evaluation of the training run is started with:
```
moderation_classifier --train_bert data/tamedia_for_classifier_v2_preproc.csv
moderation_classifier --eval_bert LOG_FILE
```
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment