Update of README.md

7ec0ab60 · Franziska Oschmann · 02c0d772 · 7ec0ab60
Commit 7ec0ab60 authored 1 year ago by Franziska Oschmann
--- a/README.md
+++ b/README.md
@@ -11,7 +11,12 @@ pip install -r requirements.txt

 ## Tensorflow

-## PyTorch
+```
+module load gcc/8.2.0 python_gpu/3.10.4 eth_proxy
+python -m venv --system-site-packages pp_env_tf_python310
+source pp_env_tf_python310/bin/activate
+pip install -r requirements.txt
+```

 # Activation of environment 

@@ -22,13 +27,6 @@ source pp_env/bin/activate

 ## On Euler

-### PyTorch
-```
-srun --pty --mem-per-cpu=3g --gpus=1 --gres=gpumem:12g bash
-module load gcc/8.2.0 python_gpu/3.11.2 eth_proxy
-source pp_env_torch/bin/activate
-```
-
 ### TensorFlow
 ```
 srun --pty --mem-per-cpu=3g --gpus=1 --gres=gpumem:12g bash
@@ -45,12 +43,66 @@ moderation_classifier --prepare_data path_to_csv

 ## 2. Model training

-### PyTorch
+For the model training several option can be chosen:
+
 ```
-moderation_classifier --train_bert_torch data/tamedia_for_classifier_v2_preproc.csv
+Usage: moderation_classifier [OPTIONS] INPUT_DATA
+
+  Run moderation classifier. 
+    :param split_data: Binary flag to specify if data should be split.
+    :param prepare_data: Binary flag to specify if data should be prepared.
+    :param text_preprocessing: Binary flag to set text preprocessing.
+    :param newspaper: Name of newspaper selected for training.
+    :param topic: Topic selected for training.
+    :param train_mnb: Binary flag to specify whether MNB should be trained.
+    :param train_bert: Binary flag to specify whether BERT should be trained.
+    :param eval_mnb: Binary flag to specify whether MNB should be evaluated.
+    :param eval_bert: Binary flag to specify whether BERT should be evaluated.
+    :param input_data: Path to input dataframe.
+
+Options:
+  -s, --split
+  -p, --prepare_data
+  -tp, --text_preprocessing
+  -n, --newspaper TEXT
+  -t, --topic TEXT
+  -tm, --train_mnb
+  -tb, --train_bert
+  -em, --eval_mnb
+  -eb, --eval_bert
+  -tbto, --train_bert_torch
 ```

-### TensorFlow
+The most important options during training are the model type (MNB or BERT) and the newspaper and topic selected for training.
+
+### MNB
+Training for all newspapers and topics is started with the following command:
+```
+moderation_classifier --train_mnb INPUT_DATA
+```
+
+Training for one newspapers (here: tagesanzeiger) and one topic (here: Wissen) is started with the following command:
+```
+moderation_classifier --newspaper tagesanzeiger --topic Wissen --train_mnb INPUT_DATA
+```
+
+After the training is finished a log-file with all relevant information (path to train data, params for filtering, ..) is stored in `saved_models/MNB_logs`. For the evaluation of the training only the path to this log-file is needed. The evaluation of the training run is started with:
+```
+moderation_classifier --eval_mnb LOG_FILE
+```
+
+### BERT
+Training for all newspapers and topics is started with the following command:
+```
+moderation_classifier --train_bert INPUT_DATA
+```
+
+Training for one newspapers (here: tagesanzeiger) and one topic (here: Wissen) is started with the following command:
+```
+moderation_classifier --newspaper tagesanzeiger --topic Wissen --train_bert INPUT_DATA
+```
+
+After the training is finished a log-file with all relevant information (path to train data, params for filtering, ..) is stored in `saved_models/BERT_logs`. For the evaluation of the training only the path to this log-file is needed. The evaluation of the training run is started with:
 ```
-moderation_classifier --train_bert data/tamedia_for_classifier_v2_preproc.csv
+moderation_classifier --eval_bert LOG_FILE
 ```