diff --git a/README.md b/README.md index 9d42efacb5b654a9b2e7e13d9f0818323d39fabc..2c9236db006335d3b89e2df0b0614923e2da1085 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,12 @@ pip install -r requirements.txt ## Tensorflow -## PyTorch +``` +module load gcc/8.2.0 python_gpu/3.10.4 eth_proxy +python -m venv --system-site-packages pp_env_tf_python310 +source pp_env_tf_python310/bin/activate +pip install -r requirements.txt +``` # Activation of environment @@ -22,13 +27,6 @@ source pp_env/bin/activate ## On Euler -### PyTorch -``` -srun --pty --mem-per-cpu=3g --gpus=1 --gres=gpumem:12g bash -module load gcc/8.2.0 python_gpu/3.11.2 eth_proxy -source pp_env_torch/bin/activate -``` - ### TensorFlow ``` srun --pty --mem-per-cpu=3g --gpus=1 --gres=gpumem:12g bash @@ -45,12 +43,66 @@ moderation_classifier --prepare_data path_to_csv ## 2. Model training -### PyTorch +For the model training several option can be chosen: + ``` -moderation_classifier --train_bert_torch data/tamedia_for_classifier_v2_preproc.csv +Usage: moderation_classifier [OPTIONS] INPUT_DATA + + Run moderation classifier. + :param split_data: Binary flag to specify if data should be split. + :param prepare_data: Binary flag to specify if data should be prepared. + :param text_preprocessing: Binary flag to set text preprocessing. + :param newspaper: Name of newspaper selected for training. + :param topic: Topic selected for training. + :param train_mnb: Binary flag to specify whether MNB should be trained. + :param train_bert: Binary flag to specify whether BERT should be trained. + :param eval_mnb: Binary flag to specify whether MNB should be evaluated. + :param eval_bert: Binary flag to specify whether BERT should be evaluated. + :param input_data: Path to input dataframe. + +Options: + -s, --split + -p, --prepare_data + -tp, --text_preprocessing + -n, --newspaper TEXT + -t, --topic TEXT + -tm, --train_mnb + -tb, --train_bert + -em, --eval_mnb + -eb, --eval_bert + -tbto, --train_bert_torch ``` -### TensorFlow +The most important options during training are the model type (MNB or BERT) and the newspaper and topic selected for training. + +### MNB +Training for all newspapers and topics is started with the following command: +``` +moderation_classifier --train_mnb INPUT_DATA +``` + +Training for one newspapers (here: tagesanzeiger) and one topic (here: Wissen) is started with the following command: +``` +moderation_classifier --newspaper tagesanzeiger --topic Wissen --train_mnb INPUT_DATA +``` + +After the training is finished a log-file with all relevant information (path to train data, params for filtering, ..) is stored in `saved_models/MNB_logs`. For the evaluation of the training only the path to this log-file is needed. The evaluation of the training run is started with: +``` +moderation_classifier --eval_mnb LOG_FILE +``` + +### BERT +Training for all newspapers and topics is started with the following command: +``` +moderation_classifier --train_bert INPUT_DATA +``` + +Training for one newspapers (here: tagesanzeiger) and one topic (here: Wissen) is started with the following command: +``` +moderation_classifier --newspaper tagesanzeiger --topic Wissen --train_bert INPUT_DATA +``` + +After the training is finished a log-file with all relevant information (path to train data, params for filtering, ..) is stored in `saved_models/BERT_logs`. For the evaluation of the training only the path to this log-file is needed. The evaluation of the training run is started with: ``` -moderation_classifier --train_bert data/tamedia_for_classifier_v2_preproc.csv +moderation_classifier --eval_bert LOG_FILE ```