German-NER-BERT

German NER on Legal Data using BERT

View the Project on GitHub harshildarji/German-NER-BERT

German NER using BERT

This project consist of the following tasks:

  1. Fine-tune German BERT on Legal Data,
  2. Create a minimal front-end that accepts a German sentence and shows its NER analysis.

2. Execute the minimal front-end

To run this project on localhost, follow these simple steps:

  1. Create a virtual enviroment using:
    conda create -n german_bert_ner python=3.9
    
  2. Activate this virtual enviroment:
    conda activate german_bert_ner
    
  3. Clone this repo:
    git clone https://github.com/harshildarji/German-NER-BERT.git
    
  4. cd to repo:
    cd German-NER-BERT
    
  5. Install required packages using:
    pip3 install -r requirements.txt
    
  6. Next, we need three important files; model.pt, tag_values.pkl, and tokenizer.pkl. One can either generate these files by executing through german_bert_ner.ipynb which will take 45-60 minutes or download the latest versions of these files from my DropBox using:
    wget https://www.dropbox.com/s/vos8pqwmlbqe0wf/model.pt
    wget https://www.dropbox.com/s/u2oojgmmprt0a9d/tag_values.pkl
    wget https://www.dropbox.com/s/uj15pab78emefoq/tokenizer.pkl
    
  7. Once above-mentioned files are generated/downloaded, run app.py as:
    python3 app.py
    
  8. Once app.py is successfully executed, head over to http://localhost:5000/.

  9. In the provided text-area, input a German (law) sentence, for example: 1. Das Bundesarbeitsgericht ist gemäß § 9 Abs. 2 Satz 2 ArbGG iVm. § 201 Abs. 1 Satz 2 GVG für die beabsichtigte Klage gegen den Bund zuständig .

  10. Final output:

German BERT NER Example

References:

  1. Leitner, Elena, Georg Rehm, and Julián Moreno-Schneider. “A dataset of german legal documents for named entity recognition.” arXiv preprint arXiv:2003.13016 (2020).