Gravitee BERT PII (Personally Identifiable Information extraction)
This application uses the gravitee-io/bert-small-pii-detection model for Named Entity Recognition (NER) to detect personally identifiable information. The model uses token classification with BIO tagging to identify predefined entity types including names, addresses, financial information, and more.
The BERT models can detect the following entity types:
Personal Information:
- PERSON (names)
- AGE
- PHONE_NUMBER
- EMAIL_ADDRESS
Location & Address:
- LOCATION
- COORDINATE
Financial:
- CREDIT_CARD
- IBAN_CODE
- FINANCIAL
- US_BANK_NUMBER
Government IDs:
- US_SSN (Social Security Number)
- US_DRIVER_LICENSE
- US_PASSPORT
- US_ITIN
- US_LICENSE_PLATE
- NRP (National Registration Number)
Technical:
- IP_ADDRESS
- MAC_ADDRESS
- URL
- IMEI
- PASSWORD
Other:
- DATE_TIME
- ORGANIZATION
- TITLE
Installation
To use this model, install the required dependencies:
pip install transformers optimum[onnxruntime] torch
Usage
Load the model using the Optimum library for ONNX Runtime:
from optimum.onnxruntime import ORTModelForTokenClassification
from transformers import AutoTokenizer
model_path = "gravitee-io/bert-small-pii-detection"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = ORTModelForTokenClassification.from_pretrained(model_path, file_name="model.onnx")
text = "John Doe lives at 123 Main St and his email is john@example.com"
inputs = tokenizer(text, return_tensors="pt", return_offsets_mapping=True)
outputs = model(**inputs)
0 1
Auto-format JSON, XML, HTML, SQL with proper indentation
Examples
Text input | Confidence Threshold | Data Type |
---|
Pages: