Is Named Entity Recognition a “solved problem”?
You know that feeling you get when a computer acts even the slightest bit human? I felt it the first time I realized my computer could recognize people’s names, the names of locations, and company names in text: It’s called Named Entity Recognition (NER). The accuracy and reliability of NER varies depending on the trained language models and domain contexts. Some call NER a “solved problem” and others say it is far from being solved. I think this all depends on user expectations, the purpose for using it, and the quality of the models used for NER tasks.
Quickly testing NER across multiple domain contexts
I put together this short tutorial as a demonstration of Stanford’s NER Server and Ruby. In order to quickly test NER tasks across a variety of domain contexts, we’ll be using web URLs as data sources for processing.
Getting started
Clone ‘ruby-ner’ from github
$ git clone https://github.com/mblongii/ruby-ner.git
$ cd ruby-ner
Download the Stanford Named Entity Recognizer (NER) software
$ curl -O http://nlp.stanford.edu/software/stanford-ner-2012-04-07.tgz
$ tar xvfz stanford-ner-2012-04-07.tgz
Run NER as a server on port 8080
$ java -mx1000m -cp stanford-ner-2012-04-07/stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier stanford-ner-2012-04-07/classifiers/english.muc.7class.distsim.crf.ser.gz -port 8080 -outputFormat inlineXML &
Install required ruby gems
$ bundle install
Run the ruby script
$ ruby get_named_entities.rb
LOCATIONS:
San Francisco
ORGANIZATIONS:
PEOPLE:
Mike Long
DATES:
May 29th
Try passing a URL to the script
$ ruby get_named_entities.rb http://cnn.com
LOCATIONS:
Illinois
ORGANIZATIONS:
FBI
PERCENTS:
-0.59 %
MONEY:
$90M
PEOPLE:
Estrella Carrera
DATES:
Saturday
How it works
NER server loads the model english.muc.7class.distsim.crf.ser.gz which was trained across a variety of corpora and is fairly robust across domains. The entity classes trained into this seven class model include: location, time, organization, percent, money, person, and date.
Take a look at get_named_entities_script.rb and feel free to give feedback or ask questions
What are some interesting uses of Named Entity Recognition?
Awesome. Tried it out. Worked nicely on a corpus of posts from Facebook. However, Organizations and People did get confused. Great for a beginner, tho.