Category Archives: Technology

Sinatra Drinks Coffee: Server-Sent Events w/ Sinatra, Thin, and CoffeeScript

Through the pursuit of experimenting with real-time digital collaboration, I’ve been exploring server-sent events (SSE) with a Ruby web development DSL called Sinatra, backed by a lightweight web server called Thin. Check out a live demo at sinatra-drinks-coffee.herokuapp.com (open it in multiple browser windows to see the full effect). Then, explore the code on github. It’s very straight-forward, but feel free to ask questions in the comments.

November 1964 --- American singer and actor Frank Sinatra. --- Image by © John Bryson/Sygma/Corbis

November 1964 — American singer and actor Frank Sinatra. — Image by © John Bryson/Sygma/Corbis

People are more likely to stay engaged in digital collaboration if they can observe each other’s actions in real-time. In order for people to observe each other’s actions, objects and their states must rapidly synchronize across all connected and authorized clients. This need has traditionally been a complicated problem given that the internet is stateless. The internet protocol (IP) was designed to treat data as independent pairs of requests and responses; not as continuous bi-directional streams. However, the perception of persistent connections between servers and clients are necessary for real-time collaboration.

Even more fun than twisting the original architecture of the internet is the need for reconciling conflicts that arise when more than one person changes a document’s state. If you have ever used Google Docs, you have used a web service that supports real-time collaboration using operational transformation control (integration) algorithms.

In spite of the challenges and limitations of real-time collaboration, I anticipate that people will expect these tools to support an ever-expanding range of activities in the next few years. Especially in engineering, design and architecture.

Here’s a good explanation of how Google reconciles conflicts in Google Docs: http://googledocs.blogspot.com/2010/09/whats-different-about-new-google-docs_22.html

Server-Sent Events Vs. Web Sockets
http://stackoverflow.com/questions/5195452/websockets-vs-server-sent-events-eventsource/5326159#5326159

Comprehensive Explanation

Named Entity Recognition with Stanford NER and Ruby

Is Named Entity Recognition a “solved problem”?

You know that feeling you get when a computer acts even the slightest bit human? I felt it the first time I realized my computer could recognize people’s names, the names of locations, and company names in text: It’s called Named Entity Recognition (NER). The accuracy and reliability of NER varies depending on the trained language models and domain contexts. Some call NER a “solved problem” and others say it is far from being solved. I think this all depends on user expectations, the purpose for using it, and the quality of the models used for NER tasks.

Quickly testing NER across multiple domain contexts

I put together this short tutorial as a demonstration of Stanford’s NER Server and Ruby. In order to quickly test NER tasks across a variety of domain contexts, we’ll be using web URLs as data sources for processing.

Getting started

Clone ‘ruby-ner’ from github
$ git clone https://github.com/mblongii/ruby-ner.git
$ cd ruby-ner

Download the Stanford Named Entity Recognizer (NER) software
$ curl -O http://nlp.stanford.edu/software/stanford-ner-2012-04-07.tgz
$ tar xvfz stanford-ner-2012-04-07.tgz

Run NER as a server on port 8080
$ java -mx1000m -cp stanford-ner-2012-04-07/stanford-ner.jar edu.stanford.nlp.ie.NERServer -loadClassifier stanford-ner-2012-04-07/classifiers/english.muc.7class.distsim.crf.ser.gz -port 8080 -outputFormat inlineXML &

Install required ruby gems
$ bundle install

Run the ruby script
$ ruby get_named_entities.rb

LOCATIONS:
San Francisco
ORGANIZATIONS:
Google
PEOPLE:
Mike Long
DATES:
May 29th

Try passing a URL to the script
$ ruby get_named_entities.rb http://cnn.com

LOCATIONS:
Illinois
ORGANIZATIONS:
FBI
PERCENTS:
-0.59 %
MONEY:
$90M
PEOPLE:
Estrella Carrera
DATES:
Saturday

How it works

NER server loads the model english.muc.7class.distsim.crf.ser.gz which was trained across a variety of corpora and is fairly robust across domains. The entity classes trained into this seven class model include: location, time, organization, percent, money, person, and date.

Take a look at get_named_entities_script.rb and feel free to give feedback or ask questions :)

What are some interesting uses of Named Entity Recognition?