Project: Part of Speech Tagging

Published by duyanh on

Introduction

In this Project, you’ll use the Pomegranate library to build a hidden Markov model for part of speech tagging with a universal tagset. Hidden Markov models have been able to achieve >96% tag accuracy with larger tagsets on realistic text corpora. Hidden Markov models have also been used for speech recognition and speech generation, machine translation, gene recognition for bioinformatics, and human gesture recognition for computer vision, and more.

The notebook already contains some code to get you started. You only need to add some new functionality in the areas indicated to complete the project; you will not need to modify the included code beyond what is requested. Sections that begin with ‘IMPLEMENTATION’ in the header indicate that you must provide code in the block that follows. Instructions will be provided for each section, and the specifics of the implementation are marked in the code block with a 'TODO' statement. Please be sure to read the instructions carefully!

NOTE: There is an optional warmup exercise to introduce the Pomegranate API included in the project files. Just launch the HMM warmup (optional).ipynb file first to get started there, then complete the hmm tagger.ipynb notebook. (Only the tagger will be submitted for review.)

The project notebooks can be found at http://14.232.166.121:8880 > andy > HMM-tagger


Leave a Reply

Your email address will not be published. Required fields are marked *