The Penn Treebank Tagset

22.12.2020


The Penn Treebank Part-of-Speech tagset is as given in this table

CCCoordination conjunctionand,but,or
CDCardinal numberone,two
EXExistential 'there'there
FWForeign wordmea culpa
INPreposition /subordin conjunctionof,in,by
JJRComparative adjectivesmaller
JJSSuperlative adjectivenicest
LSList marker1)
NNNoun, singular or masstable
NNSNoun pluralcars
NPProper noun, singularMartin
NPSProper noun, pluralVikings
PDTPredeterminerBoth the girls
POSPossessive endingfriend's
PPPersonal pronounI, he, it
PPZPossessive pronounmy, his
RBAdverbhowever, usually, naturally, here, good
RBRAdverb comparativebetter
RBSAdverb superlativebest
RPParticlegive up
SENTSentence-break punctuation.!?
TOInfinite "to"togo
VBVerb be, base formbe
VBDVerb be, past tensewas, were
VBGVerb be, gerund/present participlebeen
VBNVerb be, past participlebeen
VBZVerb be,third person sing. presentis
VHVerb have, base formhave
VHDVerb have, past tensehad
VHGVerb have, gerund/present participlehaving
VHNVerb have, past participlehad
VHPVerb have, sing. present, non-3dhave
VHZVerb have, third person sing. presenthas
VVVerb, base formtake
VVDVerb, past tensetook
VVGVerb, gerundt/present participletaking
VVNVerb, past participletaken
VVPVerb, sing. present, non-3dtake
VVZVerb, 3rd person sing. presenttakes
WPWh-pronounwho, what
WP$Possessive wh-pronounwhose
WRBWh-abverbwhere, when
"Quotation marks'"
``Opening quotation marks'"
(Opening bracket({
)Closing bracket})

This tag set is used by the nltk.pos_tag() method. As illustrated:

import re import nltk # download a tagger'averaged_perceptron_tagger') # define some sentence sent1 = "Time flies like an arrow, but fruit flies like a banana." # define some regex for tokenization rex1 = "[A-Za-z0-9]+|[,.]" # tokenize with regular expression sent1_tok = re.findall(rex1,sent1) # print the resulting pos tags print(nltk.pos_tag(sent1_tok))

Resulting in

[('Time', 'NNP'), ('flies', 'NNS'), ('like', 'IN'), ('an', 'DT'), ('arrow', 'NN'), (',', ','), ('but', 'CC'), ('fruit', 'JJ'), ('flies', 'NNS'), ('like', 'IN'), ('a', 'DT'), ('banana', 'NN'), ('.', '.')]
