pathterminuspages/language/aboutcontactabout me

The Penn Treebank Tagset

22.12.2020 | Processing/POS Tagging/Tag Sets

Contents/Index

@The Penn Treebank Tagset

The Penn Treebank Part-of-Speech tagset is as given in this table

TagDescriptionExample
CCCoordination conjunctionand,but,or
CDCardinal numberone,two
DTDeterminera,the
EXExistential 'there'there
FWForeign wordmea culpa
INPreposition /subordin conjunctionof,in,by
JJAdjectivetall
JJRComparative adjectivesmaller
JJSSuperlative adjectivenicest
LSList marker1)
MDModelcould,will
NNNoun, singular or masstable
NNSNoun pluralcars
NPProper noun, singularMartin
NPSProper noun, pluralVikings
PDTPredeterminerBoth the girls
POSPossessive endingfriend's
PPPersonal pronounI, he, it
PPZPossessive pronounmy, his
RBAdverbhowever, usually, naturally, here, good
RBRAdverb comparativebetter
RBSAdverb superlativebest
RPParticlegive up
SENTSentence-break punctuation.!?
SYMSymbol/[=*
TOInfinite "to"togo
UHInterjectionuhhuhhuhh
VBVerb be, base formbe
VBDVerb be, past tensewas, were
VBGVerb be, gerund/present participlebeen
VBNVerb be, past participlebeen
VBZVerb be,third person sing. presentis
VHVerb have, base formhave
VHDVerb have, past tensehad
VHGVerb have, gerund/present participlehaving
VHNVerb have, past participlehad
VHPVerb have, sing. present, non-3dhave
VHZVerb have, third person sing. presenthas
VVVerb, base formtake
VVDVerb, past tensetook
VVGVerb, gerundt/present participletaking
VVNVerb, past participletaken
VVPVerb, sing. present, non-3dtake
VVZVerb, 3rd person sing. presenttakes
WDTWh-determinerwhich
WPWh-pronounwho, what
WP$Possessive wh-pronounwhose
WRBWh-abverbwhere, when
###
$$$
"Quotation marks'"
``Opening quotation marks'"
(Opening bracket({
)Closing bracket})
,Comma,
:Punctuation-;:...

This tag set is used by the nltk.pos_tag() method. As illustrated:

import re import nltk # download a tagger nltk.download('averaged_perceptron_tagger') # define some sentence sent1 = "Time flies like an arrow, but fruit flies like a banana." # define some regex for tokenization rex1 = "[A-Za-z0-9]+|[,.]" # tokenize with regular expression sent1_tok = re.findall(rex1,sent1) # print the resulting pos tags print(nltk.pos_tag(sent1_tok))

Resulting in

[('Time', 'NNP'), ('flies', 'NNS'), ('like', 'IN'), ('an', 'DT'), ('arrow', 'NN'), (',', ','), ('but', 'CC'), ('fruit', 'JJ'), ('flies', 'NNS'), ('like', 'IN'), ('a', 'DT'), ('banana', 'NN'), ('.', '.')]
CommentsGuest Name:Comment: