The Penn Treebank Part-of-Speech tagset is as given in this table
Tag | Description | Example |
CC | Coordination conjunction | and,but,or |
CD | Cardinal number | one,two |
DT | Determiner | a,the |
EX | Existential 'there' | there |
FW | Foreign word | mea culpa |
IN | Preposition /subordin conjunction | of,in,by |
JJ | Adjective | tall |
JJR | Comparative adjective | smaller |
JJS | Superlative adjective | nicest |
LS | List marker | 1) |
MD | Model | could,will |
NN | Noun, singular or mass | table |
NNS | Noun plural | cars |
NP | Proper noun, singular | Martin |
NPS | Proper noun, plural | Vikings |
PDT | Predeterminer | Both the girls |
POS | Possessive ending | friend's |
PP | Personal pronoun | I, he, it |
PPZ | Possessive pronoun | my, his |
RB | Adverb | however, usually, naturally, here, good |
RBR | Adverb comparative | better |
RBS | Adverb superlative | best |
RP | Particle | give up |
SENT | Sentence-break punctuation | .!? |
SYM | Symbol | /[=* |
TO | Infinite "to" | togo |
UH | Interjection | uhhuhhuhh |
VB | Verb be, base form | be |
VBD | Verb be, past tense | was, were |
VBG | Verb be, gerund/present participle | been |
VBN | Verb be, past participle | been |
VBZ | Verb be,third person sing. present | is |
VH | Verb have, base form | have |
VHD | Verb have, past tense | had |
VHG | Verb have, gerund/present participle | having |
VHN | Verb have, past participle | had |
VHP | Verb have, sing. present, non-3d | have |
VHZ | Verb have, third person sing. present | has |
VV | Verb, base form | take |
VVD | Verb, past tense | took |
VVG | Verb, gerundt/present participle | taking |
VVN | Verb, past participle | taken |
VVP | Verb, sing. present, non-3d | take |
VVZ | Verb, 3rd person sing. present | takes |
WDT | Wh-determiner | which |
WP | Wh-pronoun | who, what |
WP$ | Possessive wh-pronoun | whose |
WRB | Wh-abverb | where, when |
# | # | # |
$ | $ | $ |
" | Quotation marks | '" |
`` | Opening quotation marks | '" |
( | Opening bracket | ({ |
) | Closing bracket | }) |
, | Comma | , |
: | Punctuation | -;:... |
This tag set is used by the nltk.pos_tag() method. As illustrated:
import re import nltk # download a tagger nltk.download('averaged_perceptron_tagger') # define some sentence sent1 = "Time flies like an arrow, but fruit flies like a banana." # define some regex for tokenization rex1 = "[A-Za-z0-9]+|[,.]" # tokenize with regular expression sent1_tok = re.findall(rex1,sent1) # print the resulting pos tags print(nltk.pos_tag(sent1_tok))Resulting in
[('Time', 'NNP'), ('flies', 'NNS'), ('like', 'IN'), ('an', 'DT'), ('arrow', 'NN'), (',', ','), ('but', 'CC'), ('fruit', 'JJ'), ('flies', 'NNS'), ('like', 'IN'), ('a', 'DT'), ('banana', 'NN'), ('.', '.')]