Stanford University NLP researchers have built Stanza, a multi-human language tool kit. This is certainly worth a look for those working with text from many locales, such as social media. It contains support for running various accurate natural language processing tools on 60+ languages and for accessing the Java Stanford CoreNLP software from Python.
It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech and morphological features, to give a syntactic structure dependency parse, and to recognize named entities.
The modules of Stanza are built on top of the PyTorch library. It is built with highly accurate neural network components that also enable efficient training and evaluation with your own annotated data. It is possible to get a much faster performance if Stanza is run on a GPU-enabled machine.
- Minimum efforts required for setting up Native Python implementation;
- Full neural network pipeline for robust text analytics, including tokenization, multi-word token (MWT) expansion, lemmatization, part-of-speech (POS) and morphological features tagging, dependency parsing, and named entity recognition;
- Pre-trained neural models supporting 66 (human) languages.
pip install stanza
To see Stanza’s neural pipeline in action, you can launch the Python interactive interpreter, and try the following commands:
>>> import stanza
>>> stanza.download('en') # download English model
>>> nlp = stanza.Pipeline('en') # initialize English neural pipeline
>>> doc = nlp("Barack Obama was born in Hawaii.") # run annotation over a sentence
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.