This chapter will introduce you to the basics of text processing with spaCy. You'll learn about the data structures, how to work with trained pipelines, and how to use them to predict linguistic features in your text.
In this chapter, you'll use your new skills to extract specific information from large volumes of text. You'll learn how to make the most of spaCy's data structures, and how to effectively combine statistical and rule-based approaches for text analysis.
This chapter will show you everything you need to know about spaCy's processing pipeline. You'll learn what goes on under the hood when you process a text, how to write your own components and add them to the pipeline, and how to use custom attributes to add your own metadata to the documents, spans and tokens.
In this chapter, you'll learn how to update spaCy's statistical models to customize them for your use case – for example, to predict a new entity type in online comments. You'll train your own model from scratch, and understand the basics of how training works, along with tips and tricks that can make your custom NLP projects more successful.