仓库源文站点原文


layout: post title: "Document Analysis Notes 1" date: 2022-7-27 11:58:00 +0800

categories: [Notes, Document Analysis]

Information Retrieval (IR)

Natural Language Processing (NLP)

Extract information from text
Generate new text

Challenges

Typical IR and NLP Pipeline

Sentence Splitting -> Tokenisation -> Stemming -> Parsing -> Semantic Analysis