Linguistics |
The scientific study of language and its structure, including the study of morphology, syntax, phonetics, and semantics. |
Computational |
The field concerned with studying the design and behavior of algorithms that are capable of efficiently manipulating very large amounts of data, and using such algorithmic techniques to understand and solve real-world problems. |
Natural Language |
Any language that has evolved naturally in humans through use and be spoken or written, like English, Spanish, French, etc. |
Machine Learning |
An application of Artificial Intelligence and includes a self-improving algorithm that helps a system learn from data input without being specifically programmed. |
Parsing |
The process of analyzing a sentence into its constituent parts and describing their syntactic roles |
Algorithm |
A set of instructions designed to perform a specific task that, given an initial state and additional inputs, will produce an output after progressing through a finite number of steps, specified in a clear, executable programming or mathematical description |
Part-of-Speech Tagging |
The process of assigning a part of speech to a word according to its definition, use, and its neighboring and related words |
Corpus |
A large and structured set of texts (written or spoken) that is used to do a specific linguistic analysis or study, which can be held in electronic format to allow for computational analysis |
Probabilistic |
An approach to modeling and prediction that involves using probabilities and statistical techniques to analyze large data sets, often involving techniques such as Bayesian networks, Markov models or Hidden Markov models |
Parsing trees |
A diagram representing the syntactic structure of a sentence or string according to a formal grammar, showing the constituent phrases or clauses and their hierarchical relationships |
Semantics |
The meaning of words, sentences, or texts |
Lexicon |
A collection or dictionary of words, phrases, or terminologies that are specific to a language or field, usually with brief definitions or explanations. |
Sentence boundary |
The points or mark indicating the beginning or end of a sentence |
Text mining |
The process of exploring and analyzing large amounts of data or texts to extract useful patterns and information using artificial intelligence, statistical methods, and computational linguistics |
Named Entity Recognition |
An application of machine learning in which an algorithm is trained to identify and classify named entities into pre-defined categories, such as person names, organization names, location names, time, quantities, etc. |
Natural Language Generation |
The use of artificial intelligence to convert the structured data into spoken or written language that appears to have been written or spoken by a human |
Tokenization |
The process of breaking down or slicing a text into smaller parts such as words, phrases or symbols |
Stemming |
The process of reducing words or phrases to their base, root or stem form |
Syntactic structure analysis |
A process of automatically determining the syntactic structure of a text and represented it in a standard format like the phrase structure tree or dependency parse tree |
Text-to-Speech |
The conversion of written or text-based communication into voice or spoken language |