NLP Fundamentals For Absolute Beginners


What is natural language processing, What Is NLU & NLG?

Language has been the most powerful tool which has made us human more efficient & effective living being when it comes to communicating anything meaningful. It is our ability to perceive anything external as images and giving expression to that visual memories through voice, based on the understanding of languages, which we have learned since ages, that makes us special & unique from other living species on this mother earth.

Being alive in this 21st century is a real boon and largely this has been because of the technological advancement we have made in the field of voice, text, and image processing. Now we have trained our powerful gadgets and machines to have their own virtual intelligence to understand and execute the human originated task without much human intervention. Natural language processing has been one such technological advancement that has empowered our machines to understand and communicate the language we humans speak in a much effective and efficient manner.

So I have decided to cover this very important field of Artificial intelligence and deep learning in the form of the NLP series.

Let’s get started!

What Is NLP?

As we discussed at the very advent of this article that we humans have this ingenious ability to express anything through language and we have mastered the art of representing any feelings and sentiments through text, which has lead to the seamless communication between humans. Now, this ability to understand the text and communicate back is what our computers and gadgets are learning in this modern 21st century by the use of Natural language processing techniques.

NLP, is a field of computer science concerned with language understanding and language generation between a machine and a human being.

It combines the power of linguistics and computer science to study the rules and structure of language, and create intelligent systems capable of understanding, analyzing, and extracting meaning from text and speech.

It is fundamentally a computer-human interaction mechanism through which our mobile phones, computers have developed the ability to understand what we human being type or speak and respond backs with appropriate answers. Their way of engagement though still lacks the contextual understanding and fails to map the human feelings to a larger extent, but with more data processing at their disposal, they are able to simulate the human brain's understanding of the text or speech-based natural language .

NLP Subtopics & Components :

NLP can be further categorized into two major streams of text processing —

  • NLU- Natural Language Understanding
  • NLG- Natural Language Generation

It is the technique where the machine tries to read the given text and comprehend it to interpret the meaning out of it.

As per wiki,

Natural-language understanding (NLU) or natural-language interpretation (NLI) is a subtopic of natural-language processing in artificial intelligence that deals with machine reading comprehension. Natural-language understanding is considered an AI-hard problem.

In most of the text processing tasks like text categorization, analyzing textual content, aggregating new information etc NLU is the very first step. Some of the typical real-world examples of NLU are

Some of the typical application of NLU :

  • Machine translation
  • Automated reasoning
  • Question answering
  • News-gathering
  • Text categorization,
  • Voice-activation,
  • Content archiving
  • Large-scale content analysis.

NLG Convert information from computer databases or semantic intents into readable human language.

In the field of Artificial intelligence, NLG is one of the subtopics of NLP where the machines are trained to self learn and generate content in the form of text and speech, whenever any structured content is fed into it.

“ It is the mechanism of producing sentences and phrases out of structured text, simulating the behavior of humans in creating a meaningful sentence and speeches ”

NLG can be considered as an intelligent software practise that automatically transforms data into plain-English content. The technology can actually tell a story — exactly like that of a human analyst — by writing the sentences and paragraphs for you.

NLG may be viewed as the opposite of natural-language understanding: whereas in natural-language understanding, the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words.

1. Text planning: Retrieving the relevant content from the corpus. Here corpus can be comprised of vocabulary, sentences, knowledge, sample data, and much more.

2. Sentence planning:

Once we have extracted the content from the corpus, we choose the required word to frame the meaningful sentences which are grammatically correct

3. Text realization:- Using the penultimate processes we can generate a text which simulates human language

This extraordinary ability of AI-powered machines to generate great ideas on a scale from the given data is what makes this NLG extremely important and useful for modern science. some of the practical examples of NLG range from

  • Simple template-based systems like a mail merge that generates form letters
  • To create systems that have a complex understanding of human grammar.
  • NLG can also be accomplished by training a statistical model using machine learning, typically on a large corpus of human-written texts.

What Are The Key Components of Natural Language Processing?

Natural language processing can be broken in following core component

  • Entity Extraction
  • Syntactical Analysis
  • Semantic Analysis
  • Pragmatic Analysis
  • Sentiment Analysis

Let’s briefly get into each component, which we will later uncover in detail in the next part of this NLP series

This task consists of identifying semantic relationships between two or more entities in a text. Entities can be names, places, organizations, etc; and relationships can be established in a variety of ways.

For example, in the phrase “Pramod lives in Singrauli, UP ”, a person (Pramod) is related to a place (Singrauli) by the semantic category “lives in”.

Entity extraction has two important components

  • Entity type: Person, place, organization, etc.
  • Salience: Importance or the centrality of an entity on the scale of 0 to 1

This step decodes the syntactic structure of the given sentence to understand the grammar and co-relation between the words in it.

syntactic analysis or parsing may be defined as the process of analyzing the strings of symbols in natural language conforming to the rules of formal grammar. The origin of the word ‘parsing’ is from Latin word ‘pars’ which means ‘part’.

A sentence includes a subject and a predicate where the subject is a noun phrase and the predicate is a verb phrase.

Let’s take one sentence:

“The cat(noun phrase) went away (verb phrase).”

see how we can combine every noun phrase with a verb phrase. Again, it’s important to reiterate that a sentence can be syntactically correct but not always make sense.

So basically in syntactic analysis

The words are parsed into the ‘parts of speech’ based on general grammar rules in the language.

Sometimes human brains trying to figure out what has been said purely based on the hidden meaning and the given context, which is a kind of an unconscious process of understanding the language. This ability doesn’t come naturally to computers.

The word “semantic” is a linguistic term and means “related to meaning or logic.”

Semantic analysis is the process of understanding natural language–the way that humans communicate–based on meaning and context

So basically if a sentence is parsed to extract entities and understand syntax, the semantic analysis concludes the meaning of the sentence in a context-free form as an independent sentence. The inferred meaning may not be the actual intent of the implied meaning.

Semantic technology processes the logical structure of sentences to identify the most relevant elements in the text and understand the topic discussed. It also understands the relationships between different concepts in the text.

For example: it understands that a text is about “politics” and “economics” even if it doesn’t contain the actual words but related concepts such as “election,” “Democrat,” “speaker of the house,” or “budget,” “tax” or “inflation.”

Sentiment analysis is a machine learning technique that detects polarity (e.g. a positive or negative opinion) in the given sentence. Using it we try to understand the sentiment behind every sentence.

To understand the sentiments we rely upon two factors

Polarity: Which is an object with value -1 to +1, which helps us to rate the sentiments of a given sentence as negative or positive.

Magnitude: The value of the magnitude ranges from 0 to infinity that signifies the weight of the assigned polarity.

In a nutshell sentiment analysis:

Focus on polarity (positive, negative, neutral) but also on feelings and emotions (angry, happy, sad, etc), and even on intentions (e.g. interested v. not interested).

It is the most complex part NLP analysis process as it has to deal with the context of the given sentence.

Pragmatic analysis deals with outside word knowledge, which means knowledge that is external to the documents and/or queries.

Pragmatics analysis that focuses on what was described is reinterpreted by what it actually meant, deriving the various aspects of language that require heavy real-world knowledge, to understand the context and thus the meaning.

Pragmatic analysis is intended to analyze the statement in relation to the preceding or succeeding statements or even the overall paragraph in order to understand its meaning or context for that matter.

“I designed the flower today. But didn’t have the right colors”

In this case, the common noun is referring to ‘flowers’. But for our computer machine to decode what actually I meant from not having the right colors, it will have to look the earlier statement to get the context right.

Applications Of NLP :

Some of the familiar real-world application of NLP which humans have been privileged to experience are

  • AI Chatbots: which you can find in many mobile and web-based applications.
  • Machine Translation: Which helps to translate the given text into multiple languages.
  • Auto Spell check: Which you have experienced in google doc or any other popular document creator software.
  • Auto word suggestion: Which again is a very useful tool to help us write sentences in the document, where Google Doc automatically suggests possible words that will come next the previously typed words.
  • Automatic Query answering: Where machine can read the given passage and attempt to self answer the asked question based on the comprehension passage.
  • Spam Detection in the mail: It is a very common use case we experience daily, where google automatically segregates authentic mail from spam mails
  • Review Analysis On e-Commerce Portal: To understand user sentiment and use it to make better product recommendations to the user who has visited on it for the shopping.
  • Speech recognition: Used in modern voice-based chatbots. Alexa, Cortona are some of the powerful examples of virtual human assistant which uses speech recognition technology powered by NLP techniques

What’s Next In NLP Part-2?

We will look into some practical algorithms and methodology behind NLP techniques and will get our hands dirty in performing NLP text pre-processing in Jupyter notebook using Python. We will cover

  • Text cleaning
  • Stemming
  • Lemmatization
  • Tokenization
  • POS: Part Of Speech tagging

etc in more details

No matter how difficult it is to infuse the real kind of human emotions in this lifeless computers, they indeed can make our life super-efficient by saving time for us human , leaving us with more time at our disposal to think and innovate for the larger good of the humanity.

Thanks for being there …

Passionate Blogger & Tech Entrepreneur | Founder of FinTech Startup | Write about AIML, DevOps, Product Mgmt & Crypto