Key Topics

Foundational Concepts of AI: Understanding human intelligence, decision-making, and the definition of AI are crucial.
AI Ethics: Knowledge of ethical concerns like AI bias and data privacy is essential.
AI Project Cycle: Mastering the stages of the AI project cycle, including problem scoping, data acquisition, exploration, modeling, and evaluation, is vital.
Data Science: Understanding data science concepts, applications, data acquisition, visualization, and exploration, alongside using Python libraries like NumPy, Pandas, and Matplotlib.
Computer Vision (CV): Grasping the basics of image representation, feature extraction, object detection, and segmentation in computer vision.
Natural Language Processing (NLP): Understanding NLP concepts like text normalization and models such as Bag-of-Words, along with the challenges of machine understanding of human language.
Evaluation: Comprehending the role and methods of evaluating AI models, including key terminologies and metrics like accuracy, precision, recall, and the F1 score.
Advance Python: Acquiring introductory Python programming skills, including working with Jupyter Notebook, understanding basic Python programs, and utilizing Python's built-in functions and libraries.

Introduction to Artificial Intelligence (AI)

What is intelligence?

Intelligence is the ‘ability to perceive or infer information, and to retain it as knowledge to be applied towards adaptive behaviours within an environment or context’.

What abilities are involved in intelligence?

The abilities that are involved in intelligence are:

Mathematical Logical Reasoning: A person's ability to regulate, measure, and understand numerical symbols, abstraction, and logic.
Linguistic Intelligence: Language processing skills both in terms of understanding or implementation in writing or verbally.
Spatial Visual Intelligence: The ability to perceive the visual world and the relationship of one object to another.
Kinesthetic Intelligence: Ability that is related to how a person uses his limbs in a skilled manner.
Musical Intelligence: A person's ability to recognize and create sounds, rhythms, and sound patterns.
Intrapersonal Intelligence: Describes how high the level of self-awareness someone has is, starting from realizing weakness, strength, to his own feelings.
Existential Intelligence: An additional category of intelligence relating to religious and spiritual awareness.
Naturalist Intelligence: An additional category of intelligence relating to the ability to process information on the environment around us.
Interpersonal Intelligence: The ability to communicate with others by understanding other people's feelings & influence of the person.

What is Artificial Intelligence (AI)?

When a machine possesses the ability to mimic human traits, i.e., make decisions, predict the future, learn and improve on its own, it is said to have artificial intelligence. In other words, a machine is artificially intelligent when it can accomplish tasks by itself - collect data, understand it, analyse it, learn from it, and improve it.

How do machines become artificially intelligent?

Machines become intelligent once they are trained with some information which helps them achieve their tasks. AI machines also keep updating their knowledge to optimize their output.

Give some examples of AI applications.

Examples of AI applications include apps that monitor physical and mental health, humanoids like Sophia, biometric security systems, real-time language translators, and weather forecasts.

What differentiates automation from AI?

Any machine that has been trained with data and can make decisions/predictions on its own can be termed as AI. A fully automatic washing machine requires human intervention to select the parameters of washing and to do the necessary preparation for it to function correctly before each wash, which makes it an example of automation, not AI.

What is the relationship between AI, Machine Learning (ML), and Deep Learning (DL)?

Definition:

AI: Any technique enabling computers to mimic human intelligence.
ML: A subset of AI that enables machines to improve at tasks with experience (data).
DL: Enables software to train itself to perform tasks with vast amounts of data. Machines are intelligent enough to develop algorithms for themselves.

Capabilities:

AI: Machines can recognize faces, manipulate objects, and understand voice commands.
ML: Machines learn by themselves using provided data to make accurate predictions/decisions.
DL: Machines train themselves with huge amounts of data and can develop algorithms independently.

Intelligence Level:

AI: Covers all concepts and algorithms that mimic human intelligence.
ML: Intermediately intelligent.
DL: The most advanced form of AI.

Data Requirement:

AI: Training data is needed.
ML: Requires data to learn and make predictions.
DL: Requires vast amounts of data for training.

Relationship:

ML and DL are subsets of AI, but not all Machine Learning is Deep Learning.

Algorithms:

AI: Thinks algorithmically and executes what they have been asked for intelligently.
DL: Intelligent enough to develop algorithms for themselves.

What are the three main domains of AI based on the type of data used?

Data Science: This domain deals with data systems and processes. It involves collecting numerous data, maintaining data sets, and deriving meaning from them to make decisions.
Computer Vision (CV): This domain enables machines to get and analyze visual information, predicting decisions based on it. The process includes image acquisition, screening, analysis, and information extraction, helping computers understand visual content. Examples include self-driving cars and facial recognition.
Natural Language Processing (NLP): This branch focuses on the interaction between computers and humans using natural language. NLP aims to enable computers to understand and process human languages. Examples include email filters and smart assistants.

What are the ethical concerns in AI?

AI Ethics: As AI usage grows, it's crucial to consider ethical practices in AI solution development.
Moral Issues: AI developers face moral dilemmas when designing algorithms, as their own moralities can influence machine decision-making.
Data Privacy: AI relies heavily on data, raising concerns about how companies collect and use personal data. Users often unknowingly grant apps access to their data.
AI Bias: Biases can be unintentionally transferred from developers to AI systems, leading to skewed outcomes. For example, virtual assistants often have female voices. Search results can also reflect biases.
AI Access: Unequal access to AI technology can create a gap between those who can afford it and those who cannot.
Job Displacement: AI may replace human labor, potentially causing mass unemployment for those in laborious jobs without specialized skills.
Over-Reliance on AI: Over-dependence on AI, especially in education, can hinder the development of critical thinking skills in children.

What are the applications of AI around us?

Artificial Intelligence (AI) is integrated into many aspects of daily life:

Search Engines: Google uses AI to provide accurate search results quickly and suggests or auto-corrects typed sentences.
Voice Assistants: AI powers voice assistants like Alexa, Google Assistant, Cortana, and Siri.
Navigation: Apps such as UBER and Google Maps use AI to provide directions.
Gaming: AI enhances graphics, creates new difficulty levels, and encourages gamers.
Recommendation Systems: Platforms like Netflix, Amazon, and Spotify use AI to provide personalized recommendations based on user preferences.
Social Media: AI is used in platforms like Facebook and Instagram to connect users and provide customized notifications.
Health Monitoring: AI is used in chatbots and health apps to monitor users' physical and mental health.
Humanoids: AI is present in humanoids like Sophia.
Security: AI is used in biometric security systems like face locks on phones.
Language Translation: AI enables real-time language translation.

The AI Project Cycle

What are the stages of the AI Project Cycle?

The AI Project Cycle mainly has five stages:

Problem Scoping
Data Acquisition
Data Exploration
Modelling
Evaluation

Describe the AI Project Cycle and explain the purpose of each stage.

The AI Project Cycle provides an appropriate framework which can lead towards the goal of developing an AI project. The AI Project Cycle has five stages:

Problem Scoping: This is the initial stage where the goal for the AI project is set by stating the problem that you wish to solve with it. It involves looking at various parameters which affect the problem to get a clearer picture. This stage sets the foundation and direction for the entire project.
Data Acquisition: This stage is about acquiring data for the project. Data can be a piece of information or facts and statistics collected together for reference or analysis. The data acquired will become the base of your project as it will help in understanding what the parameters that are related to problem scoping are. For any AI project to be efficient, the training data should be authentic and relevant to the problem statement scoped.
Data Exploration: In this stage, you try to give the data a visual image of different types of representations like graphs, databases, flow charts, maps, etc. This makes it easier to interpret the patterns which your acquired data follows. To analyse the data, you need to visualise it in some user-friendly format so that you can quickly get a sense of the trends, relationships and patterns contained within the data.
Modelling: After exploring the patterns, you can decide upon the type of model you would build to achieve the goal. The ability to mathematically describe the relationship between parameters is the heart of every AI model.
Evaluation: Once the modelling is complete, you now need to test your model on some newly fetched data. The results will help you in evaluating your model and improving it. This stage helps in understanding the reliability of any AI model, based on outputs by feeding test dataset into the model and comparing with actual answers.

How does the 4Ws Problem Canvas help in Problem Scoping?

The 4Ws Problem Canvas helps in identifying the key elements related to the problem. It helps to have a deeper understanding around a problem so that the picture becomes clearer while working to solve it. The 4Ws stand for:

Who: This block helps in analyzing the people getting affected directly or indirectly due to the problem. It identifies the stakeholders and what is known about them. Stakeholders are the people who face this problem and would be benefitted with the solution.
What: This block determines the nature of the problem. It identifies what the problem is and how you know that it is a problem. It also involves gathering evidence to prove that the problem you have selected actually exists.
Where: This block focuses on the context/situation/location of the problem. It helps look into the situation in which the problem arises, the context of it, and the locations where it is prominent.
Why: This block focuses on the benefits which the stakeholders would get from the solution and how it will benefit them as well as the society. It helps to understand why you want to solve this problem.

What are some sources for Data Acquisition?

There can be various ways in which you can collect data. Some of them are:

Surveys
Web Scraping
Sensors
Cameras
Observations
API (Application Program Interface)
Open-sourced Government Portals
Reliable Websites (Kaggle)
Interviews
World Organizations’ open-sourced statistical websites

What considerations should be made during Data Acquisition?

While accessing data from any of the data sources, the following points should be kept in mind:

Data which is available for public usage only should be taken up.
Personal datasets should only be used with the consent of the owner.
One should never breach someone’s privacy to collect data.
Data should only be taken from reliable sources as the data collected from random sources can be wrong or unusable.
Reliable sources of data ensure the authenticity of data which helps in proper training of the AI model.

Explain Data Exploration and how visual representations aid in understanding data patterns.

To analyse the data, we need to visualise it in some user-friendly format so that we can:

Quickly get a sense of the trends, relationships and patterns contained within the data.
Define strategy for which model to use at a later stage.
Communicate the same to others effectively.

To visualise data, various types of visual representations can be used like histogram, pie chart, bar graph, line graph etc.

Compare and contrast the Rule-Based Approach and Learning-Based Approach to AI modeling.

Rule-Based Approach: This approach involves AI modeling where the rules are defined by the developer. The machine follows the rules or instructions mentioned by the developer and performs its task accordingly. The learning is static. The machine, once trained, does not consider any changes made in the original training dataset. The model cannot improvise itself based on feedbacks once trained.
Learning-Based Approach: This approach involves AI modeling where the machine learns by itself. The AI model gets trained on the data fed to it and can design a model adaptive to the change in data. The model would modify itself according to the changes that occur in the data so that all the exceptions are handled.

Describe the Learning Based Approach and its types: Supervised and Unsupervised Learning.

The Learning Based Approach refers to AI modelling where the machine learns by itself. The AI model gets trained on the data fed to it and can design a model which is adaptive to the change in data. The model would modify itself according to the changes which occur in the data so that all the exceptions are handled.

Types of Learning-Based Approaches:

Supervised Learning: In a supervised learning model, the dataset which is fed to the machine is labelled. The dataset is known to the person who is training the machine, so they are able to label the data. It includes two model types:
- Classification: Where the data is classified according to the labels. This model works on a discrete dataset.
- Regression: Such models work on continuous data.
Unsupervised Learning: An unsupervised learning model works on an unlabelled dataset. This means that the data which is fed to the machine is random. The unsupervised learning models are used to identify relationships, patterns, and trends out of the data which is fed into it.

Discuss the importance of the Evaluation stage and explain key evaluation metrics.

Once a model has been made and trained, it needs to go through proper testing so that one can calculate the efficiency and performance of the model. Key evaluation metrics include:

Accuracy: Defined as the percentage of correct predictions out of all the observations.
Precision: Defined as the percentage of true positive cases versus all the cases where the prediction is true. It takes into account the True Positives and False Positives.
Recall: Defined as the fraction of positive cases that are correctly identified. It considers True Positives and False Negatives.
F1 Score: Defined as the measure of balance between precision and recall.

How do you create a Confusion Matrix?

The result of comparison between the prediction and reality can be recorded in the confusion matrix. Prediction and Reality can be easily mapped together with the help of this confusion matrix. The confusion matrix allows us to understand the prediction results. It is a record that can help in evaluation.

How do you choose between precision and recall?

Choosing between Precision and Recall depends on the condition in which the model has been deployed. In a case like Forest Fire, a False Negative can cost a lot and is risky too. If no alert is given even when there is a Forest Fire, the whole forest might burn down. In that case, Recall is more important. On the other hand, if a model is used to predict the traffic, Precision is preferred over Recall so that the model generates a very low number of false alarms.

Data Science & Python

What is Data Science?

Data science unifies statistics, data analysis, and machine learning to understand phenomena with data. Applications include fraud detection, genomics, internet search, targeted advertising, and optimizing airline operations.

What are data features?

Data features refer to the type of data you want to collect. To determine the data features, look at your problem statement and identify the data required to address the issue. For example, if predicting employee salary, data features might include salary amount, increment percentage, increment period, and bonus.

What are the differences between NumPy arrays and Python lists?

NumPy Arrays: Homogenous collection of data (one type); cannot be directly initialized; direct numerical operations can be done; take less memory space.
Python Lists: Heterogenous collection of data (multiple types); can be directly initialized; direct numerical operations are not possible; acquire more memory space.

What are common data errors?

Types of data errors include:

Erroneous Data: Incorrect values and invalid/Null values.
Missing Data: Some cells remain empty.
Outliers: Data which does not fall in the range of a certain element.

What are the basic concepts of statistics?

Common statistical methods include: Mean, Median, Mode, Standard Deviation, and Variance. Python packages like NumPy have pre-defined functions to calculate these.

What is the K-Nearest Neighbour (KNN) algorithm?

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement supervised machine learning algorithm that can be used to solve both classification and regression problems. The KNN algorithm assumes that similar things exist in close proximity. The KNN prediction model relies on the surrounding points or neighbors to determine its class or group.

What are the different types of data formats?

CSV: CSV stands for comma separated values. It is a simple file format used to store tabular data. Each line of this file is a data record and each record consists of one or more fields which are separated by commas.
Spreadsheet: A Spreadsheet is a piece of paper or a computer program which is used for accounting and recording data using rows and columns into which information can be entered. Microsoft Excel is a program which helps in creating spreadsheets.
SQL: SQL is a programming language also known as Structured Query Language. It is a domain specific language used in programming and is designed for managing data held in different kinds of DBMS (Database Management System). It is particularly useful in handling structured data.