Header Ads Widget

Classification and Clustering


1. Instead of representing knowledge in a relatively declarative, static way (as a bunch of things that are true), rule-based system represent knowledge in terms of___________ that tell you what you should do or what you could conclude in different situations.
a) Raw Text
b) A bunch of rules
c) Summarized Text
d) Collection of various Texts
Answer: b
Explanation: None.

2. A rule-based system consists of a bunch of IF-THEN rules.
a) True
b) False
Answer: a
Explanation: None.

3. In a backward chaining system you start with the initial facts, and keep using the rules to draw new conclusions (or take certain actions) given those facts.
a) True
b) False
Answer: b
Explanation: Refer the definition of backward chaining.

4. In a backward chaining system, you start with some hypothesis (or goal) you are trying to prove, and keep looking for rules that would allow you to conclude that hypothesis, perhaps setting new sub-goals to prove as you go.
a) True
b) False
Answer: a
Explanation: None.

5. Forward chaining systems are _____________ where as backward chaining systems are ___________
a) Goal-driven, goal-driven
b) Goal-driven, data-driven
c) Data-driven, goal-driven
d) Data-driven, data-driven
Answer: c
Explanation: None.

6. A Horn clause is a clause with _______ positive literal.
a) At least one
b) At most one
c) None
d) All
Answer: b
Explanation: Refer to the definition of Horn Clauses.

7. ___________ trees can be used to infer in Horn clause systems.
a) Min/Max Tree
b) And/Or Trees
c) Minimum Spanning Trees
d) Binary Search Trees
Answer: b
Explanation: Take the analogy using min/max trees in game theory.

8. An expert system is a computer program that contains some of the subject-specific knowledge of one or more human experts.
a) True
b) False
Answer: a
Explanation: None.

9. A knowledge engineer has the job of extracting knowledge from an expert and building the expert system knowledge base.
a) True
b) False
Answer: a
Explanation: None.

10. What is needed to make probabilistic systems feasible in the world?
a) Reliability
b) Crucial robustness
c) Feasibility
d) None of the above
answer : b
Explanation: On a model-based knowledge provides the crucial robustness needed to make probabilistic system feasible in the real world.

11. How many terms are required for building a bayes model?
a) 1
b) 2
c) 3
d) 4
Answer: c
Explanation: The three required terms are a conditional probability and two unconditional probability.

12. What is needed to make probabilistic systems feasible in the world?
a) Reliability
b) Crucial robustness
c) Feasibility
d) None of the mentioned
Answer: b
Explanation: On a model-based knowledge provides the crucial robustness needed to make probabilistic system feasible in the real world.

13. Where does the bayes rule can be used?
a) Solving queries
b) Increasing complexity
c) Decreasing complexity
d) Answering probabilistic query
Answer: d
Explanation: Bayes rule can be used to answer the probabilistic queries conditioned on one piece of evidence.

14. What does the bayesian network provides?
a) Complete description of the domain
b) Partial description of the domain
c) Complete description of the problem
d) None of the mentioned
Answer: a
Explanation: A Bayesian network provides a complete description of the domain.

15. How the entries in the full joint probability distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned
Answer: b
Explanation: Every entry in the full joint probability distribution can be calculated from the information in the network.

16. How the bayesian network can be used to answer any query?
a) Full distribution
b) Joint distribution
c) Partial distribution
d) All of the mentioned
Answer: b
Explanation: If a bayesian network is a representation of the joint distribution, then it can solve any query, by summing all the relevant joint entries.

17. How the compactness of the bayesian network can be described?
a) Locally structured
b) Fully structured
c) Partial structure
d) All of the mentioned
Answer: a
Explanation: The compactness of the bayesian network is an example of a very general property of a locally structured system.

18. To which does the local structure is associated?
a) Hybrid
b) Dependant
c) Linear
d) None of the mentioned
Answer: c
Explanation: Local structure is usually associated with linear rather than exponential growth in complexity.

19. Which condition is used to influence a variable directly by all the others?
a) Partially connected
b) Fully connected
c) Local connected
d) None of the mentioned
Answer: b
Explanation: None.

20. What is the consequence between a node and its predecessors while creating bayesian network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant
Answer: c
Explanation: The semantics to derive a method for constructing bayesian networks were led to the consequence that a node can be conditionally independent of its predecessors.

21. Which algorithm is used for solving temporal probabilistic reasoning?
a) Hill-climbing search
b) Hidden markov model
c) Depth-first search
d) Breadth-first search
Answer: b
Explanation: Hidden Markov model is used for solving temporal probabilistic reasoning that was independent of transition and sensor model.

22. How does the state of the process is described in HMM?
a) Literal
b) Single random variable
c) Single discrete random variable
d) None of the mentioned
Answer: c
Explanation: An HMM is a temporal probabilistic model in which the state of the process is described by a single discrete random variable.

23. What are the possible values of the variable?
a) Variables
b) Literals
c) Discrete variable
d) Possible states of the world
Answer: d
Explanation: The possible values of the variables are the possible states of the world.

24. Where does the additional variables are added in HMM?
a) Temporal model
b) Reality model
c) Probability model
d) All of the mentioned
Answer: a
Explanation: Additional state variables can be added to a temporal model while staying within the HMM framework.

25. Which allows for a simple and matrix implementation of all the basic algorithm?
a) HMM
b) Restricted structure of HMM
c) Temporary model
d) Reality model
Answer: b
Explanation: Restricted structure of HMM allows for a very simple and elegant matrix implementation of all the basic algorithm.

26. Where does the Hidden Markov Model is used?
a) Speech recognition
b) Understanding of real world
c) Both Speech recognition & Understanding of real world
d) None of the mentioned
Answer: a
Explanation: None.

27. Which variable can give the concrete form to the representation of the transition model?
a) Single variable
b) Discrete state variable
c) Random variable
d) Both Single & Discrete state variable
Answer: d
Explanation: With a single, discrete state variable, we can give concrete form to the representation of the transition model.

28. Which algorithm works by first running the standard forward pass to compute?
a) Smoothing
b) Modified smoothing
c) HMM
d) Depth-first search algorithm
Answer: b
Explanation: The modified smoothing algorithm works by first running the standard forward pass to compute and then running the backward pass.

29. Which reveals an improvement in online smoothing?
a) Matrix formulation
b) Revelation
c) HMM
d) None of the mentioned
Answer: a
Explanation: Matrix formulation reveals an improvement in online smoothing with a fixed lag.

30. Which suggests the existence of an efficient recursive algorithm for online smoothing?
a) Matrix
b) Constant space
c) Constant time
d) None of the mentioned
Answer: b
Explanation: None.

31. The Expectation Maximization algorithm has been used to identify conserved domains in unaligned proteins only.
a) True
b) False
Answer: b
Explanation: This algorithm has been used to identify both conserved domains in unaligned proteins and protein-binding sites in unaligned DNA sequences (Lawrence and Reilly 1990), including sites that may include gaps (Cardon and Stormo 1992). Given are a set of sequences that are expected to have a common sequence pattern and may not be easily recognizable by eye.

32. Which of the following is untrue regarding Expectation Maximization algorithm?
a) An initial guess is made as to the location and size of the site of interest in each of the sequences, and these parts of the sequence are aligned
b) The alignment provides an estimate of the base or amino acid composition of each column in the site
c) The column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences
d) The row-by-column composition of the site already available is used to estimate the probability
Answer: d
Explanation: The EM algorithm then consists of two steps, which are repeated consecutively. In step 1, the expectation step, the column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences. These probabilities are used in turn to provide new information as to the expected base or amino acid distribution for each column in the site.

33. Out of the two repeated steps in EM algorithm, the step 2 is ________
a) the maximization step
b) the minimization step
c) the optimization step
d) the normalization step
Answer: a
Explanation: In step 2, the maximization step, the new counts of bases or amino acids for each position in the site found in step 1 are substituted for the previous set. Step 1 is then repeated using these new counts. The cycle is repeated until the algorithm converges on a solution and does not change with further cycles. At that time, the best location of the site in each sequence and the best estimate of the residue composition of each column in the site will be available.

34. In EM algorithm, as an example, suppose that there are 10 DNA sequences having very little similarity with each other, each about 100 nucleotides long and thought to contain a binding site near the middle 20 residues, based on biochemical and genetic evidence. the following steps would be used by the EM algorithm to find the most probable location of the binding sites in each of the ______ sequences.
a) 30
b) 10
c) 25
d) 20
Answer: b
Explanation: When examining the EM program MEME, the size and number of binding sites, the location in each sequence, and whether or not the site is present in each sequence do not necessarily have to be known. For the present example, the following steps would be used by the EM algorithm to find the most probable location of the binding sites in each of the 10 sequences.

35. In the initial step of EM algorithm, the 20-residue-long binding motif patterns in each sequence are aligned as an initial guess of the motif.
a) True
b) False
Answer: a
Explanation: The base composition of each column in the aligned patterns is then determined. The composition of the flanking sequence on each side of the site provides the surrounding base or amino acid composition for comparison. Each sequence is assumed to be the same length and to be aligned by the ends.

36. In the intermediate steps of EM algorithm, the number of each base in each column is determined and then converted to fractions.
a) True
b) False
Answer: a
Explanation: For example, that there are four Gs in the first column of the 10 sequences, then the frequency of G in the first column of the site, fSG = 4/10 = 0.4. This procedure is repeated for each base and each column.

37. For the 10-residue DNA sequence example, there are _______ possible starting sites for a 20-residue-long site.
a) 30
b) 21
c) 81
d) 60
Answer: c
Explanation: For the 10-residue DNA sequence example, there are 100 – 20 +1 possible starting sites for a 20-residue-long site. Where the first one is at position 1 in the sequence ending one at 20 and the last beginning at position 81 and ending at 100 (there is not enough sequence for a 20-residue-long site beyond position 81).

38. An alternative method is to produce an odds scoring matrix calculated by dividing each base frequency by the background frequency of that base.
a) True
b) False
Answer: a
Explanation: In this method, the probability of each location is then found by multiplying the odds scores from each column. An even simpler method is to use log odds scores in the matrix. The column scores are then simply added. In this case, the log odds scores must be converted to odds scores before position probabilities are calculated.

39. Which of the following about MEME is untrue?
a) It is a Web resource for performing local MSAs (Multiple Sequence Alignment) by the above expectation maximization method is the program MEME
b) It stands for Multiple EM for Motif Elicitation
c) It was developed at developed at the University of California at San Diego Supercomputing Center
d) The Web page has multiple versions for searching blocks by an EM algorithm
Answer: d
Explanation: The Web page for two versions of MEME, ParaMEME, a Web program that searches for blocks by an EM algorithm (Described below), and a similar program MetaMEME (which searches for profiles using HMMs, described below).The Motif Alignment and Search Tool (MAST) for searching through databases for matches to motifs.

40. Which of the following about the Gibbs sampler is untrue?
a) It is a statistical method for finding motifs in sequences
b) It is dissimilar to the principle of the EM method
c) It searches for the statistically most probable motifs
d) It can find the optimal width and number of given motifs in each sequence
Answer: b
Explanation: It is another statistical method for finding motifs in sequences is the Gibbs sampler. The method is similar in principle to the EM method described above, but the algorithm is different. A combinatorial approach of the Gibbs sampler and MOTIF may be used to make blocks at the BLOCKS Web site.

41. Bayesian Belief Network is also known as ?
A. belief network
B. decision network
C. Bayesian model
D. All of the above
Answer : D
Explanation: Bayesian Belief Network also called a Bayes network, belief network, decision network, or Bayesian model.

42. Bayesian Network consist of ?
A. 2
B. 3
C. 4
D. 5
Answer : A
Explanation: Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts: Directed Acyclic Graph and Table of conditional probabilities.

43. The generalized form of Bayesian network that represents and solve decision problems under uncertain knowledge is known as an?
A. Directed Acyclic Graph
B. Table of conditional probabilities
C. Influence diagram
D. None of the above
Answer : C
Explanation: The generalized form of Bayesian network that represents and solve decision problems under uncertain knowledge is known as an Influence diagram

44. How many component does Bayesian network have?
A. 2
B. 3
C. 4
D. 5
Answer : A
Explanation: The Bayesian network has mainly two components: Causal Component and Actual numbers

45. The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a 
A. DCG
B. DAG
C. CAG
D. SAG
Answer : B
Explanation: The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a directed acyclic graph or DAG.

46. In a Bayesian network variable is?
A. continuous
B. discrete
C. Both A and B
D. None of the above
Answer : C
Explanation: Each node corresponds to the random variables, and a variable can be continuous or discrete.

47. If we have variables x1, x2, x3,….., xn, then the probabilities of a different combination of x1, x2, x3.. xn, are known as?
A. Table of conditional probabilities
B. Causal Component
C. Actual numbers
D. Joint probability distribution
Answer : D
Explanation: If we have variables x1, x2, x3,….., xn, then the probabilities of a different combination of x1, x2, x3.. xn, are known as Joint probability distribution.

48. The nodes and links form the structure of the Bayesian network, and we call this the ?
A. structural specification
B. multi-variable nodes
C. Conditional Linear Gaussian distributions
D. None of the above
Answer : A
Explanation: The nodes and links form the structure of the Bayesian network, and we call this the structural specification.

49. Which of the following are used for modeling times series and sequences?
A. Decision graphs
B. Dynamic Bayesian networks
C. Value of information
D. Parameter tuning
Answer : B
Explanation: Dynamic Bayesian networks (DBNs) are used for modeling times series and sequences.

50. How many terms are required for building a bayes model?
A. 1
B. 2
C. 3
D. 4
Answer : C
Explanation: The three required terms are a conditional probability and two unconditional probability.

51) Which of the following algorithms cannot be used for reducing the dimensionality of data?
A. t-SNE
B. PCA
C. LDA False
D. None of these
Answer: (D)

52) [ True or False ] PCA can be used for projecting and visualizing data in lower dimensions.
A. TRUE
B. FALSE
Answer: (A)

53) The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA?
PCA is an unsupervised method
It searches for the directions that data have the largest variance
Maximum number of principal components <= number of features
All principal components are orthogonal to each other
A. 1 and 2
B. 1 and 3
C. 2 and 3
D. 1, 2 and 3
E. 1,2 and 4
F. All of the above
Answer: (F)

54) Suppose we are using dimensionality reduction as pre-processing technique, i.e, instead of using all the features, we reduce the data to k dimensions with PCA. And then use these PCA projections as our features. Which of the following statement is correct?
A. Higher ‘k’ means more regularization
B. Higher ‘k’ means less regularization
C. Can’t Say
Answer: (B)

55) In which of the following scenarios is t-SNE better to use than PCA for dimensionality reduction while working on a local machine with minimal computational power?
A. Dataset with 1 Million entries and 300 features
B. Dataset with 100000 entries and 310 features
C. Dataset with 10,000 entries and 8 features
D. Dataset with 10,000 entries and 200 features
Answer: (C)

56) Which of the following statement is true for a t-SNE cost function?
A. It is asymmetric in nature.
B. It is symmetric in nature.
C. It is same as the cost function for SNE.
Answer: (B)

57) Imagine you are dealing with text data. To represent the words you are using word embedding (Word2vec). In word embedding, you will end up with 1000 dimensions. Now, you want to reduce the dimensionality of this high dimensional data such that, similar words should have a similar meaning in nearest neighbor space.In such case, which of the following algorithm are you most likely choose?
A. t-SNE
B. PCA
C. LDA
D. None of these
Answer: (A)

58) [True or False] t-SNE learns non-parametric mapping.
A. TRUE
B. FALSE
Answer: (A)

59) Which of the following statement is correct for t-SNE and PCA?
A. t-SNE is linear whereas PCA is non-linear
B. t-SNE and PCA both are linear
C. t-SNE and PCA both are nonlinear
D. t-SNE is nonlinear whereas PCA is linear
Answer: (D)

60) In t-SNE algorithm, which of the following hyper parameters can be tuned?
A. Number of dimensions
B. Smooth measure of effective number of neighbours
C. Maximum number of iterations
D. All of the above
Answer: (D)

61) The minimum time complexity for training an SVM is O(n2). According to this fact, what sizes of datasets are not best suited for SVM’s?
A) Large datasets
B) Small datasets
C) Medium sized datasets
D) Size does not matter
Answer: A

62) The effectiveness of an SVM depends upon:
A) Selection of Kernel
B) Kernel Parameters
C) Soft Margin Parameter C
D) All of the above
Answer: D

63) Support vectors are the data points that lie closest to the decision surface.
A) TRUE
B) FALSE
Answer: A

64) The SVM’s are less effective when:
A) The data is linearly separable
B) The data is clean and ready to use
C) The data is noisy and contains overlapping points
Answer: C

65) Suppose you are using RBF kernel in SVM with high Gamma value. What does this signify?
A) The model would consider even far away points from hyperplane for modeling
B) The model would consider only the points close to the hyperplane for modeling
C) The model would not be affected by distance of points from hyperplane for modeling
D) None of the above
Answer: B

66) The cost parameter in the SVM means:
A) The number of cross-validations to be made
B) The kernel to be used
C) The tradeoff between misclassification and simplicity of the model
D) None of the above
Answer: C

67)Suppose you are building a SVM model on data X. The data X can be error prone which means that you should not trust any specific data point too much. Now think that you want to build a SVM model which has quadratic kernel function of polynomial degree 2 that uses Slack variable C as one of it’s hyper parameter. Based upon that give the answer for following question.
What would happen when you use very large value of C(C->infinity)?
Note: For small C was also classifying all data points correctly
A) We can still classify data correctly for given setting of hyper parameter C
B) We can not classify data correctly for given setting of hyper parameter C
C) Can’t Say
D) None of these
Answer: A

68) What would happen when you use very small C (C~0)?
A) Misclassification would happen
B) Data will be correctly classified
C) Can’t say
D) None of these
Answer: A

69) If I am using all features of my dataset and I achieve 100% accuracy on my training set, but ~70% on validation set, what should I look out for?
A) Underfitting
B) Nothing, the model is perfect
C) Overfitting
Answer: C

70) Which of the following are real world applications of the SVM?
A) Text and Hypertext Categorization
B) Image Classification
C) Clustering of News Articles
D) All of the above
Answer: D

Question Context: 71 – 72
Suppose you have trained an SVM with linear decision boundary after training SVM, you correctly infer that your SVM model is under fitting.

71) Which of the following option would you more likely to consider iterating SVM next time?
A) You want to increase your data points
B) You want to decrease your data points
C) You will try to calculate more variables
D) You will try to reduce the features
Answer: C

72) Suppose you gave the correct answer in previous question. What do you think that is actually happening?
1. We are lowering the bias
2. We are lowering the variance
3. We are increasing the bias
4. We are increasing the variance
A) 1 and 2
B) 2 and 3
C) 1 and 4
D) 2 and 4
Answer: C

73) In above question suppose you want to change one of it’s(SVM) hyperparameter so that effect would be same as previous questions i.e model will not under fit?
A) We will increase the parameter C
B) We will decrease the parameter C
C) Changing in C don’t effect
D) None of these
Answer: A

74) We usually use feature normalization before using the Gaussian kernel in SVM. What is true about feature normalization?
1. We do feature normalization so that new feature will dominate other
2. Some times, feature normalization is not feasible in case of categorical variables
3. Feature normalization always helps when we use Gaussian kernel in SVM
A) 1
B) 1 and 2
C) 1 and 3
D) 2 and 3
Answer: B

Question Context: 75
Suppose you are dealing with 4 class classification problem and you want to train a SVM model on the data for that you are using One-vs-all method. Now answer the below questions?

75) How many times we need to train our SVM model in such case?
A) 1
B) 2
C) 3
D) 4
Solution: D

Post a Comment

0 Comments