Quiz about Certification Exam, Databricks, Databricks Certified Professional Data Scientist

Question 1

Suppose A, B , and C are events. The probability of A given B , relative to P(|C), is the same as the
probability of A given B and C (relative to P ). That is,

Accepted Answer

P(A,B|C) P(B|C) =P(A|B,C)

Answer

P(A,B|C) P(B|C) =P(B|A,C)

Answer

P(A,B|C) P(B|C) =P(C|B,C)

Answer

P(A,B|C) P(B|C) =P(A|C,B)

Question 2

Feature Hashing approach is \"SGD-based classifiers avoid the need to predetermine vector size by
simply picking a reasonable size and shoehorning the training data into vectors of that size" now with
large vectors or with multiple locations per feature in Feature hashing?

Accepted Answer

It is hard to understand what classifier is doing

Answer

Is a problem with accuracy

Answer

It is easy to understand what classifier is doing

Answer

Is a problem with accuracy as well as hard to understand what classifier us doing
FEATURE HASHING
SGD-based classifiers avoid the need to predetermine vector size by simply picking a reasonable size
and shoehorning the training data into vectors of that size. This approach is known as feature
hashing. The shoehorning is done by picking one or more locations by using a hash of the name of
the variable for continuous variables or a hash of the variable name and the category name or word
for categorical, textlike, or word-like data.
This hashed feature approach has the distinct advantage of requiring less memory and one less pass
through the training data, but it can make it much harder to reverse engineer vectors to determine
which original feature mapped to a vector location. This is because multiple features may hash to the
same location. With large vectors or with multiple locations per feature, this isn't a problem for
accuracy but it can make it hard to understand what a classifier is doing.
An additional benefit of feature hashing is that the unknown and unbounded vocabularies typical of
word-like variables aren't a problem.

Question 3

What are the advantages of the Hashing Features?

Accepted Answer

Requires the less memory

Accepted Answer

Less pass through the training data

Answer

Easily reverse engineer vectors to determine which original feature mapped to a vector location
SGD-based classifiers avoid the need to predetermine vector size by simply picking a reasonable size
and shoehorning the training data into vectors of that size. This approach is known as feature
hashing. The shoehorning is done by picking one or more locations by using a hash of the name of
the variable for continuous variables or a hash of the variable name and the category name or word
for categorical, textlike, or word-like data.
This hashed feature approach has the distinct advantage of requiring less memory and one less pass
through the training data, but it can make it much harder to reverse engineer vectors to determine
which original feature mapped to a vector location. This is because multiple features may hash to the
same location. With large vectors or with multiple locations per feature, this isn't a problem for
accuracy but it can make it hard to understand what a classifier is doing.
An additional benefit of feature hashing is that the unknown and unbounded vocabularies typical of
word-like variables aren't a problem.

Question 4

Question-3: In machine learning, feature hashing, also known as the hashing trick (by analogy to the
kernel trick), is a fast and space-efficient way of vectorizing features (such as the words in a
language), i.e., turning arbitrary features into indices in a vector or matrix. It works by applying a
hash function to the features and using their hash values modulo the number of features as indices
directly, rather than looking the indices up in an associative array. So what is the primary reason of
the hashing trick for building classifiers?

Accepted Answer

It requires the lesser memory to store the coefficients for the model

Answer

It creates the smaller models

Answer

It reduces the non-significant features e.g. punctuations

Answer

Noisy features are removed
This hashed feature approach has the distinct advantage of requiring less memory and one less pass
through the training data, but it can make it much harder to reverse engineer vectors to determine
which original feature mapped to a vector location. This is because multiple features may hash to the
same location. With large vectors or with multiple locations per feature, this isn't a problem for
accuracy but it can make it hard to understand what a classifier is doing.
Models always have a coefficient per feature, which are stored in memory during model building.
The hashing trick collapses a high number of features to a small number which reduces the number
of coefficients and thus memory requirements. Noisy features are not removed; they are combined
with other features and so still have an impact.
The validity of this approach depends a lot on the nature of the features and problem domain;
knowledge of the domain is important to understand whether it is applicable or will likely produce
poor results. While hashing features may produce a smaller model, it will be one built from odd
combinations of real-world features, and so will be harder to interpret.
An additional benefit of feature hashing is that the unknown and unbounded vocabularies typical of
word-like variables aren't a problem.

Question 5

What is the considerable difference between L1 and L2 regularization?

Accepted Answer

Size of the model can be much smaller in L1 regularization than that produced by L2-regularization

Answer

L1 regularization has more accuracy of the resulting model

Answer

L2-regularization can be of vital importance when the application is deployed in resource-tight
environments such as cell-phones.

Answer

All of the above are correct
The two most common regularization methods are called L1 and L2 regularization. L1 regularization
penalizes the weight vector for its L1-norm (i.e. the sum of the absolute values of the weights),
whereas L2 regularization uses its L2-norm. There is usually not a considerable difference between
the two methods in terms of the accuracy of the resulting model (Gao et al 2007), but L1
regularization has a significant advantage in practice. Because many of the weights of the features
become zero as a result of L1-regularized training, the size of the model can be much smaller than
that produced by L2-regularization. Compact models require less space on memory and storage, and
enable the application to start up quickly. These merits can be of vital importance when the
application is deployed in resource-tight environments such as cell-phones.
Regularization works by adding the penalty associated with the coefficient values to the error of the
hypothesis. This way, an accurate hypothesis with unlikely coefficients would be penalized whila a
somewhat less accurate but more conservative hypothesis with low coefficients would not be
penalized as much.
81

Question 6

Regularization is a very important technique in machine learning to prevent overfitting.
Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so
perfectly to overfit. The difference between the L1 and L2 is...

Accepted Answer

L2 is the sum of the square of the weights, while L1 is just the sum of the weights

Answer

L1 is the sum of the square of the weights, while L2 is just the sum of the weights

Answer

L1 gives Non-sparse output while L2 gives sparse outputs

Answer

None of the above
Regularization is a very important technique in machine learning to prevent overfitting.
Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so
perfectly to overfit. The difference between the L1 and L2 is just that L2 is the sum of the square of
the weights, while L1 is just the sum of the weights. As follows: L1 regularization on least squares:
$\"Certification$

Question 7

Select the correct option which applies to L2 regularization

Accepted Answer

Computational efficient due to having analytical solutions

Accepted Answer

Non-sparse outputs

Accepted Answer

No feature selection
Explanation :
The difference between their properties can be promptly summarized as follows:
$\"Certification$

Question 8

Regularization is a very important technique in machine learning to prevent over fitting. And
Optimizing with a L1 regularization term is harder than with an L2 regularization term because

Accepted Answer

The penalty term is not differentiate

Answer

The second derivative is not constant

Answer

The objective function is not convex

Answer

The constraints are quadratic
Regularization is a very important technique in machine learning to prevent overfitting.
Mathematically speaking, it adds a regularization term in order to prevent the coefficients to fit so
perfectly to overfit. The difference between the L1 and L2 is just that L2 is the sum of the square of
the weights, while L1 is just the sum of the weights.
Much of optimization theory has historically focused on convex loss functions because they're much
easier to optimize than non-convex functions: a convex function over a bounded domain is
guaranteed to have a minimum, and it's easy to find that minimum by following the gradient of the
function at each point no matter where you start. For non-convex functions, on the other hand,
where you start matters a great deal; if you start in a bad position and follow the gradient, you're
likely to end up in a local minimum that is not necessarily equal to the global minimum.
You can think of convex functions as cereal bowls: anywhere you start in the cereal bowl, you're likely
to roll down to the bottom. A non-convex function is more like a skate park: lots of ramps, dips, ups
and downs. It's a lot harder to find the lowest point in a skate park than it is a cereal bowl.

Question 9

Logistic regression is a model used for prediction of the probability of occurrence of an event. It
makes use of several variables that may be......

Accepted Answer

Both 1 and 2 are correct

Answer

Numerical

Answer

Categorical

Answer

None of the 1 and 2 are correct
Logistic regression is a model used for prediction of the probability of occurrence of an event. It
makes use of several predictor variables that may be either numerical or categories.

Question 10

Spam filtering of the emails is an example of

Accepted Answer

Supervised learning

Answer

Unsupervised learning

Answer

Clustering

Answer

1 and 3 are correct

Answer

2 and 3 are correct
Clustering is an example of unsupervised learning. The clustering algorithm finds groups within the
data without being told what to look for upfront. This contrasts with classification, an example of
supervised machine learning, which is the process of determining to which class an observation
belongs. A common application of classification is spam filtering. With spam filtering we use labeled
data to train the classifier: e-mails marked as spam or ham.

Quiz Databricks Certified Professional Data Scientist

Free Test - Simulator Databricks Certified Professional Data Scientist

Databricks Certified Professional Data Scientist Practice test unlocks all online simulator questions

What to expect from our Databricks Certified Professional Data Scientist practice tests and how to prepare for any exam?