Data Analysis with Python - Model Evaluation and Refinement
import pandas as pd
import numpy as np
# Import clean data
path = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DA0101EN/module_5_auto.csv'
df = pd.read_csv(path)
df.to_csv('module_5_auto.csv')
First lets only use numeric data
df=df._get_numeric_data()
df.head()
Libraries for plotting
%%capture
! pip install ipywidgets
from IPython.display import display
from IPython.html import widgets
from IPython.display import display
from ipywidgets import interact, interactive, fixed, interact_manual
Functions for plotting
def DistributionPlot(RedFunction, BlueFunction, RedName, BlueName, Title):
width = 12
height = 10
plt.figure(figsize=(width, height))
ax1 = sns.distplot(RedFunction, hist=False, color="r", label=RedName)
ax2 = sns.distplot(BlueFunction, hist=False, color="b", label=BlueName, ax=ax1)
plt.title(Title)
plt.xlabel('Price (in dollars)')
plt.ylabel('Proportion of Cars')
plt.show()
plt.close()
def PollyPlot(xtrain, xtest, y_train, y_test, lr,poly_transform):
width = 12
height = 10
plt.figure(figsize=(width, height))
#training data
#testing data
# lr: linear regression object
#poly_transform: polynomial transformation object
xmax=max([xtrain.values.max(), xtest.values.max()])
xmin=min([xtrain.values.min(), xtest.values.min()])
x=np.arange(xmin, xmax, 0.1)
plt.plot(xtrain, y_train, 'ro', label='Training Data')
plt.plot(xtest, y_test, 'go', label='Test Data')
plt.plot(x, lr.predict(poly_transform.fit_transform(x.reshape(-1, 1))), label='Predicted Function')
plt.ylim([-10000, 60000])
plt.ylabel('Price')
plt.legend()
Part 1: Training and Testing
y_data = df['price']
x_data=df.drop('price',axis=1)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.15, random_state=1)
print("number of test samples :", x_test.shape[0])
print("number of training samples:",x_train.shape[0])
number of test samples : 31
number of training samples: 170
x_train1, x_test1, y_train1, y_test1 = train_test_split(x_data, y_data, test_size=0.4, random_state=0)
print("number of test samples :", x_test1.shape[0])
print("number of training samples:",x_train1.shape[0])
number of test samples : 81
number of training samples: 120
Let's import LinearRegression from the module linear_model.
from sklearn.linear_model import LinearRegression
We create a Linear Regression object:
lre=LinearRegression()
we fit the model using the feature horsepower
lre.fit(x_train[['horsepower']], y_train)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
normalize=False)
Let's Calculate the R^2 on the test data:
lre.score(x_test[['horsepower']], y_test)
0.707688374146705
we can see the R^2 is much smaller using the test data.
lre.score(x_train[['horsepower']], y_train)
Find the R^2 on the test data using 90% of the data for training data
x_train1, x_test1, y_train1, y_test1 = train_test_split(x_data, y_data, test_size=0.1, random_state=0)
lre.fit(x_train1[['horsepower']],y_train1)
lre.score(x_test1[['horsepower']],y_test1)
0.7340722810055448
The blog covers various articles and posts on Cloud, Big Data Analytics, Data Science, Machine Learning, DevOps, Full Stack Development, Java and Middleware Technologies
Wednesday, October 16, 2019
Thursday, October 3, 2019
Polynomial Regression
import matplotlib.pyplot as plt
import pandas as pd
import pylab as pl
import numpy as np
%matplotlib inline
!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv
Understanding the Data
FuelConsumption.csv:
We have downloaded a fuel consumption dataset, FuelConsumption.csv, which contains model-specific fuel consumption ratings and estimated carbon dioxide emissions for new light-duty vehicles for retail sale in Canada. Dataset source
MODELYEAR e.g. 2014
MAKE e.g. Acura
MODEL e.g. ILX
VEHICLE CLASS e.g. SUV
ENGINE SIZE e.g. 4.7
CYLINDERS e.g 6
TRANSMISSION e.g. A6
FUEL CONSUMPTION in CITY(L/100 km) e.g. 9.9
FUEL CONSUMPTION in HWY (L/100 km) e.g. 8.9
FUEL CONSUMPTION COMB (L/100 km) e.g. 9.2
CO2 EMISSIONS (g/km) e.g. 182 --> low --> 0
Reading the data in
df = pd.read_csv("FuelConsumption.csv")
# take a look at the dataset
df.head()
MODELYEAR MAKE MODEL VEHICLECLASS ENGINESIZE CYLINDERS TRANSMISSION FUELTYPE FUELCONSUMPTION_CITY FUELCONSUMPTION_HWY FUELCONSUMPTION_COMB FUELCONSUMPTION_COMB_MPG CO2EMISSIONS
0 2014 ACURA ILX COMPACT 2.0 4 AS5 Z 9.9 6.7 8.5 33 196
1 2014 ACURA ILX COMPACT 2.4 4 M6 Z 11.2 7.7 9.6 29 221
2 2014 ACURA ILX HYBRID COMPACT 1.5 4 AV7 Z 6.0 5.8 5.9 48 136
3 2014 ACURA MDX 4WD SUV - SMALL 3.5 6 AS6 Z 12.7 9.1 11.1 25 255
4 2014 ACURA RDX AWD SUV - SMALL 3.5 6 AS6 Z 12.1 8.7 10.6 27 244
Lets select some features that we want to use for regression.
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
cdf.head(9)
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
cdf.head(9)
ENGINESIZE CYLINDERS FUELCONSUMPTION_COMB CO2EMISSIONS
0 2.0 4 8.5 196
1 2.4 4 9.6 221
2 1.5 4 5.9 136
3 3.5 6 11.1 255
4 3.5 6 10.6 244
5 3.5 6 10.0 230
6 3.5 6 10.1 232
7 3.7 6 11.1 255
8 3.7 6 11.6 267
Lets plot Emission values with respect to Engine size:
plt.scatter(cdf.ENGINESIZE, cdf.CO2EMISSIONS, color='blue')
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()
Creating train and test dataset
Train/Test Split involves splitting the dataset into training and testing sets respectively, which are mutually exclusive. After which, you train with the training set and test with the testing set.
msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]
Polynomial regression¶
Sometimes, the trend of data is not really linear, and looks curvy. In this case we can use Polynomial regression methods. In fact, many different regressions exist that can be used to fit whatever the dataset looks like, such as quadratic, cubic, and so on, and it can go on and on to infinite degrees.
In essence, we can call all of these, polynomial regression, where the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial in x. Lets say you want to have a polynomial regression (let's make 2 degree polynomial):
𝑦=𝑏+𝜃1𝑥+𝜃2𝑥2
Now, the question is: how we can fit our data on this equation while we have only x values, such as Engine Size? Well, we can create a few additional features: 1, 𝑥 , and 𝑥2 .
PloynomialFeatures() function in Scikit-learn library, drives a new feature sets from the original feature set. That is, a matrix will be generated consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, lets say the original feature set has only one feature, ENGINESIZE. Now, if we select the degree of the polynomial to be 2, then it generates 3 features, degree=0, degree=1 and degree=2:
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
test_x = np.asanyarray(test[['ENGINESIZE']])
test_y = np.asanyarray(test[['CO2EMISSIONS']])
poly = PolynomialFeatures(degree=2)
train_x_poly = poly.fit_transform(train_x)
train_x_poly
array([[ 1. , 2.4 , 5.76],
[ 1. , 1.5 , 2.25],
[ 1. , 3.5 , 12.25],
...,
[ 1. , 3.2 , 10.24],
[ 1. , 3. , 9. ],
[ 1. , 3.2 , 10.24]])
fit_transform takes our x values, and output a list of our data raised from power of 0 to power of 2 (since we set the degree of our polynomial to 2).
𝑣1𝑣2⋮𝑣𝑛 ⟶ [1[1⋮[1𝑣1𝑣2⋮𝑣𝑛𝑣21]𝑣22]⋮𝑣2𝑛]
in our example
2.2.41.5⋮ ⟶ [1[1[1⋮2.2.41.5⋮4.]5.76]2.25]⋮
It looks like feature sets for multiple linear regression analysis, right? Yes. It Does. Indeed, Polynomial regression is a special case of linear regression, with the main idea of how do you select your features. Just consider replacing the 𝑥 with 𝑥1 , 𝑥21 with 𝑥2 , and so on. Then the degree 2 equation would be turn into:
𝑦=𝑏+𝜃1𝑥1+𝜃2𝑥2
Now, we can deal with it as 'linear regression' problem. Therefore, this polynomial regression is considered to be a special case of traditional multiple linear regression. So, you can use the same mechanism as linear regression to solve such a problems.
so we can use LinearRegression() function to solve it:
clf = linear_model.LinearRegression()
train_y_ = clf.fit(train_x_poly, train_y)
# The coefficients
print ('Coefficients: ', clf.coef_)
print ('Intercept: ',clf.intercept_)
Coefficients: [[ 0. 51.86678532 -1.70694689]]
Intercept: [105.78768144]
As mentioned before, Coefficient and Intercept , are the parameters of the fit curvy line. Given that it is a typical multiple linear regression, with 3 parameters, and knowing that the parameters are the intercept and coefficients of hyperplane, sklearn has estimated them from our new set of feature sets. Lets plot it:
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')
XX = np.arange(0.0, 10.0, 0.1)
yy = clf.intercept_[0]+ clf.coef_[0][1]*XX+ clf.coef_[0][2]*np.power(XX, 2)
plt.plot(XX, yy, '-r' )
plt.xlabel("Engine size")
plt.ylabel("Emission")
Text(0, 0.5, 'Emission')
Evaluation
from sklearn.metrics import r2_score
test_x_poly = poly.fit_transform(test_x)
test_y_ = clf.predict(test_x_poly)
print("Mean absolute error: %.2f" % np.mean(np.absolute(test_y_ - test_y)))
print("Residual sum of squares (MSE): %.2f" % np.mean((test_y_ - test_y) ** 2))
print("R2-score: %.2f" % r2_score(test_y_ , test_y) )
Mean absolute error: 24.77
Residual sum of squares (MSE): 1073.01
R2-score: 0.63
Practice
Try to use a polynomial regression with the dataset but this time with degree three (cubic). Does it result in better accuracy?
# write your code here
poly3 = PolynomialFeatures(degree=3)
train_x_poly3 = poly3.fit_transform(train_x)
clf3 = linear_model.LinearRegression()
train_y3_ = clf3.fit(train_x_poly3, train_y)
# The coefficients
print ('Coefficients: ', clf3.coef_)
print ('Intercept: ',clf3.intercept_)
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')
XX = np.arange(0.0, 10.0, 0.1)
yy = clf3.intercept_[0]+ clf3.coef_[0][1]*XX + clf3.coef_[0][2]*np.power(XX, 2) + clf3.coef_[0][3]*np.power(XX, 3)
plt.plot(XX, yy, '-r' )
plt.xlabel("Engine size")
plt.ylabel("Emission")
test_x_poly3 = poly3.fit_transform(test_x)
test_y3_ = clf3.predict(test_x_poly3)
print("Mean absolute error: %.2f" % np.mean(np.absolute(test_y3_ - test_y)))
print("Residual sum of squares (MSE): %.2f" % np.mean((test_y3_ - test_y) ** 2))
print("R2-score: %.2f" % r2_score(test_y3_ , test_y) )
Coefficients: [[ 0. 32.67645817 3.58375078 -0.43783753]]
Intercept: [126.05296936]
Mean absolute error: 24.71
Residual sum of squares (MSE): 1068.14
R2-score: 0.64
import pandas as pd
import pylab as pl
import numpy as np
%matplotlib inline
!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv
Understanding the Data
FuelConsumption.csv:
We have downloaded a fuel consumption dataset, FuelConsumption.csv, which contains model-specific fuel consumption ratings and estimated carbon dioxide emissions for new light-duty vehicles for retail sale in Canada. Dataset source
MODELYEAR e.g. 2014
MAKE e.g. Acura
MODEL e.g. ILX
VEHICLE CLASS e.g. SUV
ENGINE SIZE e.g. 4.7
CYLINDERS e.g 6
TRANSMISSION e.g. A6
FUEL CONSUMPTION in CITY(L/100 km) e.g. 9.9
FUEL CONSUMPTION in HWY (L/100 km) e.g. 8.9
FUEL CONSUMPTION COMB (L/100 km) e.g. 9.2
CO2 EMISSIONS (g/km) e.g. 182 --> low --> 0
Reading the data in
df = pd.read_csv("FuelConsumption.csv")
# take a look at the dataset
df.head()
MODELYEAR MAKE MODEL VEHICLECLASS ENGINESIZE CYLINDERS TRANSMISSION FUELTYPE FUELCONSUMPTION_CITY FUELCONSUMPTION_HWY FUELCONSUMPTION_COMB FUELCONSUMPTION_COMB_MPG CO2EMISSIONS
0 2014 ACURA ILX COMPACT 2.0 4 AS5 Z 9.9 6.7 8.5 33 196
1 2014 ACURA ILX COMPACT 2.4 4 M6 Z 11.2 7.7 9.6 29 221
2 2014 ACURA ILX HYBRID COMPACT 1.5 4 AV7 Z 6.0 5.8 5.9 48 136
3 2014 ACURA MDX 4WD SUV - SMALL 3.5 6 AS6 Z 12.7 9.1 11.1 25 255
4 2014 ACURA RDX AWD SUV - SMALL 3.5 6 AS6 Z 12.1 8.7 10.6 27 244
Lets select some features that we want to use for regression.
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
cdf.head(9)
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
cdf.head(9)
ENGINESIZE CYLINDERS FUELCONSUMPTION_COMB CO2EMISSIONS
0 2.0 4 8.5 196
1 2.4 4 9.6 221
2 1.5 4 5.9 136
3 3.5 6 11.1 255
4 3.5 6 10.6 244
5 3.5 6 10.0 230
6 3.5 6 10.1 232
7 3.7 6 11.1 255
8 3.7 6 11.6 267
Lets plot Emission values with respect to Engine size:
plt.scatter(cdf.ENGINESIZE, cdf.CO2EMISSIONS, color='blue')
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()
Creating train and test dataset
Train/Test Split involves splitting the dataset into training and testing sets respectively, which are mutually exclusive. After which, you train with the training set and test with the testing set.
msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]
Polynomial regression¶
Sometimes, the trend of data is not really linear, and looks curvy. In this case we can use Polynomial regression methods. In fact, many different regressions exist that can be used to fit whatever the dataset looks like, such as quadratic, cubic, and so on, and it can go on and on to infinite degrees.
In essence, we can call all of these, polynomial regression, where the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial in x. Lets say you want to have a polynomial regression (let's make 2 degree polynomial):
𝑦=𝑏+𝜃1𝑥+𝜃2𝑥2
Now, the question is: how we can fit our data on this equation while we have only x values, such as Engine Size? Well, we can create a few additional features: 1, 𝑥 , and 𝑥2 .
PloynomialFeatures() function in Scikit-learn library, drives a new feature sets from the original feature set. That is, a matrix will be generated consisting of all polynomial combinations of the features with degree less than or equal to the specified degree. For example, lets say the original feature set has only one feature, ENGINESIZE. Now, if we select the degree of the polynomial to be 2, then it generates 3 features, degree=0, degree=1 and degree=2:
from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
test_x = np.asanyarray(test[['ENGINESIZE']])
test_y = np.asanyarray(test[['CO2EMISSIONS']])
poly = PolynomialFeatures(degree=2)
train_x_poly = poly.fit_transform(train_x)
train_x_poly
array([[ 1. , 2.4 , 5.76],
[ 1. , 1.5 , 2.25],
[ 1. , 3.5 , 12.25],
...,
[ 1. , 3.2 , 10.24],
[ 1. , 3. , 9. ],
[ 1. , 3.2 , 10.24]])
fit_transform takes our x values, and output a list of our data raised from power of 0 to power of 2 (since we set the degree of our polynomial to 2).
𝑣1𝑣2⋮𝑣𝑛 ⟶ [1[1⋮[1𝑣1𝑣2⋮𝑣𝑛𝑣21]𝑣22]⋮𝑣2𝑛]
in our example
2.2.41.5⋮ ⟶ [1[1[1⋮2.2.41.5⋮4.]5.76]2.25]⋮
It looks like feature sets for multiple linear regression analysis, right? Yes. It Does. Indeed, Polynomial regression is a special case of linear regression, with the main idea of how do you select your features. Just consider replacing the 𝑥 with 𝑥1 , 𝑥21 with 𝑥2 , and so on. Then the degree 2 equation would be turn into:
𝑦=𝑏+𝜃1𝑥1+𝜃2𝑥2
Now, we can deal with it as 'linear regression' problem. Therefore, this polynomial regression is considered to be a special case of traditional multiple linear regression. So, you can use the same mechanism as linear regression to solve such a problems.
so we can use LinearRegression() function to solve it:
clf = linear_model.LinearRegression()
train_y_ = clf.fit(train_x_poly, train_y)
# The coefficients
print ('Coefficients: ', clf.coef_)
print ('Intercept: ',clf.intercept_)
Coefficients: [[ 0. 51.86678532 -1.70694689]]
Intercept: [105.78768144]
As mentioned before, Coefficient and Intercept , are the parameters of the fit curvy line. Given that it is a typical multiple linear regression, with 3 parameters, and knowing that the parameters are the intercept and coefficients of hyperplane, sklearn has estimated them from our new set of feature sets. Lets plot it:
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')
XX = np.arange(0.0, 10.0, 0.1)
yy = clf.intercept_[0]+ clf.coef_[0][1]*XX+ clf.coef_[0][2]*np.power(XX, 2)
plt.plot(XX, yy, '-r' )
plt.xlabel("Engine size")
plt.ylabel("Emission")
Text(0, 0.5, 'Emission')
Evaluation
from sklearn.metrics import r2_score
test_x_poly = poly.fit_transform(test_x)
test_y_ = clf.predict(test_x_poly)
print("Mean absolute error: %.2f" % np.mean(np.absolute(test_y_ - test_y)))
print("Residual sum of squares (MSE): %.2f" % np.mean((test_y_ - test_y) ** 2))
print("R2-score: %.2f" % r2_score(test_y_ , test_y) )
Mean absolute error: 24.77
Residual sum of squares (MSE): 1073.01
R2-score: 0.63
Practice
Try to use a polynomial regression with the dataset but this time with degree three (cubic). Does it result in better accuracy?
# write your code here
poly3 = PolynomialFeatures(degree=3)
train_x_poly3 = poly3.fit_transform(train_x)
clf3 = linear_model.LinearRegression()
train_y3_ = clf3.fit(train_x_poly3, train_y)
# The coefficients
print ('Coefficients: ', clf3.coef_)
print ('Intercept: ',clf3.intercept_)
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')
XX = np.arange(0.0, 10.0, 0.1)
yy = clf3.intercept_[0]+ clf3.coef_[0][1]*XX + clf3.coef_[0][2]*np.power(XX, 2) + clf3.coef_[0][3]*np.power(XX, 3)
plt.plot(XX, yy, '-r' )
plt.xlabel("Engine size")
plt.ylabel("Emission")
test_x_poly3 = poly3.fit_transform(test_x)
test_y3_ = clf3.predict(test_x_poly3)
print("Mean absolute error: %.2f" % np.mean(np.absolute(test_y3_ - test_y)))
print("Residual sum of squares (MSE): %.2f" % np.mean((test_y3_ - test_y) ** 2))
print("R2-score: %.2f" % r2_score(test_y3_ , test_y) )
Coefficients: [[ 0. 32.67645817 3.58375078 -0.43783753]]
Intercept: [126.05296936]
Mean absolute error: 24.71
Residual sum of squares (MSE): 1068.14
R2-score: 0.64
Labels:
Data Science
Tuesday, October 1, 2019
Multiple Linear Regression
Importing Needed packages
import matplotlib.pyplot as plt
import pandas as pd
import pylab as pl
import numpy as np
%matplotlib inline
Downloading Data
!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv
Understanding the Data
Reading the data in
df = pd.read_csv("FuelConsumption.csv")
# take a look at the dataset
df.head()
Lets plot Emission values with respect to Engine size:
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_CITY','FUELCONSUMPTION_HWY','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
cdf.head(9)
Creating train and test dataset
plt.scatter(cdf.ENGINESIZE, cdf.CO2EMISSIONS, color='blue')
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()
msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]
Train data distribution
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()
Multiple Regression Model
from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(train[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (x, y)
# The coefficients
print ('Coefficients: ', regr.coef_)
Ordinary Least Squares (OLS)
OLS is a method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by minimizing the sum of the squares of the differences between the target dependent variable and those predicted by the linear function. In other words, it tries to minimizes the sum of squared errors (SSE) or mean squared error (MSE) between the target variable (y) and our predicted output ( 𝑦̂ ) over all samples in the dataset.
Prediction
y_hat= regr.predict(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
x = np.asanyarray(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y = np.asanyarray(test[['CO2EMISSIONS']])
print("Residual sum of squares: %.2f"
% np.mean((y_hat - y) ** 2))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % regr.score(x, y))
Residual sum of squares: 485.04
Variance score: 0.88
explained variance regression score:
If y_hat is the estimated target output, y the corresponding (correct) target output, and Var is Variance, the square of the standard deviation, then the explained variance is estimated as follow:
𝚎𝚡𝚙𝚕𝚊𝚒𝚗𝚎𝚍𝚅𝚊𝚛𝚒𝚊𝚗𝚌𝚎(y𝑦,y_hat)=1−𝑉𝑎𝑟{𝑦−𝑦̂ }𝑉𝑎𝑟{𝑦}
The best possible score is 1.0, lower values are worse
import matplotlib.pyplot as plt
import pandas as pd
import pylab as pl
import numpy as np
%matplotlib inline
Downloading Data
!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv
Understanding the Data
Reading the data in
df = pd.read_csv("FuelConsumption.csv")
# take a look at the dataset
df.head()
Lets plot Emission values with respect to Engine size:
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_CITY','FUELCONSUMPTION_HWY','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
cdf.head(9)
Creating train and test dataset
plt.scatter(cdf.ENGINESIZE, cdf.CO2EMISSIONS, color='blue')
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()
msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]
Train data distribution
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()
Multiple Regression Model
from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(train[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (x, y)
# The coefficients
print ('Coefficients: ', regr.coef_)
Ordinary Least Squares (OLS)
OLS is a method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by minimizing the sum of the squares of the differences between the target dependent variable and those predicted by the linear function. In other words, it tries to minimizes the sum of squared errors (SSE) or mean squared error (MSE) between the target variable (y) and our predicted output ( 𝑦̂ ) over all samples in the dataset.
Prediction
y_hat= regr.predict(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
x = np.asanyarray(test[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB']])
y = np.asanyarray(test[['CO2EMISSIONS']])
print("Residual sum of squares: %.2f"
% np.mean((y_hat - y) ** 2))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % regr.score(x, y))
Residual sum of squares: 485.04
Variance score: 0.88
explained variance regression score:
If y_hat is the estimated target output, y the corresponding (correct) target output, and Var is Variance, the square of the standard deviation, then the explained variance is estimated as follow:
𝚎𝚡𝚙𝚕𝚊𝚒𝚗𝚎𝚍𝚅𝚊𝚛𝚒𝚊𝚗𝚌𝚎(y𝑦,y_hat)=1−𝑉𝑎𝑟{𝑦−𝑦̂ }𝑉𝑎𝑟{𝑦}
The best possible score is 1.0, lower values are worse
Labels:
Data Science
Simple Linear Regression
import matplotlib.pyplot as plt
import pandas as pd
import pylab as pl
import numpy as np
%matplotlib inline
!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv
df = pd.read_csv("FuelConsumption.csv")
# take a look at the dataset
df.head()
# summarize the data
df.describe()
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
cdf.head(9)
viz = cdf[['CYLINDERS','ENGINESIZE','CO2EMISSIONS','FUELCONSUMPTION_COMB']]
viz.hist()
plt.show()
plt.scatter(cdf.FUELCONSUMPTION_COMB, cdf.CO2EMISSIONS, color='blue')
plt.xlabel("FUELCONSUMPTION_COMB")
plt.ylabel("Emission")
plt.show()
plt.scatter(cdf.ENGINESIZE, cdf.CO2EMISSIONS, color='blue')
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()
plt.scatter(cdf.CYLINDERS, cdf.CO2EMISSIONS, color='blue')
plt.xlabel("Cylinders")
plt.ylabel("Emission")
plt.show()
msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()
from sklearn import linear_model
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (train_x, train_y)
# The coefficients
print ('Coefficients: ', regr.coef_)
print ('Intercept: ',regr.intercept_)
Coefficients: [[39.30843025]]
Intercept: [125.07479893]
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')
plt.plot(train_x, regr.coef_[0][0]*train_x + regr.intercept_[0], '-r')
plt.xlabel("Engine size")
plt.ylabel("Emission")
Text(0, 0.5, 'Emission')
from sklearn.metrics import r2_score
test_x = np.asanyarray(test[['ENGINESIZE']])
test_y = np.asanyarray(test[['CO2EMISSIONS']])
test_y_hat = regr.predict(test_x)
print("Mean absolute error: %.2f" % np.mean(np.absolute(test_y_hat - test_y)))
print("Residual sum of squares (MSE): %.2f" % np.mean((test_y_hat - test_y) ** 2))
print("R2-score: %.2f" % r2_score(test_y_hat , test_y) )
Mean absolute error: 23.61
Residual sum of squares (MSE): 959.87
R2-score: 0.70
import pandas as pd
import pylab as pl
import numpy as np
%matplotlib inline
!wget -O FuelConsumption.csv https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/ML0101ENv3/labs/FuelConsumptionCo2.csv
df = pd.read_csv("FuelConsumption.csv")
# take a look at the dataset
df.head()
# summarize the data
df.describe()
cdf = df[['ENGINESIZE','CYLINDERS','FUELCONSUMPTION_COMB','CO2EMISSIONS']]
cdf.head(9)
viz = cdf[['CYLINDERS','ENGINESIZE','CO2EMISSIONS','FUELCONSUMPTION_COMB']]
viz.hist()
plt.show()
plt.scatter(cdf.FUELCONSUMPTION_COMB, cdf.CO2EMISSIONS, color='blue')
plt.xlabel("FUELCONSUMPTION_COMB")
plt.ylabel("Emission")
plt.show()
plt.scatter(cdf.ENGINESIZE, cdf.CO2EMISSIONS, color='blue')
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()
plt.scatter(cdf.CYLINDERS, cdf.CO2EMISSIONS, color='blue')
plt.xlabel("Cylinders")
plt.ylabel("Emission")
plt.show()
msk = np.random.rand(len(df)) < 0.8
train = cdf[msk]
test = cdf[~msk]
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')
plt.xlabel("Engine size")
plt.ylabel("Emission")
plt.show()
from sklearn import linear_model
regr = linear_model.LinearRegression()
train_x = np.asanyarray(train[['ENGINESIZE']])
train_y = np.asanyarray(train[['CO2EMISSIONS']])
regr.fit (train_x, train_y)
# The coefficients
print ('Coefficients: ', regr.coef_)
print ('Intercept: ',regr.intercept_)
Coefficients: [[39.30843025]]
Intercept: [125.07479893]
plt.scatter(train.ENGINESIZE, train.CO2EMISSIONS, color='blue')
plt.plot(train_x, regr.coef_[0][0]*train_x + regr.intercept_[0], '-r')
plt.xlabel("Engine size")
plt.ylabel("Emission")
Text(0, 0.5, 'Emission')
from sklearn.metrics import r2_score
test_x = np.asanyarray(test[['ENGINESIZE']])
test_y = np.asanyarray(test[['CO2EMISSIONS']])
test_y_hat = regr.predict(test_x)
print("Mean absolute error: %.2f" % np.mean(np.absolute(test_y_hat - test_y)))
print("Residual sum of squares (MSE): %.2f" % np.mean((test_y_hat - test_y) ** 2))
print("R2-score: %.2f" % r2_score(test_y_hat , test_y) )
Mean absolute error: 23.61
Residual sum of squares (MSE): 959.87
R2-score: 0.70
Labels:
Data Science
Saturday, September 28, 2019
Major Machine Learning Techniques
Major Machine Learning Techniques
* Regression/Estimation
-> Predicting continuous values
Algorithms: Linear Regression, Non-Linear Regression, Multiple Linear Regression
* Classification
-> Predicting the item class/category of a case
Algorithms: K-Nearest Neighbours, Decision Trees, Logistic Regression, Support Vector Machine
* Clustering
-> Finding the structure of data; summarization
Algorithms: k-Means Clustering, Hierarchical Clustering, Density-based Clustering
* Associations
-> Assoicating frequent co-occurring items/events
* Anomaly Detection
-> Discovering abnormal and unusual cases
* Sequence Mining
-> Predicting next events; click-stream (Markov Model, HMM)
* Dimension Reduction
-> Reducing the size of data (PCA)
* Recommendation Systems
-> Recommending items
Algorithms: Content-based Recommendation Engines, Collaborative Filtering
* Regression/Estimation
-> Predicting continuous values
Algorithms: Linear Regression, Non-Linear Regression, Multiple Linear Regression
* Classification
-> Predicting the item class/category of a case
Algorithms: K-Nearest Neighbours, Decision Trees, Logistic Regression, Support Vector Machine
* Clustering
-> Finding the structure of data; summarization
Algorithms: k-Means Clustering, Hierarchical Clustering, Density-based Clustering
* Associations
-> Assoicating frequent co-occurring items/events
* Anomaly Detection
-> Discovering abnormal and unusual cases
* Sequence Mining
-> Predicting next events; click-stream (Markov Model, HMM)
* Dimension Reduction
-> Reducing the size of data (PCA)
* Recommendation Systems
-> Recommending items
Algorithms: Content-based Recommendation Engines, Collaborative Filtering
Data Science Methodology
Data Science Methodology
- Business Understanding
- Analytic Approach
- Data Requirements
- Data Collection
- Data Understanding
- Data Preparation
- Modelling
- Evaluation
- Deployment
- Feedback
Labels:
Data Science
Thursday, September 12, 2019
Docker: Dockerfile
===Dockerfile===
FROM ubuntu:15.04
COPY . /app
RUN make /app
CMD python /app/app.py
use .dockerignore
===Working with Instructions===
sudo yum install git -y
mkdir docker_images
cd docker_images
mkdir weather-app
cd weather-app
git clone https://github.com/linuxacademy/content-weather-app.git src
vi Dockerfile
FROM node
LABEL org.label-schema.version=v1.1
RUN mkdir -p /var/node
ADD src/ /var/node
WORKDIR /var/node
RUN npm install
EXPOSE 3000
CMD ./bin/www
docker image build -t linuxacademy/weather-app:v1 .
docker image ls
docker container run -d --name weather-app1 -p 8081:3000 linuxacademy/weather-app:v1
docker container ls
curl localhost:8081
===Environment Variables===
cd docker_images
mkdir env
cd env
git clone https://github.com/linuxacademy/content-weather-app.git src
vi Dockerfile
FROM node
LABEL org.label-schema.version=v1.1
ENV NODE_ENV="development"
ENV PORT 3000
RUN mkdir -p /var/node
ADD src/ /var/node/
WORKDIR /var/node
RUN npm install
EXPOSE $PORT
CMD ./bin/www
docker image build -t linuxacademy/weather-app:v2 .
docker image ls
docker image inspect linuxacademy/weather-app:v2
docker container run -d --name weather-dev -p 8082:3001 --env PORT=3001 linuxacademy/weather-app:v2
docker container ls
curl localhost:8082
docker container inspect weather-dev
docker container run -d --name weather-app2 -p 8083:3001 --env PORT=3001 --env NODE_ENV=production linuxacademy/weather-app:v2
docker container inspect weather-app2
curl localhost:8083
docker container logs weather-app2
docker container run -d --name weather-prod -p 8084:3000 --env NODE_ENV=production linuxacademy/weather-app:v2
docker container logs weather-prod
curl localhost:8084
===Build Args===
cd docker_images
mkdir args
cd args
git clone https://github.com/linuxacademy/content-weather-app.git src
vi Dockerfile
FROM node
LABEL org.label-schema.version=v1.1
ARG SRC_DIR=/var/node
RUN mkdir -p $SRC_DIR
ADD src/ $SRC_DIR
WORKDIR $SRC_DIR
RUN npm install
EXPOSE 3000
CMD ./bin/www
docker image build -t linuxacademy/weather-app:v3 --build-arg SRC_DIR=/var/code .
docker image inspect linuxacademy/weather-app:v3 | grep WorkingDir
docker container run -d --name weather-app3 -p 8085:3000 linuxacademy/weather-app:v3
curl localhost:8085
===Working with Non-privileged User===
cd docker_images
mkdir non-privileged-user
cd non-privileged-user
vi Dockerfile
FROM centos:latest
RUN useradd -ms /bin/bash cloud_user
USER cloud_user
docker image build -t centos7/nonroot:v1 .
docker container run -it --name test-build centos7/nonroot:v1 /bin/bash
bash$ sudo su
bash$ su -
bash$ exit
docker container ls
docker container start test-build
docker container exec -u 0 -it test-build /bin/bash
bash$ whoami
bash$ exit
cd ~/docker_images
mkdir node-non-privileged-user
cd node-non-privileged-user
vi Dockerfile
FROM node
LABEL org.label-schema.version=v1.1
RUN useradd -ms /bin/bash node_user
USER node_user
ADD src/ /home/node_user
WORKDIR /home/node_user
RUN npm install
EXPOSE 3000
CMD ./bin/www
git clone https://github.com/linuxacademy/content-weather-app.git src
docker image build -t linuxacademy/weather-app-nonroot:v1 .
docker container run -d --name weather-app-nonroot -p 8086:3000 linuxacademy/weather-app-nonroot:v1
curl localhost:8086
===Order of Execution===
cd docker_images
mkdir centos-conf
cd centos-conf
vi Dockerfile
FROM centos:latest
RUN mkdir -p ~/new-dir1
RUN useradd -ms /bin/bash cloud_user
RUN mkdir -p /etc/myconf
RUN echo "Some config data" >> /etc/myconf/my.conf
USER cloud_user
RUN mkdir -p ~/new-dir2
docker image build -t centos7/myconf:v1 .
===Using the Volume Instruction===
cd docker_images
mkdir volumes
cd volumes
vi Dockerfile
FROM nginx:latest
VOLUME ["/usr/share/nginx/html/"]
docker image build -t linuxacademy/nginx:v1 .
docker container run -d --name nginx-volume linuxacademy/nginx:v1
docker container inspect nginx-volume
docker volume inspect volume-id
sudo ls -la /var/lib/docker/volumes/volume-id/_data
FROM ubuntu:15.04
COPY . /app
RUN make /app
CMD python /app/app.py
use .dockerignore
===Working with Instructions===
sudo yum install git -y
mkdir docker_images
cd docker_images
mkdir weather-app
cd weather-app
git clone https://github.com/linuxacademy/content-weather-app.git src
vi Dockerfile
FROM node
LABEL org.label-schema.version=v1.1
RUN mkdir -p /var/node
ADD src/ /var/node
WORKDIR /var/node
RUN npm install
EXPOSE 3000
CMD ./bin/www
docker image build -t linuxacademy/weather-app:v1 .
docker image ls
docker container run -d --name weather-app1 -p 8081:3000 linuxacademy/weather-app:v1
docker container ls
curl localhost:8081
===Environment Variables===
cd docker_images
mkdir env
cd env
git clone https://github.com/linuxacademy/content-weather-app.git src
vi Dockerfile
FROM node
LABEL org.label-schema.version=v1.1
ENV NODE_ENV="development"
ENV PORT 3000
RUN mkdir -p /var/node
ADD src/ /var/node/
WORKDIR /var/node
RUN npm install
EXPOSE $PORT
CMD ./bin/www
docker image build -t linuxacademy/weather-app:v2 .
docker image ls
docker image inspect linuxacademy/weather-app:v2
docker container run -d --name weather-dev -p 8082:3001 --env PORT=3001 linuxacademy/weather-app:v2
docker container ls
curl localhost:8082
docker container inspect weather-dev
docker container run -d --name weather-app2 -p 8083:3001 --env PORT=3001 --env NODE_ENV=production linuxacademy/weather-app:v2
docker container inspect weather-app2
curl localhost:8083
docker container logs weather-app2
docker container run -d --name weather-prod -p 8084:3000 --env NODE_ENV=production linuxacademy/weather-app:v2
docker container logs weather-prod
curl localhost:8084
===Build Args===
cd docker_images
mkdir args
cd args
git clone https://github.com/linuxacademy/content-weather-app.git src
vi Dockerfile
FROM node
LABEL org.label-schema.version=v1.1
ARG SRC_DIR=/var/node
RUN mkdir -p $SRC_DIR
ADD src/ $SRC_DIR
WORKDIR $SRC_DIR
RUN npm install
EXPOSE 3000
CMD ./bin/www
docker image build -t linuxacademy/weather-app:v3 --build-arg SRC_DIR=/var/code .
docker image inspect linuxacademy/weather-app:v3 | grep WorkingDir
docker container run -d --name weather-app3 -p 8085:3000 linuxacademy/weather-app:v3
curl localhost:8085
===Working with Non-privileged User===
cd docker_images
mkdir non-privileged-user
cd non-privileged-user
vi Dockerfile
FROM centos:latest
RUN useradd -ms /bin/bash cloud_user
USER cloud_user
docker image build -t centos7/nonroot:v1 .
docker container run -it --name test-build centos7/nonroot:v1 /bin/bash
bash$ sudo su
bash$ su -
bash$ exit
docker container ls
docker container start test-build
docker container exec -u 0 -it test-build /bin/bash
bash$ whoami
bash$ exit
cd ~/docker_images
mkdir node-non-privileged-user
cd node-non-privileged-user
vi Dockerfile
FROM node
LABEL org.label-schema.version=v1.1
RUN useradd -ms /bin/bash node_user
USER node_user
ADD src/ /home/node_user
WORKDIR /home/node_user
RUN npm install
EXPOSE 3000
CMD ./bin/www
git clone https://github.com/linuxacademy/content-weather-app.git src
docker image build -t linuxacademy/weather-app-nonroot:v1 .
docker container run -d --name weather-app-nonroot -p 8086:3000 linuxacademy/weather-app-nonroot:v1
curl localhost:8086
===Order of Execution===
cd docker_images
mkdir centos-conf
cd centos-conf
vi Dockerfile
FROM centos:latest
RUN mkdir -p ~/new-dir1
RUN useradd -ms /bin/bash cloud_user
RUN mkdir -p /etc/myconf
RUN echo "Some config data" >> /etc/myconf/my.conf
USER cloud_user
RUN mkdir -p ~/new-dir2
docker image build -t centos7/myconf:v1 .
===Using the Volume Instruction===
cd docker_images
mkdir volumes
cd volumes
vi Dockerfile
FROM nginx:latest
VOLUME ["/usr/share/nginx/html/"]
docker image build -t linuxacademy/nginx:v1 .
docker container run -d --name nginx-volume linuxacademy/nginx:v1
docker container inspect nginx-volume
docker volume inspect volume-id
sudo ls -la /var/lib/docker/volumes/volume-id/_data
Labels:
docker
Wednesday, September 11, 2019
Docker: Container Logging
docker container run --name weather-app -d -p 80:3000 linuxacademycontent/weather-app
docker container ls
docker container logs container_id
docker container logs container_id
docker container run -d --name ghost_blog \
-e database__client=mysql \
-e database__connection_host=mysql \
-e database__connection_user=root \
-e database__connection_password=password \
-e database__connection_database=ghost \
-p 8080:2368 \
ghost:1-alpine
docker container ls
docker container ls -a
docker container logs container_id
===Summary===
Create a container using the weather-app image.
docker container run --name weather-app -d -p 80:3000 linuxacademycontent/weather-app
Show information logged by a running container:
docker container logs [NAME]
Show information logged by all containers participating in a service:
docker service logs [SERVICE]
Logs need to be output to
STDOUT
and STDERR
.
Nginx Example:
RUN ln -sf /dev/stdout /var/log/nginx/access.log \
&& ln -sf /dev/stderr /var/log/nginx/error.log
Debug a failed container deploy:
docker container run -d --name ghost_blog \
-e database__client=mysql \
-e database__connection__host=mysql \
-e database__connection__user=root \
-e database__connection__password=P4sSw0rd0! \
-e database__connection__database=ghost \
-p 8080:2368 \
ghost:1-alpine
Labels:
docker
Docker: Executing Container Commands
docker container run -d nginx
docker container ls
docker container run -it nginx /bin/bash
# nginx -g 'daemon off;'
docker container ls
docker container inspect container_id
curl 172.17.0.3
# exit
docker container ls
docker container ls -a
docker container exec -it container_id ls /usr/share/nginx/html
docker container exec -it container_id /bin/bash
# apt-get update -y
# exit
docker container prune
docker container rm -f container_id
===Summary===
docker container ls
docker container run -it nginx /bin/bash
# nginx -g 'daemon off;'
docker container ls
docker container inspect container_id
curl 172.17.0.3
# exit
docker container ls
docker container ls -a
docker container exec -it container_id ls /usr/share/nginx/html
docker container exec -it container_id /bin/bash
# apt-get update -y
# exit
docker container prune
docker container rm -f container_id
===Summary===
Executing a command:
- Dockerfile
- During a Docker run
- Using the exec command
Commands can be:
- One and done Commands
- Long running Commands
Start a container with a command:
docker container run [IMAGE] [CMD]
Execute a command on a container:
docker container exec -it [NAME] [CMD]
Example:
docker container run -d -p 8080:80 nginx
docker container ps
docker container exec -it [NAME] /bin/bash
docker container exec -it [NAME] ls /usr/share/nginx/html/
Labels:
docker
Docker: Exposing and Publishing Container Ports
===Exposing Container Ports===
docker container run -d nginx
docker container ls
curl localhost
docker inspect container container_id
curl 172.17.0.2
docker container run -d --expose 3000 nginx
docker container ls
docker rm -f container_id
docker container run -d --expose 3000 -p 80:3000 nginx
docker container ls
curl localhost:3000
connection refused
docker container rm -f container_id
docker container run -d --expose 3000 -p 8080:80 nginx
curl localhost:8080
docker container run -d -p 8081:80/tcp -p 8081:80/udp nginx
curl localhost:8081
docker container run -d -P nginx
docker container ls
curl localhost:32768
docker container port container_id
docker container run -d nginx
docker container ls
curl localhost
docker inspect container container_id
curl 172.17.0.2
docker container run -d --expose 3000 nginx
docker container ls
docker rm -f container_id
docker container run -d --expose 3000 -p 80:3000 nginx
docker container ls
curl localhost:3000
connection refused
docker container rm -f container_id
docker container run -d --expose 3000 -p 8080:80 nginx
curl localhost:8080
docker container run -d -p 8081:80/tcp -p 8081:80/udp nginx
curl localhost:8081
docker container run -d -P nginx
docker container ls
curl localhost:32768
docker container port container_id
===Summary===
Exposing:
- Expose a port or a range of ports
- This does not publish the port
- Use
--expose [PORT]
docker container run --expose 1234 [IMAGE]
Publishing:
- Maps a container's port to a host`s port
-p
or--publish
publishes a container's port(s) to the host-P
, or--publish-all
publishes all exposed ports to random ports
docker container run -p [HOST_PORT]:[CONTAINER_PORT] [IMAGE]
docker container run -p [HOST_PORT]:[CONTAINER_PORT]/tcp -p [HOST_PORT]:[CONTAINER_PORT]/udp [IMAGE]
docker container run -P
Lists all port mappings or a specific mapping for a container:
docker container port [Container_NAME]
Labels:
docker
Docker - Creating Containers
docker container run --help
docker container run busybox
docker container ls
docker container ls -a
docker container run --rm busybox
docker container ls -a
docker container run nginx
docker container run -d nginx
docker container ls
docker container ls -a
docker container run -it busybox
# ls
# exit
docker container prune -f
docker container run --name my_busybox busybox
docker container ls -a
===Summary===
docker container run busybox
docker container ls
docker container ls -a
docker container run --rm busybox
docker container ls -a
docker container run nginx
docker container run -d nginx
docker container ls
docker container ls -a
docker container run -it busybox
# ls
# exit
docker container prune -f
docker container run --name my_busybox busybox
docker container ls -a
===Summary===
docker container run
:--help
Print usage--rm
Automatically remove the container when it exits-d
,--detach
Run container in background and print container ID-i
,--interactive
Keep STDIN open even if not attached--name string
Assign a name to the container-p
,--publish list
Publish a container's port(s) to the host-t
,--tty
Allocate a pseudo-TTY-v
,--volume list
Mount a volume (the bind type of mount)--mount mount
Attach a filesystem mount to the container--network string
Connect a container to a network (default "default")
Create a container and attach to it:
docker container run –it busybox
Create a container and run it in the background:
docker container run –d nginx
Create a container that you name and run it in the background:
docker container run –d –name myContainer busybox
Labels:
docker
Docker Commands
docker -h | more
docker image -h
docker image ls -h
docker image ls
docker image pull nginx
docker image ls
docker image inspect image_id
docker container -h
docker container ls
docker container run busybox
docker container ls
docker container ls -a
docker container run -P -d nginx
docker container ps
docker container inspect container_id
curl http://172.17.0.2
docker container inspect container_id
docker container top container_id
docker container ls
docker container attach container_id
docker container ls
docker container ls -a
docker container start container_id
docker container ls
docker container stop container_id
docker container start container_id
docker container logs container_id
docker container ls
curl localhost:32774
docker container logs container_id
docker container ls
docker container stats container_id
docker container exec -it container_id /bin/bash
# ls
# ls /usr/share/nginx/html/
# exit
docker container ls
docker container exec -it container_id ls /usr/share/nginx/html/
docker container ls
docker container pause container_id
docker container ls
docker container unpause container_id
docker container ls -a
docker container rm -f container_id
docker container ls -a
docker container prune
docker container ls -a
docker container prune -h
docker container prune -f
===Summary===
docker image -h
docker image ls -h
docker image ls
docker image pull nginx
docker image ls
docker image inspect image_id
docker container -h
docker container ls
docker container run busybox
docker container ls
docker container ls -a
docker container run -P -d nginx
docker container ps
docker container inspect container_id
curl http://172.17.0.2
docker container inspect container_id
docker container top container_id
docker container ls
docker container attach container_id
docker container ls
docker container ls -a
docker container start container_id
docker container ls
docker container stop container_id
docker container start container_id
docker container logs container_id
docker container ls
curl localhost:32774
docker container logs container_id
docker container ls
docker container stats container_id
docker container exec -it container_id /bin/bash
# ls
# ls /usr/share/nginx/html/
# exit
docker container ls
docker container exec -it container_id ls /usr/share/nginx/html/
docker container ls
docker container pause container_id
docker container ls
docker container unpause container_id
docker container ls -a
docker container rm -f container_id
docker container ls -a
docker container prune
docker container ls -a
docker container prune -h
docker container prune -f
===Summary===
Get a list of all of the Docker commands:
docker -h
Management command were introduced in Docker engine v1.13
Management Commands:
builder
Manage buildsconfig
Manage Docker configscontainer
Manage containersengine
Manage the docker engineimage
Manage imagesnetwork
Manage networksnode
Manage Swarm nodesplugin
Manage pluginssecret
Manage Docker secretsservice
Manage servicesstack
Manage Docker stacksswarm
Manage Swarmsystem
Manage Dockertrust
Manage trust on Docker imagesvolume
Manage volumes
docker image
:build
Build an image from a dockerfilehistory
Show the history of an imageimport
Import the contents from a tarball to create a filesystem imageinspect
Display detailed information on one or more imagesload
Load an image from a tar file or STDINls
List imagesprune
Remove unused imagespull
Pull an image or a repository from a registrypush
Push an image or a repository to a registryrm
Remove one or more imagessave
Save one or more images to a tar file (streamed to STDOUT by default)tag
Create a tag TARGET_IMAGE that refers to SOURCE_IMAGE
docker container
:attach
Attach local standard input, output, and error streams to a running containercommit
Create a new image from a container's changescp
Copy files/folders between a container and the local filesystemcreate
Create a new containerdiff
Inspect changes to files or directories on a container's filesystemexec
Run a command in a running containerexport
Export a container's filesystem as a tar archiveinspect
Display detailed information on one or more containerskill
Kill one or more running containerslogs
Fetch the logs of a containerls
List containerspause
Pause all processes within one or more containersport
List port mappings or a specific mapping for the containerprune
Remove all stopped containersrename
Rename a containerrestart
Restart one or more containersrm
Remove one or more containersrun
Run a command in a new containerstart
Start one or more stopped containersstats
Display a live stream of container(s) resource usage statisticsstop
Stop one or more running containerstop
Display the running processes of a containerunpause
Unpause all processes within one or more containersupdate
Update configuration of one or more containerswait
Block until one or more containers stop, then print their exit codes
Labels:
docker
Friday, March 15, 2019
Kubernetes Reference
Kubernetes Reference
Minikube
minikube start
minikube stop
minikube delete
minikube env
minikube ip
---
Kubectl
kubectl get all
Pods, ReplicaSets, Deployments and Services
kubectl apply –f <yaml file>
kubectl apply –f .
kubectl describe pod <name of pod>
kubectl exec –it <pod name> <command>
kubectl get <pod | po | service | svc | rs | replicaset | deployment | deploy>
kubectl get po --show-labels
kubectl get po --show-labels -l {name}={value}
kubectl delete po <pod name>
kubectl delete po --all
---
Deployment Management
kubectl rollout status deploy <name of deployment>
kubectl rollout history deploy <name of deployment>
kubectl rollout undo deploy <name of deployment>
Minikube
minikube start
minikube stop
minikube delete
minikube env
minikube ip
---
Kubectl
kubectl get all
Pods, ReplicaSets, Deployments and Services
kubectl apply –f <yaml file>
kubectl apply –f .
kubectl describe pod <name of pod>
kubectl exec –it <pod name> <command>
kubectl get <pod | po | service | svc | rs | replicaset | deployment | deploy>
kubectl get po --show-labels
kubectl get po --show-labels -l {name}={value}
kubectl delete po <pod name>
kubectl delete po --all
---
Deployment Management
kubectl rollout status deploy <name of deployment>
kubectl rollout history deploy <name of deployment>
kubectl rollout undo deploy <name of deployment>
Labels:
kubernetes
Docker Reference
Docker Reference
Manage images
docker image pull <image name>
docker image ls
docker image build -t <image name> .
docker image push <image name>
docker image tag <image id> <tag name>
---
Manage Containers
docker container run -p <public port>:<container port> <image name>
docker container ls -a
docker container stop <container id>
docker container start <container id>
docker container rm <container id>
docker container prune
docker container run -it <image name>
docker container run -d <image name>
docker container exec -it <container id> <command>
docker container exec -it <container id> bash
docker container logs -f <container id>
docker container commit -a "author" <container id> <image name>
---
Manage your (local) Virtual Machine
docker-machine ip
---
Manage Networks
docker network ls
docker network create <network name>
---
Manage Volumes
docker volume ls
docker volume prune
docker volume inspect <volume name>
docker volume rm <volume name>
---
Docker Compose
docker-compose up
docker-compose up -d
docker-compose logs -f <service name>
docker-compose down
---
Manage a Swarm
docker swarm init (--advertise-addr <ip address>)
docker service create <args>
docker network create --driver overlay <name>
docker service ls
docker node ls
docker service logs -f <service name>
docker service ps <service name>
docker swarm join-token <worker|manager>
---
Manage Stacks
docker stack ls
docker stack deploy -c <compose file> <stack name>
docker stack rm <stack name>
Manage images
docker image pull <image name>
docker image ls
docker image build -t <image name> .
docker image push <image name>
docker image tag <image id> <tag name>
---
Manage Containers
docker container run -p <public port>:<container port> <image name>
docker container ls -a
docker container stop <container id>
docker container start <container id>
docker container rm <container id>
docker container prune
docker container run -it <image name>
docker container run -d <image name>
docker container exec -it <container id> <command>
docker container exec -it <container id> bash
docker container logs -f <container id>
docker container commit -a "author" <container id> <image name>
---
Manage your (local) Virtual Machine
docker-machine ip
---
Manage Networks
docker network ls
docker network create <network name>
---
Manage Volumes
docker volume ls
docker volume prune
docker volume inspect <volume name>
docker volume rm <volume name>
---
Docker Compose
docker-compose up
docker-compose up -d
docker-compose logs -f <service name>
docker-compose down
---
Manage a Swarm
docker swarm init (--advertise-addr <ip address>)
docker service create <args>
docker network create --driver overlay <name>
docker service ls
docker node ls
docker service logs -f <service name>
docker service ps <service name>
docker swarm join-token <worker|manager>
---
Manage Stacks
docker stack ls
docker stack deploy -c <compose file> <stack name>
docker stack rm <stack name>
Labels:
docker
Thursday, March 14, 2019
ElasticSearch PUT and GET data
--Create an index in Elasticsearch
PUT http://host-1:9200/my_index
{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}
--To get information about an index
GET http://host-1:9200/my_index
--Add user to index with id 1
POST http://host-1:9200/my_index/user/1
{
"name": "Deepak",
"age": 36,
"department": "IT",
"address": {
"street": "No.123, XYZ street",
"city": "Singapore",
"country": "Singapore"
}
}
--To fetch document with id 1
GET http://host-1:9200/my_index/user/1
--Add user to index with id 2
POST http://host-1:9200/my_index/user/2
{
"name": "McGiven",
"age": 30,
"department": "Finance"
}
--Add user to index with id 3
POST http://host-1:9200/my_index/user/3
{
"name": "Watson",
"age": 30,
"department": "HR",
"address": {
"street": "No.123, XYZ United street",
"city": "Singapore",
"country": "Singapore"
}
}
--Search documents by name
GET http://host-1:9200/my_index/user/_search?q=name:watson
--Delete an index
DELETE http://host-1:9200/my_index
PUT http://host-1:9200/my_index
{
"settings" : {
"number_of_shards" : 3,
"number_of_replicas" : 1
}
}
--To get information about an index
GET http://host-1:9200/my_index
--Add user to index with id 1
POST http://host-1:9200/my_index/user/1
{
"name": "Deepak",
"age": 36,
"department": "IT",
"address": {
"street": "No.123, XYZ street",
"city": "Singapore",
"country": "Singapore"
}
}
--To fetch document with id 1
GET http://host-1:9200/my_index/user/1
--Add user to index with id 2
POST http://host-1:9200/my_index/user/2
{
"name": "McGiven",
"age": 30,
"department": "Finance"
}
--Add user to index with id 3
POST http://host-1:9200/my_index/user/3
{
"name": "Watson",
"age": 30,
"department": "HR",
"address": {
"street": "No.123, XYZ United street",
"city": "Singapore",
"country": "Singapore"
}
}
--Search documents by name
GET http://host-1:9200/my_index/user/_search?q=name:watson
--Delete an index
DELETE http://host-1:9200/my_index
Labels:
elasticsearch
Subscribe to:
Posts (Atom)