HOW DO YOU HANDLE MISSING OR CORRUPTED DATA IN A DATASET?

6 Dec 2017

How do you handle missing or corrupted data in a dataset?

You could find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value.

In Pandas, there are two very useful methods: isnull() and dropna() that will help you find columns of data with missing or corrupted data and drop those values. If you want to fill the invalid values with a placeholder value (for example, 0), you could use the fillna() method

SUBSCRIBE TO UI STREET

EXPLAIN HOW TO CREATE A BACKUP AND COPY FILES IN JENKINS?

6 Dec 2017

Explain how to create a backup and copy files in Jenkins?

To create a backup all you need to do is to periodically back up your JENKINS_HOME directory. This contains all of your build jobs configurations, your slave node configurations, and your build history. To create a back-up of your Jenkins setup, just copy this directory. You can also copy a job directory to clone or replicate a job or rename the directory.

SUBSCRIBE TO UI STREET

WHAT IS THE DIFFERENCE BETWEEN SUPERVISED AND UNSUPERVISED MACHINE LEARNING?

6 Dec 2017

What is the difference between supervised and unsupervised machine learning?

Supervised learning requires training labeled data. For example, in order to do classification (a supervised learning task), you will need to first label the data and have to train the model to classify data into your labeled groups. Unsupervised learning, in contrast, does not require labeling data explicitly.

SUBSCRIBE TO UI STREET

WHAT IS RUST?

6 Dec 2017

What is Rust?

Rust is a systems programming language sponsored by Mozilla Research. It is designed to be a “safe, concurrent, practical language”, supporting functional and imperative-procedural paradigms. Rust is syntactically similar to C++, but is designed for better memory safety while maintaining performance.

SUBSCRIBE TO UI STREET

JAVASCRIPT ES8

8 Nov 2017

Javascript ES8

EcmaScript 8 or EcmaScript 2017 was officially released at the end of June by TC39. It seems that we are talking a lot about EcmaScript in the 2016. The standard is to publish a new ES specification version once a year. ES6 was published in 2015 and ES7 was published in 2016, but do someone remember when ES5 was released? It happened in 2009, before the magical rise of JavaScript. In this article we are going to discuss about ES8 and its new features. So, explore this article and know more about EcmaScript 8.

OBJECT.VALUES AND OBJECT.ENTRIES

The Object.values method returns an array of a given own object’s enumerable property values, in the same order as that provided on behalf of a loop. The declaration of the function is trivial.

Object.values(obj)

The obj parameter is the source object for the operation. It can be an object or an array.

const obj = { x: ‘xxx’, y: 1 };
Object.values(obj); // [‘xxx’, 1]

const obj = [‘e’, ‘s’, ‘8’]; // same as { 0: ‘e’, 1: ‘s’, 2: ‘8’ };
Object.values(obj); // [‘e’, ‘s’, ‘8’]

// when we use numeric keys, the values returned in a numerical
// order according to the keys
const obj = { 10: ‘xxx’, 1: ‘yyy’, 3: ‘zzz’ };
Object.values(obj); // [‘yyy’, ‘zzz’, ‘xxx’]

Object.values(‘es8′); // [‘e’, ‘s’, ‘8’]

The Object.entries method returns an array of a given object’s own enumerable property [key, value] pairs, in the same order as Object.values. The declaration of the function is trivial:

const obj = { x: ‘xxx’, y: 1 };
Object.entries(obj); // [[’x’, ‘xxx’], [’y’, 1]]

const obj = [’e’, ‘s’, ‘8’];
Object.entries(obj); // [[’0’, ‘e’], [’1’, ‘s’], [’2’, ‘8’]]

const obj = { 10: ‘xxx’, 1: ‘yyy’, 3: ‘zzz’ };
Object.entries(obj); // [[’1’, ‘yyy’], [’3’, ‘zzz’], [’10’, ‘xxx’]]

Object.entries(‘es8′); // [[‘0′, ‘e’], [‘1′, ‘s’], [‘2′, ‘8’]]
This section adds two functions to the String object i.e. padStart & padEnd.

As their names, the purpose of those functions is to pad the start or the end of the string, so that the resulting string reaches the given length. The user can pad the specific character or string or just pad with spaces by default. Below are the functions declaration.

str.padStart(targetLength [, padString])
str.padEnd(targetLength [, padString])

As we can see, the first parameter of the functions is the targetLength, that is the total length of the resulted string. The second parameter is optional padString that is for the string to pad the source string. The default value is space.

‘es8′.padStart(2); // ‘es8′
‘es8′.padStart(5); // ‘ es8′
‘es8′.padStart(6, ‘woof’); // ‘wooes8′
‘es8′.padStart(14, ‘wow’); // ‘wowwowwowwoes8′
‘es8′.padStart(7, ‘0’); // ‘0000es8′
‘es8’.padEnd(2); // ‘es8′
‘es8’.padEnd(5); // ‘es8 ‘
‘es8’.padEnd(6, ‘woof’); // ‘es8woo’
‘es8’.padEnd(14, ‘wow’); // ‘es8wowwowwowwo’
‘es8’.padEnd(7, ‘6’); // ‘es86666′

OBJECT.GETOWNPROPERTYDESCRIPTORS

EcmaSript lacks a method for properly copying properties between two objects. This proposal solves this seemingly simple but complex problem that has been implemented at times in almost every JS toolkit or framework. Currently it is a stumbling block for efficient immutability, true composition of ES Classes, something that would benefit Decorators, and just less surprising in general than Object.assign.

ASYNC FUNCTIONS

The async function declaration defines an asynchronous function, which returns an AsyncFunction object. Internally, async functions work much like generators, but they are not translated to generator functions.

function fetchTextByPromise() {
return new Promise(resolve => {
setTimeout(() => {
resolve(“es8″);
}, 2000);
});
}

async function sayHello() {
const externalFetchedText = await fetchTextByPromise();
console.log(`Hello, ${externalFetchedText}`); // Hello, es8
}

sayHello();

The call of sayHello will log Hello in es8 after 2 seconds.
console.log(1);
sayHello();
console.log(2);

async function
And now the prints are:
1 // immediately
2 // immediately
Hello, es8 // after 2 seconds

This is because the function call does not block the flow.

Pay attention that an async function always returns a promise and an expect keyword may only be used in functions marked with the async keyword.

SHARED MEMORY AND ATOMICS

When memory is shared, multiple threads can read and write the same data in memory. Atomic operations make sure that predictable values are written and read, that operations are completed before the next operation starts and that operations are not interrupted. This section introduces a new constructor SharedArrayBuffer and a namespace object atomics with static methods.

The atomic object is an object of static methods such as math, so we cannot use it as a constructor. Examples for static method in this object are:

add / sub— add / subtract a value for the value at a specific position.
and / or / xor — bitwise and / bitwise or / bitwise xor.
load — get the value at a specific position.

THE VERDICT

JavaScript is in production, but it is always getting improved. The process of adopting new features to the specification is very arranged and composed. In the previous stage, these features were confirmed by the TC39 committee and implemented by the core developers. Most of them are already parts of the Typescript language, browsers or other polyfills, so the user can go and try them right now.

SUBSCRIBE TO UI STREET

REGULAR EXPRESSIONS IN PYTHON

8 Nov 2017

Regular Expressions in Python

A regular expression is a distinctive sequence of characters that helps the user to match or find other strings or sets of strings, using a specialized syntax held in a pattern. Regular expressions are widely used in UNIX world. The module re offers full support for Perl-like regular expressions in Python.

In this article we will be covering important functions, which would be used to handle regular expressions.

IN PYTHON A REGULAR EXPRESSION SEARCH IS TYPICALLY WRITTEN AS

match = re.search(pat, str)

The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. If the search is successful, search() returns a match object or None. Therefore, the search is usually followed by an if-statement to test if the search is succeeded or not.

str = ‘an example word:cat!!’
match = re.search(r’word:\w\w\w’, str)
# If-statement after search() tests if it succeeded
if match:
print ‘found’, match.group() ## ‘found word:cat’
else:
print ‘did not find’

The code match = re.search(pat, str) stores the search result in a variable named “match”. Then the if-statement tests the match — if true the search succeeded and match.group() is the matching text (e.g. ‘word:cat’). Otherwise if the match is false, then the search will not be succeeded, and there is no matching text.

The ‘r’ at the start of the pattern string designates a python “raw” string which passes through backslashes without change which is very convenient for regular expressions.

if-statement tests the match — if true the search succeeded and match.group() is the matching text (e.g. ‘word:cat’). If the match is false, then the search will not be succeeded, and there is no matching text.

The ‘r’ at the start of the pattern string designates a python “raw” string which passes through backslashes without change which is very convenient for regular expressions.

THE MATCH FUNCTION

The function attempts to match re pattern to string with optional flags.

Here is the syntax for this function:
re.match(pattern, string, flags=0)

The re.match function returns a match object on success. The usegroup(num) or groups() function is used to match objects to get matched expression.

The match function

#!/usr/bin/python
import re
line = “Cats are smarter than dogs”
matchObj = re.match( r'(.*) are (.*?) .*’, line, re.M|re.I)
if matchObj:
print “matchObj.group() : “, matchObj. group ()
print “matchObj.group(1) : “, matchObj. group (1)
print “matchObj.group(2) : “, matchObj. group (2)
else:
print “No match!!”

When the above code is executed, it gives the following result:

matchObj.group() : Cats are smarter than dogs
matchObj.group(1) : Cats
matchObj.group(2) : smarter

SUBSCRIBE TO UI STREET

MACHINE LEARNING

8 Nov 2017

MACHINE LEARNING

Machine learning is a method of data analysis that automates analytical model building. It is a division of artificial intelligence based on the notion that machines should be able to learn and adapt through experience. Machine learning enables the computers to get into a mode of self-learning without being explicitly programmed. When exposed to new data, these computer programs are enabled to learn, raise, change, and progress by themselves. While the concept of machine learning has been around for a long time, the ability to apply complex mathematical calculations to big data automatically—iteratively, and speedily has been gaining momentum over the last several years. So, explore this article and know more about machine learning and its importance.

MACHINE LEARNING

To understand the uses of machine learning, consider some of the examples where machine learning is applied: the self-driving Google car, cyber fraud detection, online recommendation engines such as friend suggestions on Facebook and much more are all examples of applied machine learning.

Machines can aid in filtering useful pieces of information that help in major advancements, and we are already seeing how this technology is being implemented in a varied variety of industries. Machine learning has also improved the way data extraction, and interpretation is done by involving automatic sets of generic methods that have replaced traditional statistical techniques.

THE USES OF MACHINE LEARNING

To understand the idea of machine learning better, let’s consider some more examples: web search results, real-time ads on website pages and mobile devices, email spam filtering, network intrusion detection, and pattern and image recognition.

All these are by-products of applying machine learning to analyse huge volumes of data.

Traditionally, data analysis was always been characterized by trial and error method, an approach that becomes impossible when data sets are huge and heterogeneous. Machine learning comes as the solution to all this anarchy by proposing shrewd alternatives to analyse huge volumes of data. By developing fast and efficient algorithms and data-driven models for real-time processing of data, machine learning is able to produce correct results and analysis.

TERMS AND TYPES

Machine learning is one of the most important technology trends at present. It underlies so many things we use today without even thinking about them. Speech recognition, Amazon, and Netflix recommendations, fraud detection, and financial trading are just a few well known examples of machine learning commonly used in today’s data-driven world.

POPULAR MACHINE LEARNING METHODS

Two popular methods of machine learning are supervised learning and unsupervised learning. It is estimated that about 70 percent of machine learning is supervised learning, while unsupervised learning ranges from 10 – 20 percent. Other methods are less-often used are semi-supervised and reinforcement learning.

SUPERVISED LEARNING

Supervised machine learning is more commonly used between the two. It includes algorithms such as linear and logistic regression, multi-class classification, and support vector machines. Supervised learning requires that the algorithm’s possible outputs are already known and that the data used to train the algorithm is already labelled with correct answers. For instance, a classification algorithm will learn to identify animals after being trained on a dataset of images that are properly labelled with the species of the animal and some identifying characteristics.

UNSUPERVISED LEARNING

Unsupervised machine learning is more thoroughly aligned with true artificial intelligence — the idea that a computer can learn to identify complex processes and patterns without a human to provide guidance along the way. Although unsupervised learning is prohibitively complex for some simpler enterprise use cases. Some examples of unsupervised machine learning algorithms include “K” k-means clustering, principal, and independent component analysis, and association rules.

CONCLUSION

Choosing machine learning typically depends on factors related to the structure and volume of user’s data and the use case of the issue at hand. Predictive data models will help the users to make decisions across a variety of business challenges.

SUBSCRIBE TO UI STREET

BLUE – GREEN DEPLOYMENT

8 Nov 2017

Blue – Green Deployment

If you are around DevOps or people who are working with deployments in your industry, then you must have heard of the name blue green deployment. Most of the organizations in the world use this technique to get the minimum down time for their respective products. This deployment is old but still one of the finest to use. So, explore this article and know what actually is blue green deployment.

WHAT EXACTLY IS THE BLUE-GREEN DEPLOYMENT

A blue-green deployment is a change management strategy for releasing software code. Blue/green deployment is also referred as A/B deployments which require two identical hardware environments that are configured exactly the same way. While one environment is active and serving end users, the other environment remains idle.

Blue-green deployments are often used for consumer-facing applications and applications with a serious uptime requirement. New code is released to the inactive environment, where it is thoroughly tested. Once the code has been assessed, the team makes the idle environment active, typically by adjusting a router configuration to redirect application program traffic. The process gets reversed when the next software iteration is ready for release.

THE STEP-BY-STEP PROCESS OF BLUE-GREEN DEPLOYMENT

To demonstrate this concept, first the user needs to set up two server environments. Each will have a web server installed. In this example, the web server represents an entire application stack which could include a load balancer, multiple web servers, and distributed or replicated databases in the backend. In this example the user has used a web server because it represents the smallest environment that can demonstrate the release pattern.

CREATE A LOCAL APPLICATION

We will start by creating “application”. This is an index page that the web servers can display. It allows the user to demonstrate different “versions” of the app without the overhead of actual development. On local system, install git using the platform’s preferred method. If the user’s local machine is running Ubuntu, then the user can install by typing,
local$ sudo apt-get update
local$ sudo apt-get install git

The user need to set a few configuration settings in order to commit to a git repository. The user can give name and email address by typing:

local$ git config –global user.name “Your Name”
local$ git config –global user.email username@email.com

With the configuration set, the user can create a directory for their new application and move into it:

local$ mkdir ~/sample_app
local$ cd ~/sample_app

Initialize a git repository in our application directory by typing:

local$ git init

Now, create the index.html file that represents the application:

local$ nano index.html
Save and close the file when it is finished.

To finish up, the user can add the index.html file to the git staging area and then commit by typing:

local$ git add .
local$ git commit -m “initializing repository with version 1″

CONFIGURE THE BLUE AND GREEN WEB SERVERS

Next, work on setting up green and blue environments with functional web servers. Log into your servers with your sudo user to get started.

HOW DOES BLUE GREEN DEPLOYMENT WORK WITH AWS?

DNS routing is a common method for Blue Green deployments. With DNS the user can easily switch traffic from the blue environment to the green and vice versa if the rollback is needed. Route 53 can be used to implement switch when bringing up the new “green” environment. The switch could consist of a single EC2 instance, or an entire ELB. The resource record set has to be updated so that it points to the domain or subdomain of the new instance or the new ELB. It works for a varied variety of environment configurations, as long as the endpoint is a DNS service or an IP address.

As a substitute to this DNS approach, the user can also use Route 53 with designated resource record sets. The traffic can be switched from blue environment to the green environment by updating the designated record of the record set. The user can easily rollback to blue deployment in case of an error by updating the DNS record.

Another approach to perform the Blue Green switch is using the weighted distribution with Route 53. Here the user can shift the traffic based on weightage of environment. Amazon Route 53 enables the user to define a percentage of traffic for the green environment and gradually update the weights until the green environment carries the full production traffic. This method provides the ability to perform canary analysis that slowly introduces a small percentage of production traffic to the new environment.

SUBSCRIBE TO UI STREET

CLOUD CUSTODIAN

8 Nov 2017

Cloud Custodian

Cloud Custodian is a tool that combines dozens of tools and scripts that most organizations use for managing their AWS accounts into one open source tool. It is a stateless rule engine for policy definition and enforcement, with metrics and detailed reporting for AWS.

Companies can use Custodian to manage their AWS environments by certifying compliance to security policies, tag policies, garbage collection of unused resources, and cost management through off-hours resource management. Custodian policies are written in simple YAML configuration files that specify given resource types and are constructed from a vocabulary of filters and actions.

Cloud computing has created and managed web resources very easily. The user can now spin up quite a few computing, database, and storage resources with the click of a button or the stroke of a return key. However, if the user uses the company account, then the user is likely to spin up those resources often for demonstration and testing purposes, without considering the price or clutter you might be creating along with it.

Cloud Custodian feature-set has grown exponentially with its popularity because they are good at responding to feature requests. It has now grown to the point where there is not much in the AWS world that you cannot do with it. Here is a short list of things you might be surprised what Cloud Custodian can do.

Encryption
Backups
Garbage Collection
Unused Resources
Tag Compliance
SG Compliance

Below is a basic example of custodian.yml file that stops EC2 instances tagged with custodian.

policies:
– name: stop-instances
resource: ec2
filters:
– “tag:Custodian”: present
actions:
– stop

Cloud Custodian is always good for mid to large sized organizations that give access to a large number of their employee’s to organizations AWS account. Usually, the account quickly becomes cluttered with dozens of Cloud Formation stacks, VPCs, old test instances, and Lambda functions.

Cloud Custodian is very well documented, if you are excited to start taking out the digital trash in your AWS account there is no other best place than this.

How do you handle missing or corrupted data in a dataset?

Explain how to create a backup and copy files in Jenkins?

What is the difference between supervised and unsupervised machine learning?

What is Rust?

Javascript ES8

Regular Expressions in Python

MACHINE LEARNING

Blue – Green Deployment

Cloud Custodian

Blog