TOP 5 MISTAKES IN HADOOP AND HOW TO AVOID THEM

Top 5 Mistakes in Hadoop and how to avoid them

Hadoop has its strengths and difficulties. Business needs more specialized skills and data integration to factor into planning and implementation. Even though this happens, a large percentage of Hadoop implementations fail.

To help others avoid common mistakes with Hadoop, explore this article and know top 5 mistakes with Hadoop and how to avoid them.

MISTAKE 1: MIGRATE EVERYTHING BEFORE DEVISING A PLAN

As attractive as it can be to dive, head first into Hadoop, never start without a plan. Migrating everything without a clear strategy will only create long-term issues resulting in expensive ongoing maintenance. With first-time Hadoop implementations, users can expect a lot of error messages and a steep learning curves.

Successful implementation starts by identifying a business use case. Consider every phase of the process – from data ingestion to data transformation to analytics consumption, and even beyond to other applications and systems where analytics must be embedded. It also clearly determines how Hadoop and big data will create value for the business.

Our advice: Maximize your learning in the least amount of time by taking a holistic approach and starting with smaller test cases.

MISTAKE 2: ASSUME RATIONAL DATABASE SKILLSETS ARE TRANSFERABLE TO HADOOP

Hadoop is a distributed file system, not a traditional relational database (RDBMS). User can’t migrate all their relational data and manage it in Hadoop, nor can expect skillsets to be easily transferable between the two.

If the team is lacking Hadoop skills, it doesn’t necessarily mean you have to hire all new people. Every situation is different, and there are several options to consider. It might work best to train existing developers. User might be able to plug skills gaps with point solutions in some instances, but growing organizations tend to do better in the long run with an end-to-end data platform that serves a broad spectrum of users.

Our advice: It is important to look for software, along with the right combination of people, agility, and functionality to be successful. There are lot of tools available which automates some of the repetitive aspects of data ingestion and preparation.

MISTAKE 3: USER’S FIGURE-OUT SECURITY LATER

High profile data breaches have motivated most enterprise IT teams to prioritize protecting sensitive data. If the user considers using of big data, it’s important to keep in mind while processing sensitive data about the customers and partners. The user should never, ever, expose the card and bank details, and personally identifiable information about the clients, customers or the employees. Protection starts with planning ahead.

Our advice: Address each of the security solutions before deploying a big data project. Once a business need for big data has been established, decide who will be benefited from the investment and how it is going to impact the infrastructure.

MISTAKE 4: BRIDGING THE SKILLS GAP WITH TRADITIONAL ETL

Plugging the skills gap can be tricky for the organizations who are considering to solve big data’s ETL challenges. Many developers are proficient in Java, Python, and HiveQL, but may lack the experience to optimize performance on relational databases. When Hadoop and MapReduce are used for large scale traditional data management workloads such as ETL, this problem will be increased.

Some point solutions can help to plug the skills gap, but these tend to work best for experienced developers. If you’re dealing with smaller data sets, it might work to hire people who’ve had the proper training on big data and traditional implementations, or work with experts to train and guide staff through projects. But if you’re dealing with hundreds of terabytes of data, then you will need an enterprise-class ETL tool as part of a comprehensive business analytics platform.

Our advice: People, experience, and best practices are essential for successful Hadoop projects. While considering an expert or a team of experts as permanent hires or consultants, user should consider their experience with “traditional” as-well-as big data integration, the size and the complexity of the projects they’ve worked on, the organizations they worked with, and the number of successful implementations they have done. While dealing with large volumes of data, it might be the time to evaluate a comprehensive business analytics platform which is designed to operationalize and simplify Hadoop implementations.

MISTAKE 5: ENTERPRISE-LEVEL VALUE ON A SMALL BUDGET

The low-cost scalability of Hadoop is one of the reasons why organizations decide to use it. But many organizations fail to factor in data replication/compression, skilled resources, and overall management of big data integration of the existing ecosystem.

Hadoop is built to process enormous data files that continue to grow. It’s essential to do proper sizing up front. This includes having the skills on hand to leverage SQL and BI against data in Hadoop and to compress data at the most granular levels. The compression of data also needs to be balanced with performance expectations for reading and writing data. Also, storing the data may cost 3x more than what the user has planned initially.

Our advice: Understand how the storage, resources, growth rates, and management of big data will factor into your existing ecosystem before you implement.

WHAT IS THE DIFFERENCE BETWEEN A REST WEB SERVICE AND A SOAP WEB SERVICE?

What is the difference between a REST web service and a SOAP web service?

Below are the main differences between REST and SOAP web service

  • REST supports different formats like text, JSON and XML; SOAP only supports XML.
  • REST works only over HTTP(S) on a transport layer; SOAP can be used different protocols on a transport layer.
  • REST works with resources, each unique URL is some representation of a resource; SOAP works with operations, which implements some business logic through different interfaces.
  • SOAP based reads can’t be cached, for SOAP we need to provide caching; REST based reads can be cached.
  • SOAP supports SSL security and WS-security(Web Service-security); REST only supports SSL security.
  • SOAP supports ACID (Atomicity, Consistency, Isolation, Durability); REST supports transactions, but it is neither ACID compliant nor can provide two phase commit.

WHAT’S NEW IN ANGULAR 4? WHAT ARE THE IMPROVEMENTS IN ANGULAR 4?

What’s New in Angular 4? What are the Improvements in Angular 4?

Smaller & Faster Apps- Angular 4 applications is smaller and faster in comparison
with Angular 2.

View Engine Size Reduce – Some changes under to hood to what AOT generated code compilation means in Angular 4, improved the compilation time. These changes are reduced around 60% size in most of the cases.

Animation Package – Animations now have their own package i.e. @angular/platform-browser/animations

Improvement – Some Improvement on *ngIf and *ngFor.

NAME SOME OF THE COMMONLY USED HTTP METHODS USED IN REST BASED ARCHITECTURE?

Name some of the commonly used HTTP methods used in REST based architecture?

Following well known HTTP methods are commonly used in REST based architecture −

– GET − Provides a read only access to a resource.

– PUT − Used to create a new resource.

– DELETE − Used to remove a resource.

– POST − Used to update an existing resource or create a new resource.

– OPTIONS − Used to get the supported operations on a resource.

WHAT IS THE DIFFERENCE BETWEEN MONGODB AND MYSQL?

What is the difference between MongoDB and MySQL?

Although MongoDB and MySQL both are free and open source databases, there is a lot of difference between them in the terms of data representation, relationship, transaction, querying data, schema design and definition, performance speed, normalization and many more. To compare MySQL with MongoDB is like a comparison between Relational and Non-relational databases.
 
 
 
 
 
 

WHAT ARE RESTFUL WEBSERVICES?

What are Restful Webservices?

Web services based on REST Architecture are known as RESTful web services. These web services use HTTP methods to implement the concept of REST architecture. A RESTful web service usually defines a URI, Uniform Resource Identifier service, provides resource representation such as JSON and set of HTTP Methods.
 
 
 
 
 
 
 
 
 
 
 

WHAT IS DATADOG?

What is Datadog?

Datadog

Datadog is a monitoring service for cloud-scale applications, it brings the data from servers, databases, tools, and services to present a combined view of an entire stack. These capabilities are provided on a SaaS-based data analytics platform.

Datadog uses Python based open-source agent forked from the original, created in 2009 by David Mytton for Server Density. It’s backend is built using a number of open and closed source technologies such as D3, Apache Cassandra, Kafka, PostgreSQL, etc.

Datadog helps the developers and operations teams to view their full infrastructure – cloud, servers, apps, services, metrics, and much more. This also includes real-time interactive dashboards that can be customized to a team’s specific needs, full-text search capabilities for metrics and events, sharing and discussion tools so teams can collaborate using the insights they surface, targeted alerts for critical issues, and API access to accommodate unique infrastructures.

Datadog also integrates with various cloud, enterprise, and developer software tools out of the box, so established team workflows will be unchanged and uninterrupted when adopting Datadog’s service.

BENEFITS OF DATADOG

With Datadog, users can:

* Connect and compare metrics and other data from the all apps and services as-well-as information coming from Amazon EC2, web servers, StatsD, SQL, and NoSQL databases.

* Streamline information analysis and other related processes such as graphing and measuring can be done in span of time.

* Configure information filtration setup to only gather the metrics.

* Set-up the system to send alerts or notifications on issues that only require the user’s immediate attention.

* Focus on the correct code configurations, significant updates, and scheduled operations.

* Extensive collaboration features enable the user and team to work hand in hand and provide comment and annotations for a productive session.

FEATURES OF DATADOG

* 80+ turn-key integrations for data aggregation.

* Clean graphs of StatsD and other integrations.

* Slice and dice graphs and alerts by tags, roles, and much more.

* Easy-to-use search for hosts, metrics, and tags.

* Alert notifications through e-mail and PagerDuty.

* Full API access.

* Overlay metrics and events across disparate sources.

* Out-of-the-box and customizable monitoring dashboards.

* Easy way to compute rates, ratios, averages, and integrals.

* Can mute all alerts with a click during upgrades and maintenance.

* Tools for team collaboration.

BUILDING MICROSERVICES IN PYTHON

What are Microservices?

Microservices is also known as the microservice architecture – is an architectural style that structures an application as a collection of loosely coupled services, which implement business capabilities. The microservice architecture allows the continuous delivery/deployment of large and complex applications. It also enables an organization to evolve its technology stack.

The microservice architecture pattern language is a collection of patterns for applying the microservice architecture. It has two goals, they are:

* The pattern language enables the user to decide whether microservices are a good fit for their application or not.

* The pattern language enables the user to use the microservice architecture successfully.

Explore this article and know how to build microservices in python.

Microservices design helps to ease the problems associated with the monolithic model. Implementing microservices is perhaps one of the greatest ways to improve the productivity of a software engineering team. This is especially true if the following takes place:

Building_Microservices_in_Python

* User deploys only the component that was changed. This keeps the deployments and the tests manageable.

* If multiple team members are working on the application, they need to wait until everyone is done with development and testing to move forward. However, with the microservices services model, whenever each piece of functionality is ready, it could be deployed.

* Each microservice runs in its own process space, so if the user wants to scale for any reason, he canscale the particular microservice that needs more resources instead of scaling the entire application.

* When one microservice needs to communicate with another microservice, it uses a lightweight protocol such as HTTP

MICROSERVICES DESIGN

Microservices_Design

Let’s examine how the configuration would look for a Python and Django application that runs on Nginx on a typical Linux server.

As the application code is spread across multiple repos grouped by a logically independent code, this will be the typical organization of the application directories on the server.

[ec2-user@ip-172-31-34-107 www]$ pwd
/opt/www
[ec2-user@ip-172-31-34-107 www]$ ls -lrt
total 8
drwxr-xr-x. 5 root root 4096 Oct 12 14:09 microservice1
drwxr-xr-x. 7 root root 4096 Oct 12 19:00 microservice2
drwxr-xr-x. 5 root root 4096 Oct 12 14:09 microservice3
drwxr-xr-x. 7 root root 4096 Oct 12 19:00 microservice4

Nginx, which is deployed by a front-end gateway or a reverse proxy, will have the configuration:

[ec2-user@ip-172-31-34-107 serviceA]$ cat /etc/nginx/conf.d/service.conf
upstream django1 {
server unix:///opt/www/service1/uwsgi.sock; # for a file socket
}
upstream django2 {
server unix:///opt/www/service2/uwsgi.sock; # for a file socket
}
upstream django3 {
server unix:///opt/www/service3/uwsgi.sock; # for a file socket
}
upstream django4 {
server unix:///opt/www/service4/uwsgi.sock; # for a file socket
}
server {
# the port your site will be served on
listen 80;
# the domain name it will serve for
server_name localhost;
charset utf-8;
# max upload size
client_max_body_size 75M; # adjust to taste
location /api/service1/ {
uwsgi_pass django1;
include /etc/nginx/uwsgi_params;
}
location /api/service2/ {
uwsgi_pass django2;

include /etc/nginx/uwsgi_params;
}
location /api/service3/ {
uwsgi_pass django3;
include /etc/nginx/uwsgi_params;
}
location /api/service4/ {
uwsgi_pass django4;
include /etc/nginx/uwsgi_params;
}

Multiple uWSGI processes have to be created that would process the request for each microservice:

/usr/bin/uwsgi –socket=/opt/www/service1/uwsgi.sock –module=microservice-test.wsgi –master=true –chdir=/opt/www/service1
/usr/bin/uwsgi –socket=/opt/www/service2/uwsgi.sock –module=microservice-test.wsgi –master=true –chdir=/opt/www/service2
/usr/bin/uwsgi –socket=/opt/www/service3/uwsgi.sock –module=microservice-test.wsgi –master=true –chdir=/opt/www/service3
/usr/bin/uwsgi –socket=/opt/www/service4/uwsgi.sock –module=microservice-test.wsgi –master=true –chdir=/opt/www/