DataSagar Blog – Page 21 – Where Education meets Technology

What is Data Mining? – Simple Definition

datasagar

-

March 24, 2015

The amount of raw data stored in corporate databases is exploding. From trillions of point-of-sale transactions and credit card purchases to pixel-by-pixel images of galaxies, databases are now measured in gigabytes and terabytes. (One terabyte = one trillion bytes. A terabyte is equivalent to about 2 million books!) For instance, every day, Wal-Mart uploads 20 million point-of-sale transactions to an A&T massively parallel system with 483 processors running a centralized database. Raw data by itself, however, does not provide much information. In today’s fiercely competitive business environment, companies need to rapidly turn these terabytes of raw data into significant insights into their customers and markets to guide their marketing, investment, and management strategies.

What is Data Mining?

Data mining, or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the meaning of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.

Data mining derives its name from the similarities between searching for valuable information in a large database and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find where the value resides.

What Can Data Mining Do?

Although data mining is still in its infancy, companies in a wide range of industries – including retail, finance, heath care, manufacturing transportation, and aerospace – are already using data mining tools and techniques to take advantage of historical data. By using pattern recognition technologies and statistical and mathematical techniques to sift through warehoused information, data mining helps analysts recognize significant facts, relationships, trends, patterns, exceptions and anomalies that might otherwise go unnoticed.

For businesses, data mining is used to discover patterns and relationships in the data in order to help make better business decisions. Data mining can help spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Specific uses of data mining include:

Market segmentation – Identify the common characteristics of customers who buy the same products from your company.
Customer churn – Predict which customers are likely to leave your company and go to a competitor.
Fraud detection – Identify which transactions are most likely to be fraudulent.
Direct marketing – Identify which prospects should be included in a mailing list to obtain the highest response rate.
Interactive marketing – Predict what each individual accessing a Web site is most likely interested in seeing.
Market basket analysis – Understand what products or services are commonly purchased together; e.g., beer and diapers.
Trend analysis – Reveal the difference between a typical customer this month and last.

Data mining technology can generate new business opportunities by:

Automated prediction of trends and behaviors: Data mining automates the process of finding predictive information in a large database. Questions that traditionally required extensive hands-on analysis can now be directly answered from the data. A typical example of a predictive problem is targeted marketing. Data mining uses data on past promotional mailings to identify the targets most likely to maximize return on investment in future mailings. Other predictive problems include forecasting bankruptcy and other forms of default, and identifying segments of a population likely to respond similarly to given events.

Automated discovery of previously unknown patterns: Data mining tools sweep through databases and identify previously hidden patterns. An example of pattern discovery is the analysis of retail sales data to identify seemingly unrelated products that are often purchased together. Other pattern discovery problems include detecting fraudulent credit card transactions and identifying anomalous data that could represent data entry keying errors.

HOW IT WORKS (EXAMPLE):

So called because of the manner in which it explores information, data mining is carried out by software applications which employ a variety of statistical and artificial intelligence methods to uncover hidden patterns and relationships among sets of data. For instance, a data mining program might be able to uncover a relationship between high sales volumes and poor weather conditions.

WHY IT MATTERS:

Data mining software is able to perform complex calculations and analyses on sets of data in a very short time. For this reason, data mining is used by companies in strategic planning.

FIFA WordCup 2018: Drones to be used in data collection

datasagar

-

March 24, 2015

0

FIFA WordCup 2018: Drones to be used in data collection

Russia’s National Guard has revealed that it will use video drones to detect unruly fans in crowds at the 2018 FIFA World Cup.

A variety of security measures, including drones and video security systems , will be used to identify “aggressive fans,” Sergey Melikov , the first deputy director of the National Guard, told the Interfax news agency last week.

Meanwhile, the head of the Emergency Situations Ministry has said that rescue crews are studying English and other foreign languages to better communicate with World Cup guests, the state-run RIA Novosti news agency reported last Friday.

Russia will host the FIFA World Cup 2018 between June 14 and July 15 next summer.

What is Data Warehouse? – Simple Definition

datasagar

-

March 24, 2015

0

What is Data Warehouse? – Simple Definition

WHAT IT IS:

A data warehouse is a federated repository for all the data that an enterprise’s various business systems collect. The repository may be physical or logical. Data warehousing is an electronic method of organizing information.

HOW IT WORKS (EXAMPLE):

A data warehouse essentially combines information from several sources into one comprehensive database. For example, in the business world, a data warehouse might incorporate customer information from a company’s point-of-sale systems (the cash registers), its website, its mailing lists and its comment cards. Alternatively, it might incorporate all the information about employees, including time cards, demographic data, salary information, etc.

By combining all of this information in one place, a company can analyze its customers in a more holistic way, ensuring that it has considered all the information available. Data warehousing also makes data mining possible, which is the task of looking for patterns in the data that could lead to higher sales and profits.

There are different ways to establish a data warehouse and many pieces of software that help different systems “upload” their data to a data warehouse for analysis. However, the basic idea is to first extract data from all the individual sources (cash registers, time clocks, office computers), remove redundant data and organize the data into a consistent format that can be queried.

WHY IT MATTERS:

Companies with data warehouses can have an advantage in product development, marketing, pricing strategy, production time, historical analysis, forecasting and customer satisfaction. However, data warehouses also can be very expensive to design and implement, and sometimes their construction makes them unwieldy.

What do I need to know about data warehousing?

Data warehouses are typically used to correlate broad business data to provide greater executive insight into corporate performance.

How is a data warehouse different from a regular database?

Data warehouses use a different design from standard operational databases. The latter are optimized to maintain strict accuracy of data in the moment by rapidly updating real-time data. Data warehouses, by contrast, are designed to give a long-range view of data over time. They trade off transaction volume and instead specialize in data aggregation.

What are data warehouses used for?

Many types of business data are analyzed via data warehouses. The need for a data warehouse often becomes evident when analytic requirements run afoul of the ongoing performance of operational databases. Running a complex query on a database requires the database to enter a temporary fixed state. This is often untenable for transactional databases. A data warehouse is employed to do the analytic work, leaving the transactional database free to focus on transactions.

The other benefits of a data warehouse are the ability to analyze data from multiple sources and to negotiate differences in storage schema using the ETL process.

What are the disadvantages of a data warehouse?

Data warehouses are expensive to scale, and do not excel at handling raw, unstructured, or complex data. However, data warehouses are still an important tool in the big data era.

Resources:

https://www.informatica.com/services-and-training/glossary-of-terms/data-warehousing-definition.html#fbid=85J5EGsXI8z

https://www.coursera.org/specializations/data-warehousing

http://searchsqlserver.techtarget.com/definition/data-warehouse

Power BI: An Introduction

datasagar

-

March 24, 2015

0

Power BI is a business analytics service provided by Microsoft. It provides interactive visualizations with self-service business intelligence capabilities, where end users can create reports and dashboards by themselves, without having to depend on information technology staff or database administrators.

Simply, Power BI is Microsoft’s cloud-based business intelligence technology that is part of the Office 365 suite, the cloud-based suite of productivity applications.

This application was originally conceived by Dhers and Amir Netz of the SQL Server Reporting Services Team at Microsoft. It was originally designed by Ron George in the summer of 2010 and named Project Crescent. Project Crescent was initially available for public download on July 11, 2011 bundled with SQL Server Codename Denali. Later renamed to Power BI it was then unveiled by Microsoft in September 2013 as Power BI for Office 365. The first release of Power BI was based on the Microsoft Excel–based add-ins: Power Query, Power Pivot and Power View. With time, Microsoft also added many additional features like Question and Answers, enterprise level data connectivity and security options via Power BI Gateways. Power BI was first released to the general public on July 24, 2015.A new feature named content packs was introduced which helps companies distribute their own dashboards and reports with their users for consumption through an easy to discover content gallery.

Data Sagar will be covering Power BI basic tutorials in future.

R for Data Science – An introduction

Uncategorized

datasagar

-

March 24, 2015

0

R is a free and open source software programming language and software environment for statistical computing and graphics. Distributed under the GNU General Public License version 2, R is an easy language to learn and commonly used for developing data analysis and statistical software. R compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

R is designed to allow users to add additional functionality by defining new functions. Much of the system is itself written in the R dialect of the S programming language. For computationally-intensive tasks, C, C++ and Fortran code can be linked and called at run time. Advanced users can write C code to manipulate R objects directly.

R was initially written by Robert Gentleman and Ross Ihaka, also known as “R & R” of the Statistics Department of the University of Auckland. Today, R is the result of a collaborative effort with contributions from all over the world.

The R Environment

The R environment is an integrated suite of software services for data manipulation, calculation and graphical display. R offers effective data handling and storage facility, a suite of operators for calculations on arrays, a collection of intermediate tools for data analysis, graphical facilities for data analysis and display as well as simple and effective programming language (called ‘S’) which includes conditionals, loops, user defined recursive functions and input and output facilities.

Most programs written in the R programming language are essentially ephemeral, written for a single piece of data analysis. (Source: W. N. Venables, D. M. Smith and the R Core Team; An Introduction to R)

Source: https://www.webopedia.com/TERM/R/r_programming_language.html

Python for Data Science: An Introduction

datasagar

-

March 24, 2015

0

Python for Data Science: An Introduction

In technical terms, Python is an object-oriented, high-level programming language with integrated dynamic semantics primarily for web and app development. It is extremely attractive in the field of Rapid Application Development because it offers dynamic typing and dynamic binding options.

Python is relatively simple, so it’s easy to learn since it requires a unique syntax that focuses on readability. Developers can read and translate Python code much easier than other languages. In turn, this reduces the cost of program maintenance and development because it allows teams to work collaboratively without significant language and experience barriers.

Additionally, Python supports the use of modules and packages, which means that programs can be designed in a modular style and code can be reused across a variety of projects. Once you’ve developed a module or package you need, it can be scaled for use in other projects, and it’s easy to import or export these modules.

One of the most promising benefits of Python is that both the standard library and the interpreter are available free of charge, in both binary and source form. There is no exclusivity either, as Python and all the necessary tools are available on all major platforms. Therefore, it is an enticing option for developers who don’t want to worry about paying high development costs.

If this description of Python over your head, don’t worry. You’ll understand it soon enough. What you need to take away from this section is that Python is a programming language used to develop software on the web and in app form, including mobile. It’s relatively easy to learn, and the necessary tools are available to all free of charge.

That makes Python accessible to almost anyone. If you have the time to learn, you can create some amazing things with the language.

Python interpreters are available for many operating systems. CPython, the reference implementation of Python, is open source software and has a community-based development model, as do nearly all of its variant implementations. CPython is managed by the non-profit Python Software Foundation.

Android One | Introduction

datasagar

-

December 26, 2014

0

Android One is a standard created by Google for Android systems, mainly targeted at people buying their first smartphone,^[1] and customers in thedeveloping world. Android One smartphones will run software close to stock Android, without the often extensive vendor-specific modifications that many smartphone vendors apply. Security updates will be handled by Google, avoiding problems some earlier phones have had with lacking security updates.^[2] Google also makes a reference hardware design available for Android One, meaning that OEMs just have to manufacture the phone.^[3] The first set of Android One devices features MediaTek‘s quad-core MT6582Mobile System-on-Chip (Mobile SoC).^[4]

Android One phones will initially roll out in India, Nepal, Indonesia, thePhilippines,Sri Lanka and other South Asian countries in 2014.^[5] The first Android One smartphones were by the Indian brands Micromax, Spice andKarbonn in September 2014, and other manufacturers are going to follow gradually.

Various news articles mention that the Android One standard also dictates minimum hardware requirements,^[6]^[7] but the actual list of minimum requirements doesn’t seem to have been made available on the Internet.

android-one1

Resource: http://en.wikipedia.org/wiki/Android_One

Images: update-phones.com, s3.india.com

Primary vs Secondary Memory – Key differences

datasagar

-

September 1, 2014

0

Primary vs Secondary Memory – Key differences

Hi there, welcome to my blog. Here I present you some of the key differences between Primary and Secondary Memory.

SN	Primary Memory	Secondary Memory
1	Any memory devices that’re connected directly to CPU or processor are referred to as Primary.	Memory that is not directly connected to processor, for processor to access secondary memory it need to go through primary memory.
2	Per unit storage cost of primary memory is high i.e. manufacturing cost is high because of technology and materials used.	Data storage cost of secondary storage devices are cheaper than the that of primary.
3	Response time is high and data/instructions can be accessed faster from Primary Memory. Majority of Primary Memory use Random Access mechanism.	Slower response time as compared to primary memory. They usually deploy the sequential access mechanism.
4	Content of Primary Memory can be accessed by using data bus.	Contents stored in Secondary memories are accessed by using Input/Output(I/O) channels.
5	Storage capacity is low. (generally between 1GB to 32 GB)	It stores a large volume of data( Generally storage capacity is measured in term of GB or TB. Generally more than 100 GB)
6	It is volatile in nature, means data cannot be retained if electricuty is cut-off.	Secondary Storage devices are non-volatile in nature and can hold the data even after power is cut-off.
7	Primary memories are used to store: – temporary data of programs that CPU is currently working on. – temporary data of programs that users are using frequently – startup programs that tell system the startup program sequence.	Secondary Storage devices can be used to store permanent contents like documents, movies, graphic works etc.
8	Main types of Primary Memory: a. RAM b. ROM c. Cache Memory d. Registers	Main types of Secondary Memory: a. Magnetic Storage Devices: Magnetic Tapes, Magnetic Disk(Hard Disk) b. Optical Storage Devices: CD, DVD, Blue-Ray Disks

Fig: Difference between Primary and Secondary Memory by DataSagar

#Update1(2020):

Also, for updated learning, I would like to recommend the following video to follow:

Thanks and Happy Learning!