Home Blog Page 18

Working with String in Python

Create a String

Strings can simply be defined by use of single ( ‘ ), double ( ” ) or triple ( ”’ ) inverted commas. Strings enclosed in triple quotes ( ”’ ) can span over multiple lines. A few things to keep in mind about strings:

  • Strings are immutable in Python, so you can not change the content of a string.
  • Function len() can be used to get length of a string
  • You can access the elements using indexes as you do for lists
  • You can use ‘+’ operator to concatenate two strings

String ="String elements can also be accessed using index numbers, just like lists"

print (String[0:7])

#Above print command displays "String " on screen.

TASKS TO DO

# Create a string str1
str1 = “Introduction with strings”

# Now store the length of string str1 in variable str_len
str_len = len(str1)

str_new = “Machine Learning is awesome!”
# Print last eight characters of string str_new (the length of str_new is 28 characters).
print(str_new[20:28])

str2 = “I am doing a course Introduction to Hackathon using ”
str3 = “Python”

# Write a line of code to store concatenated string of str2 and str3 into variable str4
str4 = str2 + str3

Try it yourself in DataCamp here

Creating List in Python

0

Lists are probably the most versatile data structures in Python. A list can be defined by writing a list of comma separated values in square brackets. Lists might contain items of different types. Python lists are mutable – individual elements of a list can be changed while the identity does not change.

Country =['NEPAL','INDIA','USA','GERMANY','UK','AUSTRALIA']

Temperature =[22, 44, 28, 20, 18, 25, 45, 67]

We just created two lists, one for Country names (strings) and another one for Temperature data (whole numbers).

Accessing individual elements of a list

  • Individual elements of a list can be accessed by writing an index number in square bracket. The first index of a list starts with 0 (zero) not 1. For example, Country[0] can be used to access the first element, ‘INDIA’
  • A range of elements can be accessed by using start index and end index but it does not return the value of the end index. For example, Temperature[1:4] returns three elements, the second through fourth elements [28, 20, 18], but not the fifth element

Tasks

# Create a list of squared numbers
squares_list = [0, 1, 4, 9, 16, 25]

# Now write a line of code to create a list of the first five odd numbers and store it in a variable odd_numbers
odd_numbers= [1, 3, 5, 7, 9]

# Print the first element of squares_list
print (squares_list[0])

# Print the second to fourth elements of squares_list
print(squares_list[1:4])

Practice it in DataCamp here

What is Linear Regression?

Linear regression is a basic and commonly used type of predictive analysis

FYI, Predictive analytics involves extracting data from existing data sets with the goal of identifying trends and patterns. These trends and patterns are then used to predict future outcomes and trends. Predictive analytics is the branch of the advanced analytics which is used to make predictions about unknown future events. Predictive analytics uses many techniques from data mining, statistics, modeling, machine learning, and artificial intelligence to analyze current data to make predictions about future. It uses a number of data mining, predictive modeling and analytical techniques to bring together the management, information technology, and modeling business process to make predictions about future. The patterns found in historical and transactional data can be used to identify risks and opportunities for future. Predictive analytics models capture relationships among many factors to assess risk with a particular set of conditions to assign a score, or weightage.

The overall idea of regression is to examine two things:

(1) does a set of predictor variables do a good job in predicting an outcome (dependent) variable?

(2) Which variables in particular are significant predictors of the outcome variable, and in what way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable?

 

These regression estimates are used to explain the relationship between one dependent variable and one or more independent variables.  The simplest form of the regression equation with one dependent and one independent variable is defined by the formula y = c + b*x, where y = estimated dependent variable score, c = constant, b = regression coefficient, and x = score on the independent variable.

 

There are several types of linear regression analyses available to researchers.

  • Simple linear regression
    1 dependent variable (interval or ratio), 1 independent variable (interval or ratio or dichotomous)

 

  • Multiple linear regression
    1 dependent variable (interval or ratio) , 2+ independent variables (interval or ratio or dichotomous)

 

  • Logistic regression
    1 dependent variable (dichotomous), 2+ independent variable(s) (interval or ratio or dichotomous)

 

  • Ordinal regression
    1 dependent variable (ordinal), 1+ independent variable(s) (nominal or dichotomous)

 

  • Multinominal regression
    1 dependent variable (nominal), 1+ independent variable(s) (interval or ratio or dichotomous)

 

  • Discriminant analysis
    1 dependent variable (nominal), 1+ independent variable(s) (interval or ratio)

When selecting the model for the analysis, an important consideration is model fitting.  Adding independent variables to a linear regression model will always increase the explained variance of the model (typically expressed as R²).  However, overfitting can occur by adding too many variables to the model, which reduces model generalizability.  Occam’s razor describes the problem extremely well – a simple model is usually preferable to a more complex model.  Statistically, if a model includes a large number of variables, some of the variables will be statistically significant due to chance alone.

 

Article Source: statisticssolutions

A Guide to Augmented Reality

Augmented Reality turns the environment around you into a digital interface by placing virtual objects in the real world, in real-time. Augmented Reality can be seen through a wide variety of experiences. We distinguish 3 main categories of Augmented Reality tools.

Augmented Reality 3D viewers, like Augment, allow users to place life-size 3D models in your environment with or without the use of trackers. Trackers are simple images that 3D models can be attached to in Augmented Reality.

Augmented Reality browsers enrich your camera display with contextual information. For example, you can point your smartphone at a building to display its history or estimated value.

The last way that Augmented Reality is generally experienced is through gaming, creating immersive gaming experiences that utilize your actual surroundings. Imagine shooting games with zombies walking in your own bedroom! The biggest use of Augmented Reality gaming to-date is Pokémon Go, allowing users to catch virtual Pokémon who are hidden throughout a map of the real world.

Download free Augmented Reality eBook to learn more from augumented.com here

Webinar: Enhancing Fraud Detection with Automated Machine Learning and Streaming Analytics

Webinar: Tuesday, February 13, 1:00 pm ET / 10:00 am PT
Register now

Building predictive applications allows companies to respond to new threats and take advantage of developing opportunities. But executing these new applications against high-volume event streams with sub-second latency requires a powerful combination of machine learning and streaming analytics.

In this webinar, you’ll learn how to create and evaluate new machine learning models with DataRobot and deploy them within the SQLstream Blaze streaming analytics engine – so that you can identify risk in real-time and prevent fraud as it happens – rather than after the fact.

On this 45-minute webinar, you’ll discover how Automated Machine Learning and Streaming Analytics provides:

– Automated machine learning models that can be created by anyone
– Rapid deployment against incoming, high-volume events with extremely low-latency
– The ability to update those models seamlessly – with no downtime
– Deep transparency, including prediction reason codes, to enable rapid, targeted investigations

Speakers:
Greg Michaelson, PhD – Head of DataRobot Labs
David Hickman – Senior Director, Product Marketing, SQLstream

Pickle and cPickle in Python

Serializing and De-serializing a Python object structure can be done with pickle . Pickle module serialize the object before writing it to specific file. converting the python objects such as list, dict etc into stream of characters is called as pickling. Information contained by thus converted character stream can be used to reconstruct those object in another python script.

Before getting started, do consider the fact that pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.

Lets get started with pickling up python list.

First of all you need to import pickle module as

import pickle

Now, import the pickled object and assign it to a variable

a = ['test value','test value 2','test value 3']
a
['test value','test value 2','test value 3']

file_Name = "testfile"
# open the file for writing
fileObject = open(file_Name,'wb') 

# this writes the object a to the
# file named 'testfile'
pickle.dump(a,fileObject)   

# here we close the fileObject
fileObject.close()
# we open the file for reading
fileObject = open(file_Name,'r')  
# load the object from the file into var b
b = pickle.load(fileObject)  
b
['test value','test value 2','test value 3']
a==b
True


There is also a predecessor to python named cpickle and according to official documentation it is 1000 times faster because of use of C-language.

For more on documentation of Pickle and CPickle click here

Thanks.

MS Excel Statistical Functions for Central Tendency

AVERAGE
Provides an estimate of the mean or expected value of the data. To use it, supply the range
of data you want the average of “=AVERAGE(data range).” For example, if you wanted the
average of the observations of the data in cells A1 through A10, you might enter
=AVERAGE (A1:A10) into cell A12.

MEDIAN
Returns the median or 50th percentile of the data. To use it, supply the range of data you
want the median of “=MEDIAN (data range).”

TRIMMEAN
Computing a trimmed mean is a way to temper the influence of extreme values in the estimate. A trimmed mean excludes k% of the data from the calculation, where k is a judgmentally selected proportion of the data to exclude. The calculation excludes the k/2% highest and lowest values. To use, supply the range of data you want the trimmed mean of
and specify the proportion to exclude. The proportion must be between zero and one
“=TRIMMEAN(data range, proportion).”

GEOMEAN
The geometric mean is a measure that is often used for data that are expressed as rates of change (such as the return on stocks or other investments). In a sample of size n, it returns the nth root of the product of all n sample items and is particularly relevant for data that follow the logNormal distribution. “=GEOMEAN(data range)”.

WEIGHTED AVERAGES
Actuaries often used weighted averages rather than arithmetic averages. For instance, it is common to compute weighted average loss development factors. The rational behind a weighted average is to produce a more stable estimate than can be obtained from an unweighted average. When using a weighted average, a common practice is to select a weightfor an observation that is believed to be inversely proportional to its variance so that
observations contributing more variability to the estimate are given less weight.

For instance when computing average severities for a given accident period and development age, those severities based on cells with higher claim volumes will be more stable than the severities based on cells with lower claim volumes. It therefore makes sense to give more weight to the cells with larger claim volumes when fitting a severity model using the data. This can be accomplished by using claim count as the weight variable. Excel does not have a function that can be used to compute weighted averages. However
certain functions are helpful when computing weighted averages:

SUMPRODUCT
Computes the product of two columns or rows. If one of the columns is the weight associated with each observation and the other is the sample values, the weighted average can be computed.

“=SUMPRODUCT(weight variable range, sample value variable range)”

gives the same result as creating a column that is the product of the column with the first variable times the column with the second variable, and then summing the result. Dividing the SUMPRODUCT by the SUM of the weight variable range results in the weighted average.

SUMIF
Calculates a conditional sum for the values in sum range which met the specified criteria using “=SUMIF(criterion range, criterion, sum range).”

For instance, if you wish to sum only the losses at 12 months of maturity for which there also losses at 24 months of maturity (i.e., drop the most recent period from the sum), using the criterion “>0”, along with specification of the 24 month values as the criterion range and the 12 month values as the sum range will accomplish this.

Uni-variate distribution relationships

In statistics, a univariate distribution is a probability distribution of only one random variable. This is in contrast to a multivariate distribution, the probability distribution of a random vector (consisting of multiple random variables). Below is the quick overview of useful univariate relationships for aspiring data science lovers.