To succeed as a Data Scientist, one must possess appropriate skills and qualities and develop relevant expertise. To take the first steps towards becoming a Data Scientist, it, thus, becomes very important to understand what a Data Scientist does every day. A Data Scientist spends 40% of the time in doing data related work, i.e. understanding the data, transforming the data, visualizing the data, doing exploratory analysis, understanding null values, imputing values through suitable rules and logics and understanding the problem and business case. Further, 40% of time is spent in going through a list of numerous available algorithms, reviewing the logical and mathematical basis of relevant algorithms, choosing the appropriate algorithm based on problem at hand to be solved and adopting, diagnosing and improving the selected algorithm and model for best possible solution. Further 20% of time is spent in coding related to modelling.
INTRODUCTORY OFFER - "FOR LIPSINDIA ALUMNI Rs.15,000/- Only" AND "Rs.25,000/- FOR LIPSINDIA ALUMNI REFERRALS"
Introduction to R programming, Installation of R, CRAN repository, Introduction to R studio IDE, Installing and using R packages, Operators in R programming: assignment, arithmetic, logical and relational operators in R, Data structures in R: scalar, vector, matrices, data frames, lists, factors, Data structures: accessing, using and altering each data type,
Conditional statements: if, else, else if, switch, Loops, break and next statements, Working with data frames and csv files: importing, operations and exporting, apply, sapply, lapply, tapply and mapply functions, functions used for Exploratory Data Analysis,
Data representation: types of plots, types of charts and inferences, Data representation: Introduction to GGplot2 package, Statistical problem solving via R, Hands-on data manipulations: cleaning, sub-setting, sampling anddata transformations.
Modelling on linear regression (continuous Dependant Variable(CDV)), logistic regression (discreet Dependant Variable(DDV)) and SVM (DDV and CDV), Modelling ondecision trees(DDV and CDV), random forests(DDV and CDV), exposure to Naive Bayes and clustering algorithms, Implementation of evaluation of and improvement in learning algorithms,
Introduction: types of data, concept of statistics, population, sample, parameter and statistic, uses of statistical concepts, data sources, Introduction: representation of data, types of statistical analyses, sampling methods, distance measures.
Tools of Exploratory Data Analysis: measures of central tendency, statistical estimation, Measures of central tendency as moments, co-variance, coefficient of correlation,
Combinatorics and Probability:Permutations and combinations, Probability concepts, collectively exhaustive event set, joint probability, Bayes Theorem, probability distribution for a discreet random variable, graphical representation of Bayes Event Space, Normal distribution: derivation of population parameter, central limit theorem, Z score.
Distributions: Bernoulli’s trail, binomial distribution, Poisson distribution, Hypergeometric distribution, student-t distribution, Chi-square distribution, F- distribution,
Hypothesis testing: null vs alternate hypothesis, types of errors, contingency tables, single parameter and two-parameter testing, single sided and two-sided testing, p-value, tests and test statistic, problems on hypothesis testing
Diagnostic tests: goodness of fit, t-test, f-test and chi-sq test, contingency table, degree of freedom, analysis of variances, problem solving
Regression and allied concepts, data transformation, Linear and Matrix algebra concepts, Intuition about limits, derivative and integration, maxima and minima
Introduction: concepts in machine learning, types of learning methods, geometry and visualisation of algebraic concepts (contour plots), graphical representation and intuition on machine learning
Linear Regression I: simple one variable regression line, coefficients of the line, assumptions of linear regression, Gradient descent algorithm, cost function, local and global minima, learning rate
Linear Regression II: matrix representation, Gradient descent for multiple features, feature scaling techniques, polynomial regression, finding coefficients analytically, normal equation (matrix) non-invertibility
Classification problems: graphical representation, Logistic regression model, matrix representation, general Sigmoid function and graphical representation, decision boundary (linear and non-linear), metrics for logistic regression, Receiver-operating characterstic (RoC) curve, optimum decision boundary, convexity and non-convexity of data
Non-linear decision boundary: optimization objective from logistic regression to support vector machines, large margin classifier, kernels, using SVM
Decision trees: concept, impurity or entropy measures, splitting criteria, use of Gini index, Random forests: random forest as a voting committee of decision trees, parameters, using random forests
Evaluation of learning algorithms: test/validation/train concepts, model selection, diagnosing bias and variance, regularization, learning curves, error analysis and trade-off between precision and recall, cross-validation concepts,Unsupervised learning methods: clustering, K-nearest neighbours and K-means algorithm, optimization objective, random initialization
Introduction to the tool and platform, Use in exploratory data analysis
Creating calculated fields, Labels, colours and formatting
Types of joins, Filters, aggregations and time-period based calculations
|2017-01-14||06:30 PM - 09:30 PM||Pune||47/2 ,Sankla Arcade , First floor ( opposite BSNL telephone exchange) Nal Stop , Karve Road, Pune Maharashtra 411004||
35000 - 15000
×You will be reminded 3day's in advance via SMS/Email.
The coach for this program, D. Lokhande, is a practicing data scientist. An IIM alumni, he has extensive experience in implementation of various Machine Learning techniques to real world problems. He has mastered the ability to melt business logic and data science techniques in the furnace of problem solving to forge ground breaking solutions. He has worked on many mission critical data science projects which makes him the right person to coach aspiring data scientists.
Data Scientist has been tagged as the sexiest job of 21st century by Harvard Business Review. From the year 2000 to 2010, there was a boom in the internet based market. With internet becoming more and more accessible to the masses through various gadgets, more and more data generated. Now, is the time to analyse this data and introduce efficiencies in the existing business processes. This era is thus, experiencing a boom in the job market for people who are able to handle and understand data and bring out interesting insights from the data to improve the business. To take advantage of this cycle of the Technology boom, one must undergo a good course on Data Science.
Data Science, like other ground breaking technologies, is a philosophy. It has a lot of logical and mathematical basis and background. To effectively use the machine learning and related techniques, one must know the underlying theory. Different platforms like R, Python, MATLAB deploy same theory and create similar functions for implementing data science based solutions and analyses. Thus, if one knows the underlying theory and fundamentals, moving from one platform to another would just require learning the syntax changes.
One must be prepared to learn a lot of things very quickly. Data Science as a sector is moving at a fast rate and one needs to keep on learning new things to stay up to date. One must have an open mind towards mathematics and statistics and be ready to put in the required effort to understand the crucial concepts to reap benefits from the growth in this sector.
Data Science concepts have been deployed on many platforms. Based on the level of complexity of the task at hand, one may use tools ranging from MS Excel and Tableau to R, Python, C++ and Java. In practice, some tasks are better executed in MS Excel than in R, some tasks are better executed in Python than in say, MS Excel. Having sound basic knowledge about data manipulation, transformation and analysis is what is required if one wants to use the many available platforms for data science in an integrated fashion. Generally speaking, knowing R programming or Python is enough to get one started.
Participants of LIPS Data Science program can expect themselves to be equipped with the theory of commonly used algorithms in the industry, data related manipulations, logics, statistics, mathematics and can expect themselves to have appropriate hands-on experience in handling day to day work of a typical data scientist.
Talking about the job roles that one might get after doing any kind of course depends on a lot of things including the background of the participant, previous experience, the effort put into learning things taught in the course and communication skills. To generalise, an average performing participant of this course may expect to get roles matching the profiles of, but not limited to, Junior Data Scientist, Data Scientist, Business Analyst, Business Intelligence Officer, Senior Business Analyst, Data Science Programmer and Data Science Developer.
Gaining a lot of knowledge and wisdom and appropriate hands-on experience is required if one wants to succeed in this program and secure the desired job role. The program is expected to be of average to high rigor for its participants
The course duration is three months.