To succeed as a Data Scientist, one must possess appropriate skills and qualities and develop relevant expertise. To take the first steps towards becoming a Data Scientist, it, thus, becomes very important to understand what a Data Scientist does every day. A Data Scientist spends 40% of the time in doing data related work, i.e. understanding the data, transforming the data, visualizing the data, doing exploratory analysis, understanding null values, imputing values through suitable rules and logics and understanding the problem and business case. Further, 40% of time is spent in going through a list of numerous available algorithms, reviewing the logical and mathematical basis of relevant algorithms, choosing the appropriate algorithm based on problem at hand to be solved and adopting, diagnosing and improving the selected algorithm and model for best possible solution. Further 20% of time is spent in coding related to modelling.
Introduction to R-studio, mathematical and logical operators in R, Data types and data structures, simple operations and programs, matrix operations
Data frames, string operations, factors, handling categorical data, lists and list operations
Loops and conditional statements, switch and break function
Apply functions: apply,sapply, lapply, tapply, mapply
Statistical problem solving in R
Visualizations in R
Hands-on data manipulations: cleaning, sub-setting, sampling, data transformations and allied data operations
Hands-on: Modelling on linear regression (continuous Dependent Variable(CDV)), logistic regression (discreet Dependant Variable(DDV)), SVM (DDV and CDV), decision trees(DDV and CDV), random forests(DDV and CDV), Naïve Bayes and clustering
Evaluation of and improvement in learning algorithms: Evaluation of learning algorithms, test/validation/train concepts, model selection, diagnosing bias and variance, regularization and bias/variance, learning curves, error analysis and trade-off between precision and recall, cross-validation concepts
Concept of statistics, population, sample, parameter and statistic, examples of use of statistic, data sources, representation of data, types of statistical analyses, sampling methods, types of variables, measures of central tendency, statistical estimation: point and interval, co-variance, coefficient of correlation, formulae
Permutations and combinations, Probability concepts, types of probabilities, collectively exhaustive event set, joint probability, Bayes Theorem, probability distribution for a discreet random variable, probabilistic view on variance, covariance
Distributions: Bernoulli’s trail, binomial distribution, Poisson distribution, Hypergeometric distribution, student-t distribution, Chi-square distribution, F- distribution, Normal distribution, explanation of derivation of population parameter through samples and central limit theorem, Z score
Hypothesis and testing, single parameter and two-parameter testing, single sided and two-sided testing, p-value, tests and test statistic and logic behind it, problems on hypothesis testing, diagnostic tests: goodness of fit, t-test, f-test and chi-sq test, contingency table, degree of freedom, analysis of variances
Regression and allied concepts, data transformation, Linear and Matrix algebra concepts
Supervised, Unsupervised and Reinforcement Learning, geometry (lines, curves and 3D spaces) and visualisation of algebraic concepts
Regression as a concept, simple one variable regression line, coefficients of the line, assumptions of linear regression, Gradient descent algorithm, cost function to find 'beta' values and concept, local and global minima, concept of learning rate
Matrix representation of problem, Gradient descent for multiple features, use of feature scaling techniques in gradient descent, types of feature scaling, finding coefficients analytically, normal equation (matrix)non-invertibility
Logistic regression model, matrix representation, general Sigmoid function and graphical representation, decision boundary (linear and non-linear), metrics for logistic regression (accuracy, sensitivity, specificity etcetera concepts), Receiver-operating characterstic (RoC) curve, use of RoC curve to find out optimum decision boundary, convexity and non-convexity of a group of points
Optimization objective from logistic regression to support vector machines, large margin classifier, concepts behind large margin classifications,kernels (concept, types and graphical explanations), using SVM
Decision trees and random forests:Concept, diagramatic representation, random forest as a voting committee of decision trees, parameter meaning and explanation.
Naive Bayes: Venn diagrams, Naive Bayes algorithm, application and problems, Naive Bayes learning, Bayesian inference, Retail basket analysis; Concept of boosting and bagging
Unsupervised learning methods/Clustering: K-means algorithm, optimization objective, graphical representation, random initialization, choosing number of clusters
Association rule mining, K-nearest neighbours algorithm.
Text Processing : Term Document Matrix, TF-IDF, Word Cloud, Recommendations Systems.
Sentiment Analysis : Liner classifier, predicting sentiments, positive words, negative words, vocabulary building , scoring , training and evaluating classifer.
|Date||Time (IST)||City||Location||Price*(Tax not included)|
|Date||Time (IST)||City||Location||Price*(Tax not included)|
|2017-12-09||11:30 AM - 01:30 PM||Pune||214, 2nd floor, B bulding, G-O SQUARE, Mankar Chowk, Wakad, Pune, Maharashtra 411057||FREE Introductory Session|
|2017-12-17||11:30 AM - 01:30 PM||Pune||214, 2nd floor, B bulding, G-O SQUARE, Mankar Chowk, Wakad, Pune, Maharashtra 411057||FREE Introductory Session|
|2018-01-06||10:00 AM - 12:30 PM||Pune||214, 2nd floor, B bulding, G-O SQUARE, Mankar Chowk, Wakad, Pune, Maharashtra 411057||
30000 - 35000 +Tax
An IITB and NITIE alumni, has worked with TCS, Citibank and Polaris Financial Technology. He has excellent domain knowledge in banking , investment banking, retail and manufacturing space. He is passionate trainer and mentor, loves maths and data. He is industry consultant for Data Science projects. Extensively worked on R and Python platforms.
The coach for this program, D. Lokhande, is a practicing data scientist. An IIM alumni, he has extensive experience in implementation of various Machine Learning techniques to real world problems. He has mastered the ability to melt business logic and data science techniques in the furnace of problem solving to forge ground breaking solutions. He has worked on many mission critical data science projects which makes him the right person to coach aspiring data scientists.
Data Scientist has been tagged as the sexiest job of 21st century by Harvard Business Review. From the year 2000 to 2010, there was a boom in the internet based market. With internet becoming more and more accessible to the masses through various gadgets, more and more data generated. Now, is the time to analyse this data and introduce efficiencies in the existing business processes. This era is thus, experiencing a boom in the job market for people who are able to handle and understand data and bring out interesting insights from the data to improve the business. To take advantage of this cycle of the Technology boom, one must undergo a good course on Data Science.
Data Science, like other ground breaking technologies, is a philosophy. It has a lot of logical and mathematical basis and background. To effectively use the machine learning and related techniques, one must know the underlying theory. Different platforms like R, Python, MATLAB deploy same theory and create similar functions for implementing data science based solutions and analyses. Thus, if one knows the underlying theory and fundamentals, moving from one platform to another would just require learning the syntax changes.
One must be prepared to learn a lot of things very quickly. Data Science as a sector is moving at a fast rate and one needs to keep on learning new things to stay up to date. One must have an open mind towards mathematics and statistics and be ready to put in the required effort to understand the crucial concepts to reap benefits from the growth in this sector.
Data Science concepts have been deployed on many platforms. Based on the level of complexity of the task at hand, one may use tools ranging from MS Excel and Tableau to R, Python, C++ and Java. In practice, some tasks are better executed in MS Excel than in R, some tasks are better executed in Python than in say, MS Excel. Having sound basic knowledge about data manipulation, transformation and analysis is what is required if one wants to use the many available platforms for data science in an integrated fashion. Generally speaking, knowing R programming or Python is enough to get one started.
Participants of LIPS Data Science program can expect themselves to be equipped with the theory of commonly used algorithms in the industry, data related manipulations, logics, statistics, mathematics and can expect themselves to have appropriate hands-on experience in handling day to day work of a typical data scientist.
Talking about the job roles that one might get after doing any kind of course depends on a lot of things including the background of the participant, previous experience, the effort put into learning things taught in the course and communication skills. To generalise, an average performing participant of this course may expect to get roles matching the profiles of, but not limited to, Junior Data Scientist, Data Scientist, Business Analyst, Business Intelligence Officer, Senior Business Analyst, Data Science Programmer and Data Science Developer.
Gaining a lot of knowledge and wisdom and appropriate hands-on experience is required if one wants to succeed in this program and secure the desired job role. The program is expected to be of average to high rigor for its participants
The course duration is three months.