Algorithms for data science

Steele, Brian autor/a; Chandler, John autor/a; Reddy, Swarna autor/a

Algorithms for data science

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.

Saved in:

Bibliographic Details
Main Authors:	Steele, Brian autor/a, Chandler, John autor/a, Reddy, Swarna autor/a
Format:	Texto biblioteca
Language:	eng
Published:	New York, New York, United States Springer Science+Business Media 2016
Subjects:	Algoritmos, Modelos matemáticos, Atención médica, Análisis cluster,
Online Access:	https://link.springer.com/book/10.1007/978-3-319-45797-0
Tags:	Add Tag No Tags, Be the first to tag this record!

id	KOHA-OAI-ECOSUR:42045
record_format	koha
institution	ECOSUR
collection	Koha
country	México
countrycode	MX
component	Bibliográfico
access	En linea En linea
databasecode	cat-ecosur
tag	biblioteca
region	America del Norte
libraryname	Sistema de Información Bibliotecario de ECOSUR (SIBE)
language	eng
topic	Algoritmos Modelos matemáticos Atención médica Análisis cluster Algoritmos Modelos matemáticos Atención médica Análisis cluster
spellingShingle	Algoritmos Modelos matemáticos Atención médica Análisis cluster Algoritmos Modelos matemáticos Atención médica Análisis cluster Steele, Brian autor/a Chandler, John autor/a Reddy, Swarna autor/a Algorithms for data science
description	This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.
format	Texto
topic_facet	Algoritmos Modelos matemáticos Atención médica Análisis cluster
author	Steele, Brian autor/a Chandler, John autor/a Reddy, Swarna autor/a
author_facet	Steele, Brian autor/a Chandler, John autor/a Reddy, Swarna autor/a
author_sort	Steele, Brian autor/a
title	Algorithms for data science
title_short	Algorithms for data science
title_full	Algorithms for data science
title_fullStr	Algorithms for data science
title_full_unstemmed	Algorithms for data science
title_sort	algorithms for data science
publisher	New York, New York, United States Springer Science+Business Media
publishDate	2016
url	https://link.springer.com/book/10.1007/978-3-319-45797-0
work_keys_str_mv	AT steelebrianautora algorithmsfordatascience AT chandlerjohnautora algorithmsfordatascience AT reddyswarnaautora algorithmsfordatascience
_version_	1762930549249277952
spelling	KOHA-OAI-ECOSUR:420452023-03-18T12:26:24ZAlgorithms for data science Steele, Brian autor/a Chandler, John autor/a Reddy, Swarna autor/a textNew York, New York, United States Springer Science+Business Media2016engThis textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.Incluye bibliografía: páginas 419-421 e índice: páginas 423-4301 Introduction.. 1.1 What Is Data Science?.. 1.2 Diabetes in America.. 1.3 Authors of the Federalist Papers.. 1.4 Forecasting NASDAQ Stock Prices.. 1.5 Remarks.. 1.6 The Book.. 1.7 Algorithms.. 1.8 Python.. 1.9 R.. 1.10 Terminology and Notation.. 1.10.1 Matrices and Vectors.. 1.11 Book Website.. Part I Data Reduction.. 2 Data Mapping and Data Dictionaries.. 2.1 Data Reduction.. 2.2 Political Contributions.. 2.3 Dictionaries.. 2.4 Tutorial: Big Contributors.. 2.5 Data Reduction.. 2.5.1 Notation and Terminology.. 2.5.2 The Political Contributions Example.. 2.5.3 Mappings.. 2.6 Tutorial: Election Cycle Contributions.. 2.7 Similarity Measures.. 2.7.1 Computation.. 2.8 Tutorial: Computing Similarity.. 2.9 Concluding Remarks About Dictionaries.. 2.10 Exercises.. 2.10.1 Conceptual.. 2.10.2 Computational.. 3 Scalable Algorithms and Associative Statistics.. 3.1 Introduction.. 3.2 Example: Obesity in the United States.. 3.3 Associative Statistics.. 3.4 Univariate Observations.. 3.4.1 Histograms.. 3.4.2 Histogram Construction.. 3.5 Functions.. 3.6 Tutorial: Histogram Construction.. 3.6.1 Synopsis.. 3.7 Multivariate Data.. 3.7.1 Notation and Terminology.. 3.7.2 Estimators.. 3.7.3 The Augmented Moment Matrix.. 3.7.4 Synopsis.. 3.8 Tutorial: Computing the Correlation Matrix.. 3.8.1 Conclusion.. 3.9 Introduction to Linear Regression.. 3.9.1 The Linear Regression Model.. 3.9.2 The Estimator of β.. 3.9.3 Accuracy Assessment.. 3.9.4 Computing R²adjusted.. 3.10 Tutorial: Computing β.. 3.10.1 Conclusion.. 3.11 Exercises.. 3.11.1 Conceptual.. 3.11.2 Computational.. 4 Hadoop and MapReduce.. 4.1 Introduction.. 4.2 The Hadoop Ecosystem.. 4.2.1 The Hadoop Distributed File System.. 4.2.2 MapReduce.. 4.2.3 Mapping.. 4.2.4 Reduction.. 4.3 Developing a Hadoop Application.. 4.4 Medicare Payments.. 4.5 The Command Line Environment.. 4.6 Tutorial: Programming a MapReduce Algorithm.. 4.6.1 The Mapper.. 4.6.2 The Reducer.. 4.6.3 Synopsis4.7 Tutorial: Using Amazon Web Services.. 4.7.1 Closing Remarks.. 4.8 Exercises.. 4.8.1 Conceptual.. 4.8.2 Computational.. Part II Extracting Information from Data.. 5 Data Visualization.. 5.1 Introduction.. 5.2 Principles of Data Visualization.. 5.3 Making Good Choices.. 5.3.1 Univariate Data.. 5.3.2 Bivariate and Multivariate Data.. 5.4 Harnessing the Machine.. 5.4.1 Building Fig. 5.2.. 5.4.2 Building Fig. 5.3.. 5.4.3 Building Fig. 5.4.. 5.4.4 Building Fig. 5.5.. 5.4.5 Building Fig. 5.8.. 5.4.6 Building Fig. 5.10.. 5.4.7 Building Fig. 5.11.. 5.5 Exercises.. 6 Linear Regression Methods.. 6.1 Introduction.. 6.2 The Linear Regression Model.. 6.2.1 Example: Depression, Fatalism, and Simplicity.. 6.2.2 Least Squares.. 6.2.3 Confidence Intervals.. 6.2.4 Distributional Conditions.. 6.2.5 Hypothesis Testing.. 6.2.6 Cautionary Remarks.. 6.3 Introduction to R.. 6.4 Tutorial: R.. 6.4.1 Remark.. 6.5 Tutorial: Large Data Sets and R.. 6.6 Factors.. 6.6.1 Interaction.. 6.6.2 The Extra Sums-of-Squares F-test.. 6.7 Tutorial: Bike Share.. 6.7.1 An Incongruous Result.. 6.8 Analysis of Residuals.. 6.8.1 Linearity.. 6.8.2 Example: The Bike Share Problem.. 6.8.3 Independence.. 6.9 Tutorial: Residual Analysis.. 6.9.1 Final Remarks.. 6.10 Exercises.. 6.10.1 Conceptual.. 6.10.2 Computational.. 7 Healthcare Analytics.. 7.1 Introduction.. 7.2 The Behavioral Risk Factor Surveillance System.. 7.2.1 Estimation of Prevalence.. 7.2.2 Estimation of Incidence.. 7.3 Tutorial: Diabetes Prevalence and Incidence.. 7.4 Predicting At-Risk Individuals.. 7.4.1 Sensitivity and Specificity.. 7.5 Tutorial: Identifying At-Risk Individuals.. 7.6 Unusual Demographic Attribute Vectors.. 7.7 Tutorial: Building Neighborhood Sets.. 7.7.1 Synopsis.. 7.8 Exercises.. 7.8.1 Conceptual.. 7.8.2 Computational.. 8 Cluster Analysis.. 8.1 Introduction.. 8.2 Hierarchical Agglomerative Clustering.. 8.3 Comparison of States.. 8.4 Tutorial: Hierarchical Clustering of States8.4.1 Synopsis.. 8.5 The k-Means Algorithm.. 8.6 Tutorial: The k-Means Algorithm.. 8.6.1 Synopsis.. 8.7 Exercises.. 8.7.1 Conceptual.. 8.7.2 Computational.. Part III Predictive Analytics.. 9 k-Nearest Neighbor Prediction Functions.. 9.1 Introduction.. 9.1.1 The Prediction Task.. 9.2 Notation and Terminology.. 9.3 Distance Metrics.. 9.4 The k-Nearest Neighbor Prediction Function.. 9.5 Exponentially Weighted k-Nearest Neighbors.. 9.6 Tutorial: Digit Recognition.. 9.6.1 Remarks.. 9.7 Accuracy Assessment.. 9.7.1 Confusion Matrices.. 9.8 k-Nearest Neighbor Regression.. 9.9 Forecasting the S&P 500.. 9.10 Tutorial: Forecasting by Pattern Recognition.. 9.10.1 Remark.. 9.11 Cross-Validation.. 9.12 Exercises.. 9.12.1 Conceptual.. 9.12.2 Computational.. 10 The Multinomial Naïve Bayes Prediction Function.. 10.1 Introduction.. 10.2 The Federalist Papers.. 10.3 The Multinomial Naïve Bayes Prediction Function.. 10.3.1 Posterior Probabilities.. 10.4 Tutorial: Reducing the Federalist Papers.. 10.4.1 Summary.. 10.5 Tutorial: Predicting Authorship of the Disputed Federalist Papers.. 10.5.1 Remark.. 10.6 Tutorial: Customer Segmentation.. 10.6.1 Additive Smoothing.. 10.6.2 The Data.. 10.6.3 Remarks.. 10.7 Exercises.. 10.7.1 Conceptual.. 10.7.2 Computational.. 11 Forecasting.. 11.1 Introduction.. 11.2 Tutorial: Working with Time.. 11.3 Analytical Methods.. 11.3.1 Notation.. 11.3.2 Estimation of the Mean and Variance.. 11.3.3 Exponential Forecasting.. 11.3.4 Autocorrelation.. 11.4 Tutorial: Computing ρτ.. 11.4.1 Remarks.. 11.5 Drift and Forecasting.. 11.6 Holt-Winters Exponential Forecasting.. 11.6.1 Forecasting Error.. 11.7 Tutorial: Holt-Winters Forecasting.. 11.8 Regression-Based Forecasting of Stock Prices.. 11.9 Tutorial: Regression-Based Forecasting.. 11.9.1 Remarks.. 11.10 Time-Varying Regression Estimators.. 11.11 Tutorial: Time-Varying Regression Estimators.. 11.11.1 Remarks.. 11.12 Exercises.. 11.12.1 Conceptual11.12.2 Computational.. 12 Real-time Analytics.. 12.1 Introduction.. 12.2 Forecasting with a NASDAQ Quotation Stream.. 12.2.1 Forecasting Algorithms.. 12.3 Tutorial: Forecasting the Apple Inc. Stream.. 12.3.1 Remarks.. 12.4 The Twitter Streaming API.. 12.5 Tutorial: Tapping the Twitter Stream.. 12.5.1 Remarks.. 12.6 Sentiment Analysis.. 12.7 Tutorial: Sentiment Analysis of Hashtag Groups.. 12.8 Exercises.. A Solutions to Exercises.. B Accessing the Twitter API.. References.. IndexThis textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.Adobe Acrobat profesional 6.0 o superiorAlgoritmosModelos matemáticosAtención médicaAnálisis clusterDisponible en líneaAlgorithms for data sciencehttps://link.springer.com/book/10.1007/978-3-319-45797-0URN:ISBN:3319457950URN:ISBN:9783319457956Disponible para usuarios de ECOSUR con su clave de acceso

Algorithms for data science

Similar Items

Resource Map