Algorithms for data science

This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.

Saved in:
Bibliographic Details
Main Authors: Steele, Brian autor/a, Chandler, John autor/a, Reddy, Swarna autor/a
Format: Texto biblioteca
Language:eng
Published: New York, New York, United States Springer Science+Business Media 2016
Subjects:Algoritmos, Modelos matemáticos, Atención médica, Análisis cluster,
Online Access:https://link.springer.com/book/10.1007/978-3-319-45797-0
Tags: Add Tag
No Tags, Be the first to tag this record!
id KOHA-OAI-ECOSUR:42045
record_format koha
institution ECOSUR
collection Koha
country México
countrycode MX
component Bibliográfico
access En linea
En linea
databasecode cat-ecosur
tag biblioteca
region America del Norte
libraryname Sistema de Información Bibliotecario de ECOSUR (SIBE)
language eng
topic Algoritmos
Modelos matemáticos
Atención médica
Análisis cluster
Algoritmos
Modelos matemáticos
Atención médica
Análisis cluster
spellingShingle Algoritmos
Modelos matemáticos
Atención médica
Análisis cluster
Algoritmos
Modelos matemáticos
Atención médica
Análisis cluster
Steele, Brian autor/a
Chandler, John autor/a
Reddy, Swarna autor/a
Algorithms for data science
description This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.
format Texto
topic_facet Algoritmos
Modelos matemáticos
Atención médica
Análisis cluster
author Steele, Brian autor/a
Chandler, John autor/a
Reddy, Swarna autor/a
author_facet Steele, Brian autor/a
Chandler, John autor/a
Reddy, Swarna autor/a
author_sort Steele, Brian autor/a
title Algorithms for data science
title_short Algorithms for data science
title_full Algorithms for data science
title_fullStr Algorithms for data science
title_full_unstemmed Algorithms for data science
title_sort algorithms for data science
publisher New York, New York, United States Springer Science+Business Media
publishDate 2016
url https://link.springer.com/book/10.1007/978-3-319-45797-0
work_keys_str_mv AT steelebrianautora algorithmsfordatascience
AT chandlerjohnautora algorithmsfordatascience
AT reddyswarnaautora algorithmsfordatascience
_version_ 1762930549249277952
spelling KOHA-OAI-ECOSUR:420452023-03-18T12:26:24ZAlgorithms for data science Steele, Brian autor/a Chandler, John autor/a Reddy, Swarna autor/a textNew York, New York, United States Springer Science+Business Media2016engThis textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.Incluye bibliografía: páginas 419-421 e índice: páginas 423-4301 Introduction.. 1.1 What Is Data Science?.. 1.2 Diabetes in America.. 1.3 Authors of the Federalist Papers.. 1.4 Forecasting NASDAQ Stock Prices.. 1.5 Remarks.. 1.6 The Book.. 1.7 Algorithms.. 1.8 Python.. 1.9 R.. 1.10 Terminology and Notation.. 1.10.1 Matrices and Vectors.. 1.11 Book Website.. Part I Data Reduction.. 2 Data Mapping and Data Dictionaries.. 2.1 Data Reduction.. 2.2 Political Contributions.. 2.3 Dictionaries.. 2.4 Tutorial: Big Contributors.. 2.5 Data Reduction.. 2.5.1 Notation and Terminology.. 2.5.2 The Political Contributions Example.. 2.5.3 Mappings.. 2.6 Tutorial: Election Cycle Contributions.. 2.7 Similarity Measures.. 2.7.1 Computation.. 2.8 Tutorial: Computing Similarity.. 2.9 Concluding Remarks About Dictionaries.. 2.10 Exercises.. 2.10.1 Conceptual.. 2.10.2 Computational.. 3 Scalable Algorithms and Associative Statistics.. 3.1 Introduction.. 3.2 Example: Obesity in the United States.. 3.3 Associative Statistics.. 3.4 Univariate Observations.. 3.4.1 Histograms.. 3.4.2 Histogram Construction.. 3.5 Functions.. 3.6 Tutorial: Histogram Construction.. 3.6.1 Synopsis.. 3.7 Multivariate Data.. 3.7.1 Notation and Terminology.. 3.7.2 Estimators.. 3.7.3 The Augmented Moment Matrix.. 3.7.4 Synopsis.. 3.8 Tutorial: Computing the Correlation Matrix.. 3.8.1 Conclusion.. 3.9 Introduction to Linear Regression.. 3.9.1 The Linear Regression Model.. 3.9.2 The Estimator of β.. 3.9.3 Accuracy Assessment.. 3.9.4 Computing R²adjusted.. 3.10 Tutorial: Computing β.. 3.10.1 Conclusion.. 3.11 Exercises.. 3.11.1 Conceptual.. 3.11.2 Computational.. 4 Hadoop and MapReduce.. 4.1 Introduction.. 4.2 The Hadoop Ecosystem.. 4.2.1 The Hadoop Distributed File System.. 4.2.2 MapReduce.. 4.2.3 Mapping.. 4.2.4 Reduction.. 4.3 Developing a Hadoop Application.. 4.4 Medicare Payments.. 4.5 The Command Line Environment.. 4.6 Tutorial: Programming a MapReduce Algorithm.. 4.6.1 The Mapper.. 4.6.2 The Reducer.. 4.6.3 Synopsis4.7 Tutorial: Using Amazon Web Services.. 4.7.1 Closing Remarks.. 4.8 Exercises.. 4.8.1 Conceptual.. 4.8.2 Computational.. Part II Extracting Information from Data.. 5 Data Visualization.. 5.1 Introduction.. 5.2 Principles of Data Visualization.. 5.3 Making Good Choices.. 5.3.1 Univariate Data.. 5.3.2 Bivariate and Multivariate Data.. 5.4 Harnessing the Machine.. 5.4.1 Building Fig. 5.2.. 5.4.2 Building Fig. 5.3.. 5.4.3 Building Fig. 5.4.. 5.4.4 Building Fig. 5.5.. 5.4.5 Building Fig. 5.8.. 5.4.6 Building Fig. 5.10.. 5.4.7 Building Fig. 5.11.. 5.5 Exercises.. 6 Linear Regression Methods.. 6.1 Introduction.. 6.2 The Linear Regression Model.. 6.2.1 Example: Depression, Fatalism, and Simplicity.. 6.2.2 Least Squares.. 6.2.3 Confidence Intervals.. 6.2.4 Distributional Conditions.. 6.2.5 Hypothesis Testing.. 6.2.6 Cautionary Remarks.. 6.3 Introduction to R.. 6.4 Tutorial: R.. 6.4.1 Remark.. 6.5 Tutorial: Large Data Sets and R.. 6.6 Factors.. 6.6.1 Interaction.. 6.6.2 The Extra Sums-of-Squares F-test.. 6.7 Tutorial: Bike Share.. 6.7.1 An Incongruous Result.. 6.8 Analysis of Residuals.. 6.8.1 Linearity.. 6.8.2 Example: The Bike Share Problem.. 6.8.3 Independence.. 6.9 Tutorial: Residual Analysis.. 6.9.1 Final Remarks.. 6.10 Exercises.. 6.10.1 Conceptual.. 6.10.2 Computational.. 7 Healthcare Analytics.. 7.1 Introduction.. 7.2 The Behavioral Risk Factor Surveillance System.. 7.2.1 Estimation of Prevalence.. 7.2.2 Estimation of Incidence.. 7.3 Tutorial: Diabetes Prevalence and Incidence.. 7.4 Predicting At-Risk Individuals.. 7.4.1 Sensitivity and Specificity.. 7.5 Tutorial: Identifying At-Risk Individuals.. 7.6 Unusual Demographic Attribute Vectors.. 7.7 Tutorial: Building Neighborhood Sets.. 7.7.1 Synopsis.. 7.8 Exercises.. 7.8.1 Conceptual.. 7.8.2 Computational.. 8 Cluster Analysis.. 8.1 Introduction.. 8.2 Hierarchical Agglomerative Clustering.. 8.3 Comparison of States.. 8.4 Tutorial: Hierarchical Clustering of States8.4.1 Synopsis.. 8.5 The k-Means Algorithm.. 8.6 Tutorial: The k-Means Algorithm.. 8.6.1 Synopsis.. 8.7 Exercises.. 8.7.1 Conceptual.. 8.7.2 Computational.. Part III Predictive Analytics.. 9 k-Nearest Neighbor Prediction Functions.. 9.1 Introduction.. 9.1.1 The Prediction Task.. 9.2 Notation and Terminology.. 9.3 Distance Metrics.. 9.4 The k-Nearest Neighbor Prediction Function.. 9.5 Exponentially Weighted k-Nearest Neighbors.. 9.6 Tutorial: Digit Recognition.. 9.6.1 Remarks.. 9.7 Accuracy Assessment.. 9.7.1 Confusion Matrices.. 9.8 k-Nearest Neighbor Regression.. 9.9 Forecasting the S&P 500.. 9.10 Tutorial: Forecasting by Pattern Recognition.. 9.10.1 Remark.. 9.11 Cross-Validation.. 9.12 Exercises.. 9.12.1 Conceptual.. 9.12.2 Computational.. 10 The Multinomial Naïve Bayes Prediction Function.. 10.1 Introduction.. 10.2 The Federalist Papers.. 10.3 The Multinomial Naïve Bayes Prediction Function.. 10.3.1 Posterior Probabilities.. 10.4 Tutorial: Reducing the Federalist Papers.. 10.4.1 Summary.. 10.5 Tutorial: Predicting Authorship of the Disputed Federalist Papers.. 10.5.1 Remark.. 10.6 Tutorial: Customer Segmentation.. 10.6.1 Additive Smoothing.. 10.6.2 The Data.. 10.6.3 Remarks.. 10.7 Exercises.. 10.7.1 Conceptual.. 10.7.2 Computational.. 11 Forecasting.. 11.1 Introduction.. 11.2 Tutorial: Working with Time.. 11.3 Analytical Methods.. 11.3.1 Notation.. 11.3.2 Estimation of the Mean and Variance.. 11.3.3 Exponential Forecasting.. 11.3.4 Autocorrelation.. 11.4 Tutorial: Computing ρτ.. 11.4.1 Remarks.. 11.5 Drift and Forecasting.. 11.6 Holt-Winters Exponential Forecasting.. 11.6.1 Forecasting Error.. 11.7 Tutorial: Holt-Winters Forecasting.. 11.8 Regression-Based Forecasting of Stock Prices.. 11.9 Tutorial: Regression-Based Forecasting.. 11.9.1 Remarks.. 11.10 Time-Varying Regression Estimators.. 11.11 Tutorial: Time-Varying Regression Estimators.. 11.11.1 Remarks.. 11.12 Exercises.. 11.12.1 Conceptual11.12.2 Computational.. 12 Real-time Analytics.. 12.1 Introduction.. 12.2 Forecasting with a NASDAQ Quotation Stream.. 12.2.1 Forecasting Algorithms.. 12.3 Tutorial: Forecasting the Apple Inc. Stream.. 12.3.1 Remarks.. 12.4 The Twitter Streaming API.. 12.5 Tutorial: Tapping the Twitter Stream.. 12.5.1 Remarks.. 12.6 Sentiment Analysis.. 12.7 Tutorial: Sentiment Analysis of Hashtag Groups.. 12.8 Exercises.. A Solutions to Exercises.. B Accessing the Twitter API.. References.. IndexThis textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses. This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners.Adobe Acrobat profesional 6.0 o superiorAlgoritmosModelos matemáticosAtención médicaAnálisis clusterDisponible en líneaAlgorithms for data sciencehttps://link.springer.com/book/10.1007/978-3-319-45797-0URN:ISBN:3319457950URN:ISBN:9783319457956Disponible para usuarios de ECOSUR con su clave de acceso