内容简介
通过具体的例子、很少的理论以及两款成熟的Python框架:Scikit-Learn和TensorFlow,作者Aurélien Géron会帮助你掌握构建智能系统所需要的概念和工具。你将会学习到各种技术,从简单的线性回归及发展到深度神经网络。每章的练习有助于你运用所学到的知识,你只需要有一些编程经验就行了。
探索机器学习,尤其是神经网络
使用Scikit-Learn全程跟踪一个机器学习项目的例子
探索各种训练模型,包括:支持向量机、决策树、随机森林以及集成方法
使用TensorFlow库构建和训练神经网络
深入神经网络架构,包括卷积神经网络、循环神经网络和深度强化学习
学习可用于训练和缩放深度神经网络的技术
运用实际的代码示例,无需了解过多的机器学习理论或算法细节
作者简介
Aurélien Géron,是一名机器学习顾问。作为一名前Google职员,在2013至2016年间,他领导了YouTube视频分类团队。在2002至2012年间,他身为法国主要的无线ISP Wifirst的创始人和CTO,在2001年他还是Polyconseil的创始人和CTO,这家公司现在管理着电动汽车共享服务Autolib'。
精彩书评
“本书很好地介绍了利用神经网络解决问题的相关理论与实践。它涵盖了构建高效应用涉及的关键点以及理解新技术所需的背景知识。我向有兴趣学习实用机器学习的读者推荐这本书。”
—— Pete Warden
TensorFlow移动部门主管
目录
Preface
Part Ⅰ.The Fundamentals of Machine Learning
1. The Machine Learning Landscape
What Is Machine Learning?
Why Use Machine Learning?
Types of Machine Learning Systems
Supervised/Unsupervised Learning
Batch and Online Learning
Instance-Based Versus Model-Based Learning
Main Challenges of Machine Learning
Insufficient Quantity of Training Data
Nonrepresentative Training Data
Poor-Quality Data
Irrelevant Features
Overfitting the Training Data
Underfitting the Training Data tepping Back
Testing and Validating
Exercises
2. End-to-End Machine Learning Project
Working with Real Data
Look at the Big Picture
Frame the Problem
Select a Performance Measure
Check the Assumptions
Get the Data
Create the Workspace
Download the Data
Take a Quick Look at the Data Structure
Create a Test Set
Discover and Visualize the Data to Gain Insights
Visualizing Geographical Data
Looking for Correlations
Experimenting with Attribute Combinations
Prepare the Data for Machine Learning Algorithms
Data Cleaning
Handling Text and Categorical Attributes
Custom Transformers
Feature Scaling
Transformation Pipelines
Select and Train a Model
Training and Evaluating on the Training Set
Better Evaluation Using Cross-Validation
Fine-Tune Your Model
Grid Search
Randomized Search
Ensemble Methods
Analyze the Best Models and Their Errors
Evaluate Your System on the Test Set
Launch, Monitor, and Maintain Your System
Try It Out!
Exercises
3. Classification
MNIST
Training a Binary Classifier
Performance Measures
Measuring Accuracy Using Cross-Validation
Confusion Matrix
Precision and Recall
Precision/Recall Tradeoff
The ROC Curve
Multiclass Classification
Error Analysis
Multilabel Classification
Multioutput Classification
……
Part Ⅱ.Neural Networks and Deep Learning
A. Exercise Solutions
B. Machine Learning Project Checklist
C. SVM Dual Problem
D. Autodiff
E. Other Popular ANN Architectures
Index
精彩书摘
《Scikit-Learn与TensorFlow机器学习实用指南(影印版)》:
3.It is quite possible to speed up training of a bagging ensemble by distributing it across multiple servers, since each predictor in the ensemble is independent of the others.The same goes for pasting ensembles and Random Forests, for the same reason.However, each predictor in a boosting ensemble is built based on the previous predictor, so training is necessarily sequential, and you will not gain anything by distributing training across multiple servers.Regarding stacking ensembles, all the predictors in a given layer are independent of each other, so they can be trained in parallel on multiple servers.However, the predictors in one layer can only be trained after the predictors in the previous layer have all been trained.
4.With out-of-bag evaluation, each predictor in a bagging ensemble is evaluated using instances that it was not trained on (they were held out).This makes it pos-sible to have a fairly unbiased evaluation of the ensemble without the need for an additional validation set.Thus, you have more instances available for training, and your ensemble can perform slightly better.
5.When you are growing a tree in a Random Forest, only a random subset of the features is considered for splitting at each node.This is true as well for Extra-Trees, but they go one step further: rather than searching for the best possible thresholds, like regular Decision Trees do, they use random thresholds for each feature.This extra randomness acts like a form of regularization: if a Random Forest overfits the training data, Extra-Trees might perform better.Moreover, since Extra-Trees don't search for the best possible thresholds, they are much faster to train than Random Forests.However, they are neither faster nor slower than Random Forests when making predictions.
6.Ifyour AdaBoost ensemble underfits the training data, you can try increasing the number of estimators or reducing the regularization hyperparameters of the base estimator.You may also try slightly increasing the learning rate.
……
Scikit-Learn与TensorFlow机器学习实用指南(影印版) 电子书 下载 mobi epub pdf txt