Calibration Assignment Submission (Azure Machine Learning): Difference between revisions

From Expertiza_Wiki
Jump to navigation Jump to search
No edit summary
 
(11 intermediate revisions by the same user not shown)
Line 3: Line 3:
==Introduction==
==Introduction==


Azure ML offers features such as visual composition, large pallet of modules, an extensive library of starting templates and powerful [[machine learning algorithms]], a large collection of built-in transformation tasks, and support for frequently used data science programming languages like [[R (programming language)|R]] and [[Python (programming language)|Python]] which makes building common predictive analytics models quick and easy. After developing the model, one can deploy it on [[Microsoft Azure#Cloud services|Azure Cloud]] as a scalable and [[Fault tolerance|fault-tolerant]] [[web service]] using Azure's Machine Learning [[Application programming interface|API]]. The web service created by Machine Learning API are nothing but [[Representational state transfer|REST]] APIs that allows accessing the model from almost anywhere including web sites and customer applications, As patterns in data change over time, or if the user wants to add some new source of data, the deployed model can be easily retrained and updated programmatically through Azure ML API. Azure ML provides all such facilities through an interactive visual workspace called [https://studio.azureml.net/ Azure Machine Learning Studio.]
Azure ML offers features such as visual composition, large pallet of modules, an extensive library of starting templates and powerful [https://en.wikipedia.org/wiki/Machine_learning_algorithms machine learning algorithms], a large collection of built-in transformation tasks, and support for frequently used data science programming languages like [https://en.wikipedia.org/wiki/R_(programming_language) R] and [https://en.wikipedia.org/wiki/Python_(programming_language) Python] which makes building common predictive analytics models quick and easy. After developing the model, one can deploy it on [https://en.wikipedia.org/wiki/Microsoft_Azure#Cloud_services Azure Cloud] as a scalable and [https://en.wikipedia.org/wiki/Fault_tolerance fault-tolerant] [https://en.wikipedia.org/wiki/Web_service web service] using Azure's Machine Learning [https://en.wikipedia.org/wiki/Application_programming_interface API]. The web service created by Machine Learning API are nothing but [https://en.wikipedia.org/wiki/Representational_state_transfer REST] APIs that allows accessing the model from almost anywhere including web sites and customer applications, As patterns in data change over time, or if the user wants to add some new source of data, the deployed model can be easily retrained and updated programmatically through Azure ML API. Azure ML provides all such facilities through an interactive visual workspace called [https://studio.azureml.net/ Azure Machine Learning Studio.]


==Azure Machine Learning workflow==
==Azure Machine Learning workflow==
Line 11: Line 11:
===Data Collection & Management===
===Data Collection & Management===


It allows to use data through blobs and tables ([https://azure.microsoft.com/en-us/services/storage/ Azure Storage]), [[Relational database|relational data]] ([https://azure.microsoft.com/en-us/services/sql-database/ Azure SQL Database]), [[Apache Hadoop|Hadoop]] ([https://azure.microsoft.com/en-us/services/hdinsight/ Azure HDInsight]) and massive data stores ([https://azure.microsoft.com/en-us/solutions/data-lake/ Azure Data Lake]).
It allows to use data through blobs and tables ([https://azure.microsoft.com/en-us/services/storage/ Azure Storage]), [https://en.wikipedia.org/wiki/Relational_database relational data] ([https://azure.microsoft.com/en-us/services/sql-database/ Azure SQL Database]), [https://en.wikipedia.org/wiki/Apache_Hadoop Hadoop] ([https://azure.microsoft.com/en-us/services/hdinsight/ Azure HDInsight]) and massive data stores ([https://azure.microsoft.com/en-us/solutions/data-lake/ Azure Data Lake]).


===Machine Learning Service===  
===Machine Learning Service===  
Line 26: Line 26:
# Import raw data
# Import raw data
# Preprocess the data
# Preprocess the data
# Do feature engineering and data labeling (for [[supervised learning]] such as classification)
# Do feature engineering and data labeling (for [https://en.wikipedia.org/wiki/Supervised_learning supervised learning] such as classification)
# Train, score, and evaluate the model
# Train, score, and evaluate the model
# Model comparison and selection
# Model comparison and selection
Line 35: Line 35:
==Azure Machine Learning Studio==
==Azure Machine Learning Studio==


[https://azure.microsoft.com/en-us/documentation/articles/machine-learning-what-is-ml-studio/ Azure Machine Learning Studio] is an interactive, visual workspace to build, test, and iterate on a [[predictive analytics]] model. Following are the components of the Machine Learning Studio:
[https://azure.microsoft.com/en-us/documentation/articles/machine-learning-what-is-ml-studio/ Azure Machine Learning Studio] is an interactive, visual workspace to build, test, and iterate on a [https://en.wikipedia.org/wiki/Predictive_analytics predictive analytics] model. Following are the components of the Machine Learning Studio:


===Experiments===
===Experiments===
Line 56: Line 56:
==Cortana Analytics Gallery==
==Cortana Analytics Gallery==


Cortana Analytics [https://gallery.cortanaanalytics.com/ (formerly called as Machine Learning Gallery)] provides a gallery of sample machine learning models that one can use to quickly get started. Each model contains an experiment added by Microsoft, its partners or individual data scientists. <ref name="Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes">[isbn = 978-1-4842-0446-7 "Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes"]</ref> It also serves as a platform for users looking to learn from others, to get started developing their own solutions, or to contribute their own work to the advanced analytics community<ref name="Cortana Analytics Gallery">[https://gallery.cortanaanalytics.com/ "Cortana Analytics Gallery"]</ref>.
Cortana Analytics [https://gallery.cortanaanalytics.com/ (formerly called as Machine Learning Gallery)] provides a gallery of sample machine learning models that one can use to quickly get started. Each model contains an experiment added by Microsoft, its partners or individual data scientists. <ref name="Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes">"Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes"</ref> It also serves as a platform for users looking to learn from others, to get started developing their own solutions, or to contribute their own work to the advanced analytics community<ref name="Cortana Analytics Gallery">[https://gallery.cortanaanalytics.com/ "Cortana Analytics Gallery"]</ref>.


==Azure Marketplace==
==Azure Marketplace==
Line 72: Line 72:
[https://datamarket.azure.com/dataset/aml_labs/log_regression Binary classifier API]
[https://datamarket.azure.com/dataset/aml_labs/log_regression Binary classifier API]


Binary Classifier API is an example built with Microsoft Azure Machine Learning that fits a [[Logistic regression|logistic regression model]] to data input by the user and then outputs the predicted value for each of the observations in the data.  
Binary Classifier API is an example built with Microsoft Azure Machine Learning that fits a [https://en.wikipedia.org/wiki/Logistic_regression logistic regression model] to data input by the user and then outputs the predicted value for each of the observations in the data.  


==Comparison==
==Comparison==
Line 88: Line 88:
[https://aws.amazon.com/redshift/ AWS Redshift]
[https://aws.amazon.com/redshift/ AWS Redshift]


[[AWS S3]] table  
[https://en.wikipedia.org/wiki/AWS_S3 AWS S3] table  


|Text file uploaded into  
|Text file uploaded into  
Line 95: Line 95:
Google Spreadsheets
Google Spreadsheets


[[HTTPS]] requests
[https://en.wikipedia.org/wiki/AWS_S3 HTTPS] requests


API update calls
API update calls
Line 105: Line 105:
Web URL
Web URL


Hadoop [[HiveQL]]
Hadoop [https://docs.treasuredata.com/articles/hive HiveQL]
|-
|-
|Data formats
|Data formats
|[[CSV file]]
|[https://en.wikipedia.org/wiki/CSV_file CSV file]
[[Amazon S3|S3]] and [[Amazon Redshift|Redshift]] database
[https://en.wikipedia.org/wiki/Amazon_S3 S3] and [https://en.wikipedia.org/wiki/Amazon_Redshift Redshift] database
|Text file
|Text file
Spreadsheet
Spreadsheet


[[JSON]]
[https://en.wikipedia.org/wiki/JSON JSON]
|CSV and text files
|CSV and text files
[https://docs.treasuredata.com/articles/hive Hive SQL] tables  
[https://docs.treasuredata.com/articles/hive Hive SQL] tables  
Line 123: Line 123:
[http://www.cs.waikato.ac.nz/ml/weka/arff.html arff]
[http://www.cs.waikato.ac.nz/ml/weka/arff.html arff]


[[Zip (file format)|zip]]  
[https://en.wikipedia.org/wiki/Zip_(file_format) zip]  


[http://fileinfo.com/extension/rdata RData]
[http://fileinfo.com/extension/rdata RData]
Line 130: Line 130:
|100GB
|100GB
|text file: 2.5 GB
|text file: 2.5 GB
[[HTTP request]]: 2 MB
[https://en.wikipedia.org/wiki/HTTP_request HTTP request]: 2 MB
|10 GB
|10 GB
|-
|-
|[[Data type|Data types]]
|[https://en.wikipedia.org/wiki/Data_type Data types]
|boolean
|boolean
categorical
categorical
Line 155: Line 155:
string
string
|-
|-
|[[Data visualization]]
|[https://en.wikipedia.org/wiki/Data_visualization Data visualization]
|yes
|yes
table view
table view
Line 167: Line 167:
|yes
|yes
|-
|-
|[[Feature extraction|Feature normalization]]
|[https://en.wikipedia.org/wiki/Feature_extraction Feature extraction|Feature normalization]
|yes
|yes
|no
|no
|yes using [[Principal component analysis|PCA]]
|yes using [https://en.wikipedia.org/wiki/Principal_component_analysis PCA]
|-
|-
|[[Orthonormalization]]
|[https://en.wikipedia.org/wiki/Orthonormalization Orthonormalization]
|no
|no
|no
|no
|yes
|yes
|-
|-
|[[Imputation (statistics)|Missing value imputation]]
|[https://en.wikipedia.org/wiki/Imputation_(statistics) Missing value imputation]
|yes indirectly chain feature  
|yes indirectly chain feature  
imputation model with main
imputation model with main
Line 193: Line 193:
Terms used in the above comparison
Terms used in the above comparison


[[Amazon Web Services|AWS 3]]: Amazon Simple Storage Service (Amazon S3), provides developers and IT teams with secure, durable, highly-scalable object storage<ref name="Amazon Simple Storage Service (S3) - Object Storage">[https://aws.amazon.com/s3/ "Amazon Simple Storage Service (S3) - Object Storage"]</ref>.
[https://en.wikipedia.org/wiki/Amazon_Web_ServicesAWS 3]: Amazon Simple Storage Service (Amazon S3), provides developers and IT teams with secure, durable, highly-scalable object storage<ref name="Amazon Simple Storage Service (S3) - Object Storage">[https://aws.amazon.com/s3/ "Amazon Simple Storage Service (S3) - Object Storage"]</ref>.


Azure Machine Learning provides more flexibility in terms of the data formats that can be imported into its environment. It also supports more data visualization methods as compared to Amazon Machine Learning. The missing value imputaion can also be customized.
Azure Machine Learning provides more flexibility in terms of the data formats that can be imported into its environment. It also supports more data visualization methods as compared to Amazon Machine Learning. The missing value imputaion can also be customized.
Line 208: Line 208:


== References ==
== References ==
</references>
<references/>

Latest revision as of 04:32, 9 September 2016

Azure Machine Learning (also known as Azure ML) is a fully cloud-based end-to-end service by Microsoft for big data processing including creating, testing, operationalizing and managing predictive analytics models in the cloud. Azure ML is now a part of Microsoft's big data and advanced analytics offering 'Cortana Analytics Suite'.<ref name="Introduction to machine learning on Microsoft Azure">"Introduction to machine learning on Microsoft Azure"</ref>

Introduction

Azure ML offers features such as visual composition, large pallet of modules, an extensive library of starting templates and powerful machine learning algorithms, a large collection of built-in transformation tasks, and support for frequently used data science programming languages like R and Python which makes building common predictive analytics models quick and easy. After developing the model, one can deploy it on Azure Cloud as a scalable and fault-tolerant web service using Azure's Machine Learning API. The web service created by Machine Learning API are nothing but REST APIs that allows accessing the model from almost anywhere including web sites and customer applications, As patterns in data change over time, or if the user wants to add some new source of data, the deployed model can be easily retrained and updated programmatically through Azure ML API. Azure ML provides all such facilities through an interactive visual workspace called Azure Machine Learning Studio.

Azure Machine Learning workflow

Azure ML has mainly three components in its workflow to build model and operationalize machine learning solution.

Data Collection & Management

It allows to use data through blobs and tables (Azure Storage), relational data (Azure SQL Database), Hadoop (Azure HDInsight) and massive data stores (Azure Data Lake).

Machine Learning Service

It allows to create Machine learning models through ML Studio web apps and output a web service that can run on a scheduled basis.

Embedded ML Model

It allows to embed the developed machine learning models into other applications such as apps, websites or business intelligence tools.

Build and deploy machine learning models in Azure Machine Learning

Azure Machine Learning has many different modules to help you build and deploy machine learning models in production. Following are the key steps to build and deploy a machine learning model in Azure

  1. Import raw data
  2. Preprocess the data
  3. Do feature engineering and data labeling (for supervised learning such as classification)
  4. Train, score, and evaluate the model
  5. Model comparison and selection
  6. Save the trained model
  7. Create a predictive experiment
  8. Publish the web service in Azure Machine Learning

Azure Machine Learning Studio

Azure Machine Learning Studio is an interactive, visual workspace to build, test, and iterate on a predictive analytics model. Following are the components of the Machine Learning Studio:

Experiments

It provides a list of experiments that have been created, run, and saved as drafts. It also includes a set of sample experiments to help jumpstart the projects.

Web Services

It provides a list of experiments that are already published by the user. If no experiments have been published, this list will be empty.

Datasets

It provides a list of sample datasets that ship with the product along with datasets uploaded by the user.

Trained Models

It provides a list of any trained models that the user has saved from experiments.

Settings

It includes a collection of settings so as to configure resources and account.

In Machine Learning Studio, the user can construct a predictive model by dragging and dropping datasets and analysis modules onto the interactive canvas, connecting them together with lines to show the flow of data and parameters through the workflow to form an experiment, which can be run in ML Studio. Thus, each experiment is complete workflow with all components required to build, test, and evaluate a predictive analytics model. One can iteratively update the model design by editing the experiment, saving the copy and running it again.

Cortana Analytics Gallery

Cortana Analytics (formerly called as Machine Learning Gallery) provides a gallery of sample machine learning models that one can use to quickly get started. Each model contains an experiment added by Microsoft, its partners or individual data scientists. <ref name="Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes">"Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes"</ref> It also serves as a platform for users looking to learn from others, to get started developing their own solutions, or to contribute their own work to the advanced analytics community<ref name="Cortana Analytics Gallery">"Cortana Analytics Gallery"</ref>.

Azure Marketplace

Machine Learning Studio enables a user to configure Machine Learning models using graphical user interfaces. Azure Marketplace, on the other hand, offers pre-packaged models that are ready to use. Some of the services that the Marketplace has to offer are listed below:<ref name="Microsoft Azure Machine Learning Review - RobustTechHouse">"Microsoft Azure Machine Learning Review - RobustTechHouse"</ref>

Text Analytics

The Text Analytics API is a suite of text analytics services built using Azure Machine Learning. It can be used to detect sentiments, extract keyphrases and detect the language of the text provided.

Frequently brought together

This API can be used by website owners to help its customers discover items in its catalog that are bought together. It uses customer purchase history to provide "Frequently bought together" recommendations.

Binary classifier API

Binary Classifier API is an example built with Microsoft Azure Machine Learning that fits a logistic regression model to data input by the user and then outputs the predicted value for each of the observations in the data.  

Comparison

The summary comparison between various machine learning platforms has been summarized below<ref name="Machine Learning as a Service - Benchmark">"Machine Learning as a Service - Benchmark"</ref>

AWS machine learning Google Prediction API MS Azure Machine Learning
Data sources Text file uploaded into S3

AWS RDS

AWS Redshift

AWS S3 table

Text file uploaded into

Google storage

Google Spreadsheets

HTTPS requests

API update calls

Uploaded text files

Azure storage

SQL database

Web URL

Hadoop HiveQL

Data formats CSV file

S3 and Redshift database

Text file

Spreadsheet

JSON

CSV and text files

Hive SQL tables

Odata values

svmlight

arff

zip

RData

Dataset maximum size 100GB text file: 2.5 GB

HTTP request: 2 MB

10 GB
Data types boolean

categorical

numeric

string

numeric

string

boolean

categorical

datetime

numeric

timespan

string

Data visualization yes

table view

no Table, histogram,

statistical summary

Mathematical transformations no no yes
Feature normalization] yes no yes using PCA
Orthonormalization no no yes
Missing value imputation yes indirectly chain feature

imputation model with main

prediction model

yes automatic replaces missing

strings with ""

replaces missing numbers

with 0

yes can replace with custom

values mean, median or mode.

Terms used in the above comparison

3: Amazon Simple Storage Service (Amazon S3), provides developers and IT teams with secure, durable, highly-scalable object storage<ref name="Amazon Simple Storage Service (S3) - Object Storage">"Amazon Simple Storage Service (S3) - Object Storage"</ref>.

Azure Machine Learning provides more flexibility in terms of the data formats that can be imported into its environment. It also supports more data visualization methods as compared to Amazon Machine Learning. The missing value imputaion can also be customized.

Further reading

Videos

References

<references/>