Thursday, August 6, 2020

Azure Databricks

 It is an Apache Spark based analytics paltform optimized for Microsoft axzure



key features

  1. Spark SQL lirabry
  2. Streaming services for OIT et
  3. MLIB machine leaning libs
  4. Graph Computation


Spark core API supports

  1. R
  2. SQL
  3. Python
  4. Scala
  5. Java

Azure Databricks as a pltform contains.

  1. Datbricks workspace
  2. Datbrcisk workflows
  3. Databricks Runtime
  4. Databricks I/O
  5. Datbricks Serverless
  6. Dataricks Enterprise Security (DBES)
Storage solutions supported
  1. Blob storage
  2. Data lake
  3. SQL DW
  4. Apache kafka
  5. Hadoop
Applications Supported
  1. ML
  2. Streaming
  3. Data 
  4. Power BI
  5. Others
Users
  • Data Scientist
  • Data engineers
  • Analysts
  • othes



Azure Data Factory (Rest End Point one side and Storage in other end point)

1. Need to have REST API2. Create Data Lake Storage3. Create Azure Data Factory4. Create App registration to access data from Data Lake


Get the URL 
Test URL like Postman or something.

Create data lake storage
Create app registration
Create two linked services ( one for rest end point and other for data lake storage)

RStudio Setup

 

Windows

Once R and RStudio are installed, open RStudio to make sure that you don’t get any error messages.