Oracle has stated that its mission is to Bring together the right Infrastructure, Data Management, and Data Science Tools to make data science more collaborative, scalable, and powerful for every enterprise.
And indeed, with OCI Data Science, you get a fully managed platform that has been built to meet the needs of teams of a modern enterprise. It provides users and development teams a project driven collaborative environment, which enables teams to work together on an end-to-end modelling workflow with self-service resources and data access. OCI Data Science is using Jupyter notebooks to support the lates open source tools such as Python. Tensorflow, Keras, Scikit-learn, MXNet and other.
The key OCI Data Science features that make a difference to other notebook environments are:
- Projects are main “containers” organise your work. Every Project can contain more Notebook Sessions.
- JupyterLab Notebook Sessions provides users with preinstalled Python libraries for data analysis, preprocessing, modelling, etc.
- Model Catalog enables users to store all their machine learning models to a catalog, to make these models become auditable and reproducible.
- Accelerated Data Science SDK is Oracle’s library to improve common data science tasks like preprocessing, exploratory analysis, model creation and testing, model deployment much faster, easier and less-error prone.
Setting up OCI Data Science
- Setup a Data Scientists group and assign users to the group
- Create a new compartment to own network and data science resources
- Create a virtual cloud network (VCN) and subnets
- Create OCI identity and access management policies
- From the main menu, navigate to Identity > Groups.
- Click Create Group and enter group name. In my case i named the new group QubixDataScientists.
The next step is to create the following resources:
- A compartment to own network resources. A Virtual Cloud Network (VCN), a public or private subnet, and other resources such as, an internet gateway or service gateway, a route table, and security lists.
- A compartment to own Data Science resources. Projects, notebook sessions, models, and work requests.
- Navigate to Identity > Compartments
- Click Create Compartment and give a new to a new compartment. New compartment in my case is called QubixSlovenia-DataScience. We will create all OCI Data Science resources in this compartment.
To create a notebook session, VCN that contains a subnet is required. Notebook Session will always be created within that subnet. All egress from a notebook session is routed through this subnet. To access data and install additional packages to use in the notebook session, you must configure the subnet with appropriate access.
- Navigate to Networking (in Core Infrastructure section of the main menu).
- Create a VCN. In my case I used Network Quickstart and then VCN with Internet Connectivity. This option is much more convenient because the workflow will create everything you need automatically. I am not very good at these type of things, therefore it was an ideal solution in this situations, however I assume system administrators might do much better job here. But at the end it work for me very well.
- VCN name,
- Compartment - pick your compartment from the list of available compartments,
- VCN CIDR block (I left the defaults unchanged)
- Public Subnet CIDR block (I left the defaults unchanged)
- Private Subnet CIDR blocl (I left the defaults unchanged)
- Use DNS Hostnames in this VCN - this has to be checked!
As the last step, before you can start using Oracle Data Science, you have to create a number of Oracle Cloud Infrastructure Identity and Access Management policies to grant access to Data Science-related and network resources:
- Give users access to Data Science-related resources
- Give users access to network resources.
- Give Data Science service access to network resources.
- allow group QubixDataScientists to manage data-science-family in compartment QubixSlovenia-DataScience for QubixDataScientists-manage-access
- allow group QubixDataScientists to use virtual-network-family in compartment QubixSlovenia-DataScience for QubixDataScientists-manage-network-access
- create a new policy QubixDataScientists-service-network-access
- enter allow service datascience to use virtual-network-family in compartment QubixSlovenia-DataScience as a policy statement.