Written by Dung Dinh• May 13, 2021• 10:25• Languages & Coding, Programming

Oracle Data Integrator (ODI) Tutorial: How to Import Files into Oracle Autonomous Data Warehouse (ADW)

HomeLanguages & Coding, ProgrammingOracle Data Integrator (ODI) Tutorial: How to Import Files into Oracle Autonomous Data Warehouse (ADW)

CodingSight - Importing Data into an Oracle Autonomous Data Warehouse using Oracle Data Integrator

When you build an ETL / ELT enterprise system, you need the Data Integration tools, such as Microsoft SQL Server Integration Service, Oracle Data Integrator (ODI), Talend, etc. These tools help you to integrate, transform, and load data from the source to the destination location.

The current article is the Oracle Data Integrator reference, which will explain loading multiple datafiles into an Autonomous Data Warehouse (ADW).

What is Oracle Data Integrator?

Oracle Data Integrator (ODI) is a cross-platform software solution developed for data transformation and data integration. Its primary goal is to simplify and speed up software development and maintenance. ODI is powerful and cost-effective, thus becoming the world leader in the ELT (Extract, Load, Transform) field.

Being one of the Oracle products, it integrates with other Oracle solutions seamlessly and works well with large data volumes. Numerous transformation options let the users configure and perform the necessary tasks precisely.

In particular, the scope of the ODI product includes the following:

Reverse-engineering of applications and databases in an automated way
Development and maintenance of the data integration, data flows, and data transformation processes
Automated generation of the documentation
Adjusting and customization of the generated code

In the software architecture, the central point is an open ODI Repository where this software stores all the configuration info, metadata, and execution logs. This master repository allows several separate environments to exchange metadata and scenarios. Also, it serves as a version control system with the repository location open for the participating developers.

Other notable components of the software architecture are:

Designer Navigator that checks the data integrity and transforms the data
Operator Navigator that serves for the management and monitoring tasks
Topology Navigator that describes the physical and logical architecture of the systems
Security Manager that deals with creating users, profiles, and managing rights

The strongest advantages of using the ODI product are the following:

It checks the data quality, detecting the faults and recycling the data automatically. This way, it won’t let the faulty data get into the target tables. No programming is needed, the function is embedded.

It improves execution performance due to the ELT approach. The retrieved data get to the target server and are transformed there. The productivity grows much higher.

It offers a simpler and more efficient product architecture by eliminating an additional server for data transforming between the sources and the target server.

It supports all RDBMSs and leading Data Warehousing platforms, such as Oracle, Exadata, Teradata, IBM DB2, Netezza, Sybase IQ, etc. Such technologies as flat files, ERPs, LDAP, and XML are also supported.

The Oracle Data Integrator 12c software version is current, and our reference will deal with it. It is not meant for earlier versions like Oracle Data Integrator 11g.

In our case study, the system receives the Sales data files from a database. The Oracle Data Integrator needs to pick up all data files to load into the Oracle Autonomous Data Warehouse.

What Do you Need for Oracle ODI Training?

Download the Oracle Data Integrator version 12.2.1.4 or higher from the Oracle site.
Install and configure the ODI tool using the official guideline.
In case you want a prebuilt package that includes ODI and Oracle Database, you can download the Prebuilt Virtual Machine for Oracle Data Integrator 12c package. This is the Oracle VM Virtual Box file that you just need to download and import into the Oracle Virtual Box application. This way, you get a testing environment. Note: Use it for testing purposes and not in the production environment
You need to have the ADW database instance provisioned (named ADWDEMO)
Download the ADW Wallet zip file for setting connection between ODI and ADW instance

Create the ADW Connection in Oracle Data Integrator

Unzip the downloaded ADW Wallet file into the secured folder. Then, you should set up the User and Password to access ADW (avoid using ADMIN user).

Open the ODI Studio and connect to your Work Repository. Go to Topology > Physical Architecture > Right-Click on Oracle > New Data Server

Enter Name: ADW_DEMO

Set up Connection:

User: the database user created in ADW instance
Password: enter the password of the database user

Go to the JDBC tab

JDBC Driver: oracle.jdbc.OracleDriver
JDBC URL: unzip the Wallet file and open the tnsnames.ora file – it defines the list of TNS services you can use to connect to the ADW instance.

Go to the JDBC and unzip the Wallet file

By default, we use the MEDIUM service for connecting to the ADW instance. Therefore, copy the connection string of MEDIUM:

(description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.us-ashburn-1.oraclecloud.com))(connect_data=(service_name=ghkzzgdddmcvkfb_adwdemo_low.adb.oraclecloud.com))(security=(ssl_server_cert_dn="CN=adwc.uscom-east-1.oraclecloud.com,OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))

Properties: you need to add 2 properties:

oracle.net.wallet_location is the path of folder where you unzipped the Wallet file. For example:

(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=C:\instantclient-basic-19-3\ADW_DEMO)))

oracle.net.ssl_server_dn_match – true

Click Test Connection to ensure the connection is successful:

Click Save > Right-Click on the ADW_DEMO data server created > New Physical Schema

Enter Schema and Work Schema: DEMO

Schema is the database user for loading and transforming the data

Expand Logical Architecture > Right-click on Oracle > New Logical Schema

Enter the name: L_ADW_DEMO

Use ADW_DEMO.DEMO for all contexts:

Use ADW_DEMO.DEMO as a new name for Data Server

The logical schema is used in ODI objects such as Package, Variables, Data Model, Mapping, Procedure, etc. If you need to change the connection of ADW to a different instance, you will not update all objects. Instead, you will only change the connection in Physical Architecture.

Create Connection to the Raw Data File in ODI

Go to Topology > Physical Architecture > Right-Click on File > New Data Server. For the definition of the Data Server, provide the following information:

Name: FILE_GENERIC

For the definition of the Data Server, provide the following name - file_generic

Right-click on the FILE_GENERIC data server > New Physical Schema
Enter Directory (Schema) where ODI will look for the source and target files and create error files for invalid records detected in the source files. The location is either local or shared network directory.
Enter Directory (Work Schema) where ODI may create temporary files associated with the sources and targets contained in the Data Schema

I am using the same folder for Schema and Work Schema

Expand Logical Architecture > Right-click on File > New Logical Schema

Enter the name: L_FILE_GENERIC
Use FILE_GENERIC.C:\Working\Data for all contexts

Create a Log Table

We have to load multiple data files into ADW. Thus, we are going to use a log table to manage the list of data files. Open SQL Developer and connect to the ADW DEMO database instance.

Open SQL Worksheet and run the following script in a command line:

CREATE TABLE ETL_SALES_LOAD_LOG(
    "FILE_NAME"  VARCHAR2(100 BYTE)
    , "STATUS"     VARCHAR2(1 BYTE)
    , "LOADED_DT"  DATE
)

COLUMN NAME	DATA TYPE	Note
FILE_NAME	VARCHAR(100)	The name of a data file that needs to proceed
STATUS	VARCHAR(1)	Status of the process – Y: file proceeded N: file not proceeded yet
LOADED_DT	DATE	When record inserted into the table

Create the Staging Sale Table

Open SQL Developer and connect to the ADW DEMO database instance.

Open SQL Worksheet and run the following script in a command line:

CREATE TABLE STG_SALES_DAILY(
    SALE_ID  VARCHAR2(100 BYTE)
    , SALE_DATE   DATE
    , SALE_AMOUNT NUMBER(22,7)
    , LOADED_DT  DATE
)

COLUMN NAME	DATA TYPE	Note
SALE_ID	VARCHAR(100)	Sale ID generated from the source database
SALE_DATE	DATE	When sale transaction happened in the source database
SALE_AMOUNT	NUMBER(22,7)	Sales Amount
LOADED_DT	DATE	When a record is inserted into the table

Create Sample Data for Our Tutorial

Create a sample file for metadata definition:

SALE_ID|SALE_DATE|SALE_AMOUNT
AAAAAAA|01/01/2021|10000

Save the sample file as SALES_METADATA.txt and store it in C:\Working\Data\Metadata

Create a Data Model in ODI

Go to the Designer panel > Expand Model > Click on the Folder icon > New Model Folder

New Model Folder in Oracle Data Integrator

Enter the name: FILE_GENERIC

Right-click on the FILE_GENERIC folder > New Model

Configure Definition:

Name: SALES_FILE
Technology: File
Logical Schema: L_FILE_GENERIC

Right-click on SALES_FILE model > New Datastore

Definition

Name: SALES_FILE
Alias: SALE_FILE
Datastore Type: Table
Resource Name: browse to file SALES_METADATA.txt. We will update Resource Name with the dynamic file name to support loading multiple files later. Now, we only want to define metadata. Thus, we point to the exact sample file.

Files

File Format: Delimited
Heading (Number of Lines): 1
Record Separator: MS-DOS because our data files are generated in Window OS
Field Separator: | (pipe bar)

Attributes

Click on Reverse Engineer

ODI discovers the data type of each column of the sample file and determines the appropriate data type.

ODI discovers the data type of each column of the sample file and determines the appropriate data type.

Click Save

Right-click on SALES_FILE model > New Datastore

Enter the name ETL_SALES_FILES

Definition

Name: ETL_SALES_FILES
Alias: ETL_SALES_FILES
Datastore Type: Table
Resource Name: browse to file ETL_SALES_FILES.txt. This data store will contain the list of SALES data files.

Files

File Format: Delimited
Heading (Number of Lines): empty
Record Separator: MS-DOS because our data files are generated in Window OS
Field Separator: | (pipe bar)

Attributes

Click on Add Attribute
Enter: FILE_NAME
Type: String
Physical Length and Length: 100

Click Save

Click on the Folder icon in Models > Create New Model Folder. Enter TG_SALES

Right-click on TG_SALES > New Model

Definition

Name: TG_SALES
Technology: Oracle
Logical Schema: L_ADW_DEMO

Click on the Reserve Engineer icon. ODI will connect to the ADW database instance and list of all table schemas:

Expand the TG_SALES model, and you will see the list of tables.

Create an ODI Project

Expand Projects and click on the New project icon > New Project. Enter the name: SALES_DEMO

Right-click on the SALES_DEMO project > New folder > enter SALES

The next step is to create project variables. Right-click on Variables > New Variable.This variable has to keep the name of each data file when proceeding with data.

Definition

Name: ETL_SALES_FILE_NAME
Datatype: Text
Keep History: Latest Value

Refreshing

The purpose is to refresh a variable value at run-time.

Schema: L_ADW_DEMO. As mentioned earlier, we use the Logical Schema only. Whenever the database connection or the directory path is changed, our code stays intact.
Select Query: enter the following PL/SQL script:

SELECT TRIM(FILE_NAME)
FROM <%=odiRef.getSchemaName()%>.ETL_SALES_LOAD_LOG
WHERE STATUS = 'N'
AND TRUNC(LOADED_DT) = (Select trunc(from_tz(cast(sysdate as timestamp), 'GMT') at time zone 'ASIA/KUALA_LUMPUR')  from dual)

Expand Models > FILE_GENERIC folder > SALES_FILE model

Double-click on SALES_FILE and change the Resource Name = #SALES_DEMO. ETL_SALES_FILE_NAME

Continue creating new variables for the SALES_DEMO project. The next variable is used in a looping process.

Definition

Name: ETL_SALES_FILE_COUNT
Datatype: Text
Keep History: Latest Value

Refreshing (to refresh the variable value at run-time)

Schema: L_ADW_DEMO. As I mentioned, we only use Logical Schema so whenever the database connection or directory path is change, our code is not impacted.
Select Query: enter the following PL/SQL script

SELECT COUNT(1)
FROM <%=odiRef.getSchemaName()%>.ETL_SALES_LOAD_LOG
WHERE STATUS = 'N' AND TRUNC(LOADED_DT) = (Select trunc(from_tz(cast(sysdate as timestamp), 'GMT') at time zone 'ASIA/KUALA_LUMPUR')  from dual)

Create Mapping to Import File Names Into ETL_SALES_LOAD_LOG

This process targets importing the list of file names into the ETL table.

Expand the SALES folder in the project, right-click on Mappings > New Mapping

Enter name: ETL_SALES_LOAD_FILE_NAME.
Empty mapping is created
Drag and drop ETL_SALES_FILES from SALES_FILE model into the mapping
Drag and drop ETL_SALES_LOAD_LOG from TG_SALES model into the mapping
Create a mapping between ETL_SALES_FILES.FILE_NAME and ETL_SALES_LOAD_LOG.FILE_NAME
Choose STATUS: Target: ‘N’ (by default, ‘N’ means file not proceeded yet)
Choose LOADED_DT column (Current Date).
Target:

(Select trunc(from_tz(cast(sysdate as timestamp), 'GMT') at time zone 'ASIA/KUALA_LUMPUR')  from dual)

Create Mapping to import file names into ETL_SALES_LOAD_LOG

Click on ETL_SALES_LOAD_LOG. Target: Control Append. This option allows us to insert records, not delete them.

Click on the Physical tab and keep the default settings for ETL_SALES_LOAD_LOG. Integration Knowledge Module: IKM Oracle Insert.GLOBAL

Create Mapping to Import the SALES Data into the STG_SALES_DAILY Table

This mapping targets importing the data files into the STG_SALES_DAILY table.

Expand the SALES folder in the project and right-click on Mappings > New Mapping

Enter name: ETL_SALES_LOAD_DAILY
Empty mapping is created
Drag and drop SALES_FILES from the SALES_FILE model into the mapping
Drag and drop STG_SALES_DAILY from the TG_SALES model into the mapping
Create mappings between SALES_FILES and STG_SALES_DAILY

SALES_FILES	STG_SALES_DAILY	Expression
SALE_ID	SALE_ID
SALE_DATE	SALE_DATE
SALE_AMOUNT	SALE_AMOUNT
	LOADED_DT	(Select trunc(from_tz(cast(sysdate as timestamp), ‘GMT’) at time zone ‘ASIA/KUALA_LUMPUR’) from dual)

Create Mapping to import the SALES data into the STG_SALES_DAILY table

Click on STG_SALES_DAILY. Target: Control Append

Create a Procedure

This step creates a PL/SQL procedure to update the status of a file when it is processed in the ETL_SALES_LOAD_LOG table.

Right-click on the Procedure folder > New Procedure. Enter the name: PROC_UPDATE_FILE_NAME_STATUS. Next, go to the Task tab:

Enter the task name: Update Status to Y
Target Logical Schema: L_ADW_DEMO
Enter the target command:

BEGIN
UPDATE <%=odiRef.GetSchemaName()%>.ETL_SALES_LOAD_LOG
SET STATUS='N'
WHERE FILE_NAME='#SALES_DEMO.ETL_SALES_FILE_NAME'
AND TRUNC(LOADED_DT)= (Select trunc(from_tz(cast(sysdate as timestamp), 'GMT') at time zone 'ASIA/KUALA_LUMPUR') 
from dual);
END;

Create a Package

Expand the SALES folder > Right-click on Package > New Package

Enter the name: ETL_SALES_LOAD_DAILY

Drag and drop the OdiOSCommand component into the package. This step is to scan the data files’ location and get the names of all data files.

General: The step name is Get the list of filenames

Command: “CMD /C <Path of Bat file>\Command.bat”

"CMD /C  C:\Working\Data\Command.bat"

Drag and drop the OdiOSCommand component into the package.

You need to create a bat file named Command.bat

%echo
CD C:\Working\Data
DIR /B SALES*.txt > ETL_SALES_FILES.txt
EXIT

Save the .bat file into the folder <Path of Bat file>. The content of that .bat file must be changed to the data files folder before executing the command to get the list of file names.

As we are running ODI in a Windows environment, the .bat file is a simple way to execute Windows commands.

When you are running ODI in Linux, you can run the command directly:

cd <Path to a folder of data files> 
s -ll SALES*.csv |awk '{print $9}' > ETL_SALES_FILES.txt
exit

Drag and drop the ETL_SALES_LOAD_FILE_NAME mapping into the package

Step 3: Drag and drop the ETL_SALES_FILE_COUNT variable

Step name: Get file count

Type: Refresh Variable

The purpose of Step 3 is to get the name of the first data file in the ETL_SALES_LOAD_LOG table

Step 4: Drag and drop the ETL_SALES_FILE_COUNT variable. At this step, we have to check if the data file existed (file count variable >=1)

Step name: Check if the number of data files that need to be processed.

Type: Evaluate Variable

If ETL_SALES_FILE_COUNT >=1 Then continue proceeding

Else Move data files to an archived folder

Step 5: Drag and drop the ETL_SALES_LOAD_DAILY mapping into the package to import data from the data file into the ADW Staging table STG_SALES_DAILY

Step 6: Drag and drop the PROC_UPDATE_FILE_NAME_STATUS procedure into the package

Step 7: drag and drop the ETL_SALES_FILE_COUNT variable

Step name: Get file count

Type: Refresh Variable

The purpose of this step is to continue fetching the next data file in the log table ETL_SALES_LOAD_LOG

Save the package.

Execute the Package

There are three sample files:

SALES_20210419.txt
SALES_20210420.txt
SALES_20210421.txt

Click the play icon run the package:

After executing the package, you can check the ETL_SALES_FILES.txt content as shown below. Based on the list, the package proceeds each file one-by-one (looping process):

the package proceeds each file one-by-one (looping process)

Verify the data in the STG_SALES_DAIL table. There are three records loaded from our data files:

Conclusion

This article explained the process of building an ETL ODI package that supports loading multiple data files into an ADW instance. The work-flow is basic and quite common, but I believe this package is one of the ETL patterns you have to work on sometimes

Depending on your requirements and circumstances, you can continue developing other work-flows based on this pattern:

Load the data files from the SFTP server
Archive data files after they are processed completely
Write more logging, such as the number of inserted rows, deleted rows, etc.

Tags: data import, oracle autonomous database, oracle data integrator Last modified: October 07, 2022

About the Author / Dung Dinh

BI Specialist, Data Modelling, working as Oracle Consultant in Oracle Cloud.

←

CodingSight - Sending Report Data to Excel and SQL Database using Multicast Transformation in SSIS Packages

Previous Story
Sending Report Data to Excel and SQL Database using Multicast Transformation in SSIS Packages

→

CodingSight - Layers in AWS Lambda Functions

Next Story
Introduction to Layers in AWS Lambda Functions

CodingSight - Advanced Analytics with Oracle Analytics Part 1

Comments are closed.

Oracle Data Integrator (ODI) Tutorial: How to Import Files into Oracle Autonomous Data Warehouse (ADW)

What is Oracle Data Integrator?

What Do you Need for Oracle ODI Training?

Create the ADW Connection in Oracle Data Integrator

Create Connection to the Raw Data File in ODI

Create a Log Table

Create the Staging Sale Table

Create Sample Data for Our Tutorial

Create a Data Model in ODI

Create an ODI Project

Create Mapping to Import File Names Into ETL_SALES_LOAD_LOG

Create Mapping to Import the SALES Data into the STG_SALES_DAILY Table

Create a Procedure

Create a Package

Execute the Package

Conclusion

About the Author / Dung Dinh

Related Posts

How to use Oracle Advanced Analytics: Tutorial Guide

Oracle Data Integrator: A Tutorial Guide to Get Google’s Financial Service Data

Import JSON Data with SQL Developer Web into Autonomous Database

How to Connect an Oracle ADW with Power BI and Create Sales Dashboard

Write for us

Recent Posts

Recent Comments