Oracle Data Integrator Tutorial: How to Import Multiple Raw Files into Oracle Autonomous Data Warehouse

When you build an ETL / ELT enterprise system, you need the Data Integration tools, such as Microsoft SQL Server Integration Service, Oracle Data Integrator (ODI), Talend, etc. These tools help you to integrate, transform, and load data from the source to the destination location.

The current article is the Oracle Data Integrator tutorial, that will explain loading multiple datafiles into an Autonomous Data Warehouse (ADW). This tool is an ELT (extract, load, transform) solution with a graphic interface, built by Oracle. It lets the users build, manage, and maintain the data integration process.

CodingSight - Importing Data into an Oracle Autonomous Data Warehouse using Oracle Data Integrator

In our case study, the system receives the Sales data files from a database. The Oracle Data Integrator needs to pick up all data files to load into the Oracle Autonomous Data Warehouse.  

Before Proceeding to Oracle Data Integrator Tutorial

  • Download the Oracle Data Integrator version 12.2.1.4 or higher from the Oracle site. Then install and configure this tool using the official guideline.
  • In case you want a prebuilt package that includes ODI and Oracle Database, you can download the Prebuilt Virtual Machine for Oracle Data Integrator 12c package. This is the Oracle VM Virtual Box file that you just need to download and import into the Oracle Virtual Box application. This way, you get a testing environment. Be aware that it should be used for testing purposes and not in the production environment
  • You need to have the ADW database instance provisioned (named ADWDEMO)
  • Download the ADW Wallet zip file for setting connection between ODI and ADW instance

Create the ADW Connection in Oracle Data Integrator

Unzip the downloaded ADW Wallet file into the secured folder. Then, you should set up the User and Password to access ADW (avoid using ADMIN user).

Open the ODI Studio and connect to your Work Repository. Go to Topology > Physical Architecture > Right-Click on Oracle > New Data Server

Open the ODI Studio and connect to your Work Repository

Enter Name: ADW_DEMO

Set up Connection:

  • User: the database user created in ADW instance
  • Password: enter the password of the database user
Set up ADW Connection

Go to the JDBC tab

  • JDBC Driver: oracle.jdbc.OracleDriver
  • JDBC URL: unzip the Wallet file and open the tnsnames.ora file – it defines the list of TNS services you can use to connect to the ADW instance.
Go to the JDBC and unzip the Wallet file

By default, we use the MEDIUM service for connecting to the ADW instance. Therefore, copy the connection string of MEDIUM:

(description= (retry_count=20)(retry_delay=3)(address=(protocol=tcps)(port=1522)(host=adb.us-ashburn-1.oraclecloud.com))(connect_data=(service_name=ghkzzgdddmcvkfb_adwdemo_low.adb.oraclecloud.com))(security=(ssl_server_cert_dn="CN=adwc.uscom-east-1.oraclecloud.com,OU=Oracle BMCS US,O=Oracle Corporation,L=Redwood City,ST=California,C=US")))

Properties: you need to add 2 properties:

  • oracle.net.wallet_location is the path of folder where you unzipped the Wallet file. For example:
(SOURCE=(METHOD=file)(METHOD_DATA=(DIRECTORY=C:\instantclient-basic-19-3\ADW_DEMO)))
  • oracle.net.ssl_server_dn_matchtrue
Adding 2 properties

Click Test Connection to ensure the connection is successful:

Test Connection to ensure the connection is successful

Click Save > Right-Click on the ADW_DEMO data server created > New Physical Schema

Creating New Physical Schema

Enter Schema and Work Schema: DEMO

Schema is the database user for loading and transforming the data

Enter Schema and Work Schema: DEMO

Expand Logical Architecture > Right-click on Oracle > New Logical Schema

Enter the name: L_ADW_DEMO

Use ADW_DEMO.DEMO for all contexts:

Use ADW_DEMO.DEMO as a new name for Data Server

The logical schema is used in ODI objects such as Package, Variables, Data Model, Mapping, Procedure, etc. If you need to change the connection of ADW to a different instance, you will not update all objects. Instead, you will only change the connection in Physical Architecture.

Create Connection to the Raw Data File in ODI

Go to Topology > Physical Architecture > Right-Click on File > New Data Server. For the definition of the Data Server, provide the following information:

  • Name: FILE_GENERIC
For the definition of the Data Server, provide the following name - file_generic
  • Right-click on the FILE_GENERIC data server > New Physical Schema
  • Enter Directory (Schema) where ODI will look for the source and target files and create error files for invalid records detected in the source files. The location is either local or shared network directory.
  • Enter Directory (Work Schema) where ODI may create temporary files associated with the sources and targets contained in the Data Schema

I am using the same folder for Schema and Work Schema

using the same folder for Schema and Work Schema

Expand Logical Architecture > Right-click on File > New Logical Schema

  • Enter the name: L_FILE_GENERIC
  • Use FILE_GENERIC.C:\Working\Data for all contexts
Expand Logical Architecture > Right-click on File > New Logical Schema

Create a Log Table

We have to load multiple data files into ADW. Thus, we are going to use a log table to manage the list of data files. Open SQL Developer and connect to the ADW DEMO database instance.

Open SQL Worksheet and run the following script in a command line:

CREATE TABLE ETL_SALES_LOAD_LOG(
    "FILE_NAME"  VARCHAR2(100 BYTE)
    , "STATUS"     VARCHAR2(1 BYTE)
    , "LOADED_DT"  DATE
)
COLUMN NAMEDATA TYPENote
FILE_NAMEVARCHAR(100)The name of a data file that needs to proceed
STATUSVARCHAR(1)Status of the process – Y: file proceeded N: file not proceeded yet
LOADED_DTDATEWhen record inserted into the table

Create the Staging Sale Table

Open SQL Developer and connect to the ADW DEMO database instance.

Open SQL Worksheet and run the following script in a command line:

CREATE TABLE STG_SALES_DAILY(
    SALE_ID  VARCHAR2(100 BYTE)
    , SALE_DATE   DATE
    , SALE_AMOUNT NUMBER(22,7)
    , LOADED_DT  DATE
)
COLUMN NAMEDATA TYPENote
SALE_IDVARCHAR(100)Sale ID generated from the source database
SALE_DATEDATEWhen sale transaction happened in the source database
SALE_AMOUNTNUMBER(22,7)Sales Amount
LOADED_DTDATEWhen a record is inserted into the table

Create Sample Data

Create a sample file for metadata definition:

  • SALE_ID|SALE_DATE|SALE_AMOUNT
  • AAAAAAA|01/01/2021|10000

Save the sample file as SALES_METADATA.txt and store it in C:\Working\Data\Metadata

Create a Data Model in ODI

Go to the Designer panel > Expand Model > Click on the Folder icon > New Model Folder

New Model Folder in Oracle Data Integrator

Enter the name: FILE_GENERIC

Right-click on the FILE_GENERIC folder > New Model

Right-click on the FILE_GENERIC folder > New Model

Configure Definition:

  • Name: SALES_FILE
  • Technology: File
  • Logical Schema: L_FILE_GENERIC
Configure New Model

Right-click on SALES_FILE model > New Datastore

Right-click on SALES_FILE model > New Datastore

Definition

  • Name: SALES_FILE
  • Alias: SALE_FILE
  • Datastore Type: Table
  • Resource Name: browse to file SALES_METADATA.txt. We will update Resource Name with the dynamic file name to support loading multiple files later. Now, we only want to define metadata. Thus, we point to the exact sample file.

Files

  • File Format: Delimited
  • Heading (Number of Lines): 1
  • Record Separator: MS-DOS because our data files are generated in Window OS
  • Field Separator: | (pipe bar)

Attributes

  • Click on Reverse Engineer
Click on Reverse Engineer
  • ODI discovers the data type of each column of the sample file and determines the appropriate data type.
ODI discovers the data type of each column of the sample file and determines the appropriate data type.

Click Save

Right-click on SALES_FILE model > New Datastore

Enter the name ETL_SALES_FILES

Definition

  • Name: ETL_SALES_FILES
  • Alias: ETL_SALES_FILES
  • Datastore Type: Table
  • Resource Name: browse to file ETL_SALES_FILES.txt. This data store will contain the list of SALES data files.

Files

  • File Format: Delimited
  • Heading (Number of Lines): empty
  • Record Separator: MS-DOS because our data files are generated in Window OS
  • Field Separator: | (pipe bar)

Attributes

  • Click on Add Attribute
  • Enter: FILE_NAME
  • Type: String
  • Physical Length and Length: 100
New Database Creation

Click Save

Click on the Folder icon in Models > Create New Model Folder. Enter TG_SALES

Right-click on TG_SALES > New Model

Definition

  • Name: TG_SALES
  • Technology: Oracle
  • Logical Schema: L_ADW_DEMO
Right-click on TG_SALES > New Model

Click on the Reserve Engineer icon. ODI will connect to the ADW database instance and list of all table schemas:

ODI will connect to the ADW database instance and list of all table schemas

Expand the TG_SALES model, and you will see the list of tables.

Expand the TG_SALES model, and you will see the list of tables.

Create an ODI Project

Expand Projects and click on the New project icon > New Project. Enter the name: SALES_DEMO

Expand Projects and click on the New project icon > New Project. Enter the name: SALES_DEMO

Right-click on the SALES_DEMO project > New folder > enter SALES

The next step is to create project variables. Right-click on Variables > New Variable.This variable has to keep the name of each data file when proceeding with data.

Definition

  • Name: ETL_SALES_FILE_NAME
  • Datatype: Text
  • Keep History: Latest Value

Refreshing

The purpose is to refresh a variable value at run-time.

  • Schema: L_ADW_DEMO. As mentioned earlier, we use the Logical Schema only. Whenever the database connection or the directory path is changed, our code stays intact.
  • Select Query: enter the following PL/SQL script:
SELECT TRIM(FILE_NAME)
FROM <%=odiRef.getSchemaName()%>.ETL_SALES_LOAD_LOG
WHERE STATUS = 'N'
AND TRUNC(LOADED_DT) = (Select trunc(from_tz(cast(sysdate as timestamp), 'GMT') at time zone 'ASIA/KUALA_LUMPUR')  from dual) 

Expand Models > FILE_GENERIC folder > SALES_FILE model

Double-click on SALES_FILE and change the Resource Name = #SALES_DEMO. ETL_SALES_FILE_NAME

Double-click on SALES_FILE and change the Resource Name = #SALES_DEMO

Continue creating new variables for the SALES_DEMO project. The next variable is used in a looping process.

Definition

  • Name: ETL_SALES_FILE_COUNT
  • Datatype: Text
  • Keep History: Latest Value

Refreshing (to refresh the variable value at run-time)

  • Schema: L_ADW_DEMO. As I mentioned, we only use Logical Schema so whenever the database connection or directory path is change, our code is not impacted.
  • Select Query: enter the following PL/SQL script
SELECT COUNT(1)
FROM <%=odiRef.getSchemaName()%>.ETL_SALES_LOAD_LOG
WHERE STATUS = 'N' AND TRUNC(LOADED_DT) = (Select trunc(from_tz(cast(sysdate as timestamp), 'GMT') at time zone 'ASIA/KUALA_LUMPUR')  from dual)

Create Mapping to Import File Names Into ETL_SALES_LOAD_LOG

This process targets importing the list of file names into the ETL table.

Expand the SALES folder in the project, right-click on Mappings > New Mapping

Expand the SALES folder in the project, right-click on Mappings > New Mapping
  • Enter name: ETL_SALES_LOAD_FILE_NAME.
  • Empty mapping is created
  • Drag and drop ETL_SALES_FILES from SALES_FILE model into the mapping
  • Drag and drop ETL_SALES_LOAD_LOG from TG_SALES model into the mapping
  • Create a mapping between ETL_SALES_FILES.FILE_NAME and ETL_SALES_LOAD_LOG.FILE_NAME
  • Choose STATUS: Target: ‘N’ (by default, ‘N’ means file not proceeded yet)
  • Choose LOADED_DT column (Current Date).
  • Target:
(Select trunc(from_tz(cast(sysdate as timestamp), 'GMT') at time zone 'ASIA/KUALA_LUMPUR')  from dual)
Create Mapping to import file names into ETL_SALES_LOAD_LOG

Click on ETL_SALES_LOAD_LOG. Target: Control Append. This option allows us to insert records, not delete them.

Click on ETL_SALES_LOAD_LOG. Target: Control Append. This option allows us to insert records, not delete them.

Click on the Physical tab and keep the default settings for ETL_SALES_LOAD_LOG. Integration Knowledge Module: IKM Oracle Insert.GLOBAL

Click on the Physical tab and keep the default settings for ETL_SALES_LOAD_LOG.

Create Mapping to Import the SALES Data into the STG_SALES_DAILY Table

This mapping targets importing the data files into the STG_SALES_DAILY table.

Expand the SALES folder in the project and right-click on Mappings > New Mapping

Expand the SALES folder in the project and right-click on Mappings > New Mapping
  • Enter name: ETL_SALES_LOAD_DAILY
  • Empty mapping is created
  • Drag and drop SALES_FILES from the SALES_FILE model into the mapping
  • Drag and drop STG_SALES_DAILY from the TG_SALES model into the mapping
  • Create mappings between SALES_FILES and STG_SALES_DAILY
SALES_FILESSTG_SALES_DAILYExpression
SALE_IDSALE_ID 
SALE_DATESALE_DATE 
SALE_AMOUNTSALE_AMOUNT 
 LOADED_DT(Select trunc(from_tz(cast(sysdate as timestamp), ‘GMT’) at time zone ‘ASIA/KUALA_LUMPUR’)  from dual)
Create Mapping to import the SALES data into the STG_SALES_DAILY table

Click on STG_SALES_DAILY. Target: Control Append

Click on STG_SALES_DAILY. Target: Control Append

Create a Procedure

This step creates a PL/SQL procedure to update the status of a file when it is processed in the ETL_SALES_LOAD_LOG table.

Right-click on the Procedure folder > New Procedure. Enter the name: PROC_UPDATE_FILE_NAME_STATUS. Next, go to the Task tab:

  • Enter the task name: Update Status to Y
  • Target Logical Schema: L_ADW_DEMO
  • Enter the target command:
BEGIN
UPDATE <%=odiRef.GetSchemaName()%>.ETL_SALES_LOAD_LOG
SET STATUS='N'
WHERE FILE_NAME='#SALES_DEMO.ETL_SALES_FILE_NAME'
AND TRUNC(LOADED_DT)= (Select trunc(from_tz(cast(sysdate as timestamp), 'GMT') at time zone 'ASIA/KUALA_LUMPUR') 
from dual);
END;

Create a Package

Expand the SALES folder > Right-click on Package > New Package

Create a package

Enter the name: ETL_SALES_LOAD_DAILY

Drag and drop the OdiOSCommand component into the package. This step is to scan the data files’ location and get the names of all data files.

General: The step name is Get the list of filenames

Command: “CMD /C  <Path of Bat file>\Command.bat”

"CMD /C  C:\Working\Data\Command.bat"
Drag and drop the OdiOSCommand component into the package.

You need to create a bat file named Command.bat

%echo
CD C:\Working\Data
DIR /B SALES*.txt > ETL_SALES_FILES.txt
EXIT

Save the .bat file into the folder <Path of Bat file>. The content of that .bat file must be changed to the data files folder before executing the command to get the list of file names.

As we are running ODI in a Windows environment, the .bat file is a simple way to execute Windows commands.

When you are running ODI in Linux, you can run the command directly:

cd <Path to a folder of data files> 
s -ll SALES*.csv |awk '{print $9}' > ETL_SALES_FILES.txt
exit

Drag and drop the ETL_SALES_LOAD_FILE_NAME mapping into the package

Step 3: Drag and drop the ETL_SALES_FILE_COUNT variable

Step name: Get file count

Type: Refresh Variable

The purpose of Step 3 is to get the name of the first data file in the ETL_SALES_LOAD_LOG table

Step 4: Drag and drop the ETL_SALES_FILE_COUNT variable. At this step, we have to check if the data file existed (file count variable >=1)

Step name: Check if the number of data files that need to be processed.

Type: Evaluate Variable

    If ETL_SALES_FILE_COUNT >=1 Then continue proceeding

    Else Move data files to an archived folder

Else Move data files to an archived folder

Step 5: Drag and drop the ETL_SALES_LOAD_DAILY mapping into the package to import data from the data file into the ADW Staging table STG_SALES_DAILY

Step 6: Drag and drop the PROC_UPDATE_FILE_NAME_STATUS procedure into the package

Step 7: drag and drop the ETL_SALES_FILE_COUNT variable

    Step name: Get file count

    Type: Refresh Variable

The purpose of this step is to continue fetching the next data file in the log table ETL_SALES_LOAD_LOG

continue fetching the next data file in the log table ETL_SALES_LOAD_LOG

Save the package.

Execute the Package

There are three sample files:

  • SALES_20210419.txt
  • SALES_20210420.txt
  • SALES_20210421.txt
Execute the package

Click the play icon run the package:

Click the play icon run the package

After executing the package, you can check the ETL_SALES_FILES.txt content as shown below. Based on the list, the package proceeds each file one-by-one (looping process):

the package proceeds each file one-by-one (looping process)

Verify the data in the STG_SALES_DAIL table. There are three records loaded from our data files:

Verify the data in the STG_SALES_DAIL table

Conclusion

This article explained the process of building an ETL ODI package that supports loading multiple data files into an ADW instance. The work-flow is basic and quite common, but I believe this package is one of the ETL patterns you have to work on sometimes

Depending on your requirements and circumstances, you can continue developing other work-flows based on this pattern:

  • Load the data files from the SFTP server
  • Archive data files after they are processed completely
  • Write more logging, such as the number of inserted rows, deleted rows, etc.
Dung Dinh

Dung Dinh

BI Specialist, Data Modelling, working as Oracle Consultant in Oracle Cloud.

Leave a Reply

Your email address will not be published. Required fields are marked *