In the previous article, we worked on the Twitter Activity dataset in JSON format with the SQL Developer Web tool in Autonomous Database. However, SQL Developer Web has its limitations. It only allows you to parse the first level of the JSON data. Thus, it won’t support parsing complicated JSON structures.
So, does it mean you can’t deal with the data you need properly? Fortunately, there is an alternative. You can use SQL/JSON functions
json_value, json_query to build complicated query or use dot-notation to access your JSON data.
Import JSON Data
The previous part provided you with the information and the sample Twitter dataset to download. You might need to refer to that article again to refresh your knowledge and re-download the dataset. We are going to reuse it for the scenarios in the current article.
So, please make sure that you have that file with the data in JSON format. But before we start to explore it, we need to edit that dataset for the SQL Developer Web to parse the data appropriately. The detailed instruction is provided in the previous article, but let us briefly recall what to do.
Download and unzip the archived file – you will see two folders there: assets and data. The data files are located in the data folder.
Find the tweet.js file in the data folder and open it with any Text Editor, such as Notepad. In the context of that file, you will see a part of the JSON data including the following row:
Before importing the JSON data into Autonomous, we need to remove that row from the context of the tweet.js file.
Then save the tweet.js file and access SQL Developer Web with the DEMO user. In the top-left corner, click the hamburger icon and navigate to SQL. After that, navigate to the tweet.js file and upload it.
Click Next to go to the Table Definition step. Enter table name – STG_TWITTER_JSON and select the schema in the ADW instance. You will also see the table definition with columns and data type. Here, you can either keep the default definition or change the data type for each column.
Make sure that the data type is CLOB(JSON)
Click Next to go to the Review step and start loading JSON data. In this step, you see the DDL script to create your table. Note that the CHECK constraint is applied to the table. Make sure that the TWEET column data value is JSON.
Click Finish and verify the data.
Now, we have a table with only one column. This column contains context in JSON format. Here we need to parse the context of each row into the tabular structure.
Query JSON Data
Option 1: Simple Dot-Notation Access to JSON Data
Dot notation is designed for common use cases. You can use this option when the JSON structure in the dataset is quite simple, and you can access the elements directly.
The return value for the dot-notation query is always a string (data type VARCHAR2(4000)) representing JSON data. The content of the string depends on the targeted JSON data:
- If a single JSON value is targeted, that value is the string content, whether it is a JSON scalar, object, or array.
- If multiple JSON values are targeted, the string content is a JSON array – its elements are those values.
Let’s open the tweet.js file and have a quick view of it. In the context of entities, it is also JSON format, and urls is a nested child.
How to use the dot notation approach to build the script for parsing the JSON data
The syntax is the following:
SELECT <alias name>.<column name>.<path to level or attribute> FROM <table name> <alias name>
Use the following script:
SELECT T.tweet.retweeted As "retweeted", T.tweet.source "source", T.tweet.entities.hashtag "hashtag", T.tweet.entities.symbols "symbols", T.tweet.entities.user_mentions "user_mentions", T.TWEET.entities.urls.url "url", T.TWEET.entities.urls.expanded_url "expanded_url", T.TWEET.entities.urls.display_url "display_url", T.TWEET.display_text_range "display_text_range", T.TWEET.favorite_count "favorite_count", T.TWEET.id_str "id_str", T.TWEET.truncated "truncated", T.TWEET.retweet_count "retweet_count", T.TWEET.id "id", T.TWEET.possibly_sensitive "possibly_sensitive", T.TWEET.created_at "created_at", T.TWEET.favorited "favorited", T.TWEET.full_text "full_text" FROM STG_TWITTER_JSON T;
As you see, the dot notation approach is quite straightforward. You need to know the JSON structure clearly and understand which levels, elements, or attributes to access.
Try the same query and insert parsed data into a new table:
CREATE TABLE STG_TWEET_DOT_NOTATION AS SELECT T.tweet.retweeted As "retweeted", T.tweet.source "source", T.tweet.entities.hashtag "hashtag", T.tweet.entities.symbols "symbols", T.tweet.entities.user_mentions "user_mentions", T.TWEET.entities.urls.url "url", T.TWEET.entities.urls.expanded_url "expanded_url", T.TWEET.entities.urls.display_url "display_url", T.TWEET.display_text_range "display_text_range", T.TWEET.favorite_count "favorite_count", T.TWEET.id_str "id_str", T.TWEET.truncated "truncated", T.TWEET.retweet_count "retweet_count", T.TWEET.id "id", T.TWEET.possibly_sensitive "possibly_sensitive", T.TWEET.created_at "created_at", T.TWEET.favorited "favorited", T.TWEET.full_text "full_text" FROM STG_TWITTER_JSON T;
Check the structure of the STG_TWEET_DOT_NOTATION table:
Option 2: Using SQL/PL Functions JSON VALUE and JSON QUERY
The behavior of json_value and json_query functions is different from the dot notation approach. they are suitable for more complicated queries. For instance, they can return NULL or an error if the JSON format is incorrect or does not match specific queries. These functions also support returning the data types – you could not do it with the dot-notation query.
To work with json_value and json_query, first get familiar with the JSON path and the SQL/JSON path expression
- json_value function returns a scalar value
- json_query function returns an array of objects or a JSON document
Now, re-write the previously used query with the json_value function. We’ll use RETURNING to return the data type of retweet_count and favorite_count:
SELECT json_value(TWEET,'$.retweeted') As "retweeted", json_value(TWEET,'$.source') "source", json_value(TWEET,'$.entities.hashtag') "hashtag", json_value(TWEET,'$.entities.symbols') "symbols", json_value(TWEET,'$.entities.user_mentions') "user_mentions", json_value(TWEET,'$.entities.urls.url') "url", json_value(TWEET,'$.entities.urls.expanded_url') "expanded_url", json_value(TWEET,'$.entities.urls.display_url') "display_url", json_value(TWEET,'$.display_text_range') "display_text_range", json_value(TWEET,'$.favorite_count' returning number) "favorite_count", json_value(TWEET,'$.id_str') "id_str", json_value(TWEET,'$.truncated') "truncated", json_value(TWEET,'$.retweet_count' returning number) "retweet_count", json_value(TWEET,'$.id') "id", json_value(TWEET,'$.possibly_sensitive') "possibly_sensitive", json_value(TWEET,'$.created_at') "created_at", json_value(TWEET,'$.favorited') "favorited", json_value(TWEET,'$.full_text') "full_text" FROM STG_TWITTER_JSON T;
Check the structure of the STG_TWEET_JSON_VALUE table. The data type of both retweet_count and favorite_count is NUMBER:
Now, let’s try with the json_query function.
The process of json_query is like the JSON value. It is used when you want to return the JSON document or an array of objects. For example, you want to return the array of urls. Then you can write the following query:
SELECT json_query(TWEET, '$.entities.urls' returning VARCHAR2 pretty ) URLS FROM STG_TWITTER_JSON T;
Copy the context of any row and paste it to the Notepad editor. You will see a pretty format:
JSON is the standard data structure for most API and Web services. Oracle specialists have to deal with the data in this format quite frequently. Thus, Oracle provides a set of specialized tools and options to work with JSON data. The SQL/JSON functions are exceptionally helpful when you need to access or modify the JSON data.
- Dot-notation is the function used to access the JSON data directly. It returns a scalar value, and the data type is always a string (VARCHAR2(4000)).
- json_value and json_query functions are used when the element’s length attribute is more than 4000, or you need to access JSON data with more complicated queries.
Note also that the Autonomous Database in Oracle version 18c and higher has introduced lots of changes for work with JSON. If you are still using an older version, it might be worth upgrading to the latest version.
- Prediction of Employee Attrition with Oracle Analytics - July 29, 2021
- Query Google’s Financial Service Using Oracle Data Integrator - July 22, 2021
- Query JSON Data with SQL/JSON Functions & Autonomous Database - July 2, 2021