Query JSON Data with SQL/JSON Functions & Autonomous Database

In the previous article, we worked on the Twitter Activity dataset in JSON format with the SQL Developer Web tool in Autonomous Database. However, SQL Developer Web has its limitations. It only allows you to parse the first level of the JSON data. Thus, it won’t support parsing complicated JSON structures.

So, does it mean you can’t deal with the data you need properly? Fortunately, there is an alternative. You can use SQL/JSON functions json_value, json_query to build complicated query or use dot-notation to access your JSON data.

CodingSight - Query JSON Data with SQL/JSON Functions & Autonomous Database

Import JSON Data

The previous part provided you with the information and the sample Twitter dataset to download. You might need to refer to that article again to refresh your knowledge and re-download the dataset. We are going to reuse it for the scenarios in the current article. 

So, please make sure that you have that file with the data in JSON format. But before we start to explore it, we need to edit that dataset for the SQL Developer Web to parse the data appropriately. The detailed instruction is provided in the previous article, but let us briefly recall what to do.

Download and unzip the archived file – you will see two folders there: assets and data. The data files are located in the data folder.

Find the tweet.js file in the data folder and open it with any Text Editor, such as Notepad. In the context of that file, you will see a part of the JSON data including the following row:

window.YTD.tweet.part0 =

Before importing the JSON data into Autonomous, we need to remove that row from the context of the tweet.js file.

Then save the tweet.js file and access SQL Developer Web with the DEMO user. In the top-left corner, click the hamburger icon and navigate to SQL. After that, navigate to the tweet.js file and upload it.

Import JSON Data

Click Next to go to the Table Definition step. Enter table name – STG_TWITTER_JSON and select the schema in the ADW instance. You will also see the table definition with columns and data type. Here, you can either keep the default definition or change the data type for each column.

Make sure that the data type is CLOB(JSON)

Import JSON Data

Click Next to go to the Review step and start loading JSON data. In this step, you see the DDL script to create your table. Note that the CHECK constraint is applied to the table. Make sure that the TWEET column data value is JSON.

Import JSON Data

Click Finish and verify the data.

Now, we have a table with only one column. This column contains context in JSON format. Here we need to parse the context of each row into the tabular structure.

Query JSON Data

Option 1: Simple Dot-Notation Access to JSON Data

Dot notation is designed for common use cases. You can use this option when the JSON structure in the dataset is quite simple, and you can access the elements directly.

The return value for the dot-notation query is always a string (data type VARCHAR2(4000)) representing JSON data. The content of the string depends on the targeted JSON data:

  • If a single JSON value is targeted, that value is the string content, whether it is a JSON scalar, object, or array.
  • If multiple JSON values are targeted, the string content is a JSON array – its elements are those values.

Let’s open the tweet.js file and have a quick view of it. In the context of entities, it is also JSON format, and urls is a nested child.

Simple Dot-Notation Access to JSON Data

How to use the dot notation approach to build the script for parsing the JSON data

The syntax is the following:

SELECT <alias name>.<column name>.<path to level or attribute>
FROM <table name> <alias name>

Use the following script:

SELECT T.tweet.retweeted As "retweeted",
 T.tweet.source "source",
 T.tweet.entities.hashtag "hashtag",
 T.tweet.entities.symbols "symbols",
 T.tweet.entities.user_mentions "user_mentions",
 T.TWEET.entities.urls.url "url",
 T.TWEET.entities.urls.expanded_url "expanded_url",
 T.TWEET.entities.urls.display_url "display_url",
 T.TWEET.display_text_range "display_text_range",
 T.TWEET.favorite_count "favorite_count",
 T.TWEET.id_str "id_str",
 T.TWEET.truncated "truncated",
 T.TWEET.retweet_count "retweet_count",
 T.TWEET.id "id",
 T.TWEET.possibly_sensitive "possibly_sensitive",
 T.TWEET.created_at "created_at",
 T.TWEET.favorited "favorited",
 T.TWEET.full_text "full_text"
FROM STG_TWITTER_JSON T;
Simple Dot-Notation Access to JSON Data

As you see, the dot notation approach is quite straightforward. You need to know the JSON structure clearly and understand which levels, elements, or attributes to access.

Try the same query and insert parsed data into a new table:

CREATE TABLE STG_TWEET_DOT_NOTATION
AS
SELECT T.tweet.retweeted As "retweeted",
 T.tweet.source "source",
 T.tweet.entities.hashtag "hashtag",
 T.tweet.entities.symbols "symbols",
 T.tweet.entities.user_mentions "user_mentions",
 T.TWEET.entities.urls.url "url",
 T.TWEET.entities.urls.expanded_url "expanded_url",
 T.TWEET.entities.urls.display_url "display_url",
 T.TWEET.display_text_range "display_text_range",
 T.TWEET.favorite_count "favorite_count",
 T.TWEET.id_str "id_str",
 T.TWEET.truncated "truncated",
 T.TWEET.retweet_count "retweet_count",
 T.TWEET.id "id",
 T.TWEET.possibly_sensitive "possibly_sensitive",
 T.TWEET.created_at "created_at",
 T.TWEET.favorited "favorited",
 T.TWEET.full_text "full_text"
FROM STG_TWITTER_JSON T;

Check the structure of the STG_TWEET_DOT_NOTATION table:

Simple Dot-Notation Access to JSON Data

Option 2: Using SQL/PL Functions JSON VALUE and JSON QUERY

The behavior of json_value and json_query functions is different from the dot notation approach. they are suitable for more complicated queries. For instance, they can return NULL or an error if the JSON format is incorrect or does not match specific queries. These functions also support returning the data types – you could not do it with the dot-notation query.

To work with json_value and json_query, first get familiar with the JSON path and the SQL/JSON path expression

  • json_value function returns a scalar value
  • json_query function returns an array of objects or a JSON document

Now, re-write the previously used query with the json_value function. We’ll use RETURNING to return the data type of retweet_count and favorite_count:

SELECT json_value(TWEET,'$.retweeted') As "retweeted",
 json_value(TWEET,'$.source') "source",
 json_value(TWEET,'$.entities.hashtag') "hashtag",
 json_value(TWEET,'$.entities.symbols') "symbols",
 json_value(TWEET,'$.entities.user_mentions') "user_mentions",
 json_value(TWEET,'$.entities.urls.url') "url",
 json_value(TWEET,'$.entities.urls.expanded_url') "expanded_url",
 json_value(TWEET,'$.entities.urls.display_url') "display_url",
 json_value(TWEET,'$.display_text_range') "display_text_range",
 json_value(TWEET,'$.favorite_count' returning number) "favorite_count",
 json_value(TWEET,'$.id_str') "id_str",
 json_value(TWEET,'$.truncated') "truncated",
 json_value(TWEET,'$.retweet_count' returning number) "retweet_count",
 json_value(TWEET,'$.id') "id",
 json_value(TWEET,'$.possibly_sensitive') "possibly_sensitive",
 json_value(TWEET,'$.created_at') "created_at",
 json_value(TWEET,'$.favorited') "favorited",
 json_value(TWEET,'$.full_text') "full_text"
FROM STG_TWITTER_JSON T;
Using SQL/PL Functions JSON VALUE and JSON QUERY

Check the structure of the STG_TWEET_JSON_VALUE table. The data type of both retweet_count and favorite_count is NUMBER:

Using SQL/PL Functions JSON VALUE and JSON QUERY

Now, let’s try with the json_query function.

The process of json_query is like the JSON value. It is used when you want to return the JSON document or an array of objects. For example, you want to return the array of urls. Then you can write the following query:

SELECT 
json_query(TWEET, '$.entities.urls' returning VARCHAR2 pretty ) URLS
FROM STG_TWITTER_JSON T;
Using SQL/PL Functions JSON VALUE and JSON QUERY

Copy the context of any row and paste it to the Notepad editor. You will see a pretty format:

Using SQL/PL Functions JSON VALUE and JSON QUERY

Conclusion

JSON is the standard data structure for most API and Web services. Oracle specialists have to deal with the data in this format quite frequently. Thus, Oracle provides a set of specialized tools and options to work with JSON data. The SQL/JSON functions are exceptionally helpful when you need to access or modify the JSON data.

  • Dot-notation is the function used to access the JSON data directly. It returns a scalar value, and the data type is always a string (VARCHAR2(4000)).
  • json_value and json_query functions are used when the element’s length attribute is more than 4000, or you need to access JSON data with more complicated queries.

Note also that the Autonomous Database in Oracle version 18c and higher has introduced lots of changes for work with JSON. If you are still using an older version, it might be worth upgrading to the latest version.

Dung Dinh

Dung Dinh

BI Specialist, Data Modelling, working as Oracle Consultant in Oracle Cloud.

Leave a Reply

Your email address will not be published. Required fields are marked *