Full-Text: Exploring the Netflix Catalog

In this tutorial, we will explore how to manage a dataset of Netflix titles, making use of CrateDB Cloud’s full-text search capabilities. Each entry in our imaginary dataset will have the following attributes:

show_id

A unique identifier for each show or movie.

type

Specifies whether the title is a movie, TV show, or another format.

title

The title of the movie or show.

director

The name of the director.

cast

An array listing the cast members.

country

The country where the title was produced.

date_added

A timestamp indicating when the title was added to the catalog.

release_year

The year the title was released.

rating

The content rating (e.g., PG, R, etc.).

duration

The duration of the title in minutes or seasons.

listed_in

An array containing genres that the title falls under.

description

A textual description of the title, indexed using full-text search.

To begin, let’s create the schema for this dataset.

Creating the Table

CrateDB uses SQL, the most popular query language for database management. To store the data, create a table with columns tailored to the dataset using the CREATE TABLE command.

Importantly, you will also take advantage of CrateDB’s full-text search capabilities by setting up a full-text index on the description column. This will enable you to perform complex textual queries later on.

CREATE TABLE "netflix_catalog" (
   "show_id" TEXT PRIMARY KEY,
   "type" TEXT,
   "title" TEXT,
   "director" TEXT,
   "cast" ARRAY(TEXT),
   "country" TEXT,
   "date_added" TIMESTAMP,
   "release_year" TEXT,
   "rating" TEXT,
   "duration" TEXT,
   "listed_in"  ARRAY(TEXT),
   "description" TEXT INDEX using fulltext
);

Run the above SQL command in CrateDB to set up your table. With the table ready, you’re now set to insert the dataset.

Inserting Data

Now, insert data into the table you just created, by using the COPY FROM SQL statement.

COPY netflix_catalog
FROM 'https://github.com/crate/cratedb-datasets/raw/main/cloud-tutorials/data_netflix.json.gz'
WITH (format = 'json', compression='gzip');

Run the above SQL command in CrateDB to import the dataset. After this commands finishes, you are now ready to start querying the dataset.