The Wrangler User Interface of Cloud Data Fusion

Photo by Priscilla Du Preez on Unsplash

This Google product was briefly introduced in our previous blog post, Introduction to Cloud Data Fusion. In particular, it was explained the set-up, the cost structure, and the pipeline studio’s plugins elements. This blog post will focus more on introducing the Wrangler User Interface tool.

The Wrangler UI is a handy interface to clean, transform and prepare a dataset. With this tool, you can change a datatype, apply filters, correctly deal with null values, create new fields, etc. Furthermore, it also has an insights section where you can quickly visualise your data.

Wrangler UI — Data

This section will focus on explaining the step required to use the Wrangler UI and the capabilities of this tool. To use it you need to select Wrangler from the top Menu.

Wrangler Section

As a first step, you need to select a table inside a dataset from one of the available sources (i.e. BigQuery, Cloud Storage or another database source). A sample of 1000 rows will be loaded into the interface.

Available Sources

Here you will have the option to choose between Data (to transform the data) or Insights (to rapidly visualise the information), by selecting the right button inside the yellow box of the screenshot below.

On the right side of the Data section, you will see a list of available fields with the relative percentage of non-missing values in the sample (i.e. Completion, green box in the screenshot).

By clicking on the arrow at the left of a column name (i.e. red box), you can modify the data inside that column.

The available functions include Parse (i.e. valid for example, to convert a CSV or JSON file into a table with multiple columns), Change Data Type, Filter, Copy or Delete a Column, Fill Null or Empty Cells, Find and Replace a value, Define a new column or apply a Custom Transformation.

Once you click on a function, you will be able to manually enter the appropriate parameters.

Otherwise, you can also type the function, the required field names and the appropriate parameters manually in the command interface at the bottom (i.e. blue box).

Wrangler UI — Data

Once you are satisfied with your transformations, click on “Create a Pipeline” (orange box) and follow the integration process highlighted in the previous blog post.

Wrangler UI — Insights

In the Insights section of the Wrangler UI you can quickly visualise your data. On the right side, under Data you can select the fields that you want to analyse by dragging them in the Encoding section next to it. If no fields are selected, a Univariate Summary for each variable in the table will be automatically loaded with the count of records for each distinct entry in the data sample (i.e. see screenshot below).

Wrangler UI — Insights

On the other hand, if specific fields are selected, you will see a Specified View according to the parameters given and a series of Related additional Views that will include a related summary showing the mean of other variables, an alternative encoding showing a different bucketization of the underlying data and/or a series of views specific to a Wildcard Field (i.e. a single data type that include for example quantitative, categorical or temporal fields).

At the top of the Encoding section, you can specify the variables visualised in 2D on the x and y-axis.

In the Mark section right below, you can specify the size, the colour, the shape, the detail and the text of the figure inside the plots. A default Mark will be automatically generated if no fields are selected.

Under Facet you can divide the plots into multiple plots according to the distinct values of a selected row or column field and under Wildcard Shelves you can specify a single wildcard field or a combination of them.

In the filter section you can filter the data that is going to be visualised with a custom range.

Inside the plot figures there is an option to sort the data alphabetically, to write notes and to save them into a bookmark for later use.

Specified and Related Views

The best way to understand how the Wrangler UI Insights section works is to play around with your data. Unfortunately, the official documentation available is minimal. If you have access to Qwicklabs, you can also follow this Wrangler UI tutorial.

Once you are satisfied with your visualisations, click on “Create a Pipeline” and integrate a new pipeline or return to the Data section and further transform the data.

Conclusion

The Wrangler UI is handy to clean, transform and quickly visualise a sample of data and to be integrated easily into a code-free pipeline.

A limitation of the Wrangler UI is that you can preview only 1000 random rows. Furthermore, you cannot extract a new sample of data (compared to Cloud Dataprep) and the tool does not support Big Query tables with columns of type STRUCT.

Although this tool seems to be very useful, there is not a lot of official documentation available. Hopefully, it will be added soon in the Data Fusion overall official documentation.

Further Links

This post is part of our new Google Cloud series in the Data School of datadice. We will be sharing the best insights that we’ve learned through years of creating data solutions.

If you want to learn more about using Google Data Studio and take it to the next level in combination with BigQuery, check our Udemy course here.

If you are looking for help to set up a modern and cost-efficient data warehouse or analytical dashboards, send us an email to hello@datadice.io and we will schedule a call.

--

--

--

Agency of Business Intelligence specialists, based in Coburg, DE. Building modern, customized and cloud-based data warehouse solutions. https://www.datadice.io/

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Why you need to learn Python?

How Much Does It Cost To Create An App like Zoom?

Reflecting on the past five years

Laravel or Drupal 8? What Are the Key Differences? Which One Best Fits Your Use Case Scenario?

Laravel or Drupal 8? What Are the Key Differences? Which One Best Fits Your Use Case Scenario?

Welcome to Strapi Academy

WORDPRESS VS JOOMLA — THE UNBIASED COMPARISON

Data Models: 3NF, Star and Vault

5 reasons why IT community engagement is important

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
datadice

datadice

Agency of Business Intelligence specialists, based in Coburg, DE. Building modern, customized and cloud-based data warehouse solutions. https://www.datadice.io/

More from Medium

Making Interactive Report Using Google Data Studio and BigQuery Sandbox (Public Dataset…

Data processing with Dataflow SQL (part 1/2)

BigQuery 🤔

Why I built the python-bigquery-validator package