Data Explorer
The Data Explorer is designed to complement code-first exploration of data, allowing you to display data in a spreadsheet-like grid, temporarily filter and sort data, and provide useful summary statistics directly inside of Positron. The goal of the Data Explorer isn’t to replace code-based workflows, but rather supplement with ephemeral views of the data or summary statistics as you further explore or modify the data via code.
The Data Explorer has three primary components, discussed in greater detail in the sections below:
- Data grid: Spreadsheet-like display of the individual cells and columns, as well as sorting
- Summary panel: Column name, type and missing data percentage for each column
- Filter bar: Ephemeral filters for specific columns
Open a dataframe in the Data Explorer
Each instance of the Data Explorer tool is powered by a language runtime and can display dataframes in Python (pandas
) or R (data.frame
, tibble
, data.table
). We also have experimental support for polars
, and additional Python dataframe libraries will be added in the future.
Each instance of the data explorer will be refreshed with changes made to the underlying data. This allows combined workflows between the UI-centric Data Explorer and a code-first approach.
To open a new instance of the Data Explorer on a specific data frame, use one of the following methods:
- Use the language runtime directly:
- Via Python:
%view dataframe label
- Via R:
View(dataframe, "label")
- Via Python:
- Navigate to the Variables Pane and click on the Data Explorer icon for a specific dataframe object
Data grid
The data grid is the primary display, with a spreadsheet-like cell-by-cell view. It’s intended to scale efficiently to relatively large in-memory datasets, up to millions of rows or columns. Each column header has the column name above the data type, as used in the language runtime. At the top right of each column, there is a context menu to control sorting or to add a filter for the selected column. Resize columns by clicking and dragging the column’s borders.
Row labels default to the observed row index, with a zero-based index in Python and a one-based index in R. Alternatively, pandas
and R users may also have rows with modified indices or string-based labels.
Summary panel
The summary panel displays a vertical scrolling list of all of the column names and an icon representing their respective type. It displays a sparkline histogram of that column’s data, and also displays the amount of missing data as both an inline bar graph and an increasing percentage.
Double clicking on a column name will also bring the column into focus in the data grid, allowing for quickly navigating wider data.
- The summary panel quickly collapsed by dragging it to the edge or clicking on the collapse button after hovering over the grid and summary panel divider.
- The summary panel can also be placed on the left or right side of the Data Explorer via the Layout control.
Filter bar
The filter bar has controls to:
- Add, Show/Hide existing filters, or Clear Filters
- A + button to quickly add a new filter
- The status bar at the bottom of the Data Explorer also displays the percentage and number of remaining rows relative to the original total after applying a filter
When creating a new filter, you will first need to select a column either by scrolling the full list or searching across columns for a specific string. Once a column is selected, the available filters for that column type will be displayed. Alternatively, the context menu in each column label of the data grid also allows for creating filters with the column name pre-populated.
Available filters vary according to the column type. For example, string columns have filter affordances for: contains, starts or ends with, is empty, or exact matches. Alternatively, numeric columns have logical operations such as: is less than or greater than, is equal to, or is inclusively between two values.