matminer.figrecipes.plotly package

Submodules

matminer.figrecipes.plotly.make_plots module

class matminer.figrecipes.plotly.make_plots.PlotlyFig(df=None, plot_mode='offline', plot_title=None, x_title=None, y_title=None, colbar_title='auto', hovermode='closest', filename='auto', show_offline_plot=True, username=None, colorscale='Viridis', api_key=None, textsize=25, ticksize=25, fontfamily='Courier', height=None, width=None, scale=None, margins=100, pad=0, marker_scale=1.0, x_scale='linear', y_scale='linear', hoverinfo='x+y+text')

Bases: object

__init__(df=None, plot_mode='offline', plot_title=None, x_title=None, y_title=None, colbar_title='auto', hovermode='closest', filename='auto', show_offline_plot=True, username=None, colorscale='Viridis', api_key=None, textsize=25, ticksize=25, fontfamily='Courier', height=None, width=None, scale=None, margins=100, pad=0, marker_scale=1.0, x_scale='linear', y_scale='linear', hoverinfo='x+y+text')

Class for making Plotly plots

Args:
df (DataFrame): A pandas dataframe object which can be used to
generate several plots.
plot_mode: (str)
  1. ‘offline’: creates and saves plots on the local disk
  2. ‘notebook’: to embed plots in a IPython/Jupyter notebook,
  3. ‘online’: save the plot in your online plotly account,
  4. ‘static’: save a static image of the plot locally

(v) ‘return’: Any plotting method returns its Plotly Figure object. Useful for fine tuning the plot. NOTE: Both ‘online’ and ‘static’ modes require either the fields ‘username’ and ‘api_key’ or Plotly credentials file.

plot_title: (str) title of plot x_title: (str) title of x-axis y_title: (str) title of y-axis colbar_title (str or None): the colorbar (z) title. If set to

“auto” the name of the third column (if pd.Series) is chosen.
hovermode: (str) determines the mode of hover interactions. Can be
‘x’/’y’/’closest’/False

filename: (str) name/filepath of plot file show_offline_plot: (bool) automatically open the plot (the plot is

saved either way); only applies to ‘offline’ mode.

username: (str) plotly account username colorscale: (str) Sets the colorscale (colormap). It can be an array

containing arrays mapping a normalized value to an rgb, rgba, hex, hsl, hsv, or named color string. At minimum, a mapping for the lowest (0) and highest (1) values are required. Example: ‘[[0, ‘rgb(0,0,255)’, [1, ‘rgb(255,0,0)’]]’. Alternatively, it may be a palette name from the following list: Greys, YlGnBu, Greens, YlOrRd, Bluered, RdBu, Reds, Blues, Jet, Picnic, Rainbow, Portland, Hot, Blackbody, Earth, Electric, Viridis

api_key: (str) plotly account API key textsize: (int) size of text of plot title and axis titles ticksize: (int) size of ticks fontfamily: (str) HTML font family - the typeface that will be applied by the web browser. The web browser

will only be able to apply a font if it is available on the system which it operates. Provide multiple font families, separated by commas, to indicate the preference in which to apply fonts if they aren’t available on the system. The plotly service (at https://plot.ly or on-premise) generates images on a server, where only a select number of fonts are installed and supported. These include “Arial”, “Balto”,

“Courier New”, “Droid Sans”,, “Droid Serif”, “Droid Sans Mono”, “Gravitas One”, “Old Standard TT”, “Open Sans”, “Overpass”, “PT Sans Narrow”, “Raleway”, “Times New Roman”.

height: (float) output height (in pixels) width: (float) output width (in pixels) scale: (float) Increase the resolution of the image by scale amount, eg: 3. Only valid for PNG and

JPEG images.
margins (float or [float]): Specify the margin (in px) with a list [top, bottom, right, left], or a
number which will set all margins.

pad: (float) Sets the amount of padding (in px) between the plotting area and the axis lines marker_scale (float): scale the size of all markers w.r.t. defaults x_scale: (str) Sets the x axis scaling type. Select from ‘linear’, ‘log’, ‘date’, ‘category’. y_scale: (str) Sets the y axis scaling type. Select from ‘linear’, ‘log’, ‘date’, ‘category’. hoverinfo: (str) Any combination of “x”, “y”, “z”, “text”, “name”

joined with a “+” OR “all” or “none” or “skip”. Examples: “x”, “y”, “x+y”, “x+y+z”, “all” Determines which trace information appear on hover. If none or skip are set, no information is displayed upon hovering. But, if none is set, click and hover events are still fired.

Returns: None

bar(data=None, cols=None, x=None, y=None, labels=None, barmode='group', colors=None, bargap=None)

Create a bar chart using Plotly.

Can be used with x and y arguments or with a dataframe (passed as ‘data’ or taken from constructor).

Args:
data (DataFrame): The column names will become the ‘x’ axis. The
rows will become sets of bars (e.g., 3 rows = 3 sets of bars for each x point).
cols ([str]): A list of strings specifying columns of a DataFrame
passed into the constructor to be used as data. Should not be used with ‘data’.
x (list or [list]): A list containing ‘x’ axis values. Can be a list
of lists if there is more than one set of bars.
y (list or [list]): A list containing ‘y’ values. Can be a list of
lists if there is more than one set of bars (more than one set of data for each ‘x’ axis value).
labels (str or [str]): Defines the label for each set of bars. If
str, defines the column of the DataFrame to use for labelling. The column’s entry for a row will be the label for that row. If it is a list of strings, should be used with x and y, and defines the label for each set of bars.
barmode: Defines how sets of bars are displayed. Can be set to
“group” or “stack”.
colors ([str]): The list of colors to use for each set of bars.
The length of this list should be equal to the number of rows (sets of bars) present in your data.

bargap (int/float): Separation between bars.

Returns:
A Plotly bar chart object.
create_plot(fig)

Creates a plotly plot based on its dictionary representation. The modes of plotting are:

  1. offline: Makes an offline html.
  2. notebook: Embeds in Jupyter notebook
  3. online: Send to Plotly, requires credentials
  4. static: Creates a static image of the plot
  5. return: Returns the dictionary representation of the plot.
Args:
fig: (dictionary) contains data and layout information
Returns:
A Plotly Figure object (if self.plot_mode = ‘return’)
data_from_col(col, df=None)
try to get data based on column name in dataframe and return
informative error if failed.
Args:
col (str): column name to look for

Returns (pd.Series or col itself):

heatmap(data=None, cols=None, x_bins=6, y_bins=4, precision=1, annotation='count', annotation_color='black', colorscale=None)
Args:

data: (array) an array of arrays. For example, in case of a pandas dataframe ‘df’, data=df.values.tolist() cols ([str]): A list of strings specifying the columns of the

dataframe (either data or self.df) to use. Currenly, only 3 columns is supported. Note that the order in cols matter, the firts is considered x, second y and the third as z (color)
x_bins (int or None): if the unique values for x_prop is more than
x_bins, x_prop is binned to the number of x_bins for better presentation

y_bins (int or None): similar to x_bins precision (int): number of floating points used for binning/display annotation (str or None): mode of annotation. Options are: None or

“count”: the number of data available in each cell displayed “value”: the actual value of the cell in addition to colorbar

annotation_color (str): the color of annotation (text inside cells) colorscale: see the __init__ doc for colorscale

Returns: A Plotly heatmap plot Figure object.

heatmap_plot(data, x_labels=None, y_labels=None, colorscale='Viridis', colorscale_range=None, annotations_text=None, annotations_text_size=20, annotations_color='white')

Make a heatmap plot, either using 2D arrays of values, or a dataframe.

Args:

data: (array) an array of arrays. For example, in case of a pandas dataframe ‘df’, data=df.values.tolist() x_labels: (array) an array of strings to label the heatmap columns y_labels: (array) an array of strings to label the heatmap rows colorscale: (str/array) Sets the colorscale. The colorscale must be an array containing arrays mapping a

normalized value to an rgb, rgba, hex, hsl, hsv, or named color string. At minimum, a mapping for the lowest (0) and highest (1) values are required. For example, [[0, ‘rgb(0,0,255)’, [1, ‘rgb(255,0,0)’]]. Alternatively, colorscale may be a palette name string of the following list: Greys, YlGnBu, Greens, YlOrRd, Bluered, RdBu, Reds, Blues, Picnic, Rainbow, Portland, Jet, Hot, Blackbody, Earth, Electric, Viridis
colorscale_range: (array) Sets the minimum (first array item) and maximum value (second array item)
of the colorscale
annotations_text: (array) an array of arrays, with each value being a string annotation to the corresponding
value in ‘data’

annotations_text_size: (int) size of annotation text annotations_color: (str/array) color of annotation text - accepts similar formats as other color variables

Returns: A Plotly heatmap plot Figure object.

histogram(data=None, cols=None, orientation='vertical', histnorm='count', n_bins=None, bins=None, colors=None, bargap=0)

Creates a Plotly histogram. If multiple series of data are available, will create an overlaid histogram.

For n_bins, start, end, size, colors, and bargaps, all defaults are Plotly defaults.

Args:
data (DataFrame or list): A dataframe containing at least
one numerical column. Also accepts lists of numerical values. If None, uses the dataframe passed into the constructor.
cols ([str]): A list of strings specifying the columns of the
dataframe to use. Each column will be represented with its own histogram in the overlay.
orientation (str): Determines whether histogram is oriented
horizontally or vertically. Use “vertical” or “horizontal”.
histnorm: The technique for creating the plot. Can be “probability
density”, “probability”, “density”, or “” (count).
n_bins (int or [int]): The number of binds to include on each plot.
if only one number specified, all histograms will have the same number of bins
bins (dict or [dict]): specifications of the bins including start,
end and size. If n_bins is set, size cannot be set in bins. Also size is ignored if start or end not specified. Examples: 1) bins=None, n_bins = 25 2) bins={‘start’: 0, ‘end’: 50, ‘size’: 2.0}, n_bins=None
colors (str or list): The list of colors for each histogram (if
overlaid). If only one series of data is present or all series should have the same value, a single str determines the color of the bins.
bargaps (float or list): The gaps between bars for all histograms
shown.
Returns:
Plotly histogram figure.
parallel_coordinates(data=None, cols=None, line=None, precision=2, colbar=None)

Create a Plotly Parcoords plot from dataframes. Args:

data (DataFrame or list): A dataframe containing at least
one numerical column. Also accepts lists of numerical values. If None, uses the dataframe passed into the constructor.
cols ([str]): A list of strings specifying the columns of the
dataframe to use.

line (dict): plotly line dict with keys such as “color” or “width” precision (int): the number of floating points for columns with

float data type (2 is recommended for a nice visualization)

Returns: a Plotly scatter matrix plot

scatter_matrix(data=None, cols=None, colbar=None, marker=None, text=None, **kwargs)

Create a Plotly scatter matrix plot from dataframes using Plotly. Args:

data (DataFrame or list): A dataframe containing at least
one numerical column. Also accepts lists of numerical values. If None, uses the dataframe passed into the constructor.
cols ([str]): A list of strings specifying the columns of the
dataframe to use.

colbar: (str) name of the column used for colorbar marker (dict): if size is set, it will override the automatic size text (see PlotlyFig.xy_plot documentation): **kwargs: keyword arguments of scatterplot. Forbidden args are

‘size’, ‘color’ and ‘colorscale’ in ‘marker’. See example below

Returns: a Plotly scatter matrix plot

# Example for more control over markers: from matminer.figrecipes.plotly.make_plots import PlotlyFig from matminer.datasets.dataframe_loader import load_elastic_tensor df = load_elastic_tensor() pf = PlotlyFig() pf.scatter_matrix(df[[‘volume’, ‘G_VRH’, ‘K_VRH’, ‘poisson_ratio’]],

colbar_col=’poisson_ratio’, text=df[‘material_id’], marker={‘symbol’: ‘diamond’, ‘size’: 8, ‘line’: {‘width’: 1, ‘color’: ‘black’}}, colormap=’Viridis’, title=’Elastic Properties Scatter Matrix’)
violin(data=None, cols=None, group_col=None, groups=None, title=None, colors=None, use_colorscale=False)

Create a violin plot using Plotly.

Args:
data: (DataFrame or list) A dataframe containing at least one
numerical column. Also accepts lists of numerical values. If None, uses the dataframe passed into the constructor.
cols: ([str]) The labels for the columns of the dataframe to be
included in the plot. Not used if data is passed in as list.
group_col: (str) Name of the column containing the group for each
row, if it exists. Used only if there is one entry in cols.
groups: ([str]): All group names to be included in the violin plot.
Used only if there is one entry in cols.

title: (str) Title of the violin plot colors: (str/tuple/list/dict) either a plotly scale name (Greys,

YlGnBu, Greens, etc.), an rgb or hex color, a color tuple, a list/dict of colors. An rgb color is of the form ‘rgb(x, y, z)’ where x, y and z belong to the interval [0, 255] and a color tuple is a tuple of the form (a, b, c) where a, b and c belong to [0, 1]. If colors is a list, it must contain valid color types as its members. If colors is a dictionary, its keys must represent group names, and corresponding values must be valid color types (str).
use_colorscale: (bool) Only applicable if grouping by another
variable. Will implement a colorscale based on the first 2 colors of param colors. This means colors must be a list with at least 2 colors in it (Plotly colorscales are accepted since they map to a list of two rgb colors)

Returns: A Plotly violin plot Figure object.

xy(xy_pairs, colbar=None, colbar_range=None, labels=None, names=None, sizes=None, modes='markers', markers=None, lines=None, colorscale=None, showlegends=None, normalize_size=True)

Make an XY scatter plot, either using arrays of values, or a dataframe. Args:

xy_pairs (tuple or [tuple]): x & y columns of scatter plots
with possibly different lengths are extracted from this arg example 1: ([1, 2], [3, 4]) example 2: [(df[‘x1’], df[‘y1’]), (df[‘x2’], df[‘y2’])] example 3: [(‘x1’, ‘y1’), (‘x2’, ‘y2’)]
colbar (list or np.ndarray or pd.Series): set the colorscale for
the colorbar (list of numbers); overwrites marker[‘color’]
colbar_range ([min, max]): the range of numbers included in colorbar.
if any number is outside of this range, it will be forced to either one. Note that if colbar_range is set, the colorbar ticks will be updated to reflext -min or max+ at the two ends.
labels (list or [list]): to individually set annotation for scatter
point either the same for all traces or can be set for each
names (str or [str]): list of trace names used for legend. By
default column name (or trace if NA) used if pd.Series passed
sizes (str, float, [float], [list]). Options:
str: column name in data with list of numbers used for marker size float: a single size used for all traces in xy_pairs [float]: list of fixed sizes used for traces (length==len(xy_pairs)) [list]: list of list of sizes for each trace in xy_pairs

modes (str or [str]): trace style; can be ‘markers’/’lines’/’lines+markers’ markers (dict or [dict]): gives the ability to fine tune marker

of each scatter plot individually if list of dicts passed. Note that the key “size” is forbidden in markers. Use sizes arg instead.

lines (dict or [dict]: similar to markers though only if mode==’lines’ colorscale (str): see the colorscale doc in __init__ showlegends (bool or [bool]): indicating whether to show legend

for each trace (or simply turn it on/off for all if not list)

normalize_size (bool): if True, normalize the size list.

Returns: A Plotly Scatter plot Figure object.

xy_plot(x_col, y_col, text=None, color='rgba(70, 130, 180, 1)', size=6, colorscale='Viridis', legend=None, showlegend=False, mode='markers', marker='circle', marker_fill='fill', hoverinfo='x+y+text', add_xy_plot=None, marker_outline_width=0, marker_outline_color='black', linedash='solid', linewidth=2, lineshape='linear', error_type=None, error_direction=None, error_array=None, error_value=None, error_symmetric=True, error_arrayminus=None, error_valueminus=None)

Make an XY scatter plot, either using arrays of values, or a dataframe.

Args:

x_col: (array) x-axis values, which can be a list/array/dataframe column y_col: (array) y-axis values, which can be a list/array/dataframe column text: (str/array) text to use when hovering over points; a single string, or an array of strings, or a

dataframe column containing text strings
color: (str/array) in the format of a (i) color name (eg: “red”), or (ii) a RGB tuple,
(eg: “rgba(255, 0, 0, 0.8)”), where the last number represents the marker opacity/transparency, which must be between 0.0 and 1.0., (iii) hexagonal code (eg: “FFBAD2”), or (iv) name of a dataframe numeric column to set the marker color scale to
size: (int/array) marker size in the format of (i) a constant integer size, or (ii) name of a dataframe
numeric column to set the marker size scale to. In the latter case, scaled Z-scores are used.
colorscale: (str) Sets the colorscale. The colorscale must be an array containing arrays mapping a
normalized value to an rgb, rgba, hex, hsl, hsv, or named color string. At minimum, a mapping for the lowest (0) and highest (1) values are required. For example, [[0, ‘rgb(0,0,255)’, [1, ‘rgb(255,0,0)’]]. Alternatively, colorscale may be a palette name string of the following list: Greys, YlGnBu, Greens, YlOrRd, Bluered, RdBu, Reds, Blues, Picnic, Rainbow, Portland, Jet, Hot, Blackbody, Earth, Electric, Viridis

legend: (str) plot legend mode: (str) marker style; can be ‘markers’/’lines’/’lines+markers’ marker: (str) Shape of marker symbol. For all options, please see

marker_fill: (str) Shape fill of marker symbol. Options are “fill”/”open”/”dot”/”open-dot” hoverinfo: (str) Any combination of “x”, “y”, “z”, “text”, “name” joined with a “+” OR “all” or “none” or

“skip”. Examples: “x”, “y”, “x+y”, “x+y+z”, “all” default: “x+y+text” Determines which trace information appear on hover. If none or skip are set, no information is displayed upon hovering. But, if none is set, click and hover events are still fired.

showlegend: (bool) show legend or not add_xy_plot: (list) of dictionaries, each of which contain additional data to add to the xy plot. Keys are

names of arguments to the original xy_plot method - required keys are ‘x_col’, ‘y_col’, ‘text’, ‘mode’, ‘name’, ‘color’, ‘size’. Values are corresponding argument values in the same format as for the original xy_plot. Use None for values not to be set, else a KeyError will be raised. Optional keys are ‘marker’ and ‘marker_fill’ (same format as root keys)

marker_outline_width: (int) thickness of marker outline marker_outline_color: (str/array) color of marker outline - accepts similar formats as other color variables linedash: (str) sets the dash style of a line. Options are ‘solid’/’dash’ linewidth: (int) sets the line width (in px) lineshape: (str) determines the line shape. With “spline” the lines are drawn using spline interpolation error_type: (str) Determines the rule used to generate the error bars. Options are,

  1. “data”: bar lengths are set in variable error_array/’error_arrayminus’,
  2. “percent”: bar lengths correspond to a percentage of underlying data. Set this percentage in the
variable ‘error_value’/’error_valueminus’,

(iii) “constant”: bar lengths are of a constant value. Set this constant in the variable ‘error_value’/’error_valueminus’

error_direction: (str) direction of error bar, “x”/”y” error_array: (list/array/series) Sets the data corresponding the length of each error bar.

Values are plotted relative to the underlying data
error_value: (float) Sets the value of either the percentage (if error_type is set to “percent”) or
the constant (if error_type is set to “constant”) corresponding to the lengths of the error bars.
error_symmetric: (bool) Determines whether or not the error bars have the same length in both direction
(top/bottom for vertical bars, left/right for horizontal bars
error_arrayminus: (list/array/series) Sets the data corresponding the length of each error bar in the bottom
(left) direction for vertical (horizontal) bars Values are plotted relative to the underlying data.
error_valueminus: (float) Sets the value of either the percentage (if error_type is set to “percent”) or
the constant (if error_type is set to “constant”) corresponding to the lengths of the error bars in the bottom (left) direction for vertical (horizontal) bars

Returns: A Plotly Scatter plot Figure object.

Module contents