fbpx
Skip to content

Top 12 Matplotlib Visuals – Data Analysis

    Top-12-Matplotlib-Visuals-Data-Analysis.jpg

    Last Updated on: 30th May 2024, 03:27 pm

    Matplotlib in Python offers a versatile toolkit for creating various types of visualizations. Line plots are ideal for showcasing trends over time, while scatter plots reveal relationships between variables.

    Bar charts excel in comparing discrete categories, histograms display data distributions, and box plots identify central tendency and outliers.

    Pie charts provide a visual representation of proportions, and area plots depict cumulative data or compositions. Violin plots combine box plots and kernel density plots to show data distribution.

    3D plots visualize complex relationships in three-dimensional space, and heatmaps represent data using color gradients.

    Each visualization serves distinct purposes, from exploring trends and relationships to analyzing distributions and compositions, catering to diverse data analysis needs.

    Line Plot

    Use Case: Line plots are commonly used to visualize trends over time or to show the relationship between two variables. They are useful for displaying continuous data points.

    import matplotlib.pyplot as plt
    
    # Sample data
    x = [1, 2, 3, 4, 5]
    y = [2, 3, 5, 7, 11]
    
    # plot
    plt.plot(x, y)
    plt.title('Line Plot')
    plt.xlabel('x')
    plt.ylabel('y')
    plt.show()

    Explanation:

    • plt.scatter(x, y, color='red', marker='o', label='Data Points'): Plots the scatter plot with given x and y values. color specifies the color of the points, marker specifies the marker style, and label assigns a label to the data points.
    • plt.legend(): Displays the legend.
    • Other lines have the same meaning as in the Line Plot.

    Scatter Plot

    Use Case: Scatter plots are useful for visualizing the relationship between two variables. They help identify correlations or patterns in the data and are particularly effective when dealing with a large number of data points.

    # Sample data
    x = [1, 2, 3, 4, 5]
    y =[2, 3, 5, 7, 11]
    
     #Scatter plot
    plt.scatter(x, y)
    plt.title('Scatter Plot')
    plt.xlabel('x')
    plt.ylabel('y')
    plt.show()

    Explanation:

    • import matplotlib.pyplot as plt: Imports the matplotlib library.
    • x and y: Define the data to be plotted.
    • plt.scatter(x, y): Creates a scatter plot with x and y data points.
    • plt.title('Scatter Plot'): Sets the title of the plot.
    • plt.xlabel('x') and plt.ylabel('y'): Labels the x and y axes.
    • plt.show(): Displays the plot.

    Bar Chart

    Use Case: Bar charts are effective for comparing categorical data. They are often used to display discrete data points and show the relative sizes of different categories.

    import matplotlib.pyplot as plt
    
    # Data
    x = ['A', 'B', 'C', 'D', 'E']
    y = [10, 15, 7, 10, 12]
    
    # Plot
    plt.bar(x, y, color='skyblue')
    plt.xlabel('Categories')
    plt.ylabel('Values')
    plt.title('Bar Chart')
    plt.show()

    Explanation:

    • plt.bar(x, y, color='skyblue'): Plots the bar chart with given x categories and corresponding y values. color specifies the color of the bars.
    • Other lines have the same meaning as in the Line Plot.

    Nested Bar Plot Comparison

    Use Case: This plot visualizes nested categorical data, showing the relationship between main and sub-categories through bar heights.

    import matplotlib.pyplot as plt
    import numpy as np
    
    # Data
    categories = ['A', 'B', 'C', 'D']
    outer_values = [20, 35, 30, 25]  # Outer bars
    inner_values = [10, 20, 15, 10]  # Inner bars
    
    # Plot
    fig, ax = plt.subplots()
    index = np.arange(len(categories))  # Index for the categories
    width = 0.4  # Width of the outer bars
    
    # Plot the outer bars
    ax.bar(index, outer_values, width, label='Outer Bars')
    
    # Plot the inner bars
    ax.bar(index, inner_values, width, label='Inner Bars')
    
    # Add labels, title, and legend
    ax.set_xlabel('Categories')
    ax.set_ylabel('Values')
    ax.set_title('Nested Bar Plot')
    ax.set_xticks(index)
    ax.set_xticklabels(categories)
    ax.legend()
    
    # Add value labels on top of each bar
    for i, (outer, inner) in enumerate(zip(outer_values, inner_values)):
        ax.text(i, outer + 1, str(outer), ha='center', va='bottom')
        ax.text(i, inner + 1, str(inner), ha='center', va='bottom')
    
    plt.show()

    Explanation:

    • Define categories and values for outer and inner bars.
    • Create a figure and subplot for plotting.
    • Create an index for the categories and specify the width of the bars.
    • Plot the outer bars on the subplot.
    • Plot the inner bars on the subplot.
    • Set labels for the x-axis and y-axis.
    • Set a title for the plot.
    • Set positions and labels for the x-axis ticks.
    • Add a legend to the plot.
    • Add value labels on top of each bar.
    • Display the plot.

    Histogram

    Use Case: Histograms are used to visualize the distribution of a single continuous variable. They display the frequency or count of data points within predefined intervals or bins.

    import matplotlib.pyplot as plt
    import numpy as np
    
    # Data
    data = np.random.randn(1000)
    
    # Plot
    plt.hist(data, bins=30, color='cyan', edgecolor='black')
    plt.xlabel('Value')
    plt.ylabel('Frequency')
    plt.title('Histogram')
    plt.show()

    Explanation:

    • np.random.randn(1000): Generates 1000 random data points from a standard normal distribution.
    • plt.hist(data, bins=30, color='green', edgecolor='black'): Plots the histogram with the given data, dividing it into 30 bins. color specifies the color of the bars, and edgecolor specifies the color of the edges of the bars.
    • Other lines have the same meaning as in the Line Plot.

    Box Plot

    Use Case: Box plots are useful for visualizing the distribution of a continuous variable and identifying outliers. They display key statistical measures such as median, quartiles, and potential outliers.

    import matplotlib.pyplot as plt
    import numpy as np
    
    # Data
    data = np.random.randn(100)
    
    # Plot
    plt.boxplot(data)
    plt.xlabel('Data')
    plt.ylabel('Values')
    plt.title('Box Plot')
    plt.show()
    

    Explanation:

    • np.random.randn(100): Generates 100 random data points from a standard normal distribution.
    • plt.boxplot(data): Plots the box plot of the given data.
    • Other lines have the same meaning as in the Line Plot.

    Pie Chart

    Use Case: Pie charts are effective for showing the composition of a whole. They are commonly used to represent proportions or percentages of different categories within a dataset.

    import matplotlib.pyplot as plt
    
    # Data
    labels = ['A', 'B', 'C', 'D']
    sizes = [15, 30, 45, 10]
    
    # Plot
    plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=140)
    plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
    plt.title('Pie Chart')
    plt.show()

    Explanation:

    • plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=140): Plots the pie chart with given sizes and labels. autopct specifies the format for displaying the percentages, and startangle rotates the start of the pie chart by the specified angle.
    • plt.axis('equal'): Ensures that the pie chart is drawn as a circle.
    • Other lines have the same meaning as in the Line Plot.

    Donut Plot

    Use Case: Donut plots are similar to pie charts but with a hole in the center. They are used to represent the composition of a whole and are effective for comparing the proportions of different categories within a dataset.

    import matplotlib.pyplot as plt
    
    # Data
    labels = ['A', 'B', 'C', 'D']
    sizes = [20, 30, 40, 10]
    
    # Plot
    fig, ax = plt.subplots()
    ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, colors=['lightblue', 'lightgreen', 'lightcoral', 'lightsalmon'])
    ax.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
    
    # Draw a white circle at the center to create the donut
    centre_circle = plt.Circle((0, 0), 0.7, color='white', linewidth=0)
    ax.add_artist(centre_circle)
    
    plt.title('Donut Plot')
    plt.show()

    Explanation:

    • labels and sizes: Define the labels and corresponding sizes of the sectors in the donut plot.
    • fig, ax = plt.subplots(): Create a figure and axis object.
    • ax.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=90, colors=['lightblue', 'lightgreen', 'lightcoral', 'lightsalmon']): Plots the pie chart with the given sizes, labels, percentage format, start angle, and colors.
    • ax.axis('equal'): Ensures that the pie chart is drawn as a circle.
    • centre_circle = plt.Circle((0, 0), 0.7, color='white', linewidth=0): Creates a white circle at the center to form the donut.
    • ax.add_artist(centre_circle): Adds the white circle to the plot.
    • Other lines have the same meaning as in the Line Plot.

    Area Plot

    Use Case: Area plots are similar to line plots but with the area below the line filled in. They are useful for visualizing cumulative data or for highlighting the magnitude of change over time.

    import matplotlib.pyplot as plt
    
    # Data
    x = [1, 2, 3, 4, 5]
    y1 = [1, 2, 3, 4, 5]
    y2 = [5, 4, 3, 2, 1]
    
    # Plot
    plt.fill_between(x, y1, y2, color='skyblue', alpha=0.5)
    plt.xlabel('X-axis')
    plt.ylabel('Y-axis')
    plt.title('Area Plot')
    plt.show()

    Explanation:

    • plt.fill_between(x, y1, y2, color='skyblue', alpha=0.5): Plots the area between two lines defined by y1 and y2 along the x-axis. color specifies the color of the filled area, and alpha controls the transparency.
    • Other lines have the same meaning as in the Line Plot.

    Violin Plot

    Use Case: Violin plots are used to visualize the distribution of a continuous variable across different categories. They combine aspects of box plots and kernel density estimation plots to provide insights into the data distribution.

    import matplotlib.pyplot as plt
    import numpy as np
    
    # Data
    data = [np.random.normal(0, std, 100) for std in range(1, 4)]
    
    # Plot
    plt.violinplot(data)
    plt.xlabel('Data')
    plt.ylabel('Values')
    plt.title('Violin Plot')
    plt.show()

    Explanation:

    • np.random.normal(0, std, 100): Generates 100 data points from a normal distribution with mean 0 and standard deviation std.
    • plt.violinplot(data): Plots the violin plot of the given data.
    • Other lines have the same meaning as in the Line Plot.

    Heatmap

    Use Case: Heatmaps are effective for visualizing the magnitude of relationships between two variables. They are commonly used in fields such as finance, biology, and social sciences to identify patterns or correlations in large datasets.

    import matplotlib.pyplot as plt
    import numpy as np
    
    # Data
    data = np.random.rand(10, 10)
    
    # Plot
    plt.imshow(data, cmap='hot', interpolation='nearest')
    plt.colorbar()
    plt.xlabel('X-axis')
    plt.ylabel('Y-axis')
    plt.title('Heatmap')
    plt.show()

    Explanation:

    • np.random.rand(10, 10): Generates a 10×10 array of random numbers.
    • plt.imshow(data, cmap='hot', interpolation='nearest'): Displays the heatmap of the data using a colormap (hot in this case). interpolation specifies the interpolation method.
    • plt.colorbar(): Adds a color bar to indicate the mapping of values to colors.
    • Other lines have the same meaning as in the Line Plot.

    Hexbin Plot

    Use Case: Hexbin plots are used to visualize the distribution of a large number of points in a two-dimensional space. They are particularly useful when dealing with dense datasets and help identify patterns or clusters in the data.

    import matplotlib.pyplot as plt
    import numpy as np
    
    # Data
    x = np.random.randn(1000)
    y = np.random.randn(1000)
    
    # Plot
    plt.hexbin(x, y, gridsize=30, cmap='inferno')
    plt.colorbar()
    plt.xlabel('X-axis')
    plt.ylabel('Y-axis')
    plt.title('Hexbin Plot')
    plt.show()

    Explanation:

    • np.random.randn(1000): Generates 1000 random data points from a standard normal distribution.
    • plt.hexbin(x, y, gridsize=30, cmap='inferno'): Plots the hexbin plot with the given x and y values. gridsize determines the number of hexagons in the x-direction.
    • plt.colorbar(): Adds a color bar to indicate the mapping of values to colors.
    • Other lines have the same meaning as in the Line Plot.

    Each of these plots serves a specific purpose and can be chosen based on the type of data being visualized and the insights you want to derive from it.

    Share this post on social!

    Comment on Post

    Your email address will not be published. Required fields are marked *