How to Analyze Amazon Keywords with Python

For brands, Amazon provides the search term report in Brand Analytics to analyze the search and click behavior of Amazon's customers. This helps to operationalize your Amazon SEO activities.

In this post I want to show you how how to analyze this data using a standard tool like Python, Pandas and Jupyter Notebooks. With some lines of code, we demonstrate how to quickly find the most used search terms, seasonal search terms, a brand's performance, and much more.

To follow along you need:

A working installation of Jupyter Lab. We recommend that you install Anaconda.
Access to Brand Analytics to download the CSV version of the search term report

To make some interesting analyses it makes sense to not only load the current report but e.g. the 52 weeks of reports. You can download those as well in Brand Analytics. Takes some time, but it is doable.

Once you have it, we need to further make some preparations to load and work with it.

Ready? Let's go!

Setup

First things first. We need to load all necessary libraries:

# Setup
import pandas as pd
import numpy as np
import time

import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import seaborn as sns
import datetime
from datetime import timedelta
import glob

%config InlineBackend.figure_format='retina'
plt.rcParams["figure.dpi"] = 200

Let's also define some helper methods. These methods are not necessary if you would like to only analyze the data but we use them here to better show the results in this blog post.

# Get link to an ASIN
def get_ASIN_Link(ASIN, domain="US"):
    if domain == "DE":
      domain = "de"
    elif domain == "US":
      domain = "com"
    link = '<a href="https://www.amazon.' + domain + '/dp/' + ASIN + '" target="_blank">' + ASIN + '</a>'
    return link

# Define a function for colouring (red for negative, changes number format)
def highlight_max(s):
    if s.dtype == np.object:
        is_neg = [False for _ in range(s.shape[0])]
    else:
        is_neg = s < 0
    return ['color: red;' if cell else 'color:black' 
        for cell in is_neg]

Define some variables

First, let's create some variables we can use later on and make changes to our script easier:

# Common
currentWeek  = '33'
previousWeek = '32'
market       = 'US'
yearString   = '2021'

yearWeekCurrent = yearString + '-' + currentWeek
yearWeekPrevious = yearString + '-' + previousWeek

# Variables
pathToProducts = "/your/path/to/products.csv"
columnsInProducts = ['ASIN', 'marketplaceTitle', 'category', 'brand', 'productImageUrl', 'productTitle'] # US, adjust for other markets

pathToReports = "/your/path/to/brandanalyticsdata/" 
columnsInReports = ["Search Term","Search Frequency Rank","#1 Clicked ASIN","#2 Clicked ASIN","#3 Clicked ASIN"] # US, adjust for other markets
columnsInReportsSearchTerm = ["Search Term","Search Frequency Rank"] # US, adjust for other markets

thousandSeparator = "," # US, adjust for other markets

# For two weeks based analysis only:
currentWeekPath = pathToReports + "Amazon-Searchterms_US_2021_" + currentWeek + ".csv"
previousWeekPath = pathToReports + "Amazon-Searchterms_US_2021_" + previousWeek + ".csv"

Our script expects that your files should follow the following naming format:

...
Amazon-Searchterms_US_2021_27.csv
Amazon-Searchterms_US_2021_28.csv
Amazon-Searchterms_US_2021_29.csv
Amazon-Searchterms_US_2021_30.csv
Amazon-Searchterms_US_2021_31.csv
Amazon-Searchterms_US_2021_32.csv
Amazon-Searchterms_US_2021_33.csv
...

This allows us later to extract the marketplace, the year, and the calendar week from the filename. This data will be added later to our data frame to allow for time series analysis.

Here is a quick tip: As Amazon's search term reports are quite large (1 Mio. lines per week) you can shorten those via this quick shell script you can run e.g. in a bash shell:

for file in ./csv/Amazon*.csv; do
    head -n 500000 "$file" > "short/$file"
done

This will reduce each file to the first 500,000 lines and save a short version to the short folder. If you analyze e.g. 1 year of data this will reduce the speed of your analyses significantly.

Importing Brand Analytics search term data

First, we need to load all search term reports into memory. Again, the script expects that these files sit together in a single directory with some strict naming conditions:

# Initialze empty dataframe we'll append data to
dfFinal = pd.DataFrame()

# Get all files
all_files = sorted(glob.glob(pathToReports  + "/*.csv"))
print ('\n'.join(all_files))

# Import files to df
i = 0
for file in all_files:
  i = i + 1
  print(str(i) + ". " + file)

  # Read from CSV file
  dfTemp = pd.read_csv(file, thousands=thousandSeparator, usecols=columnsInReports, engine="python", error_bad_lines=True, encoding='utf-8', skiprows=1,  sep=",")

  # Add week from filename to dataframe as new column
  week = file[-6:][:2] # e.g. 06
  year = file[-11:][:4] # e.g. 2020
  yearWeek = year + '-' + week # e.g. 2020-06
  marketplaceTitle = file[-14:][:2]
  dfTemp['week'] = week
  dfTemp['year'] = year
  dfTemp['yearWeek'] = yearWeek
  dfTemp['marketplaceTitle'] = marketplaceTitle

  # Concat with previous results
  dfFinal = pd.concat([dfTemp, dfFinal])

# We rename ASIN1 into `1` as this becomes handy when we unmelt the report
dfFinal.columns = ['searchterm', 'rank', '1', '2', '3', 'week', 'year', 'yearWeek', 'marketplaceTitle']

# Change data type to int
dfFinal = dfFinal.astype({"week": int, "year": int, "year": int})

Dataframe with search term data

Some of the analyses shown below only require loading the current and the previous week of data. If you want to do that, you can just load the 2 respective files, e.g. like this:

dfCurrent = pd.read_csv(currentWeekPath, thousands=thousandSeparator, usecols=columnsInReportsSearchTerm, engine="python", error_bad_lines=True, encoding='utf-8', skiprows=1,  sep=",")

dfPrevious= pd.read_csv(previousWeekPath, thousands=thousandSeparator, usecols=columnsInReportsSearchTerm, engine="python", error_bad_lines=True, encoding='utf-8', skiprows=1,  sep=",")

The thousandSeparator parameter depends on your locale. It's either the dot or a comma. For US reports it is the comma.

We will need to unmelt all this data. More on that soon.

Before we also load some products data (e.g. the brand) for each ASIN found in all search term report files which will be handy to enrich the data later:

dfProducts = pd.read_csv(pathToProducts, engine="python", error_bad_lines=True, encoding='utf-8', sep=",")

If you don't have this data, you can extract this with some work from the product title of the search term reports. However, this is not covered here.

Re-formatting / shaping the data

Next, we need to bring our data into the right format. Basically, we need to "unpivot" the data we have which will make the following analyses possible.

Pandas provides a very handy function for that called melt (see documentation), which makes this very easy:

# Unmelt dataframe

# Create a copy
dfWideMultiReports = dfFinal

# Unmelt from wide to long
dfWideMultiReportsLong = dfWideMultiReports.melt(id_vars=["searchterm", "rank", "week", "year", "yearWeek", "marketplaceTitle"], var_name="position", value_name="ASIN")

# Add column with a 1 so we can sum by this colum
dfWideMultiReportsLong['count'] = 1

# Make position an int
dfWideMultiReportsLong = dfWideMultiReportsLong.astype({"position": int})

Unmelted data

Now we have one row for each search term, ASIN, week combination.

Next, we enrich our search term report data with the product data to e.g. have a column for the ASIN's brand:

# Left join products data to searchterm data
dfWideMultiReportsLongExtended = pd.merge(left=dfWideMultiReportsLong, right=dfProducts, how='left', left_on=['marketplaceTitle','ASIN'], right_on = ['marketplaceTitle','ASIN'])

# Drop rows where searchterm or brand is NaN
dfWideMultiReportsLongExtended = dfWideMultiReportsLongExtended.dropna(subset=['searchterm', 'ASIN', 'rank', 'week', 'year', 'yearWeek', 'marketplaceTitle', 'position', 'brand', 'productTitle']) # Only dop rows with N/A in specific columns

We also drop rows where we are missing some data using the dropna method.

The final dataframe we are going to work with looks like this now:

Extended data

Analysis of the data

Now as we have everything in place we can start doing some simple analysis. We will start by analyzing changes between the current and the previous week. As shown above we have this data in our dfCurrent and dfPrevious dataframes.

We will look at:

Search terms
Brands
Products

Search term analysis

Prepare the data

We need to merge our 2 dataframes containing the current and the previous week of data for some analyses.

# Change column names
dfCurrent.columns = ['searchterm', 'currentRank']
dfPrevious.columns = ['searchterm', 'previousRank']

# Make currentRank and previousRank an integer
dfCurrent = dfCurrent.astype({"currentRank": int})
dfPrevious = dfPrevious.astype({"previousRank": int})

# Merge current and previous dataframes
dfMerged = pd.merge(left=dfCurrent, right=dfPrevious, how='left', left_on=['searchterm'], right_on = ['searchterm'])
dfPrevious = dfPrevious.astype({"previousRank": int})

# Calculate the changes between current and previous
dfMerged['DeltaAbs']=dfMerged['currentRank']-dfMerged['previousRank']

# Positive value of Change (to see biggest movers in absolute terms)
dfMerged['DeltaAbsPos']=dfMerged['DeltaAbs'].abs()

With that out of the way, let's do some more fun stuff.

Get the top search terms of the current week

Let's start slow with some simple tables. Let's get the top search terms from the current week and show the difference to the previous week. We also add some colors based on the changes using TailwindCSS classes. However, this is not necessary for the sole analysis of the data. It just makes the following tables prettier.

# Get current Top 10 with previous
dfTop10Searchterms = dfMerged
dfTop10Searchterms = dfTop10Searchterms.drop('DeltaAbsPos', 1)

dfTop10Searchterms.columns = ['Searchterm', 'Rank cur.', 'Rank prev.', 'Change']
dfTop10Searchterms['Change'] = dfTop10Searchterms.apply(lambda row: '<span class="text-green-600">↑ ' + str(int(np.nan_to_num(row.Change))) + '</span>' if row.Change < 0 else '<span class="text-red-600">↓ ' + str(int(np.nan_to_num(row.Change))) + '</span>', axis=1)
searchTermsTop = dfTop10Searchterms.head(10).to_markdown(index=False)

The dfTop10Searchterms dataframe looks like this then:

Searchterm	Rank cur.	Rank previous	Change
disposable face masks	1	1	↓ 0
face mask	2	2	↓ 0
pop it	3	3	↓ 0
desk	4	5	↑ -1
iphone 12 pro max case	5	7	↑ -2
earbuds	6	6	↓ 0
n95 mask	7	4	↓ 3
apple watch band	8	11	↑ -3
black disposable face masks	9	24	↑ -15
laptop	10	15	↑ -5

The search term black disposable face masks has quite improved compared to the previous week. Remember, the lower the search frequency rank, the more often this term is searched on Amazon.

Get newcomer search terms

Now we can try to find the top search terms (based on search frequency rank) where there was no search frequency rank in the previous week:

# Get new entries where there was no rank in previous week
dfTopNewcomer = dfMerged[dfMerged['previousRank'].isnull()]
dfTopNewcomer = dfTopNewcomer.drop(['previousRank','DeltaAbs', 'DeltaAbsPos'], 1)
dfTopNewcomer = dfTopNewcomer.sort_values(by=['currentRank'], ascending=True).head(10)

dfTopNewcomer.columns = ['New searchterms', 'Rank cur.']
searchTermsNewcomer=dfTopNewcomer.head(10).to_markdown(index=False)

This will e.g. look like this:

New searchterms	Rank cur.
pixel 5a 5g case	2027
infinity game table	2459
cicy bell womens casual blazer	5234
donsfield lint remover	6945
go oats oatmeal in a ball	9887
ivernectin	10034
go oats	10367
ruthless empire rina kent	10711
cicy bell blazer	11057
pixel 5a 5g screen protector	11222

As you can see these are very often long-tail keywords.

Find search terms with the highest rank change

Now let's look at the "mover's and shakers" of high-traffic search terms. With "high-traffic" we mean that we reduce the result to search terms with a search frequency rank below 1,000. Feel free to play with this parameter to get different results.

# Only include searchterms which are now < 1000
filterWinners = (dfMerged['currentRank'] < 1000)

# Winners
dfWinners = dfMerged[filterWinners].sort_values(by=['DeltaAbs'], ascending=True).head(10)

# Drop DeltaAbsPos column
dfWinners = dfWinners.drop('DeltaAbsPos', 1)

# Rename columns
dfWinners.columns = ['Searchterm', 'Rank CW ' + currentWeek, 'Rank CW ' + previousWeek, 'Delta']

For the winners this will look like this:

Searchterm	Rank CW 33	Rank CW 32	Delta
black american flags	847	14,708	-13,861
paw patrol movie	523	4,817	-4,294
sexy dresses for women	762	2,924	-2,162
headphones for kids for school	869	2,460	-1,591
room decor aesthetic	825	2,373	-1,548
hunger games	996	2,412	-1,416
eyebrow pencil	480	1,859	-1,379
throw pillow covers	355	1,641	-1,286
bentgo kids lunch box	705	1,902	-1,197
respect	927	2,076	-1,149

Top movers

Let's find out for which high-traffic search terms (again with a search frequency rank below 1,000) the search frequency rank has changed most compared to the previous week. We include both positive and negative changes as we sort by the absolute changes here.

# Top Movers within a certain rank threshold
filterMovers = (dfMerged['previousRank'] < 1000) & (dfMerged['currentRank'] < 1000)
dfMerged = dfMerged.astype({"currentRank": float}) # Needs to be there otherwise currentRank column is string?!
dfMovers = dfMerged[filterMovers].sort_values(by=['DeltaAbsPos'], ascending=False).head(10)
dfMovers['DeltaAbs'] = dfMovers.apply(lambda row: '<span class="text-green-600">↑ ' + str(int(np.nan_to_num(row.DeltaAbs))) + '</span>' if row.DeltaAbs < 0 else '<span class="text-red-600">↓ ' + str(int(np.nan_to_num(row.DeltaAbs))) + '</span>', axis=1)

# Drop DeltaAbs column
dfMovers = dfMovers.drop('DeltaAbsPos', 1)

# Rename columns
dfMovers.columns = ['Searchterm', 'Rank CW ' + currentWeek, 'Rank CW ' + previousWeek, 'Delta']

The result looks like this:

Searchterm	Rank CW 33	Rank CW 32	Delta
field of dreams	998	196	↓ 802
friday the 13th	902	210	↓ 692
pop it fidget toys	241	929	↑ -688
n95 masks for virus protection	911	395	↓ 516
wireless earbuds bluetooth	302	785	↑ -483
college essentials	875	454	↓ 421
fall clothes for women	297	678	↑ -381
air conditioner	583	206	↓ 377
usb c cable	618	991	↑ -373
iphone charging cords	949	594	↓ 355

Of course, you can change the thresholds as you like.

Brand analyses

If we have the brand attached to each entry, we can also analyze by brand. Here we use our big data frame instead of the 2 separate weeks.

Let's find out which brands show up most often:

filtCurrentWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekCurrent)
filtPreviousWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekPrevious)

dfBrandCurrentWeek = dfWideMultiReportsLongExtended[filtCurrentWeek]
dfBrandPreviousWeek = dfWideMultiReportsLongExtended[filtPreviousWeek]

dfTopBrandsCurrentWeek = dfBrandCurrentWeek['brand'].value_counts().sort_values(ascending=False).to_frame('countCurrent')
dfTopBrandsPreviousWeek = dfBrandPreviousWeek['brand'].value_counts().sort_values(ascending=False).to_frame('countPrevious')

brandsTop = dfTopBrandsCurrentWeek.head(10).to_markdown().replace('countCurrent', 'Count')

The brandsTop dataframe will look like this:

	Count
Amazon Basics	9670
Apple	5620
Nike	5469
365 by Whole Foods Market	3577
adidas	3546
SAMSUNG	3171
Amazon	3079
HP	3002
Generic	2809
LEGO	2737

We just count by a brand's appearance in the search term report. You could include the search frequency rank to give search terms with a low rank a higher weight. This was not done here.

Let's now find out which brands have improved compared to the previous week. Again we do a plain count here:

dfTopBrandsMerged = dfTopBrandsCurrentWeek.merge(dfTopBrandsPreviousWeek, how='left', left_index=True, right_index=True)
dfTopBrandsMerged['DeltaAbs'] = dfTopBrandsMerged['countCurrent'] - dfTopBrandsMerged['countPrevious']
dfTopBrandsMerged['DeltaAbsPos'] = dfTopBrandsMerged['DeltaAbs'].abs()

# Winners and Losers
dfBrandLoosers = dfTopBrandsMerged.sort_values(by=['DeltaAbs'], ascending=True).head(10)

dfBrandWinners = dfTopBrandsMerged.sort_values(by=['DeltaAbs'], ascending=False).head(10)
dfBrandWinners.columns = ['# CW ' + currentWeek, '# CW ' + previousWeek, 'Delta', 'Delta Abs']
dfBrandWinners = dfBrandWinners.drop('Delta Abs', 1)
brandsWinner = dfBrandWinners.to_markdown()

The brandsWinner dataframe will look like this:

	# CW 33	# CW 32	Delta
Amazon	3079	2833	246
SAMSUNG	3171	2939	232
Bentgo	814	638	176
HP	3002	2863	139
Disguise	820	690	130
Texas Instruments	718	589	129
DXLOVER	249	131	118
Generic	2809	2696	113
Logitech	1208	1095	113
Spooktacular Creations	293	182	111

Product analyses

What we did with search terms and brands can also be done with products. Let's look for the products which show up most or more often compared to the previous week. We will use our full data set again here.

First, we'll filter the big dataset, then we'll merge the current and the previous week, and then we'll count the number of occurrences.

filtCurrentWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekCurrent)
filtPreviousWeek = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekPrevious)

dfProductCurrentWeek = dfWideMultiReportsLongExtended[filtCurrentWeek]
dfProductPreviousWeek = dfWideMultiReportsLongExtended[filtPreviousWeek]

dfTopProductsCurrentWeek = dfProductCurrentWeek['ASIN'].value_counts().sort_values(ascending=False).to_frame('countCurrent')
dfTopProductsPreviousWeek = dfProductPreviousWeek['ASIN'].value_counts().sort_values(ascending=False).to_frame('countPrevious')

# ASIN is index, create column from it, add marketplaceTitle for making the merge possible
dfTopProductsCurrentWeek['ASIN'] = dfTopProductsCurrentWeek.index
dfTopProductsCurrentWeek['marketplaceTitle'] = market

dfTopProductsPreviousWeek['ASIN'] = dfTopProductsPreviousWeek.index
dfTopProductsPreviousWeek['marketplaceTitle'] = market

# Add product data, e.g. product title
dfTopProductsCurrentWeek = pd.merge(left=dfTopProductsCurrentWeek, right=dfProducts, how='left', left_on = ['marketplaceTitle', 'ASIN'], right_on = ['marketplaceTitle', 'ASIN'])

# Merge current with previous df
dfTopProductsMerged = dfTopProductsCurrentWeek.merge(dfTopProductsPreviousWeek, how='left', left_on = ['marketplaceTitle', 'ASIN'], right_on = ['marketplaceTitle', 'ASIN'])

# Calculate the changes
dfTopProductsMerged['DeltaAbs'] = dfTopProductsMerged['countCurrent'] - dfTopProductsMerged['countPrevious']
dfTopProductsMerged['DeltaAbsPos'] = dfTopProductsMerged['DeltaAbs'].abs()

# Shorten the product title and add ASIN
dfTopProductsMerged['productTitleShort'] = dfTopProductsMerged['productTitle'].astype(str).replace('|', '-').str[0:30] + ' (' + get_ASIN_Link(dfTopProductsMerged['ASIN']) + ')'

# Get rid of some columns and prepare output
dfTopProductsForOutput = dfTopProductsMerged[['productTitleShort', 'countCurrent']]
dfTopProductsForOutput.columns = ['Product title', 'Count']

productsTop = dfTopProductsForOutput.head(10).to_markdown(index = False)

# Winners
dfProductsWinners = dfTopProductsMerged.sort_values(by=['DeltaAbs'], ascending = False)
dfProductsWinners = dfProductsWinners[['productTitleShort', 'DeltaAbs' ]]
dfProductsWinners.columns = ['Product title', 'Delta']
productsWinner = dfProductsWinners.head(10).to_markdown(index=False)

# Loosers
dfProductsLoosers = dfTopProductsMerged.sort_values(by=['DeltaAbs'], ascending = True)
dfProductsLoosers = dfProductsLoosers[['productTitleShort', 'DeltaAbs' ]]
dfProductsLoosers.columns = ['Product title', 'Delta']
productsLoser = dfProductsLoosers.head(10).to_markdown(index=False)

The first rows of productsTop look like this then:

Product title	Count
EnerPlex 3-Ply Reusable Face M (B088C2WD5F)	287
Matein Travel Laptop Backpack, (B06XZTZ7GB)	226
Blink Mini – Compact indoor pl (B07X6C9RMF)	219
Gildan Men's Crew T-Shirts, Mu (B091FNSFN1)	210
Crocs Men's and Women's Classi (B09BBT7T3P)	207
AIHOU 50PC Kids Butterfly Disp (B08LL8H78D)	195
Apple AirPods Pro (B07ZPC9QD4)	195
Beckham Hotel Collection Bed P (B01LYNW421)	179
Apple AirPods with Charging Ca (B07PXGQC1Q)	177
TCP Global Salon World Safety (B08FXLVHTY)	176

Again we only count the appearance. We have not weighted this by the search frequency rank of the respective keywords.

Respectively productsWinner looks like this:

Product title	Delta
KATCHY Indoor Insect and Flyin (B07B6RZP4H)	160
Texas Instruments TI-84 Plus C (B00TFYYWQA)	64
5 Layer Protection Breathable (B08F8Y3X59)	59
Columbia Baby Girls' Benton Sp (B077FNDN8N)	55
Fire 7 Kids Edition Tablet, 7" (B07H936BZT)	54
BISSELL Crosswave All in One W (B01DTYAZO4)	51
$20 PlayStation Store Gift Car (B004RMK4BC)	50
Germ Guardian True HEPA Filter (B004VGIGVY)	50
Texas Instruments TI-84 PLUS C (B00XOLOOPY)	47
SPACE JAM Heroes of Goo JIT Zu (B08S3GWM5W)	44

Let's now find out which are the hottest newcomer products and which products showed high changes. The code looks for this looks like this:

# Best newcoming products
filtBothWeeks = (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekCurrent) | (dfWideMultiReportsLongExtended['yearWeek'] == yearWeekPrevious)
dfProductCurrentWeek = dfWideMultiReportsLongExtended[filtBothWeeks]

df_pivot_current = pd.pivot_table(dfProductCurrentWeek,index=["ASIN"], columns=["yearWeek"], values=["searchterm", "rank"],aggfunc={"searchterm":len,"rank":min})
df_pivot_current['marketplaceTitle'] = market
df_pivot_current.columns= ['minRankPrevious', 'minRankCurrent', 'countPrevious', 'countCurrent', 'marketplaceTitle']
df_pivot_current.reset_index(inplace = True)

df_pivot_current.sort_values(by=['countCurrent'], ascending = False)

df_pivot_current['countDelta'] = df_pivot_current['countCurrent'] - df_pivot_current['countPrevious']
df_pivot_current.sort_values(by=['countDelta'], ascending = False)

# Add product data, e.g. product title
df_pivot_current_ext = pd.merge(left=df_pivot_current, right=dfProducts, how='left', left_on = ['marketplaceTitle', 'ASIN'], right_on = ['marketplaceTitle', 'ASIN'])

# Shorten the product title and add ASIN
df_pivot_current_ext['productTitleShort'] = df_pivot_current_ext['productTitle'].astype(str).replace('|', '-').str[0:30] + ' (' + get_ASIN_Link(df_pivot_current_ext['ASIN'], market) + ')'

# Products with most change between current an prev. week
df_pivot_current_ext_short = df_pivot_current_ext[['productTitleShort', 'countCurrent', 'minRankCurrent', 'countDelta']]
df_pivot_current_ext_short = df_pivot_current_ext_short.sort_values(by=['countDelta'], ascending = False)
df_pivot_current_ext_short.columns = ['Product title', '# searchterms', 'Best rank', 'Delta']

productsNewcomer = dfTopProductNewcomer.head(10).to_markdown(index=False)

Let's look at the movers and shakers first:

Product title	# searchterms	Best rank	Delta
KATCHY Indoor Insect and Flyin (B07B6RZP4H)	174	775	160
Texas Instruments TI-84 Plus C (B00TFYYWQA)	82	1914	64
5 Layer Protection Breathable (B08F8Y3X59)	92	1276	59
Columbia Baby Girls' Benton Sp (B077FNDN8N)	58	2817	55
Fire 7 Kids Edition Tablet, 7" (B07H936BZT)	66	1387	54
BISSELL Crosswave All in One W (B01DTYAZO4)	91	5223	51
Germ Guardian True HEPA Filter (B004VGIGVY)	98	149	50
$20 PlayStation Store Gift Car (B004RMK4BC)	66	6829	50
Texas Instruments TI-84 PLUS C (B00XOLOOPY)	65	1914	47
SPACE JAM Heroes of Goo JIT Zu (B08S3GWM5W)	52	1277	44

Let's figure out which are the best newcomer products:

# Best Newcomer Products (without rankings in previous week)
dfTopProductNewcomer = df_pivot_current_ext[df_pivot_current_ext['countPrevious'].isnull()]
dfTopProductNewcomer = dfTopProductNewcomer.sort_values(by=['countCurrent'], ascending = False)
dfTopProductNewcomer['productTitleShort'] = dfTopProductNewcomer['productTitle'].astype(str).replace('|', '-').str[0:30] + ' (' + get_ASIN_Link(dfTopProductNewcomer['ASIN'], market) + ')'

dfTopProductNewcomer = dfTopProductNewcomer[['productTitleShort', 'countCurrent', 'minRankCurrent']]
dfTopProductNewcomer = dfTopProductNewcomer.sort_values(by=['countCurrent'], ascending = False)
dfTopProductNewcomer.columns = ['Product title', '# searchterms', 'Best rank']

productsNewcomer looks like this:

Product title	# searchterms	Best rank
adidas Unisex-Child Questar Fl (B08TK7P6CN)	160	1966
Kids KN95 Face Mask - 25 Pack (B08Q8J5MNQ)	142	844
Crocs Unisex-Child Kids' Class (B08LDBPB3J)	50	2595
TeeTurtle - The Original Rever (B088X4XFNQ)	47	10620
adidas Women's Grand Court Sne (B09746H9SC)	45	588
Kids Face Mask, Disposable Kid (B08R8CHZN3)	40	17142
Potaroma Cat Wall Toys, 7.7 by (B08T9CL5T9)	31	16802
MIAODAM Face Mask Lanyards, Ma (B09CDDP2Z3)	28	279
KN95 Particulate Respirator - (B0876J4F4Y)	27	235
Earbuds Earphones Wired Stereo (B098L2DSS3)	26	3532

Additional Amazon Keyword and Brand Analyses

We can do so much more with this data. Let me show you some examples:

Single keyword analysis by week

Once we have loaded the data for e.g. the last 52 weeks we can plot the development of a single search term over time:

searchTerm = "pool"

# Filter data
dfSingleKeyword = dfFinal.loc[dfFinal['searchterm'] == searchTerm].sort_values('week')

# Plot the data
kwPlot = dfSingleKeyword.plot(x ='week', y='rank', figsize=(10,6), title="Keyword: " + searchTerm)
kwPlot.set_xlabel("Weeks")
kwPlot.set_ylabel("Rank")
kwPlot

This will give us a better understanding of how the demand for certain products behaves over time.

Here is a graph for the term "pool" shown above.

Graph for the search term "pool" in 2021

It's easy to see when the pool season started and ended in 2021.

Show generic search terms a specific brand ranks for

Here we search for all search terms the brand "OPPO" ranks for but does not include the brand's name. Sorted by rank ascending.

Here we assume that the data for a single brand analytics report sits in the dataframe called df:

# Show searchterms for a given brand and sort by rank ASC
myFilter = (df['brand'] == 'OPPO') & (df["Searchterm"].str.contains('oppo')==False)
df.loc[myFilter].sort_values(by=['Rank'])

This should give us generic search terms.

This of course can also be done for multiple brands across multiple reports:

brands = ["Kasa Smart", "ARENTI", "GW Security", "Blink Home Security"]
brandsLowerCase = [x.lower() for x in brands]

# Get all rows from selected brands
dfBrandTerms = dfWideMultiReportsLongExtended[dfWideMultiReportsLongExtended.brand.isin(brands)]

# Group result
dfBrandTermsGrouped = dfBrandTerms.groupby('searchterm')['rank'].agg(['mean']).sort_values(by=['mean'])

# Reset index
dfBrandTermsGrouped.reset_index(level=0, inplace=True)
dfBrandTermsGrouped = dfBrandTermsGrouped.astype({"mean": int})

The dfBrandTermsGrouped dataframe now looks like this:

Top keywords for selected security camera brands

If we only want to find generic keywords, let's remove the search terms which include a brand's name:

brandKeywordsWithoutBrand = dfBrandTermsGrouped[~dfBrandTermsGrouped.searchterm.str.contains('|'.join(brandsLowerCase))]

Want to export both all and non-brand keywords to Excel? Pandas has you covered:

with pd.ExcelWriter('camera-export-brand-keywords.xlsx') as writer:  
    brandKeywordsWithoutBrand.to_excel(writer, sheet_name='Generic Keywords', float_format="%.2f")
    dfBrandTermsGrouped.to_excel(writer, sheet_name='All Keywords', float_format="%.2f")

Classify search terms

We could even go one step further and try to classify this for every search term in the report using this function.

To do this, we first need to put all brands into a list:

# Get unique list of brands
dfWideMultiReportsLongExtended['brand'].nunique()
allBrands = dfWideMultiReportsLongExtended.brand.unique()

# Lower each name
allBrands = [ x.lower() for x in allBrands if len(str(x)) > 3 ]

Now we can analyze each row in our dataframe and check if the search term contains the brand's brand name, the brand name of a different brand, or is a generic searchterm.

def searchTermType (row):
    brand = str(row['brand']).lower()
    searchterm = str(row['searchterm']).lower()
    
    if (not pd.isnull(brand)) & (brand in searchterm):
        return 'brand'
    if not pd.isnull(brand):
        for brand2 in allBrands:
            if not pd.isnull(brand2):
                if brand2.lower() in searchterm:
                    return 'other-brand'
    return 'generic'

This function checks, if the search term contains any brand's name.

To apply this function to a small sample of dataframe we need to run the following command:

dfPlay = dfWideMultiReportsLongExtended.head(10)
dfPlay['searchTermType'] = dfPlay.apply(searchTermType, axis=1)

Show a brand's performance over time

dfAll = dfWideMultiReportsLongExtended
brandsInScope = ["Nike", "adidas", "Under Armour"]

# Create pivot table
dfPivot = pd.pivot_table(dfAll[dfAll['brand'].isin(brandsInScope)],index=["yearWeek"], columns=["brand"], values=["ASIN"], aggfunc='count', fill_value=0)

# Plot data
plt = dfPivot.plot(fontsize=4, figsize=(4, 2))
plt.legend(loc=7, prop={'size': 3}, bbox_to_anchor=(0.23,0.90))
plt.set_ylabel('Count', fontsize=5)
plt.set_xlabel('Week', fontsize=5)

The chart looks like this:

Development of specific brands over time

Conclusion

And there you have it. Although we covered quite some ground there is so much more you can do with this data.

I hope we could inspire you with some new ideas to try out.

You can find the full source code of the code above on Github. This of course excludes the search term data as this is only availabe to registered brand owners.

Analyze the Amazon Brand Analytics search term report with Python and Jupyter Notebooks