Skip to main content

Using Text Extraction Rules

AddSearch automatically excludes certain page elements like sidebars, headers, and footers from the search index because these elements usually contain content that is not unique or relevant for search results.

Text extraction rules let you override these defaults by including elements that AddSearch normally excludes or by excluding additional elements you do not want in your search index. For example, you can include a sidebar with important information, or exclude popups and cookie consent messages that appear after page load.

These rules are configured in the AddSearch dashboard and do not require any changes to your website's source code. Alternatively, you can also use special HTML attributes in your page templates to control indexing.


How to configure text extraction rules

Follow these steps to include or exclude elements from the AddSearch index:

  1. Identify the CSS selector of the element you want to include or exclude. For guidance, see Identifying CSS selectors.
  2. Create a text extraction rule in the AddSearch dashboard using the identified selector.
  3. Recrawl the affected pages to apply the changes to the search index.

Setting text extraction rules in the AddSearch dashboard

  1. Log in to your AddSearch Account.
  2. Navigate to Domains and crawling > Text extraction rules.

To include an element in the index

  1. Click Add new text extraction rule +.
  2. Paste the CSS selector into the input field.
  3. Choose Include from the rule type dropdown.
  4. Click Save to apply the rule.

To exclude an element from the index

  1. Click Add new text extraction rule +.
  2. Paste the CSS selector into the input field.
  3. Choose Exclude from the rule type dropdown.
  4. Click Save to apply the rule.

Recrawling to update the index

After adding or modifying text extraction rules, recrawl your website to update the search index:


Using text extraction rules helps improve your search results by fine-tuning which parts of your pages are indexed and which are ignored.