Crawler

Meet PlexviaBot.

PlexviaBot is the crawler Plexvia uses to read a business’s own public website, only when that business asks us to learn from it. This page explains how it behaves, and how to authorize or block it.

Website ingestion crawler
What it is

A crawler that only reads sites it was invited to.

When someone sets up Plexvia for their business and asks us to learn from their website, PlexviaBot visits that site and reads its public pages, the same pages any visitor can see. It uses what it finds to help Plexvia answer questions grounded in the business’s own knowledge.

PlexviaBot only ever visits a site because the business that owns it requested website ingestion inside Plexvia. It does not crawl the open web, and it does not follow links off to other domains.

How to identify it

One user agent, one address.

PlexviaBot always identifies itself the same way. Use these exact values to recognise it in your logs, or to allow or block it in your security tools.

User agent
PlexviaBot/1.0 (+https://plexvia.com/crawler)
Crawler IP address
34.198.249.238

PlexviaBot requests only public pages such as your homepage, robots.txt, and sitemap over standard HTTPS.

How it behaves

Gentle by design.

PlexviaBot is built to read enough to be useful and nothing more. It is deliberately quiet.

A light touch

PlexviaBot makes only a few requests per site during a setup, then steps away. It is not a continuous or high-volume crawler.

Short, patient timeouts

Each request waits only a few seconds and gives up quietly if a page is slow, so it never lingers or hammers your server.

Respects robots.txt

PlexviaBot reads your robots.txt first and honours any rule you set for it, including a full disallow.

Authorize PlexviaBot

If a security layer is blocking it.

Many sites sit behind a CDN, a web application firewall, or a security plugin that challenges or blocks automated requests. That is good practice, but it can also stop PlexviaBot from reading pages you actually want Plexvia to learn from. To let it through, add an allow rule for both of these:

  • Allow the IP address

    Allowlist 34.198.249.238 in your CDN, WAF, or security plugin.

  • Allow the user agent

    Permit requests identifying as PlexviaBot, and let them reach your public pages.

The exact steps differ by provider, whether you use Cloudflare, AWS, Fastly, Akamai, Sucuri, Wordfence, or another tool. When PlexviaBot is blocked, Plexvia shows you the precise, per-provider instructions right inside the app, tailored to what it detected on your site. You never have to disable security for everyone else, only allow this one crawler.

You can confirm PlexviaBot reaches your site with a single request:

curl -I -A "PlexviaBot/1.0 (+https://plexvia.com/crawler)" https://yourdomain.com/
Prefer not to be crawled

Blocking PlexviaBot is just as easy.

If you would rather Plexvia did not read your site, you are always in control. The simplest way is a rule in your robots.txt, which PlexviaBot honours:

User-agent: PlexviaBot
Disallow: /

You can also disallow the PlexviaBot user agent, or the IP 34.198.249.238, in your CDN or WAF. Once website ingestion is blocked, Plexvia simply continues without learning from your site.

Blocking PlexviaBot never affects the rest of Plexvia, your inbox, chat, and team keep working exactly as before.

Everything, in one machine-readable file.

PlexviaBot’s name, user agent, IP address, and contact are also published as JSON, so tools and teams can verify them automatically.

Questions about a request from PlexviaBot? Reach us any time, or through the contact page.