SyncFlo AI
Data & Knowledge

Knowledge Sources

Train your WhatsApp AI agent using web pages, documentation files, and manual Q&A pairs. Manage how context is ingested, vectorized, and retrieved during customer chats.

Vector Ingestion & Retrieval (RAG)

SyncFlo utilizes Retrieval-Augmented Generation (RAG) to keep your WhatsApp agent accurate and aligned. When you upload or link any source, SyncFlo automatically chunks the text, computes high-dimensional vector embeddings, and indexes them in a secure vector database.

When a customer asks a question on WhatsApp, the system performs a semantic vector search to retrieve the most relevant text chunks, feeds them to the LLM as verified context, and generates a factual response. This prevents hallucinations and limits answers to authorized facts only.

Supported Source Types

Files & Documents

Upload physical documents to train the AI on user manuals, catalogs, terms of service, or company profiles.

  • Formats: PDF, DOCX, TXT, MD.
  • Maximum file size: 10MB per file.
  • Requires selectable text (no scanned images without OCR).

Websites & URLs

Point the crawler to your web domain to scrape product listings, help centers, blog posts, or landing pages.

  • Automated link discovery.
  • Crawl frequency controls.
  • Domain restrictions to prevent external crawling.

Q&A Pairs

Manually define specific query-response matching parameters for maximum precision control.

  • Overrides broad files.
  • Matches exact brand voice.
  • Perfect for sensitive policy topics or pricing objections.

Configuring Web Crawlers

Web crawler sources allow SyncFlo to regularly fetch text from public URLs to keep knowledge fresh. When setting up a website source, you can define how often it updates and which pages it scans.

Crawl Frequencies

  • Manual Only: Ingestion runs once during creation. Subsequent crawls must be triggered manually using the Refresh button in the source list.
  • Weekly: SyncFlo automatically recrawls and updates vector embeddings every 7 days at midnight.
  • Monthly: Re-scrapes the site every 30 days to check for text updates or new URL branches.

Crawl Configuration Best Practices

  1. Provide direct, canonical URLs (e.g. https://docs.syncflo.ai).
  2. Ensure the page is public and does not require cookie agreements or login credentials.
  3. Exclude heavy media domains to prevent crawler overhead.
  4. Use structured site maps (sitemap.xml) for fast indexing.

AI Q&A Generator

Automate Knowledge Base Training

Writing hundreds of manual Q&A pairs can be exhausting. SyncFlo's AI Q&A Generator extracts potential questions and answers directly from your uploaded files or crawled website text.

How It Works:

  1. Upload a detailed document (like an Employee Handbook or Product Spec Sheet) and wait for vector processing.
  2. In the sources list table, locate the source and click the ✨ Generate action button.
  3. SyncFlo initiates a background LLM pipeline to read the document chunks and construct 20 to 50 logical Q&A candidate pairs that customers are likely to ask.
  4. These generated pairs appear in the Q&A Pairs list, pre-marked with a
    New Suggestion
    badge.
  5. Review, modify, or approve the suggestions. Approved pairs are instantly saved with 100% retrieval confidence.

Source Status & Lifecycle

Monitor the health and index status of each source from the dashboard:

Active

The source is parsed and vector chunks are fully indexed. The AI will retrieve matching context from this source during conversation lookups.

Inactive

The source is saved but disabled. Chunks remain stored, but the retrieval layer will ignore this source when answering user questions.

Processing

The document is being parsed, cleaned, and split into chunks. Vector embeddings are currently being generated and saved.

Error

Ingestion failed. This usually occurs when the PDF format is corrupt, password-protected, contains no readable text (scanned image), or when crawling is blocked.

Content Best Practices

DO INCLUDE:

  • Clear Q&A formats for common customer questions.
  • Accurate policies regarding returns, shipping, and refunds.
  • Detailed catalogs, properties, or product parameters.
  • Specific instructions on when to transfer chats to human support agents.

DO NOT INCLUDE:

  • API keys, passwords, or secure network tokens.
  • Outdated pricing packages or expired campaigns.
  • Confidential business files or customer personal information.
  • Non-text media formats (like images, ZIPs, or spreadsheets).