Yes, and the architecture is straightforward once you stop trying to make Copilot Studio do it. You mirror authoritative documents from SharePoint, OneDrive, or Azure Blob into a vector index the agent owns, then let Microsoft Graph webhooks and Event Grid notifications drive incremental updates. Edits in the source propagate to the agent in single-digit minutes. No re-uploads.
Most teams hit this problem after they have already tried the obvious paths. Copilot Studio's SharePoint connector still does not automatically refresh when files change, a limitation Microsoft confirmed in July 2025 that has not been fixed at the time of writing. Azure AI Search can reach SharePoint through an indexer, but the indexer cannot sit behind Conditional Access, has only basic ACL support in preview, and requires you to build the agent layer yourself. The pattern this guide describes uses Retell AI as the voice layer because its knowledge base API accepts authenticated pushes from your own sync worker, which sidesteps every limitation Microsoft's first-party path imposes.
A voice agent that answers from a curated mirror of your private corpus, with edits propagating end to end in five to fifteen minutes and a daily reconciliation pass that catches anything the event stream loses.
By the end, your stack will:
Before you start, you'll need:
Sites.Selected is preferred, Sites.Read.All is the wider fallback)Because the tools were designed for a different job. Copilot Studio was built to summarize content for a human reading a screen, not to feed a voice agent that has under one second to compose a spoken answer. Azure AI Search's SharePoint indexer was built for enterprise search, where a four-to-six-hour refresh window is fine. Neither product was designed around the assumption that a billing policy edited at 9 a.m. needs to be the answer a caller hears at 9:08 a.m.
Three constraints separate voice from text. The first is latency budget. A retrieval call has to return in under 100 milliseconds during a live conversation, so the index has to be on the same network as the agent, and the chunks have to be small enough to embed inline without blowing the LLM's context window. The second is failure tolerance. When a chatbot returns the wrong link, the user clicks again. When a voice agent quotes a retired policy on a recorded compliance call, you have a problem with audit logs attached to it. The third is freshness expectation. Sales teams adjust pricing on Tuesdays. Support teams ship policy updates after a Friday incident review. The agent that quotes last week's number is worse than no agent at all.
This is why the architecture splits the source of truth from the retrieval layer. SharePoint stays canonical. The vector index is a derived artifact that the sync pipeline keeps current. Failure modes become observable instead of silent.
Pick by where the document is authored, not by where it's easiest to wire up. Each source signals something different about how the content gets maintained.
SharePoint document libraries are the right primary source when ownership is collaborative and content evolves through committee review. Service catalogs, policy manuals, internal wikis, and sales playbooks tend to live here, with version history, comments, and approval flows attached. The cost is metadata sprawl: a typical site contains four versions of the same policy, two retired drafts, and someone's screenshot folder.
OneDrive is rarely the right primary source for a voice agent. Content tied to one person's account walks out the door when that person leaves. Use OneDrive only as a staging area where individual contributors author drafts that get promoted to a SharePoint library after review.
Azure Blob containers are the right source when documents are produced by upstream systems. Generated PDFs from a billing pipeline, contracts dropped by a CLM tool, statements from an export job, and transcripts from a recording system all belong in Blob. The volume is higher, the freshness is harder to fake, and the file naming follows whatever convention the producing system enforces, which makes mirror-and-sync simpler than SharePoint's free-form structure.
Most enterprise teams sync both. SharePoint feeds the policy and product side, Blob feeds the operational and machine-generated side, and the voice agent's knowledge base merges both behind a single retrieval API.
Register an application in Microsoft Entra ID, request application permissions, and have a tenant admin grant consent.
Application permissions run under a service principal. The worker keeps running through the night without a logged-in user, and the token is consistent across calls. Delegated permissions, by contrast, tie access to a real user account and force a re-auth roughly every 75 minutes per the security libraries Microsoft now uses. Delegated permissions also cannot preserve document-level ACLs, as Microsoft's own SharePoint indexer documentation notes, which becomes a problem the moment compliance asks who can hear what.
Scope is where most teams overshare. Granting Sites.Read.All lets the app read every site in the tenant. It is fast at setup and indefensible in an audit. The tighter alternative is Sites.Selected, where a tenant admin pre-authorizes the app against specific site IDs only. The worker sees the libraries the agent needs and nothing else. Use Sites.Selected from day one. Backfilling site-level scoping after the fact requires re-consent and usually a security review you didn't budget for.
For Azure Blob, assign the same service principal the Storage Blob Data Reader role on the specific container. Container-level scope beats account-level scope for the same reason.
Combine two Microsoft Graph primitives. Webhooks tell you when something happened. Delta queries tell you exactly what changed.
A webhook subscription is a POST to /subscriptions with a resource path pointing at the drive (for example, /sites/{site-id}/drive/root), a changeType of updated, and a notificationUrl aimed at your worker's HTTPS endpoint. The notification body is intentionally thin. It carries the resource ID and the change type, nothing more. This is by design. The worker uses the notification as a signal to call the delta endpoint, where the actual payload lives.
The delta query is /sites/{site-id}/drive/root/delta. On first run, no token, you get a full enumeration plus an opaque @odata.deltaLink. Persist that link verbatim. On every subsequent run, replay it and Graph returns only the items added, modified, renamed, or deleted since the last call. Microsoft's scan guidance is explicit: webhooks plus delta is the recommended pattern for large libraries, because pure polling will get you throttled, and pure webhooks will lose data if the endpoint is slow.
Two pieces of folklore worth knowing. The same item can appear more than once in a delta page, by design, because Graph expands folder hierarchies and merges concurrent changes. When duplicates appear, take the last occurrence. Subscriptions also expire. Renew at 75% of the maximum lifetime, not at the deadline. A renewal job that fails silently is the single most common reason a previously-working sync starts drifting onto stale content, and you find out when a customer complains.
Use Event Grid for real-time notifications and the change feed for batch reconciliation. They solve different problems and the production answer is to run both.
Event Grid pushes events the moment a blob is created, replaced, or deleted. Subscribe at the storage account, filter on eventType for Microsoft.Storage.BlobCreated and Microsoft.Storage.BlobDeleted, and route to your worker. For Azure Data Lake Storage Gen2, add a filter on the FlushWithClose API call. This ensures the event fires only after the blob is fully committed. Skip this and you'll process partial uploads, which produces ingestion errors that look like file corruption but aren't.
The change feed is the ordered, durable log behind the events. Per Microsoft's documentation, it provides a guaranteed transaction log persisted as Avro files in $blobchangefeed/log/, written within a few minutes of each change. Event Grid is best-effort and can drop notifications under load. The change feed cannot. Run a daily job that walks the change feed and reconciles against your knowledge base manifest, and you have a safety net underneath the real-time path.
The combination matters. Event Grid alone is fast but lossy. The change feed alone is reliable but slow. Together they give you minute-scale freshness on the happy path and full consistency by morning on the unhappy path.
The worker downloads the file, normalizes it, and pushes it to the platform API. Three operations: create, update, delete. Rename collapses to delete-plus-create.
The Retell AI knowledge base accepts a long list of document formats including PDF, DOCX, PPTX, XLSX, CSV, TSV, TXT, MD, HTML, RTF, ODT, EPUB, plus message formats and several image types. The constraints worth knowing: 50MB per file, 25 files per base, and 1,000 rows by 50 columns for spreadsheets. Markdown ingests the cleanest when you control the source format, which is why teams often run a normalization step that converts authored Word documents to Markdown before pushing.
When a file count grows past 25, split bases by domain rather than by department. A billing agent linked to "billing-policies-en", "service-catalog-2026", and "exception-cases" retrieves cleanly across all three because retrieval similarity is computed per chunk, not per base. Splitting by department, on the other hand, creates the wrong boundaries. The same caller question often spans two departments, and the agent will retrieve from only one of them.
For the operational side, idempotency is what saves you when the same notification arrives twice. Hash each file's content and use the hash as the document identifier. A duplicate notification for an unchanged file produces a no-op rather than a duplicate index entry.
A pure text extractor will silently lose 30 to 40 percent of the meaning in a real PowerPoint or financial workbook. The agent will then confidently quote the surviving 60 percent, including the parts that no longer make sense without the table they came from.
PowerPoints encode information in slide layout, table cells, image-based callouts, and speaker notes. Excel encodes meaning in column headers that span merged cells, in formulas that reference other tabs, and in tab order. Naive text extraction returns a flat string of words with the structure stripped. Two paths survive in production.
The first is layout-aware parsing. Tools like Unstructured, Azure Document Intelligence, or LlamaParse preserve table cells and slide structure as Markdown. They are cheaper per document and predictable in output. The downside is they handle tables well but charts poorly.
The second, which has gained traction since mid-2025, is image-based extraction. Render each slide or sheet to an image and pass it through a vision-capable LLM that returns Markdown. The output recovers tables, charts, and visual callouts that text extractors miss. The cost is higher per document and slower, which makes this the right path for the documents that actually matter and the wrong path for bulk ingestion of everything in a SharePoint site.
The decision rule that holds up: route policy documents through the cheap layout-aware path, route a small set of high-value visual documents (one-pagers, slide decks executives reference, the rate cards your sales team actually uses) through the image-based path. Don't try to use one approach for everything.
Recursive character splitting at 512 tokens with 10 to 20 percent overlap, three chunks retrieved at default similarity. Tune from there based on your own call data.
This is not the popular answer. The popular answer is semantic chunking, which sounds smarter and benchmarks worse. The Vecta benchmark published in early 2026 put recursive 512-token splitting at 69 percent retrieval accuracy and semantic chunking at 54 percent on the same 50-document corpus. NVIDIA's research lands in the same place: factoid queries (the kind a voice agent gets) perform best at 256 to 512 tokens, with 10 to 20 percent overlap to preserve sentence context across boundaries.
The practical implication for sync: changing chunk size or embedding model invalidates every existing chunk in the base. If retrieval suddenly performs worse after a tuning pass, do not patch incrementally. Drop the base, re-ingest the source, and accept the few-hour reprocessing cost. The clarity you get on every future answer is worth the redo.
Retrieval threshold tuning happens after you have call data, not before. Pull 50 to 100 calls from post call analysis, tag the wrong-chunk failures, and adjust either the chunking of the offending source or the similarity threshold. Most teams overtune at first. The default settings are correct for 80 percent of use cases.
Build a closed-loop test that traces from edit to spoken answer with a unique testable phrase. This is the most useful five-minute check in the entire pipeline.
Pick a document and edit a unique numeric value in it. "Premium tier discount: 12.5%" becomes "Premium tier discount: 14.0%". Save in SharePoint. Within five to fifteen minutes (Graph notification, delta processing, embedding), the change should be live in the index. Place a test call asking the question. If the agent says 14.0%, the loop works.
When it doesn't, the failure isolates cleanly. Did the worker receive the webhook? Check your endpoint logs. Did the delta query return the file? Check the worker logs. Did the upload succeed? Check the API response. Did the chunk make it into retrieval? Check the call's retrieval log in post-call analysis. Each layer answers a yes/no question, and you find the broken layer in under ten minutes.
Run this loop after every meaningful pipeline change. New chunking strategy, new embedding model, new source, new sync worker version. If the loop closes, the change is safe to ship. If it doesn't, you have a precise reproduction of the bug.
Three layers, applied in this order. Prune at the source. Constrain at the prompt. Observe at the call.
Pruning at the source is the layer most teams skip and pay for later. When a policy is retired, move the file out of the synced folder. The cleanest pattern is a synced/ and archive/ directory split inside the same SharePoint library, with the worker only watching synced/. Two indexed versions of the same policy is a recipe for confident contradictions, and you cannot debug your way out of conflicting source documents.
Constraining at the prompt is layer two. Instruct the agent to answer only from retrieved knowledge base context and to escalate when none is available. Public benchmarks show grounded RAG cuts hallucination rates by 26 to 43 percent against ungrounded LLMs. The lift only holds when retrieval surfaces the right document. A "no answer found, transferring you" response is almost always better than a confident wrong one, and an escalation rule that triggers on low-similarity retrieval is one of the highest-leverage settings in the agent.
Observing at the call is layer three and where the loop closes. Tag every miss with a category: missing source, wrong chunk retrieved, outdated content, model misinterpretation. Each category has a different fix. Missing sources go into the next ingestion pass. Wrong chunks usually mean two documents discuss similar topics with different vocabulary, which fixes with metadata tags or by splitting the source. Outdated content traces back to a webhook gap that the daily reconciliation should have caught, and didn't, which is a bug in your reconciliation job.
For a team handling 5,000 calls per month at three minutes each, knowledge base usage is the smallest line item by an order of magnitude.
Retell AI bills $0.07 per minute for the base call cost, with knowledge base usage at $0.005 per minute on top. Each workspace includes 10 free knowledge bases. Additional bases are $8 per month. For 15,000 minutes of monthly call time, knowledge base usage adds $75. The base call cost is $1,050. Compare both to the SDR or front-desk salary the agent is offsetting and the math becomes obvious. Pricing is consistent whether you use one base or seven, and there is no platform fee on top.
The hidden costs are on the Microsoft side, and they are usually small but easy to misconfigure. Microsoft Graph charges per call. Azure Event Grid charges per million operations. Both are pennies at typical sync volume. The way you turn this into a real bill is by polling SharePoint every 30 seconds instead of subscribing to webhooks. A bug like that has spiked Azure consumption charges by a factor of fifty for at least one team I've seen. Webhook plus delta keeps the bill flat regardless of how often the source changes.
These recur often enough across deployments that they deserve a checklist.
Webhook subscription expiration. Subscriptions don't renew themselves. Add expiration to your alerting and renew at 75 percent of the maximum lifetime. Silent expiration is the most common drift cause.
Deletion handling. Most teams ship sync that handles created and updated files and forgets removed ones. The agent then quotes a policy that was retired three months ago. Wire the deleted change type as a first-class case, not an edge case.
Tenant-wide read scope. Granting Sites.Read.All at setup is fast. Six months later when an auditor asks which sites the voice service principal can see, "all of them" is the wrong answer. Use Sites.Selected from the start.
Documents over the 50MB ceiling. Long manuals fail upload silently when they exceed the per-file limit. Pre-process oversize documents by splitting on logical boundaries (chapter, section, product line) and uploading each piece as its own document. Keep parent-child metadata so retrieval can stitch them back if needed.
Drift between staging and production knowledge bases. Teams build a sync pipeline against a staging base, copy the agent config to production, and forget the production agent is still pointed at last quarter's manually-uploaded files. Make the knowledge base ID explicit in your deployment config and verify it after every release.
Yes. The architecture is service-principal authentication via Microsoft Entra ID, files pulled over an authenticated Graph channel, and uploads pushed to the knowledge base API over HTTPS. Source documents stay in SharePoint with their existing ACLs. The knowledge base holds an indexed copy used only for retrieval during calls, and you can scope the service principal to a single site if compliance requires it.
Five to fifteen minutes end to end on the happy path. The breakdown: Graph webhook delivery within a few minutes, delta query and download in under a minute, parsing and embedding in one to three minutes depending on document size. Once indexed, retrieval adds under 100 milliseconds during the call itself, so callers don't perceive a pause.
A daily reconciliation job replays the delta query and compares results against the knowledge base manifest, catching anything Event Grid or Graph missed. The combination of real-time webhooks and a daily delta sweep is the standard pattern, recommended in Microsoft's own engineering writing precisely because notifications are best-effort.
Yes. The same worker pattern applies to Google Drive (via the Drive Activity API), Amazon S3 (via S3 Event Notifications), Confluence, Notion, and any source that emits change events. The knowledge base API is source-agnostic. The only piece that changes is the auth and event-listening code.
Copilot Studio's SharePoint connector does not auto-refresh when files change, a limitation Microsoft confirmed in mid-2025 that has not shipped a fix at the time of writing. Workarounds involve Power Automate flows that trigger manual refreshes. Beyond the sync problem, Copilot Studio is built for chat surfaces and does not produce sub-second voice latency. For phone agents, the architecture in this guide is the path.
Retell AI ships with SOC 2 Type II, HIPAA with self-service BAA, and GDPR, plus configurable data retention and PII redaction. For healthcare workloads, the standard pattern is to gate the BAA before any PHI touches the index. Compliance posture against your specific regulatory regime is yours to validate. The infrastructure-level certifications cover the platform, not your content classification policy.
Yes. An agent can have multiple knowledge bases linked, and each base can pull from a different source pipeline. A common pattern is one base per logical domain (billing, support, product specs) with sources cleanly separated. Conversation flow nodes can also bind a different base to a specific node when sales and support paths need distinct context.
One engineer who's comfortable with OAuth and webhooks for the sync layer, one ops owner for the agent build and prompt tuning, and a content owner who decides what belongs in the synced folder. The split that works in practice: IT owns the worker and the auth; ops owns the agent; content team owns what's in scope. The architecture supports a single-person setup for proof of concept and scales without re-architecture.
Azure AI Search's SharePoint indexer handles ingestion well but has hard limits worth knowing. It does not support tenants with Conditional Access enabled. ACL preservation is in public preview, not GA. Refresh latency runs in hours, not minutes. And you still need to build the voice agent layer on top. For pure enterprise search, AI Search is fine. For voice, the sync-into-Retell-AI pattern is operationally simpler and faster to deploy.
Recursive 512-token splitting with 10 to 20 percent overlap. This is the benchmark-validated default across 2026 evaluations and outperforms semantic chunking by roughly 15 points on real document corpora. Three chunks retrieved at default similarity. Tune only after you have 50 to 100 real call transcripts to inform the change.
The sync pipeline is the unglamorous half of voice AI, and it's also the half that decides whether the deployment ships or stalls. A pilot that "works on demo data" and falls over the moment a policy changes is the signature failure mode in this space, and the architecture above is the fix.
Once the knowledge base is solid, the same agent expands into adjacent workflows on the same data. AI customer support for inbound questions, lead qualification for outbound, a receptionist that routes to either. The corpus you've curated for one becomes the source of truth for all of them.
Start free with $10 in credit at retellai.com.
See how much your business could save by switching to AI-powered voice agents.
Total Human Agent Cost
AI Agent Cost
Estimated Savings
A Demo Phone Number From Retell Clinic Office

Start building smarter conversations today.

