Bulk Plain-Text Extraction from SharePoint Enhanced Rich …

You have a SharePoint list. It has 6,000 items. Several columns are Multiple lines of text with enhanced rich text turned on, and you need the values as plain text — without the <div class="ExternalClass..."> wrappers, without the <p>, <br>,   noise, just the words.

You can’t change the column type. HTML to Text inside an Apply to each takes hours and chokes on connector throttling.

This post walks through the actually-fast solution: SharePoint’s built-in FieldValuesAsText projection, the getitems POST endpoint, and a clean pagination loop. It also covers every gotcha you’ll hit along the way — there are several, and they’re all undocumented or buried in 2017-era GitHub issues.

Why the obvious approaches don’t scale

Before we get to the solution, here’s why the usual suggestions fail at 6,000+ items:

Apply to each + HTML to Text action. ~1 second per item. 6,000 items = 100 minutes minimum, plus connector throttling. Often times out.
Apply to each + nested replace() calls. Same loop overhead, just with no premium connector cost.
xpath(xml(...), 'string(/r)') in a Select action. Vectorized (one action call for the whole array) and very fast — but breaks on any malformed HTML, like an unclosed <br> or <img> from someone who pasted out of Word.
Office Scripts. Works great, but adds an Excel hop and has a daily script-call cap.
PnP PowerShell. Best-in-class for one-time migrations, but you wanted this in a flow.

The real answer is sitting in the SharePoint REST API and almost no one in the Power Automate community knows about it.

The hidden gem: `FieldValuesAsText`

Every SharePoint list item exposes a server-computed projection called FieldValuesAsText. Ask for it, and SharePoint returns plain-text versions of every field — rich text, lookups, person/group, choice, you name it. The HTML stripping, entity decoding, lookup resolution, and date formatting all happen server-side. You don’t write a single regex.

The reason most flow builders never encounter it: Power BI’s SharePoint connector uses it transparently (which is why Power BI users never have this problem), but in Power Automate you have to call it yourself via Send an HTTP request to SharePoint.

Here’s the naive call:

GET _api/web/lists/getbytitle('TrackerList')/items(42)/FieldValuesAsText

This returns one item with every field already plain-text. The output looks like:

{
  "ID": "42",
  "Title": "Quarterly Review",
  "Description_x005f_x0020_x005f_Field": "Plain text version of what was rich HTML.",
  "Owner": "Jane Doe",
  "Status": "Approved",
  "Created": "9/21/2024 4:08 PM"
}

Beautiful. But you don’t want one item — you want all 6,000.

Why `$expand=FieldValuesAsText` doesn’t just work

The natural next step is to fetch the items collection and expand FieldValuesAsText:

GET _api/web/lists/getbytitle('TrackerList')/items?$expand=FieldValuesAsText&$top=5000

This is where you fall into a tar pit of three separate, undocumented behaviors that all look like the same problem (“the field is empty”). They’re not.

Gotcha 1: deferred expansion in verbose mode

With the default Accept: application/json;odata=verbose header, SharePoint returns FieldValuesAsText as a deferred link:

{
  "FieldValuesAsText": {
    "__metadata": {
      "id": "...",
      "uri": "..."
    }
  }
}

The object is empty because OData verbose mode defers expanded entities by default. Switch to nometadata and they get inlined.

Gotcha 2: only system fields come through on the collection endpoint

Even with nometadata, calling $expand=FieldValuesAsText on the items collection (versus a single item) silently filters down to intrinsic SharePoint fields — ID, Created, Modified, FileRef, etc. Your custom columns are simply absent from the response. No error, just missing data.

This is documented in sp-dev-docs issue #6011 as a known divergence between /items(<id>)/FieldValuesAsText and /items?$expand=FieldValuesAsText. Microsoft has acknowledged it but won’t fix it because production code now depends on it.

The workaround is to explicitly project each subfield in $select:

?$select=Id,FieldValuesAsText/Description_x0020_Field,FieldValuesAsText/Notes_x0020_Field&$expand=FieldValuesAsText

Gotcha 3: query string length limit

If you’re projecting 30+ rich-text fields, your URL grows past ~4 KB and SharePoint returns:

The length of the query string for this request exceeds the configured maxQueryStringLength value.

This is the wall that pushes you toward the getitems POST endpoint.

The fix: `getitems` with a CAML body

getitems is the lesser-known sibling of the items collection endpoint. It’s a POST that accepts a CAML query in the body — and crucially, it returns all custom fields under FieldValuesAsText even without explicit $select projection. The body has no real size limit, so you can put column lists or filters there without worrying about URL length.

POST _api/web/lists/getbytitle('TrackerList')/getitems?$expand=FieldValuesAsText

Headers:

Accept: application/json;odata=nometadata
Content-Type: application/json;odata=verbose

Body:

{
  "query": {
    "__metadata": { "type": "SP.CamlQuery" },
    "ViewXml": "<View><RowLimit>500</RowLimit></View>"
  }
}

That’s it. Every column, every item, all rich text already plain text. The only catch is RowLimit — you’d think 5000 (SharePoint’s hard ceiling) is fine, but the Power Automate HTTP connector has a 100 MB response buffer:

HTTP request failed as there is an error: 'Cannot write more bytes to the buffer than the configured maximum buffer size: 104857600.'

For wide lists with 100+ columns, 5000 rows easily exceeds 100 MB once FieldValuesAsText adds duplicates of every value. Drop RowLimit to 500 or 1000 and paginate. Twelve calls of 500 finish faster than fighting with retries on one giant call anyway.

Pagination with Do-Until

SharePoint paginates getitems differently from the GET items collection. Instead of a __next URL, the response contains a ListItemCollectionPositionNext.PagingInfo token that you echo back in the next request’s body.

The full flow looks like this:

Initialize variables (before the loop)

Variable	Type	Initial value
`PagingToken`	String	empty
`AllItems`	Array	`[]`
`HasMore`	Boolean	`true`

The Do-Until loop

Loop condition: @equals(variables('HasMore'), false)
Limits: Count 60, Timeout PT1H

Inside the loop, five actions:

1. Compose — `BuildBody`

@{if(
  empty(variables('PagingToken')),
  '{"query":{"__metadata":{"type":"SP.CamlQuery"},"ViewXml":"<View><RowLimit>500</RowLimit></View>"}}',
  concat(
    '{"query":{"__metadata":{"type":"SP.CamlQuery"},"ListItemCollectionPosition":{"PagingInfo":"',
    variables('PagingToken'),
    '"},"ViewXml":"<View><RowLimit>500</RowLimit></View>"}}'
  )
)}

First iteration sends a clean body. Every subsequent iteration injects the paging token SharePoint returned last time.

2. Send an HTTP request to SharePoint

In code view, this is what the action looks like:

{
  "type": "OpenApiConnection",
  "inputs": {
    "parameters": {
      "dataset": "https://contoso.sharepoint.com/sites/projecttracker/",
      "parameters/method": "POST",
      "parameters/uri": "_api/web/lists/getbytitle('TrackerList')/getitems?$expand=FieldValuesAsText",
      "parameters/headers": {
        "Accept": "application/json;odata=nometadata",
        "Content-Type": "application/json;odata=verbose"
      },
      "parameters/body": "@{outputs('BuildBody')}"
    },
    "host": {
      "apiId": "/providers/Microsoft.PowerApps/apis/shared_sharepointonline",
      "connection": "shared_sharepointonline-1",
      "operationId": "HttpRequest"
    }
  }
}

Note parameters/body is a string expression referencing the Compose output — not the raw JSON. Power Automate sends whatever string you put there as the request body.

3. Compose — `MergedItems`

@union(variables('AllItems'), body('Send_an_HTTP_request_to_SharePoint')?['value'])

union() concatenates two arrays and de-duplicates. Since SharePoint never returns the same item twice across pages, this is effectively a flat append.

4. Set variable — `AllItems` to `@outputs('MergedItems')`

5. Set variable — `PagingToken`

@coalesce(body('Send_an_HTTP_request_to_SharePoint')?['ListItemCollectionPositionNext']?['PagingInfo'], '')

When SharePoint has no more pages, this property is null. coalesce() falls back to an empty string.

6. Set variable — `HasMore`

@not(empty(variables('PagingToken')))

When the token is empty, the loop exits on the next iteration check.

After the loop

AllItems now holds the entire list as a single flat array of objects. Feed it into a Select action to shape the output for whatever’s next — Excel, Dataverse, an HTTP push to another system. The downstream Select can address rich-text fields like this:

@item()?['FieldValuesAsText']?['Description_x005f_x0020_x005f_Field']

The double-encoding gotcha (subtle but important)

If you ever do go back to using $select for projection, watch for this: SharePoint double-encodes field internal names inside FieldValuesAsText. A column with internal name Description_x0020_Field (where _x0020_ is a space) appears in the URL as:

$select=FieldValuesAsText/Description_x0020_Field

But in the response, it comes back as:

"Description_x005f_x0020_x005f_Field"

Every _ got re-encoded to _x005f_. Get this wrong and you’ll see your field as null in the Select output and assume the projection failed.

Microsoft has tracked this in sp-dev-docs since 2017 and explicitly won’t fix it because too many existing apps depend on the current behavior.

The pragmatic workaround: run your call once with $top=1, paste the response into a JSON viewer, and copy the actual property names you see. Don’t try to reason about the encoding.

Putting it all together — the full solution

For one-time migrations or scheduled syncs of large lists with rich-text columns, here’s the pattern that actually works:

POST to getitems instead of GET on the items collection — bypasses the URL length limit and the silent custom-field filtering.
Use $expand=FieldValuesAsText in the URL and let SharePoint return everything inlined.
Use Accept: application/json;odata=nometadata — never verbose for this — to avoid deferred expansion.
Set RowLimit between 500 and 1000 for wide lists. Wider list = smaller page.
Paginate with a Do-Until loop echoing the PagingInfo token back in each body.
Accumulate with union() in a Compose, not Append-to-array.
Read rich-text values from the nested FieldValuesAsText object in your downstream Select, using the double-encoded names.

A list of 6,000 items with 30 rich-text columns processes in roughly 12 HTTP calls and finishes in under a minute. No HTML to Text action ever runs. No regex, no xpath, no Office Script, no PowerShell. SharePoint does all the conversion server-side, and you’ve turned what was a 100-minute Apply-to-each into a 60-second flow.

What about ongoing data?

For lists that keep growing, the durable answer is to add a shadow PlainText column populated by a flow that fires on item-create and item-update. That flow does the conversion once per write, so reads don’t pay the cost. Use the pattern in this post to backfill historical rows, then let the trigger flow handle everything new.

The 6,000-record problem is really a one-time problem. Once you’ve solved it, you don’t solve it again.

If this saved you a few hours, the broader lesson is: when Power Automate feels too slow, it’s almost always because you’re looping in a place where you should be projecting. SharePoint’s REST API is much richer than the connector surfaces, and the difference between “this flow runs in five minutes” and “this flow runs in five hours” is usually a single endpoint choice.