This article covers how to import publishing pages to SharePoint.
This can help you if you need to bulk create pages in SharePoint and you have the data stored in a database or spreadsheet or if you need to migrate pages from another content management system such as WordPress.

Scenario
We are going to assume that you have your page data in a database. For this example we are going to use MySQL and the WordPress schema. As you will see in the coming steps this could just as easily be any OleDB or ODBC data source and any schema.
Import Tool
We will use the free Import for SharePoint toolset to import the pages.
You download it from here.
When you download and install the import tool you will have full documentation and additional example import configuration files which will help further.
The Source
Column
The source data must contain a column with the HTML mark-up in it.
In our example content this is called ‘PageContent’. Inside it looks a bit like this.
|
If you are storing digital / scanned images in SharePoint you are most likely storing them as Adobe PDFs.<br/><br/>PDF is a great format to use because it views nicely and it also can accommodate full text searching of text in the image (more on that another time).<br/><br/>If you are now looking to implement records management for those documents this infers a requirement to keep the images for a long time.<br/><br/>This is where PDF/A comes in. It is a reduced feature set of PDF which is guaranteed to be viewable in years to come whereas perhaps a more fully featured PDF document might not be in a couple of decades time.<br/><br/><a href="https://ensentia.com/wp-content/uploads/2016/01/pdf-a1.png"><img class="alignnone size-full wp-image-135" src="https://ensentia.com/wp-content/uploads/2016/01/pdf-a1.png" alt="pdf-a1" /></a><br/><br/>If you are scanning your documents into SharePoint the scanning software should be able to generate PDF/A compliant images for you though there might be an additional cost for this.<br/><br/>If you are migrating scanned images into SharePoint then it is worthwhile ensuring that these are generated as PDF/A compliant. |
Select Statement
Now from the database source we need to ‘Select’ the data that will create our pages. Since we are working with WordPress in this example the data is in wp_posts as we see below.
|
select *, concat(post_name,'.aspx') as DestinationFileName, replace(post_content,'\n','<br/>') as PageContent, 'True' as Publish from wp_posts where post_type = 'post' and post_status = 'publish' |
This select statement is also, cleverly, giving us the destination page name and setting up the page to be automatically published.
Import Configuration File
This is the file that tells Import for SharePoint how to create the pages in SharePoint.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
|
<?xml version="1.0" encoding="utf-8"?> <DataSetImportSettings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Source> <!-- The source used is ODBCSelect meaning that DIFS expects to run a SQL select statement against an ODBC data source. This will ignore columns like importstatus --> <SourceDataSetType>ODBCSelect</SourceDataSetType> <ODBCSourceDataSetSettings> <!-- So a system DSN has been created called WordPress, this has in fact been pointed to a MySQL database hosting WordPress --> <ConnectionString>DSN=WordPress;Uid=root;Pwd=password;</ConnectionString> </ODBCSourceDataSetSettings> <ODBCTableSourceDataSetSettings /> <ODBCSelectSourceDataSetSettings> <SelectStatement>select *, concat(post_name,'.aspx') as DestinationFileName, replace(post_content,'\n','<br/>') as PageContent, 'True' as Publish from wp_posts where post_type = 'post' and post_status = 'publish'</SelectStatement> </ODBCSelectSourceDataSetSettings> </Source> <Destination> <AuthenticationSettings> <AuthenticationType></AuthenticationType> <domain /> <username></username> <encryptedpassed></encryptedpassed> </AuthenticationSettings> <DestinationItemSettings> <DestinationItemType>PublishingPage</DestinationItemType> <ItemExistsBehaviour>Overwrite</ItemExistsBehaviour> <ImportMappings> <ImportMapping xsi:type="ImportMapping_String"> <DestinationField>Title</DestinationField> <SourceColumn>post_title</SourceColumn> </ImportMapping> <ImportMapping xsi:type="ImportMapping_String"> <DestinationField>Page Content</DestinationField> <SourceColumn>PageContent</SourceColumn> </ImportMapping> </ImportMappings> </DestinationItemSettings> <DestinationListSettings> <DestinationWebUrlRelative>/sites/SPImportHelper</DestinationWebUrlRelative> <DestinationFolderUrlRelative>/sites/SPImportHelper/Pages</DestinationFolderUrlRelative> <DestinationServerUrl></DestinationServerUrl> <DestinationListName>Pages</DestinationListName> </DestinationListSettings> <SourceColumns> <SourceFileNameAndPath>FullName</SourceFileNameAndPath> <ContentType>ContentType</ContentType> <DestinationSubFolder>DestinationSubDirectories</DestinationSubFolder> <DestinationFileName>DestinationFileName</DestinationFileName> <Publish>Publish</Publish> <CheckInComment>CheckInComment</CheckInComment> <PublishComment>PublishComment</PublishComment> <PageLayoutASPXName>PageLayoutASPXName</PageLayoutASPXName> </SourceColumns> </Destination> </DataSetImportSettings> |
The schema is fully explained in the documentation but the important bits for this exercise are explained here;
DestinationItemType
So we want to create publishing pages (We could alternatively create wiki pages, modern SharePoint pages, site pages or blog posts) but lets stick to the most common (publishing pages) for now.
<DestinationItemType>PublishingPage</DestinationItemType>
PageLayoutASPXName
So if your source select statement does not have a column of this name then “ArticleLeft.aspx” will be used. If you want to use another page layout then ensure you select statement returns a column of this name containing the name of your desired page layout.
<PageLayoutASPXName>PageLayoutASPXName</PageLayoutASPXName>
ImportMapping
This bit maps your HTML data (in the column PageContent) to the SharePoint page content (field Page Content).
<ImportMapping xsi:type=”ImportMapping_String”>
<DestinationField>Page Content</DestinationField>
<SourceColumn>PageContent</SourceColumn>
</ImportMapping>
Execution
Ok so rather than re-invent the wheel we’ll let you read the documentation installed with Import for SharePoint on this one.
Result
Ok so originally in WordPress the page looked like this.

And now in out of the box SharePoint it looks like this.

Great, but seems a bit simplistic
Ok so we have shown how to import publishing pages into SharePoint.
Realistically a project is always going to be more complicated than that.
So lets talk about real life….
Targeting Branded SharePoint
Page Layout
So the destination is likely to be branded? That’s no problem we’ve already talked about PageLayoutASPXName and custom branded SharePoint really just means using a different page layout.
Content Type Fields
But the destination page has extra fields, like managed meta data “Tags”, a Byline, an Article Date? Again no problem you just need more of these ImportMappings to map data from your source into those additional SharePoint fields.
<ImportMapping xsi:type=”ImportMapping_String”>
<DestinationField>Title</DestinationField>
<SourceColumn>post_title</SourceColumn>
</ImportMapping>
Data Manipulation
So what if the source data is not in the exact format that SharePoint needs?
No problem this manipulation can be done in SQL as shown below.
|
select *, concat(post_name,'.aspx') as DestinationFileName, replace(post_content,'\n','<br/>') as PageContent, 'True' as Publish from wp_posts where post_type = 'post' and post_status = 'publish' |
WordPress was never going to contain a column giving us a file name like “MyPage.Aspx” so we create one on the fly using concat here.
If (when?) your manipulations get too complex for inclusion in the SQL statement (on the fly) you can directly manipulate the source table, just make sure you take precautions if the source data is used by anything else (like working from a copy).
So what does this get used for?
We have seen this approach used for the following;
- Legacy Content Management System (CMS) migration.
- Bulk creation of pages from Excel
- Scan to Mark-Up / Republishing – Loading data that has been scanned and OCR’d into pages.
- WordPress to SharePoint Migration
- Drupal to SharePoint Migration
- Joomla to SharePoint Migration
- Custom Intranet to SharePoint Migration
Great, Makes more sense now but I’m still an bit unsure
No problem just get in touch.