Wednesday, July 23, 2014

Why some back comparable 2010 workflows intermittently failed on SharePoint 2013?

After we upgrade SharePoint to 2013, we found some workflows failed intermittently. The failed workflows include not only custom workflows but also out of box approval workflows. After debugging the OoB failed workflows, we identified that most of the failed OoB workflows have due date or duration configured. This is pointing to issue “Bulk workflow task processing” timer job that should be trigger by “SharePoint Foundation Workflow Timer Service”. 

If we looked at the SharePoint workflow architecture, the “SharePoint Foundation Workflow Timer Service” should be the process to trigger the workflows with due date or duration configured. This workflow timer service would need to have all the references like assemblies and workflow definitions in order to process successfully the workflows. In our cases, our previous consultants configured the “SharePoint Foundation Workflow Timer Service” on the application servers NOT the WFEs. We normally do not have solutions and workflows deployed to application servers.

One issue we could explain the Muhimbi workflow failed in production. Muhimbi installation will installed many features in 15 hive, one workflow in web.config, and one workflow dll in GAC as in following three screenshots.


If “SharePoint Foundation Workflow Timer Service” is running on application servers, the service will not be able to find the workflow definition, related dlls, or other reference. It will fail as other people mentioned for different workflows. We might have similar issues for the following workflows.

  • Third party workflows like Muhimbi worklfow
  • Custom visual studio workflow activities
  • Designer workflows
  • OoB workflows with due date or duration

Now the fix to resolve intermittently failed workflow on SharePoint 2013 is to run the “SharePoint Foundation Workflow Timer Service” on all WFEs instead as Microsoft recommended. The “Microsoft SharePoint Foundation Workflow Timer Service” should run on the server with “Microsoft SharePoint Foundation Web Application” service running. This server is WFE server. Of cause, you could run the “Microsoft SharePoint Foundation Workflow Timer Service” on different server with the configurations mentioned from Microsoft. We have other SharePoint customers resolve the similar workflow issues by reconfigure the “SharePoint Foundation Workflow Timer Service”to run on WFEs.

In addition, here are some steps your might try to debug the workflow issue.

  • Enabled all workflow log level to verbose - Check General, Timer and Workflow Infrastructure under the SharePoint Foundation category and select Verbose in the least critical event to report to trace log.
  • Verify workflow service account has the full access to all content databases and config database
  •  Restart workflow timer jobs
Processes workflow events that are in the scheduled items table, such as delays.
5 minute
Workflow auto cleanup
Deletes tasks and instances in the workflow instance table for workflows that were marked completed more than n days in the past, where n is specified in the workflow association. Crawls through tasks and the workflow instance table.
Workflow failover
Processes events for workflows that have failed and are marked to be retried.
15 minute
Bulk workflow task processing
Processes bulk workflow task completion.

  • Restart SharePoint timer job
Now you should have much stable workflow process. 

No comments:

Post a Comment