Friday, February 21, 2014

SharePoint 2013 "Sorry, something went wrong. An unexpected error has occurred" user login error for web applications with multiple authentications

SharePoint 2010 and 2013 come with Claims based authentication and the option to have multiple authentication providers for the same url. Since our company store the external users in LDAP and internal users in AD, we have two external web applications that are using the mixed authentications. External users will use form based authentication and internal users will use window authentication to access those two web applications.

After we upgraded from 2010 to 2013, we found internal users are not able to consistently login to the two web applications with the mixed authentications. Different internal users hitting different web applications on different Web Front servers at different time might get the following different results.
  • Login without issue 
  • Sorry, something went wrong. An unexpected error has occurred
  • Stay on Sign In page
  • Sorry this page has not been shared for you
  • Server Error in ‘/’ Application

Here are the screenshots for different error.



We have working with Microsoft support on this critical production SharePoint 2013 issue several days after go-live without solutions. The issue seems to be related to SharePoint 2013 farm with multiple mixed authentication providers (forms and Windows as example) web applications that can be reproduced on several environments. The exception from the ULS log is listed below.

Unexpected       System.ArgumentException: Exception of type 'System.ArgumentException' was thrown.  Parameter name: encodedValue    at Microsoft.SharePoint.Administration.Claims.SPClaimEncodingManager.DecodeClaimFromFormsSuffix(String encodedValue)     at Microsoft.SharePoint.Administration.Claims.SPClaimProviderManager.GetProviderUserKey(IClaimsIdentity claimsIdentity, String encodedIdentityClaimSuffix)     at Microsoft.SharePoint.Administration.Claims.SPClaimProviderManager.GetProviderUserKey(String encodedIdentityClaimSuffix)     at Microsoft.SharePoint.Utilities.SPUtility.GetFullUserKeyFromLoginName(String loginName)     at Microsoft.SharePoint.ApplicationRuntime.SPHeaderManager.AddIsapiHeaders(HttpContext context, String encodedUrl, NameValueCollection headers)     at Microsoft.SharePoint.Application...     


If you decompile Microsoft SharePoint package and the exception seems to be thrown when SahrePoint try to decode userID and failed to find the "|" inside claims like "i:0#.w|DOMAIN/username".  Since there is no quick solution at this point, we looked at the exception and based on the following logic, we came up the workaround listed below. Here is the details for the logics and adjustment we have applied to production to at least reduce the issue if not resolved. We will have to do more research to identify which change is absolute necessary. 

1. The first change is to configure LdapMembershipProvider to use version 15 (2013 version) instead of version 14 (2010 version).The reason behind this is SharePoint 2013 might modified the LdapMembershipProvider implementation and we may run into authentication issue if we use 2010 version. The updated configuration is listed below if for web application web.cnfig.

<add name="LdapMember" type="Microsoft.Office.Server.Security.LdapMembershipProvider, Microsoft.Office.Server, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c" server="xldap.qualcomm.com" port="636" useSSL="true" connectionUsername="uid=spexovd,ou=People,o=Sharepoint Extranet,o=qualcomm.com" connectionPassword="Qualcomm123" useDNAttribute="false" userDNAttribute="entrydn" userNameAttribute="uid" userContainer="ou=people,o=Corporate Legal,o=qualcomm.com" userObjectClass="person" userFilter="(ObjectClass=person)" scope="Subtree" otherRequiredUserAttributes="sn,givenname,cn" /> 

You still need to modify Security Token Service web.config and Central Administration web.config as discussed in different blog.

2. The second change is we removed the sessionstate from the web.config. The following three lines have been removed since the issue seems to be server side authentication confused on the cached user authentication. If we remove the server side session and leverage the client cookie, it might reduce the issue.

<sessionState mode="SQLServer" timeout="60" allowCustomSqlDatabase="true" sqlConnectionString="Data Source=SPSQLSTG3;Initial Catalog=SessionStateDatabase;Integrated Security=True;Enlist=False;Pooling=True;Min Pool Size=0;Max Pool Size=100;Connect Timeout=15" />
<remove name="Session" />
<add name="Session" type="System.Web.SessionState.SessionStateModule" />

3. The third change is to disable the page session session state. The thought behind this is same as previous reason.

<pages enableSessionState="false" enableViewState="true" enableViewStateMac="true" validateRequest="false" clientIDMode="AutoID" pageParserFilterType="Microsoft.SharePoint.ApplicationRuntime.SPPageParserFilter, Microsoft.SharePoint, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c" asyncTimeout="7">


4. The fourth change is to add client cookie persistent session time as one hour as described in Jalil Sear's blog. The thought behind this is we are utilize the client cookie instead of server session to persistent user's information, we would like to keep the cookie not expire quickly. The change is in RED.

<cookieHandler mode="Custom" path="/" persistentSessionLifetime="60">

5. The fifth change is to set the security token service to default configuration as default.

Get-SPSecurityTokenServiceConfig

Set-SPSecurityTokenServiceConfig -FormsTokenLifetime 600

$sec=Get-SPSecurityTokenServiceConfig
$sec.LogonTokenCacheExpirationWindow=6000000000
$sec.Update()
$sec

$sec=Get-SPSecurityTokenServiceConfig
$sec.CookieLifetime=4320000000000
$sec.Update()
$sec

The thought behind this is to make sure we have the correct security token service configuration on SharePoint 2013.

6. This sixth change is to fix SharePoint 2013 distributed cache bug as Jason Warren described in his blog. The issue Jason described is that occasionally, a user would click on link and instead of receiving the expected page they would unexpectedly be redirected to the sign in page where they were prompted to log in again. This is similar to what we experienced.

As Jason explained that when SharePoint tried to retrieve the token from distributed cache, the connection would time out or a connection would be unavailable and the comparison would fail. Since it couldn't validate the presented token SharePoint had no choice but to log the user out and redirect them to the sign in page.

The fix he provided is summarized below. 
  • Apply AppFabric Cumulative Update 3, AppFabric Cumulative Update 4, or a later AppFabric CU to all servers in the farm
  • Add backgroundGC key to DistributedCacheService.exe.config file on all cache servers
  • Restart AppFabric Windows Service on all cache servers
  • Restart Distributed Cache SharePoint service on all cache servers
  • Reset IIS (IISRESET) on all servers in the farm
If the issue persists, you may need to increase timeout and connection values:
  • Increase distributed cache client settings for affected containers using the Set-SPDistributedCacheClientSetting cmdlet.
  • Increase security token service values with Get-SPSecurityTokenServiceConfig
  • Restart AppFabric, and Distributed Cache on cache servers
We are in the process to apply these great suggestions form Jason and working with Microsoft to resolve this issue.

7. The seventh change the load balancer for the two web applications with multiple authentications. One VIP URL points to only one server and another points to different server.  The though is based on Microsoft DSE inside information that different SharePoint 2013 client with multiple authentications has the similar login issue.The exception indicated that might be a bad user claims that might be introduced by multiple authentications that inside the cache.

8. The eighth change is to remove the sticky session from the load balancer for the two web applications with multiple authentications. The original though is user could fail over to another server if encountered an error. However, since we have modified the VIP to point to only one server, this setting is no longer relevant as I can see. We will try to add this setting back and verify.

Although we noticed the login issue has dramatically reduced, we are still getting such error randomly.  If you have similar issue, please let me know and we could push Microsoft to get final solution.

4 comments:

  1. Harry,
    Thanks for this post.
    What u have done, same we have done in our organization,
    We also have migrated from sp 2010 to sp2013, we also have multilpe authentication for same url.
    And if any user become inactive more than 30 mins than it gets error as above mentioned "sorry something went wrong".
    so it will very helpful if u can guide

    ReplyDelete
  2. the same issue, 7-8, thank you

    ReplyDelete
  3. Did Microsoft ever reach any solution for this issue? Is there a KB article about it?

    ReplyDelete
  4. This comment has been removed by the author.

    ReplyDelete