Showing posts with label SharePoint 2010 Web Service. Show all posts
Showing posts with label SharePoint 2010 Web Service. Show all posts

Friday, September 14, 2012

Attivio SharePoint 2010 search integration issues and concerns you need to know

If you implement Attivio SharePoint 2010 search connector, users could search SharePoint content and metadata through centralized Attivio search interface with other contents such as wiki, email, files share, Documentum, and eRoom. After reviewing the Attivio architecture and SharePoint connector guide, we have some concerns and questions that should be resolved for SharePoint search integration. Some of them are critical that need to be addressed before going to production.

1. System pages and galleries crawling - Attivio SharePoint 2010 connector guide indicated SharePoint Object Model connector does not support crawling system galleries. However, this limitation does not indicated in SharePoint Web Services connector which is we are using. Since SharePoint pages and galleries will contain many pages and exclude them from the crawling will be best practice to reduce the performance impact. The following galleries and system page should be excluded from the crawling and there may be more need to be excluded you may check the sacreen shot for SharePoint site.
  • Web Part Gallery
  • Site Template Gallery
  • List Template Gallery
  • Master Page Gallery
  • Theme Gallery
  • From Template system files
  • IWConvertedForms
  • Workflow Forms
 
2. SharePoint 2010 permissions crawling - Attivio SharePoint 2010 connector guide indicated target audiences and audience filtering are not supported. There is no way to return the target audience of an item. As a result, the search will not apply target audiences permission. Users not inside target audiences might be able to search and view the content. This needs to be verified and addressed.

3. SharePoint 2010 content type crawling - Attivio SharePoint 2010 connector guide indicated content types are not supported and we had concern that content with customized content type might not be indexes. Attivio consultants have confirmed this is not correct and all content with different content types will be indexed and will be searchable. This needs to be tested.

4. SharePoint 2010 crawling configuration - Attivio SharePoint 2010 connector guide indicated the NoCrawlproperty for lists and sites is not available. As a result, we could not exclude any list or sites to be excluded in Attivio search. We have some secrete site collections in the system we do not exposure to any users except some restricted users. Owners of these sites might not want to expose any content through other search UI even the permission has been properly applied. We may need to identify some workaround to address this.

5. SharePoint 2010 MySite crawling - Attivio SharePoint 2010 connector guide indicated to pass http://host:port/personal/username rather than http://host:port/MySite. SharePoint treats MySites as separate repositories. We are not sure whether we need to pass each and every personal my site URL which is more than 10,000 in our company. This need to be address if MySite content need to be searchable through Attivio.

6. SharePoint 2010 Meeting Workspaces crawling - Attivio SharePoint 2010 connector guide indicated crawling Meeting Workspaces causes the server to queue child pages such as Workspace Pages that do not exist, which in turn causes an Exception error message during a crawl. This needs to be testing and verified.

7. SharePoint 2010 audit and logs – SharePoint will contain audit logs and other logs inside content database. At this point, we are not sure whether Attivio will index any of these. We are hoping these will not be indexes to avoid performance issue. This needs to be confirmed.

8. SharePoint 2010 entitlement policy – We are implementing Nextlabs SharePoint entitlement solution to deny certain group users to selected site content even those users are granted permissions through SharePoint. The SharePoint search will be integrated with Nextlabs SharePoint entitlement policies and will block those users to search or view selected content. However, Attivio SharePoint 2010 connector will not aware of the Nextlabs SharePoint entitlement policies and may expose selected content to those users. We might need to customize Attivio search to Nextlabs SharePoint entitlement policies through Nextlabs policy web services before display search result to end users.

9. SharePoint 2010 retention policy – We are implementing retention policy to some site content. For example, if we apply the retention policy to one site as seven year policy, content will be deleted automatically after seven years. The SharePoint backup tape may have one year retention policy and will be recycled after one year. The same one year policy should be applied to Attivio index tapes. In other words, anything deleted from SharePoint should not exist on Attivio side even backup tapes.

10. Attivio SharePoint 2010 connector web service – This web service contains several interfaces that will not only read but also update and delete SharePoint contents. Although this is not a real issue now but we are surprised that crawling process web service contains update and change interfaces. We would need to be careful only grant Attivio SharePoint crawling account as READ only and may utilize the following update interfaces.
  • CancelCheckOut
  • CheckOut
  • Checkin
  • CopyItem
  • CreateDocument
  • CreateFolder
  • DeleteItems
  • DeleteVersion
  • MoveItems
  • Promote
  • SetAttachments
  • SetPermissions
  • UpdateItem
If you found anything else we need to be concerned on Attivio SharePoint 2010 search, please share with us.

Attivio SharePoint 2010 search - what you need to know as SharePoint architect and administrator



As an enterprise search initiative, our company started to evaluate Attivio search engine that should add world-class content analysis, search and navigation. Attivio’s Active Intelligence Engine™ (AIE) indexes content and metadata, performs advanced linguistic and context analysis, delivers permission-aware, relevance-ranked search and content navigation. One of the search integration targets is SharePoint 2010. Since there is limited architecture diagram and document on the architecture how Attivio integration with SharePoint and what are the impacts to SharePoint, this blog will cover the Attivio architecture, Attivio SharePoint connector, limitation of the Attivio SharePoint integration so you could refer to manage the Attivio integration with SharePoint.

Attivio architecture has three major layers including Endpoint API layer (Ingestion) to crawl all the contents, Universal Index layer to create indexes, and Query API layer to expose search. There are other services including ingestion services and asynchronous workflows for cleansing and enriching content before it is persisted in the Universal Index, system services for backup and logs, and Transport Layer enables workflow communication and distribution across one or many nodes. The detailed architecturediagram is listed below.




Attivio SharePoint connector supports all SharePoint lists including document libraries, calendar, tasks, issues, discussions boards and all SharePoint objects as well as a read/write feature. It also gives users access to all site collections in a farm, including subsite connection.

Attivio SharePoint connector installation is very simple and one wsp solution named entropysoft.sharepoint.webservice.wsp will be deployed to SharePoint farm. Since there is very limited documentation on how the connector works, we will dig into at what components will be deployed so we will be able to understand how it works.

After Attivio SharePoint connector installation, the following changes will be made to SharePoint farm.


1. One farm solution named entropysoft.sharepoint.webservice.wsp deployed globally

2. Four dll files deployed to assembly GAC  
  • Entropysoft.Sharepoint.WebService
  • Entropysoft.WebConfModif
  • log4net.dll
  • Microsoft.Web.Services2.dll
 3. Two web services files will be deployed to ISAP folder
  • sharepointConnector.asmx
  • sharepointConnectorwsdl.aspx
4. One web service entry below will be added to all web.config configuration session 


  <location path="_vti_bin/sharepointConnector.asmx">
    <system.web>
      <authorization>
        <deny users="?" />
      </authorization>
      <webServices>
        <soapExtensionTypes>
          <add type="Entropysoft.Sharepoint.Webservice.ExceptionSoapExtension, Entropysoft.Sharepoint.Webservice, Version=4.5.91.0, Culture=neutral, PublicKeyToken=08ab0f4d3c6ea37b" priority="2" group="0" />
          <add type="Microsoft.Web.Services2.WebServicesExtension, Microsoft.Web.Services2,Version=2.0.3.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35" priority="1" group="0" />
        </soapExtensionTypes>
      </webServices>
    </system.web>
  </location>



After the connector installation, you could verify the connector through web service. The URL is http://<servername>/<sites>/_vti_bin/sharepointConnector.asmx. You could view the wsdl by appending the ?wsdl as you normally do for all other SharePoint web services.


If you add a service account through central admin to have read access all the webapp site collection and pass only webapp root site collection URL, your have complete the SharePoint side installation and configuration. The Attivio crawling process will use the web service call GetSiteCollectionsUrls to retrieve all site collections inside the webapp and then call web services to index all content and metadata inside the site collection. After the first full crawling, the connector will use web service call GetChanges to index any future changes.


As SharePoint administrator, you may be concerned on the performance impact to the system especially on the first FULL crawling process. You should conduct the performance testing on the crawling process on non production environment and schedule this on non working hours in production.

Now, you should feel comfortable to manage the Attivio SharePoint 2010 connector installation, configuration, and support. We will focus on some of the issues in next blog.


Thursday, May 17, 2012

NextLabs’ Entitlement Manager Issue #5 - Users could workaround to display restricted lists through SharePoint 2010 RPC calls


We are evaluating NextLabs EntitlementManager to restrict the users belong to some security groups toaccess selected site collections with sensitive information even these users have been granted the permission through individual account, any AD groups, or email list groups. One of the test cases is to verify access permissions through SharePoint Foundation RPC Protocol (RPC) methods.

SharePoint Foundation RPC Protocol (RPC) methods that can be used in URL protocol to make HTTP GET requests. Although this may not be a very common method for end users, this is one of the security concerns. The test result is very promising since after we applied policy to deny acccess to the list or library, users will get error when he/she try to access the list or library through SharePoint RPD calls with only one cavity. Here is the set up and explination.

We have set up one site with one library named lib1 http://xnetsbx-sp/it/nextlab1/lib1 with two documents. The list GUID is E757FF25-7CE0-406C-991D-D078FB008B39.



We applied the deny access to both http://xnetsbx-sp/it/nextlab1/** and http://xnetsbx-sp/it/nextlab1/lib1/** from NextLabs.

We have set up several RPC cases based some references. We got deny access error when accesing following RPC calls with error message looks like this.


Here are some of the RPC test cases and the URL syntex embedded as link.
If you have the permission, you should have the following outputs.

1. Picture 1 - Exports the list as CAML

2. Picture 2 - Display list or library metadata

3. Picture 3 -  Open a view of the document library in a view


Everything seems perfect and contents are blocked after applying the deny access polcy from NextLabs until I accidently typied in one wrong URL to open a view of the document library in a view. Here is the issue.

If you type in the correct URL http://xnetsbx-sp/it/nextlab1/_vti_bin/owssvr.dll?dialogview=FileOpen&location=lib1, you will get error message. 

If you append some other parameters to the URL like this

The good thing is users still could not open the documents even they could view the library. I guess more testing need to be done.