Shibboleth Identity Provider And User Privacy
CESNET technical report 17/2010
Ivan Novakov
Received 1. 12. 2010
Abstract
This report deals with several topics concerning user data privacy in Shibboleth Identity Provider. It describes the functionality of SAML name identifiers and their implementation in Shibboleth, explains attribute release policies configuration and brings some best practices about creating them.
Keywords: Shibboleth, IDP, SAML, privacy, attributes, attribute release policy, name identifiers
1 Introduction
The identity provider is an entity that provides user authentication and user data to applications and resources that require it. This process is executed in a federated environment, a single sign-on system or any other case of distributed authentication and authorization infrastructure. In any case that means, that some information about the user is given away to other entities outside the user’s home organisation. That is why it is so important that the identity provider is configured properly and has control over what information is released, to whom it is released and under what conditions user data are provided.
For example, if an identity provider is a part of a federation with well defined policies, there is a certain level of assurance about how service providers treat user data coming from the identity providers. In that case the identity provider relies on the inter-organisational agreements associated with the federation and knows what user data is safe to release.
The same identity provider may become a member of another federation or take part in a bilateral agreement with a third party service provider. The new agreements bring also different level of assurance, so the identity provider should revise its release policies accordingly to protect user data.
The article covers several topics concerning user data privacy in Shibboleth Identity Provider. It describes the functionality of SAML name identifiers and their implementation in Shibboleth, explains attribute release policies configuration and brings some best practices about creating them.
The following “constants” are used in the article:
IDP_SRC- the directory where the Shibboleth Identity Provider sources are locatedIDP_HOME- the directory where Shibboleth Identity Provider is installed
2 User Identity
2.1 Digital identity
User identity is generally a piece of information that allows a user to be identified in a given scope. It can be a name, an email, a driver’s license number, fingerprints etc. A user may have various pieces of personally identifiable information [1] associated with him. They are usually referred as user attributes.
User identity fundamentally requires a user identifier - an attribute which is unique in a given scope [2] – a domain, community, application, federation, etc. Such identifier allows the user to be unambiguously identified in the given scope.
2.2 SAML Name Identifiers
In SAML [3] such identifier is referred as a name identifier and is used to identify the person the IdP has issued an assertion about [4]. Name identifiers come in different formats, which have different characteristics. All formats have a scope property, that defines which IdP provides the identifier.
Name identifiers may be transient or persistent. Transient name identifiers are valid only for a brief period of time (e. g. 5 minutes). Persistent name identifiers on the other side are valid for a long period of time (e. g. years).
Another important characteristic is transparency. Transparent name identifiers contain information, that can identify the user. For example, the email address is such an identifier. Opaque name identifiers on the contrary cannot be used to determine user’s identity. They are often represented by a hashed string or UUID value.
If a name identifier is referred as targeted, it is designated for a specific relying party or relying party group. That means, different relying parties receive different name identifiers for the same user, which reduces the risk of correlation attacks.
A name identifier may be also reversible – then the IdP is able to translate the identifier back to the precise user within the lifetime of the identifier. Reversibility is essential mostly for supporting back-channel queries, but is not strictly needed for a one-way communication path, which is used by default by SAML 2 service providers.
2.3 Shibboleth IdP Name Identifiers
Shibboleth Identity Provider 2.x introduces two built-in name identifier formats, which may cover most of the use cases:
- transient name identifier – short term, opaque, reversible
- persistent name identifier – long term, opaque, reversible, targeted
2.3.1 Transient Name Identifier
The transient name identifier is used by default and is sufficient in most cases. It does not reveal the identity of the user but, if required, it allows retroactive user identification from the log. It is defined as an AttributeDefinition element of the type TransientId (IDP_HOME/attribute-resolver.xml):
<resolver:AttributeDefinition id="transientId" xsi:type="ad:TransientId">
<resolver:AttributeEncoder
xsi:type="enc:SAML1StringNameIdentifier"
nameFormat="urn:mace:shibboleth:1.0:nameIdentifier"/>
<resolver:AttributeEncoder xsi:type="enc:SAML2StringNameID"
nameFormat="urn:oasis:names:tc:SAML:2.0:nameid-format:transient"/>
</resolver:AttributeDefinition>
2.3.2 Persistent Name Identifier
The persistent name identifier lasts for a long time, which allows the IdP to provide some advanced features. For example, the IdP can be used as an attribute authority – it can release attributes in response to queries from service providers independently of the user. That also allows the service providers to regularly check if the user still has all attributes, that are required for the service and eventually get rid of the inactive accounts (so called account checking [7]).
Since the persistent name identifier is valid for a long time, it should be targeted, e. g. each service provider should receive a different value. It should be also revokable – there may be situations, which require the value of the name identifier to be changed.
Maintaining and releasing targeted and revokable name identifiers is a complex task, which does not come out of the box. Actually, there exists a simplified solution – the ComputedId data connector – but it is considered deprecated for its lack of flexibility and reversibility [5].
The most straightforward solution is to use a stored ID data connector [6]. The connector creates and persists unique identifiers for each user/IdP/SP combination. The first generated identifier is similar to those generated by the computed ID data connector for backward compatibility. Every subsequently generated identifier is in the Type 4 UUID format.
The stored ID data connector uses SQL database to store generated identifiers. The following table should be created in the database (considering MySQL):
CREATE TABLE `shibpid` ( `localEntity` varchar(255) NOT NULL, `peerEntity` varchar(255) NOT NULL, `principalName` varchar(255) NOT NULL, `localId` varchar(255) NOT NULL, `persistentId` varchar(255) NOT NULL, `peerProvidedId` varchar(255) default NULL, `creationDate` timestamp NOT NULL, `deactivationDate` timestamp NULL default NULL, KEY `pid_idx1` (`persistentId`), KEY `pid_idx2` (`persistentId`,`deactivationDate`), KEY `pid_idx3` (`localEntity`,`peerEntity`,`localId`), KEY `pid_idx4` (`localEntity`,`peerEntity`,`localId`,`deactivationDate`) )
The stored ID data connector is a common DataConnector element with type StoredId and with a unique id attribute. The generatedAttributeID is the name of the attribute produced by this connector. The sourceAttributeID is the ID of the attribute, whose first value will be used for the identifier generation, typically it is the principal name (uid, eduPersonPrincipalName). The salt attribute represents a string of random data (at least 16 characters). The salt is used in the process of name identifier generation, so it is extremely important to backup its value somewhere else.
The connector element requires exactly one dependency element referencing the value of the sourceAttributeID attribute.
The ApplicationManagedConnection element contains attributes defining the connection to the SQL database holding the generated identifiers. In order to use the database, the corresponding JDBC driver must be placed in the IDP_SRC/lib directory and the Shibboleth IdP installation script must be re-run to export a new WAR file (the servlet container should be then restarted).
<!-- Stored ID connector -->
<resolver:DataConnector xsi:type="dc:StoredId"
id="myStoredId"
generatedAttributeID="persistentId"
sourceAttributeID="uid"
salt="put-in-random-string-here">
<resolver:Dependency ref="uid"/>
<dc:ApplicationManagedConnection
jdbcDriver="com.mysql.jdbc.Driver"
jdbcURL="jdbc:mysql://localhost:3306/storedid?autoReconnect=true"
jdbcUserName="dbuser"
jdbcPassword="dbpass"/>
</resolver:DataConnector>
The values supplied by the stored ID data connector are then released as common attributes. An AttributeDefinition element of the type Simple should be created and the proper attribute encoders for SAML1 and SAML2 name identifiers should be attached to it:
<resolver:AttributeDefinition id="persistentId" xsi:type="ad:Simple">
<resolver:Dependency ref="myStoredId"/>
<resolver:AttributeEncoder xsi:type="enc:SAML1StringNameIdentifier"
nameFormat="urn:oasis:names:tc:SAML:2.0:nameid-format:persistent"/>
<resolver:AttributeEncoder xsi:type="enc:SAML2StringNameID"
nameFormat="urn:oasis:names:tc:SAML:2.0:nameid-format:persistent"/>
</resolver:AttributeDefinition>
An identity provider may produce more than one type of name identifiers at the same time. Which one is actually used may depend on the request from the service provider. Shibboleth Service Provider 2.3 and above may specify the required name identifier within the SAML request. If the SP has not requested a particular name identifier format, the IdP then searches for supported formats in the SP’s metadata. If more than one formats are supported, the IdP just randomly chooses one [8].
For backward compatibility reasons or if the IdP is stuck to the transient name identifier for some reason, the persistent ID value may be released under the eduPersonTargetedId attribute:
<resolver:AttributeDefinition xsi:type="ad:SAML2NameID"
id="eduPersonTargetedID"
nameIdFormat="urn:oasis:names:tc:SAML:2.0:nameid-format:persistent"
sourceAttributeID="persistentId">
<resolver:Dependency ref="persistentId"/>
<resolver:AttributeEncoder xsi:type="enc:SAML1XMLObject"
name="urn:oid:1.3.6.1.4.1.5923.1.1.1.10"/>
<resolver:AttributeEncoder xsi:type="enc:SAML2XMLObject"
name="urn:oid:1.3.6.1.4.1.5923.1.1.1.10"
friendlyName="eduPersonTargetedID"/>
</resolver:AttributeDefinition>
3 User Attributes
3.1 What Is Released By The IdP
The released user attributes may contain various types of information:
- preferences – for example, the preferred language
- authorization – group membership, entitlements
- personal information – name, email, address, phone number
Preferences may affect the functionality of the application. Authorization data define, what the user is allowed to do in the application. Personal data contain personally identifiable information [1], which includes contact information (email, phone number) or even attributes directly linked to the user’s identity (name, email, personal ID). Such sensitive information should be released very carefully and should not be misused for other purposes.
For example, it has been common practice in the past to provide user identity persistence through the eduPersonPrincipalName attribute. The use case is simple – the application needs to track the user, so when he comes back, the application knows it is the same user. There is no reason to use an attribute carrying personal information for that purpose. The use of the persistent name identifier or the eduPersonTargetedId attribute is more appropriate. The eduPersonPrincipalName attribute itself usually contains one of the most sensitive pieces of information – the username of the user’s account, which may eventually lead to disclosure of other personal information, such as his name or email.
3.2 Attribute Release Policies
In general, the IdP should release only the minimum information needed by the applications to work properly. This is achieved through setting the appropriate attribute release policies (ARP). An ARP describes which attributes are sent to a service provider depending on various conditions [9]. The default ARP file is IDP_HOME/conf/attribute-filter.xml.
3.2.1 Basic Configuration
A policy is defined in an AttributeFilterPolicy element. Each policy must have a unique id attribute defined. Each policy must contain exactly one policy requirement rule – a condition which determines, if the the policy is active for the current request. If the condition is evaluated to true, the policy is active, otherwise it is not active.
A policy requirement rule is defined in a PolicyRequirementRule element. The type of the rule may be any of the supported matching rule types [9]. The matching rule types include options to match specific relying party or a group of relying parties (federation). It is also possible to combine rules in expressions using logical operations (AND, OR, NOT).
The attribute policy further contains zero or more attribute rules. Basically, it is a list of attributes affected by the policy, if the policy is evaluated as active. An attribute rule is defined in a AttributeRule element with an attributeID attribute, which contains the case-sensitive attribute ID as assigned in the attribute resolver (IDP_HOME/conf/attribute-resolver.xml).
Each attribute rule contains exactly one permit or deny value rule. A permit value is defined with the PermitValueRule element and specifies which attribute values are allowed for release. A deny value rule is defined with the DenyValueRule element. A value is released only if it has been permitted and not denied. Each permit or deny value rule must have one of the supported matching rule types, which include options to match strings, scoped strings, regular expressions, etc. [9]
In the following example only the specified values (faculty, student, staff, alum, member, affiliate, employee, library-walk-in) of the eduPersonAffiliation attribute are released to the service provider with the required entityID (https://service.example1.edu/shibboleth-sp or urn:example:org:sp:foo):
<afp:AttributeFilterPolicy id=”affiliationPolicy”>
<afp:PolicyRequirementRule xsi:type="basic:OR">
<basic:Rule xsi:type="saml:AttributeRequesterString"
value="https://service.example1.edu/shibboleth-sp"/>
<basic:Rule xsi:type="basic:AttributeRequesterString"
value="urn:example:org:sp:foo"/>
</afp:PolicyRequirementRule>
<afp:AttributeRule attributeID="eduPersonAffiliation">
<afp:PermitValueRule xsi:type="basic:OR">
<basic:Rule xsi:type="basic:AttributeValueString" value="faculty"
ignoreCase="true"/>
<basic:Rule xsi:type="basic:AttributeValueString" value="student"
ignoreCase="true"/>
<basic:Rule xsi:type="basic:AttributeValueString" value="staff"
ignoreCase="true"/>
<basic:Rule xsi:type="basic:AttributeValueString" value="alum"
ignoreCase="true"/>
<basic:Rule xsi:type="basic:AttributeValueString" value="member"
ignoreCase="true"/ >
<basic:Rule xsi:type="basic:AttributeValueString" value="affiliate"
ignoreCase="true"/>
<basic:Rule xsi:type="basic:AttributeValueString" value="employee"
ignoreCase="true"/>
<basic:Rule xsi:type="basic:AttributeValueString"
value="library-walk-in" ignoreCase="true"/>
</afp:PermitValueRule>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>
3.2.2 Building An Attribute Release Policy
If we are building ARP from scratch, the best way is to begin with no policies at all. That means - nothing is allowed to be released. In our first policy we define, what attributes are released to anyone. Such attribute may be the transientId attribute hodling the transient name identifier. The policy requirement rule type is set to ANY, as well as the permit value rule:
<!-- Release to anyone -->
<afp:AttributeFilterPolicy id="releaseToAnyone">
<afp:PolicyRequirementRule xsi:type="basic:ANY"/>
<afp:AttributeRule attributeID="transientId">
<afp:PermitValueRule xsi:type="basic:ANY"/>
</afp:AttributeRule>
</afp:AttributeFilterPolicy>
Second, we define policies for every federation our IdP is member of. The policy requirement rule type must be AttributeRequesterInEntityGroup and the value must contain the name of the entity group (the Name attribute of the EntitiesDescriptor element in the metadata).
Then we add attribute rules for the corresponding attributes we are willing to release. The selection of attributes released to each federation usually depends on the respective federation policy. Generally, the list should contain only the mandatory attributes required by the federation.
<!-- Release to myFed -->
<afp:AttributeFilterPolicy id="releaseToMyFed">
<afp:PolicyRequirementRule
xsi:type="basic:AttributeRequesterInEntityGroup"
value=”https://myfed.com/metadata”/>
<!-- attribute rules -->
</afp:AttributeFilterPolicy>
Finally, we define policies for specific service providers, which require extra attributes. If a service provider requires sensitive data, a proper agreement should exist, to make sure the personal data are treated properly. The policy requirement rule type must be AttributeRequesterString and the value attribute must contain the entityID of the service provider:
<!-- Release to myService -->
<AttributeFilterPolicy id="releaseToMyService">
<PolicyRequirementRule xsi:type="basic:AttributeRequesterString"
value="https://my.service.com/shibboleth"/>
<AttributeRule attributeID="userUniqId">
<PermitValueRule xsi:type="basic:ANY"/>
</AttributeRule>
<AttributeRule attributeID="affiliation">
<PermitValueRule xsi:type="basic:ANY"/>
</AttributeRule>
</AttributeFilterPolicy>
Shibboleth IdP supports various other options for defining advanced policy rules. It is even possible to evaluate a script to determine, if the rule is true or false. But the basic policies a deployer should start with are those shown above – what to release to anyone, what to release to a federation as a whole, what to release to single relying parties requiring additional attributes.
4 Conclusion
Every administrator should be concerned about the safety of sensitive personal information provided by his IdP. Such information should be released to trusted entities only and only when those entities really need it. Identity persistence should not be provided through attributes like eduPersonPrincipalName. Persistent name identifiers or the eduPersonTargetedId attribute should be used instead. Detailed attribute release policies should be defined if necessary. The IdP should always release the minimum set of attributes possible.
References
| [1] | Personally identifiable information. In Wikipedia, The Free Encyclopedia. [cit. 2010-12-01]. Available online. |
| [2] | Digital identity. In Wikipedia, The Free Encyclopedia. [cit. 2010-12-01]. Available online. |
| [3] | Security Assertion Markup Language. In Wikipedia, The Free Encyclopedia. [cit. 2010-12-01]. Available online. |
| [4] | Name Identifiers. In Internet 2 Wiki. [cit. 2010-12-01]. Available online. |
| [5] | Computed ID Data Connector. In Internet 2 Wiki., [cit. 2010-12-01]. Available online. |
| [6] | Stored ID Data Connector. In Internet 2 Wiki. [cit. 2010-12-01]. Available online. |
| [7] | User Account Checking Using a (persistent) NameID. In Internet 2 Wiki. [cit. 2010-12-01]. Available online. |
| [8] | Supporting a new Name Identifier. In Internet 2 Wiki. [cit. 2010-12-01]. Available online. |
| [9] | Define a New Attribute Filter. In Internet 2 Wiki. [cit. 2010-12-01]. Available online. |