How to use tokenization with Cloud DLP

our favorite catwalking company has been using cloud dlp to automatically redact and replace sensitive values from their data but what about data that they need to protect but still use for things like billing stick around and find out how cloud dlps got them covered [Music] [Applause] kitty cat walks has already implemented cloud dlp to apply redaction and replacement in their issue ticketing system so they don't inadvertently log any sensitive information from their customers their billing department on the other hand handles tons of credit card numbers they need to protect that information but also access those values to process payments for their premium catwalking service so while cloud dlp offers a bunch of options to help you hide and remove sensitive values in your data there's one technique that leverages encryption to de-identify and in some cases re-identify those sensitive values crypto based tokenization also referred to as pseudonymization is a powerful de-identification technique that uses encryption keys to replace sensitive data values with cryptographically generated tokens this method of de-identification is especially popular in industries like finance and healthcare where protecting data is a matter of utmost importance but preserving data utility is still desired cloud dlp supports three types of cryptographic methods first is deterministic encryption here the detected data is replaced with an encrypted value and prepended with an optional surrogate annotation this method supports most input types and is authenticated which makes it the most recommended tokenization solution the next method is format preserving encryption fpe like deterministic encryption fpe will replace the value of an encrypted string except it's going to be the same length and use the same character set as the original value this is the way to go if you need to retain support for any legacy data systems that have strict length character set requirements the third method is cryptographic hashing here dlp replaces sensitive data with a hashed value unlike the other two methods cryptographic hashing uses a one-way token so it can't be reversed it's the perfect solution if you want each unique value to be transformed into a corresponding unique hash value but don't necessarily want it to be reversible more on that later one huge factor that sets all three of these apart from most other de-identification techniques is that they allow us to securely de-identify data while retaining its referential integrity referential integrity allows records to maintain the relationship to one another even after being individually de-identified for example if we have this table of data with cryptographic hashing applied to one of its fields using the same cryptographic key each of the unique values will have a consistent transformed value this means that data can be de-identified for security or compliance reasons but still used for business operations and analytical workflows such as joining across tables or aggregating data so another powerful feature of tokenization is the ability to reverse the encryption to re-identify the data since deterministic encryption and format preserving encryption use a symmetric encryption key to tokenize data the same key can be used to essentially undo the de-identification to reveal the original value the option to re-identify expands utility of cloud dlp so that data can be securely obscured and readable only by systems authorized to access the encryption key and the output value let's see how the dlp api handles tokenization by looking at this request from a node.js app in this example we're going to apply format preserving encryption to the credit card number field of this sample csv data the outset to the variable mycsv this is the request object that will send to the de-identified content function first we want to specify the csv data that we want to scan in the de-identification configuration we'll indicate a few things here in the transformations array is where we indicate each of the transformations we'd like to apply in this case we just want to target the info type credit card number and provide an fpe configuration for format preserving encryption there we provide a wrapped key which i've contained in the variable my wrap key and the cloud kms key we used to wrap it in the form of a kms resource name that i threw into the variable my key name we'll also specify that we want to transform our data using a numeric alphabet and provide a surrogate info type string that will be appended to each value the response of this function will include the data we provided in the request except the credit card numbers replaced with an encrypted string and prepended the surrogate info type we specified as you can see each unique value in our original data remains unique after transformation with its referential integrity intact in a larger data set we'd still be able to analyze the data without accessing the sensitive values so how about re-identification the re-identify content function takes this request object it looks similar to the de-identified content request except we'll be passing our de-identified data as long as the surrogate info type and the encryption key we provide matches what we use to tokenize our data the result will be our original data with the sensitive values revealed now that kitty catwalks have added crypto based tokenization to their dlp know-how their security and compliance teams can sleep soundly knowing that their customer billing information is well protected on their systems next time we'll take what they've learned so far and see how they'll use it to construct dlp templates so that they can reuse inspection and de-identification rules across different systems and processes see you [Music] soon [Music] you

You May Also Like