Oracle GoldenGate replication + AES encryption on the fly – part 1 of 2

Let’s consider a poor man’s data encryption in the database. We’ll use AES128 CBC algorithm to encrypt the data. The application is written in such way that it can encrypt/decrypt the data while doing database operations. This solution is license free, works with no additional database licenses. A new requirement has arrived – the data must be replicated to the target database, which does not use any encryption. The replication and decryption would be performed by Oracle GoldenGate. Let’s see how this can be done and what is the overhead of the decryption process.

As told before, we can’t afford Oracle TDE licenses to use encryption in the database. They are too expensive. We have to encrypt the data by our self. The data will be decrypted during data replication by the Replicat proces. The data will be decrypted in the target database. This solution is good when the source database is in cloud (the data is encrypted) but the target database is on-premise.

The key question is how to decrypt the data during replication by OGG. There are actually two options. Either the decryption can be done using some external SQL calls (SQLEXEC) or using User Exit Function (custom C code).

Let’s test the following 3 scenarios:

  1. Plain text replication without decryption (to compare the speed),
  2. Decryption using Oracle DBMS_CRYPTO function called with SQLEXEC,
  3. Decryption using custom C library using User Exit Function.

1. Environment

The database version used in this test is Oracle 12.1.0.2, and the OGG version is 12.2.0.1.170221.

The data is encrypted using classic AES128 CBC algorithm. In this test only VARCHAR2 data is going to be encrypted. The same technology should also work for other data types. The data is encrypted/decrypted using following PL/SQL functions:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
--
-- DECRYPT (Function)
--
CREATE OR REPLACE FUNCTION GGUSER.decrypt128(ciphertext IN RAW, p_key IN RAW) RETURN VARCHAR2 IS
  plaintext varchar2(8000);
  l_mod NUMBER := dbms_crypto.ENCRYPT_AES128 + dbms_crypto.CHAIN_CBC + dbms_crypto.PAD_PKCS5;
BEGIN
  plaintext := UTL_I18N.RAW_TO_CHAR(dbms_crypto.decrypt(ciphertext, l_mod, p_key), 'AL32UTF8');
  RETURN plaintext;
END;
/
 
--
-- ENCRYPT (Function)
--
CREATE OR REPLACE FUNCTION GGUSER.encrypt128(plaintext IN varchar2, p_key IN raw) RETURN raw IS
  ciphertext raw(4000);
  l_mod NUMBER := dbms_crypto.ENCRYPT_AES128 + dbms_crypto.CHAIN_CBC + dbms_crypto.PAD_PKCS5;
BEGIN
  ciphertext := dbms_crypto.encrypt(UTL_I18N.STRING_TO_RAW(plaintext, 'AL32UTF8'), l_mod, p_key);
  RETURN ciphertext;
END;
/

A simple test shows that the functions are working correct:

1
2
3
4
5
6
7
8
9
10
11
SQL> SELECT GGUSER.encrypt128('text', '123456789ABCDEF0123456789ABCDEF0') FROM dual;
 
GGUSER.ENCRYPT('TEKST','123456789ABCDEF0123456789ABCDEF0
--------------------------------------------------------
31316BED58349EBE609D2FDE7B939366
 
SQL> select GGUSER.decrypt128('31316BED58349EBE609D2FDE7B939366', '123456789ABCDEF0123456789ABCDEF0') from dual;
 
GGUSER.DECRYPT('31316BED58349EBE609D2FDE7B939366','123456789ABCDEF0123456789ABCDEF0
-----------------------------------------------------------------------------------
text

Table schema for the test is:

1
2
3
4
5
CREATE TABLE jan.TESTDATA (
  test_id      NUMERIC       NOT NULL,
  DATA         varchar2(100) NOT NULL
);
ALTER TABLE jan.TESTDATA ADD CONSTRAINT TESTDATA_PK PRIMARY KEY(test_id);

Both tests are performed on 2 hosts:

  • Host 1: Intel x86_64, Linux, SSD disk – for Oracle GoldenGate
  • Host 2: Intel x86_64, Linux, SSD disk – for Oracle target Database

Both hosts are both connected with 1Gbps direct connection. The ping is 0.5ms.

2. Test scenarios

In the test scenario is:

  1. Suspend the Replicat process
  2. Run 4 times 1M INSERT’s in 10k transactions on the source database
  3. Wait till the whole data gets into the trail used by the Replicat process
  4. Start the Replicat process
  5. Calculate the time how long takes to apply the data from the trails to the target database

Options I have used are:

  • BATCHSQL
  • INTEGRATED REPLICAT

The test scenario is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
SELECT GGUSER.encrypt128('Hello world!', '123456789ABCDEF0123456789ABCDEF0') AS encrypted FROM dual;
-> 5DDA280C07C157843F196E83F21240E8
 
SELECT GGUSER.encrypt128('Rumor on the internets!', '123456789ABCDEF0123456789ABCDEF0') AS encrypted FROM dual;
-> 46A743880FA5B24CA2771426A0C37B420853A25D929C09058C8FA2C25A84A61B
 
SELECT GGUSER.encrypt128('The quick brown fox jumps over the lazy dog!', '123456789ABCDEF0123456789ABCDEF0') AS encrypted FROM dual;
-> B4655970B97C1D1C0D7123CB8C4404CC6B0CCB6F856AF3EB168D83DC6396625E68E40A8AB6BA62B505D44D865AF0DD49
 
SELECT GGUSER.encrypt128('The quick brown fox jumps over the lazy dog and runs away!', '123456789ABCDEF0123456789ABCDEF0') AS encrypted FROM dual;
-> B4655970B97C1D1C0D7123CB8C4404CC6B0CCB6F856AF3EB168D83DC6396625EAC0F57A4C9D8B0C77EA19189831BFC5ABF35D958689250F04DD012563D50D25C
 
 
INSERT INTO jan.MARKER SELECT sysdate FROM dual;
commit;
BEGIN
    FOR i IN 1..100
    LOOP
        FOR j IN 1..10000
        LOOP
            INSERT INTO jan.TESTDATA VALUES (jan.TESTDATA_SEQ1.nextval, '5DDA280C07C157843F196E83F21240E8');
        END LOOP;
        commit;
   END LOOP;
END;
/
INSERT INTO jan.MARKER SELECT sysdate FROM dual;
commit;
BEGIN
    FOR i IN 1..100
    LOOP
        FOR j IN 1..10000
        LOOP
            INSERT INTO jan.TESTDATA VALUES (jan.TESTDATA_SEQ1.nextval, '46A743880FA5B24CA2771426A0C37B420853A25D929C09058C8FA2C25A84A61B');
        END LOOP;
        commit;
   END LOOP;
END;
/
INSERT INTO jan.MARKER SELECT sysdate FROM dual;
commit;
BEGIN
    FOR i IN 1..100
    LOOP
        FOR j IN 1..10000
        LOOP
            INSERT INTO jan.TESTDATA VALUES (jan.TESTDATA_SEQ1.nextval, 'B4655970B97C1D1C0D7123CB8C4404CC6B0CCB6F856AF3EB168D83DC6396625E68E40A8AB6BA62B505D44D865AF0DD49');
        END LOOP;
        commit;
   END LOOP;
END;
/
INSERT INTO jan.MARKER SELECT sysdate FROM dual;
commit;
BEGIN
    FOR i IN 1..100
    LOOP
        FOR j IN 1..10000
        LOOP
            INSERT INTO jan.TESTDATA VALUES (jan.TESTDATA_SEQ1.nextval, 'B4655970B97C1D1C0D7123CB8C4404CC6B0CCB6F856AF3EB168D83DC6396625EAC0F57A4C9D8B0C77EA19189831BFC5ABF35D958689250F04DD012563D50D25C');
        END LOOP;
        commit;
   END LOOP;
END;
/
INSERT INTO jan.MARKER SELECT sysdate FROM dual;
commit;

In the end I will calculate the average speed of data replication of 1M INSERT commands.

3. Test #1 – no decryption

During no decryption the Replicat code is simple – the data on the target would be the same as on the source:

1
MAP jan.TESTDATA, TARGET jan.TESTDATA;

4. Test #2 – Decryption using DBMS_CRYPT and SQLEXEC call

To decrypt the data using DBMS_CRYPTO in the target database an SQLEXEC call has to be made:

1
2
3
4
MAP jan.TESTDATA, &
  TARGET jan.TESTDATA, &
  SQLEXEC (ID DECRYPT, QUERY "select GGUSER.decrypt(:data, :key) as decrypted from dual", PARAMS (DATA = DATA, KEY = '123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0')), &
  COLMAP (USEDEFAULTS, DATA = @GETVAL(DECRYPT.decrypted));

5. Test #3 – Decryption using C code and User Exit Function

The last scenario is using my custom code of C function that is decrypting the data:

1
2
CUSEREXIT ueAES128CBC.so CUSEREXIT
MAP jan.TESTDATA, TARGET jan.TESTDATA;

I will not publish the code of the program since it is not finished and does not handle all the possible errors. The code is simply waiting for EXIT_CALL_PROCESS_RECORD event and when this event is called it checks if the appropriate table and column name appears. If it matches, the data is decrypted and returned to the Replicat process. No additional external calls are being made. No network calls are done. Everything is executed on the OGG host.

6. Results

As one might expect, the test 2 is heavy dependent on network latency. If everything sits on hosts then it might work fast. But if OGG is on a separate host than the database – then every call to decrypt a value will require a round trip and is very dependent on network latency.

Results:

No Test Time Time relative to baseline
1. No decryption 19s 1.00x
2. SQLEXEC and call to DBMS_CRYPTO on the target database 19 m. 09s 60.47x
3. Custom C code and User Exit Function 16s 0.84x

The first obvious conclusion is that running SQLEXEC through the network is horribly slow. It is unacceptable as a solution for replicating large amount of data as the overhead is too big.

What is more surprising, the cost of running a user exit causes a smaller row to be inserted into the database. This is faster then just inserting encrypted data in VARCHAR2. Maybe binary encrypted data would be faster (stored in VARBINARY, not VARVHAR). I would also like to add that the time of encryption is very short in comparison to the time of data processing by the database. Using another encryption algorithm like AES256 would make very little difference on the time of the replication.

This concludes the first part of encryption tests. To be continued.

Leave a Comment

Your email address will not be published. Required fields are marked *