Back in the future 2014

Our customer reports an Oracle error about wrong flashback date in expdp command…

In the prod. environment all metadata will be extracted nightly as backup by oracle expdp tool. The flashback time is sysdate.
In the moment the process is started, time stamp will be remembered by the expdp tool.
Before the expdp really starts the extract, all syntax, consistence, etc. checks will be done. It takes some time.
In this time oracle/UNIX server moves to new time, one hour earlier.
The expdp gets a command to export data with time stamp ca. one hour in the future. Oracle has to be a prophet, but generates Error message.
I can predict the situation to come again in ca. 365 days šŸ™‚

Lessons learned DataPump

As a developer you never know, how powerful and dangerous is the datapump.
During the expdp you will never destroy anything. If something is missing, Oracle will tell you gently.
But the import is very very powerful and you can easily destroy the instance especially if you use the sys user for the import.
Because of it the best you can do for your system – backup it.
The errors I have seen are caused by the option TABLE_EXISTS_ACTION. TRUNCATE or REPLACE values are the dangerous ones, dependent what are you doing withe database.
First scenario – export user schema and import in the other instance the schema doesn’t exist. You just copy/paste the export command and edit it. Just forget to change the SID. Whoops the existing schema was overwritten. The data inserted in the time is the first question. Really big problem is if you import metadata only. In this case the whole data is away. The backup is really helpful.
The other scenario: full database export as backup, import data only with truncate option as TABLE_EXISTS_ACTION. Start as sys. Can you follow me? The sys objects truncated…
Backup is helpful.
Summarize: do backup before import. You never know which mistake will be the next.
And even better use RMAN backup as permanent solution…

SYSAUX Tablespace maintenance 2

https://deputydba.wordpress.com/2014/09/10/sysaux-tablespace-maintenance/

I have got the results of the installed patch.

The patch works exactly in the way the not documented workaround works:

Oracle splits the existing partition, the new partition created empty, only the initial extent allocated. No data movement will be initiated. The whole task will take less than a second. Ne data will be stored in new partition. The old partition will be obsolete after retention period. Standard maintenance job will drop it after 8 retention period. The weekend maintenance window not needed, just coincidence in my first test.

The results of the bugfix: Oracle automatically splits partitions as described above, partitions will be dropped.

 

 

THEDATE GBSIZE PREV_GBSIZE DIFF
09.09.2014
2,02
   
10.09.2014
2,02
2,02
0
11.09.2014
2,03
2,02
0,01
12.09.2014
2,04
2,03
0,01
13.09.2014
2,05
2,04
0,01
14.09.2014
2,06
2,05
0,01
15.09.2014
2,07
2,06
0,01
16.09.2014
0,41
2,07
-1,66

SYSAUX tablespace maintenance

Based on Metalink 387914.1 entry:

Problem description

The SYSAUX tablespace in our IPCC environments are growing permanently. The analyse of the problem shows, that the AWR snapshots are deleted after retention period as designed. Problematic are the history tables, WRH$_ACTIVE_SESSION_HISTORY and WRH$_EVENT_HISTOGRAM, also other. The older snapshots history will be kept by oracle. The data will be stored in only one single partition. The partition will be split after the partition size achieved the threshold, what is never happened in our environment.

The reason of the problem described by oracle:

Oracle decides what rows need to be purged based on the retention policy. There is a special mechanism which is used in the case of the large AWR tables where we store the snapshot data in partitions. One method of purging data from these tables is by removing partitions that only contain rows that have exceeded the retention criteria. During the nightly purge task, we only drop the partition if all the data in the partition has expired. If the partition contains at least one row which, according to the retention policy shouldn’t be removed, then the partition won’t be dropped and as such the table will contain old data.

If partition splits do not occur (for whatever reason), then we can end up with a situation where we have to wait for the latest entries to expire before the partition that they sit in can be removed. This can mean that some of the older entries can be retained significantly past their expiry date. The result of this is that the data is not purged as expected.

Solutions

The workaround solution suggested by oracle, not documented officially

A potential solution to this issue is to manually split the partitions of the partitioned AWR objects such that there is more chance of the split partition being purged.You will still have to wait for all the rows in the new partitions to reach their retention time but with split partitions there is more chance of this happening. you can manually split the partitions using the following undocumented command:
alter session set “_swrf_test_action” = 72;
to perform a single split of all the AWR partitions.

Test in our environment:

SELECT owner,
  segment_name,
  partition_name,
  segment_type,
  bytes/1024/1024/1024 Size_GB
FROM dba_segments
WHERE segment_name='WRH$_ACTIVE_SESSION_HISTORY';

SEGMENT_NAME                                                                    
------------------------------------------------------------
PARTITION_NAME                 SEGMENT_TYPE          SIZE_GB
------------------------------ ------------------ ----------
WRH$_ACTIVE_SESSION_HISTORY                                                     
WRH$_ACTIVE_3551044952_19447   TABLE PARTITION    .010742188
                                                                                
WRH$_ACTIVE_SESSION_HISTORY                                                     
WRH$_ACTIVE_SES_MXDB_MXSN      TABLE PARTITION    6.1035E-05
                                                                                

2 rows selected.

The table has only two partitions: maxvalue and single data partition containing all the data.

alter session set “_swrf_test_action” = 72;
The execution can take few minutes. After the execution of the previous select, we can see, that one more partition has been created:

SEGMENT_NAME                                                                    
-------------
PARTITION_NAME                 SEGMENT_TYPE          SIZE_GB
--------------------           -------------         -------
WRH$_ACTIVE_SESSION_HISTORY                                                     
WRH$_ACTIVE_3551044952_19447   TABLE PARTITION    .010742188
                                                                                
WRH$_ACTIVE_SESSION_HISTORY                                                     
WRH$_ACTIVE_3551044952_25147   TABLE PARTITION    6.1035E-05
                                                                                
WRH$_ACTIVE_SESSION_HISTORY                                                     
WRH$_ACTIVE_SES_MXDB_MXSN      TABLE PARTITION    6.1035E-05
                                                                                

3 rows selected.

Note that this command will split partitions for ALL partitioned AWR objects. It also initiates a single split; it does not need to be disabled and will need to be repeated if multiple splits are required.

After the partition split we can pray, that the partitions will be aged out and dropped by the nightly job.

The workaround solution suggested by oracle, documented officially

We can realize, that in the history tables are much more older snapshots that in snapshot tables. In the actual tables:

select  count(*), max(snap_id), min(snap_id) from DBA_HIST_SNAPSHOT  order by end_interval_time asc;


  COUNT(*) MAX(SNAP_ID) MIN(SNAP_ID)
---------- ------------ ------------
       180        25144        24965

In the history tables:

select count(distinct(snap_id)), max(snap_id), min(snap_id) from WRH$_ACTIVE_SESSION_HISTORY  ;

COUNT(DISTINCT(SNAP_ID)) MAX(SNAP_ID) MIN(SNAP_ID)
------------------------ ------------ ------------
                    5674        25144        19447
1 row selected.

To get read of the old entries we use DBMS Package:

begin
  DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE( 
  low_snap_id => 19447, 
  high_snap_id => 24964 
 );
end;
/

After ca. 32 Minutes processing, the result is:

select count(distinct(snap_id)), max(snap_id), min(snap_id) from WRH$_ACTIVE_SESSION_HISTORY  ;

COUNT(DISTINCT(SNAP_ID)) MAX(SNAP_ID) MIN(SNAP_ID)
------------------------ ------------ ------------
                     180        25145        24965
1 row selected.

For this solution we have to know:

* The old entries will be deleted in ALL AWR tables.
* The space will be not freed because of the High Water Mark
* The growth of the tablespace will be stopped for a while

The oracle permanent bugfix, documented officially

The problem has been reported as a bug in Oracle Metalink. And there is a interim patch for a Linux Oracle version.
The patch number is 14084247 and can be found in Metalink.
The test follows.

Results

# The both options have to be repeated after the reasonable period.
# They can be combined, but haven’t to

The workaround with splitting partition

The workaround with splitting partition works fine. The partitions will be dropped in the weekend maintenance window on friday in the evening.
The test databases results look like follows:

Y database:

THEDATE GBSIZE PREV_GBSIZE DIFF
02.09.2014
1,86
03.09.2014
1,86
1,86
0
04.09.2014
1,87
1,86
0,01
05.09.2014
0,5
1,87
-1,37
06.09.2014
0,5
0,5
0
07.09.2014
0,51
0,5
0,01
08.09.2014
0,52
0,51
0,01
09.09.2014
0,52
0,52
0

Database X:

THEDATE GBSIZE PREV_GBSIZE DIFF
02.09.2014
1,91
03.09.2014
1,91
1,91
0
04.09.2014
1,92
1,91
0,01
05.09.2014
0,39
1,92
-1,53
06.09.2014
0,4
0,39
0,01
07.09.2014
0,41
0,4
0,01
08.09.2014
0,42
0,41
0,01
09.09.2014
0,42
0,42
0

Workaround with deleting data

After the stored procedure has been started for a while the SYSAUX tablespace doesn’t grow. It was tested on Y database:

THEDATE GBSIZE PREV_GBSIZE DIFF
01.09.2014
1,99
02.09.2014
1,99
1,99
0
03.09.2014
2,01
1,99
0,02
04.09.2014
2,01
2,01
0
05.09.2014
2,01
2,01
0
06.09.2014
2,01
2,01
0
07.09.2014
2,01
2,01
0
08.09.2014
2,01
2,01
0

Permanent Patch

The test results after the weekend maintenance window will come.

Extension of the existing mounted filesystem

The problem

Oracle EM Grid control sent alert, because the file system is almost full. At the time, I had a look on two mount points we had 3% and 4% free space. And couple of self extendable data files. In every moment the system can crash.

Solution

Extend existing file system (ext3).

Technical environment

MSA2312fc Ā Fiberglas array with 10 HDD connected to one 1.5 TByte Ā logical volume. RedHat Linux 5.8. Oracle 11.2 on RedHat cluster.

First step

First part is very specialized and I will just mention it without details. We use HP Storage Management Utility. With this utility I gave 15 Gig extra to every virtual disk.

On the linux side

  • To make the size changes visible on linux, we have to rescan the array. We have special HP tool:
# hp_rescan -a
Rescanning /sys/class/scsi_host/host2
Rescanning /sys/class/scsi_host/host1
  • with multipatch list we can list the available virtual disks
# multipath -l

Ā 

  • And with pvdisplay we can list existing physical volumes and they characteristics:
# pvdisplay /dev/mpath/LOG_*
  • with pvresize extend the needed volume:
# pvresize --setphysicalvolumesize 60GĀ  /dev/mpath/LOG_u03

If we would create new volume, we have to work with volume group, the volume group can be controlled with

# vgdisplay

but i our case it’s not necessary.

  • Resize logical volume mit filesystem resize:
# lvresize /dev/VG_LOG_u03/LogVol00 -l +100%FREE -r
  • And without file system
# lvresize /dev/VG_LOG_u03/LogVol00 -l +100%FREE
  • In this case we have to expand the file system to the logical volume size in an additional step:
# resize2fsĀ /dev/VG_LOG_u03/LogVol00

Troubleshooting

We have extended the physical volume a couple of blocks bigger that Fiberglas array. As a result the existing file system was unmounted and not usable.

As troubleshooting, the Fiberglas virtual disk has to be extended corresponding to mistake. And the changes re-scanned Ā under Linux.

If it doesn’t help, like in our case, try this:

# vgscan --mknodes
# pvscan
# vgchange -ay
# vgdisplay -v
# lvscan

After cluster reboot, the cluster ware was able to remount the FS and start the Database. After that we extended the file system successfully.