Thursday, August 29, 2013

proving bad CPU


Acting as DBA today I experience bad CPU performance it seems
The Oracle database is running on a Oracle/Sparc virtualization system.

AWR/Statspack shows:
DB CPU 11,825 87.51
log file sync 21,933 1,292 59 9.56 Commit
db file parallel read 16,277 268 16 1.98 User I/O

But looking at the top "SQL by CPU Time" I see no demanding statement,
and all seem to be using a lot of CPU evenly.

Working on the system everything is slow, for instance 'autotrace' and 'explain plan'

I have a heavy suspicion it's the system's fault. Maybe some other VM on the machine is sucking all CPU
I reported to the syadmin team, but I'd like proofs.

I thought about writing a PL/SQL CPU-intensive job, but it wont help with the system guys. It maybe Oracle's fault for all they know.

So I quickly wrote a BASH prime computation script, and compared results to an old PC at home

$ cat p.sh
#!/bin/bash

if [ -z "$1" ]; then
  echo usage: $0 MAXNUM; exit 1
fi
nmax=$1

primes=(2)
n=3

while [ $n -le $nmax ]; do
  #echo verifier $n
  isprime=1
  for d in ${primes[*]} ; do
        #echo check div par $d
        r=$(( $n % $d ))
        if [ $r -eq 0 ]
        then
          isprime=0
          #echo No: $d div $n
          break
        fi
  done

  if [ $isprime -eq 1 ]; then
    primes=(${primes[*]} $n)   #primes+=($n)    dont work on older bash(solaris 10)
  fi

  #echo primes = ${primes[*]}
  n=$(( $n + 2 ))
done

echo primes = ${primes[*]}





Results are indeed bad:

customer_system$ time ./p.sh 2000
primes = 2 3 5 7 11 ...
real    0m9.376s
user    0m8.496s
sys     0m1.182s

my_old_pc$ time ./p.sh 2000
primes = 2 3 5 7 11 ...
real    0m2.594s
user    0m2.348s
sys     0m0.226s


A VM on a 5 years old PC is 3x faster ! wow!

Then I thought: These are T-series Sparc system. I imagine they might tell me this could be normal, it is designed for parallel-intensive stuff (Still, 5y old PC... )
But to be sure I also tried some parallel work:

$ cat para.sh
#!/bin/bash

if [ -z "$2" ]; then
  echo usage: $0 parallel_degree primemax; exit 1
fi

max=$1
i=0

while [ $i -le $max ]; do
  ./p.sh $2 &

  i=$(( $i + 2 ))
done
wait
echo fin


Results:

my_old_pc$ time ./para.sh 10 500
primes = 2 3 5 7 ...
primes = 2 3 5 7 ...

primes = 2 3 5 7 ...
...
fin
real    0m1.487s
user    0m1.288s
sys     0m0.184s


customer_system$ time ./para.sh 10 500
primes = 2 3 5 7 ...
primes = 2 3 5 7 ...primes = 2 3 5 7 ...
...
fin
real    0m2.427s
user    0m5.027s
sys     0m4.148s




Less bad, but still BAD. There is definitely weak a CPU on this system causing these bad DB performances.