Showing posts with label checkpoint. Show all posts
Showing posts with label checkpoint. Show all posts

Wednesday, February 15, 2012

DWH problem: updating a table with every 1000 records a checkpoint

Hi,

Currently we're a building a metadatadriven datawarehouse in SQL
Server 2000. We're investigating the possibility of the updating
tables with enormeous number of updates and insert and the use of
checkpoints (for simple recovery and Backup Log for full recovery).

On several website people speak about full transaction log and the
pace of growing can't keep up with the update. Therefore we want to
create a script which flushes the dirty pages to the disk. It's not
quite clear to me how it works. Questions we have is:
* How does the process of updating, insert and deleting works with SQL
Server 2000 with respect to log cache, log file, buffer cache, commit,
checkpoint, etc?
What happens when?
* As far as i can see now: i'm thinking of creating chunks of data of
1000 records with a checkpoint after the Query. SQL server has the
default of implicit transactions and so it will not need a commit.
Something like this?
* How do i create chunks of 1000 records automatically without
creating a identity field or something. Is there something like SELECT
NEXT 1000?

Greetz,

Hennie> * How does the process of updating, insert and deleting works with SQL
> Server 2000 with respect to log cache, log file, buffer cache, commit,
> checkpoint, etc?
> What happens when?

There's a discussion of this in the Books Online
<architec.chm::/8_ar_sa_8unn.htm>. In summary, all data modifications are
written to both the buffer cache and transaction log buffer cache. The log
buffer is written first. Modified buffer cache pages are written to disk by
either worker threads, the lazy writer or the checkpoint process. Worker
threads write data when they need to wait for a read i/o anyway. The lazy
writer process wakes up periodically and writes data in order to maintain a
reasonable number of free cache buffers. The checkpoint process writes all
dirty pages to disk periodically (about a minute by default) in order to
reduce startup recovery time.

Log buffers are written as they become full, when you issue a COMMIT and
during a checkpoint.

> * As far as i can see now: i'm thinking of creating chunks of data of
> 1000 records with a checkpoint after the Query. SQL server has the
> default of implicit transactions and so it will not need a commit.
> Something like this?
> * How do i create chunks of 1000 records automatically without
> creating a identity field or something. Is there something like SELECT
> NEXT 1000?

I'm not sure I understand what you are asking here. In the default
autocommit mode, each statement is a separate transaction.

It seems you are mostly concerned with transaction log space management.
The main consideration is to keep your transactions reasonably sized since
only committed data can be removed from the log during a log backup (FULL or
BULK_LOGGED recovery model) or checkpoint (SIMPLE model).

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Hennie de Nooijer" <hdenooijer@.hotmail.com> wrote in message
news:191115aa.0402090341.23275aae@.posting.google.c om...
> Hi,
> Currently we're a building a metadatadriven datawarehouse in SQL
> Server 2000. We're investigating the possibility of the updating
> tables with enormeous number of updates and insert and the use of
> checkpoints (for simple recovery and Backup Log for full recovery).
> On several website people speak about full transaction log and the
> pace of growing can't keep up with the update. Therefore we want to
> create a script which flushes the dirty pages to the disk. It's not
> quite clear to me how it works. Questions we have is:
> * How does the process of updating, insert and deleting works with SQL
> Server 2000 with respect to log cache, log file, buffer cache, commit,
> checkpoint, etc?
> What happens when?
> * As far as i can see now: i'm thinking of creating chunks of data of
> 1000 records with a checkpoint after the Query. SQL server has the
> default of implicit transactions and so it will not need a commit.
> Something like this?
> * How do i create chunks of 1000 records automatically without
> creating a identity field or something. Is there something like SELECT
> NEXT 1000?
> Greetz,
> Hennie|||Hennie de Nooijer (hdenooijer@.hotmail.com) writes:
> * How does the process of updating, insert and deleting works with SQL
> Server 2000 with respect to log cache, log file, buffer cache, commit,
> checkpoint, etc?
> What happens when?
> * As far as i can see now: i'm thinking of creating chunks of data of
> 1000 records with a checkpoint after the Query. SQL server has the
> default of implicit transactions and so it will not need a commit.
> Something like this?
> * How do i create chunks of 1000 records automatically without
> creating a identity field or something. Is there something like SELECT
> NEXT 1000?

I hope Dan's reply shed some light on how the log works.

As for breaking up in chucks there is nothing built-in like:

UPDATE tbl
SET yadayada
...
WITH COMMIT EACH 1000 ROW

For DML statements you will have to find out how to do the batching best
from your data. Note there is no reason to keep a strict batch size, but
if you have some column that divides the data nicely that can be used.
For instance, if you are handling order data, you can take one day or
month at a time.

The one case where you can tell SQL Server to commit by each n row is
when you bulk load with BCP or BULK INSERT.

I also like to point out that selecting the batch size, is a trade-off
between log size and speed. Particularly if the search condition to
identify a batch takes time to execute, too small batch sizes can be
costly. A little depending on row size, but 100000 rows at time is not
an unreasonable number. 1000 is by far to small.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||> > * How does the process of updating, insert and deleting works with SQL
> > Server 2000 with respect to log cache, log file, buffer cache, commit,
> > checkpoint, etc?
> > What happens when?
> There's a discussion of this in the Books Online
> <architec.chm::/8_ar_sa_8unn.htm>.
How does this works?

> In summary, all data modifications are
> written to both the buffer cache and transaction log buffer cache. The log
> buffer is written first. Modified buffer cache pages are written to disk by
> either worker threads, the lazy writer or the checkpoint process. Worker
> threads write data when they need to wait for a read i/o anyway. The lazy
> writer process wakes up periodically and writes data in order to maintain a
> reasonable number of free cache buffers. The checkpoint process writes all
> dirty pages to disk periodically (about a minute by default) in order to
> reduce startup recovery time.
Ok clear..

> Log buffers are written as they become full, when you issue a COMMIT and
> during a checkpoint.
It thought that a commit is not neccessary a write to the disk. A
commit is just a end mark of a transaction as far i can see in the
documentation. A checkpoint will write the transactions to the
database.

> > * As far as i can see now: i'm thinking of creating chunks of data of
> > 1000 records with a checkpoint after the Query. SQL server has the
> > default of implicit transactions and so it will not need a commit.
> > Something like this?
> > * How do i create chunks of 1000 records automatically without
> > creating a identity field or something. Is there something like SELECT
> > NEXT 1000?
> I'm not sure I understand what you are asking here. In the default
> autocommit mode, each statement is a separate transaction.
> It seems you are mostly concerned with transaction log space management.
> The main consideration is to keep your transactions reasonably sized since
> only committed data can be removed from the log during a log backup (FULL or
> BULK_LOGGED recovery model) or checkpoint (SIMPLE model).
Yup this is what i meant. Only the question is what is a reasonable
chunk? Perhaps i will create parameter for this so i can tune this.|||> I hope Dan's reply shed some light on how the log works.
Yup

> As for breaking up in chucks there is nothing built-in like:
> UPDATE tbl
> SET yadayada
> ...
> WITH COMMIT EACH 1000 ROW
> For DML statements you will have to find out how to do the batching best
> from your data. Note there is no reason to keep a strict batch size, but
> if you have some column that divides the data nicely that can be used.
> For instance, if you are handling order data, you can take one day or
> month at a time.
Yeah well i have identity column which i could use for chunking the
records. Disadvantage is that it is an implementation depending on the
functionality of a column in the table. It's not what i want but i'll
have to deal with it.

> The one case where you can tell SQL Server to commit by each n row is
> when you bulk load with BCP or BULK INSERT.
Well this is not possible. I'm reading and insert/updating from other
tables in SQL server and BCP reads only from files.

> I also like to point out that selecting the batch size, is a trade-off
> between log size and speed. Particularly if the search condition to
> identify a batch takes time to execute, too small batch sizes can be
> costly. A little depending on row size, but 100000 rows at time is not
> an unreasonable number. 1000 is by far to small.
Well 1000 was just a hunch (is this rightly written). I thinking of
creating a parameter from which we can tune the chunks..

Thx you all for your answers

Greetz
Hennie|||> > There's a discussion of this in the Books Online
> > <architec.chm::/8_ar_sa_8unn.htm>.
> How does this works?

From the Books Online, you can go directly to the referenced topic by
clicking Go-->URL and pasting in the URL 'architec.chm::/8_ar_sa_8unn.htm'.

> > Log buffers are written as they become full, when you issue a COMMIT and
> > during a checkpoint.
> It thought that a commit is not neccessary a write to the disk. A
> commit is just a end mark of a transaction as far i can see in the
> documentation. A checkpoint will write the transactions to the
> database.

A COMMIT ensures all committed data are written to the log file so that
committed data are permanently persisted to the database. Modified data may
or may not have been written to data files because dirty data pages are
written separately by worker threads, the lazy writer or checkpoint process.
This is transparent to your application because data are always retrieved
from buffer cache.

Only committed data before the oldest uncommitted transaction can be removed
from the log. This occurs during a log backup in the FULL or BULK_LOGGED
recovery model or during a checkpoint in the SIMPLE model. Rather than
trying to take control of the checkpoint frequency, all you really need to
do is ensure your transactions are reasonably short and small.

> Only the question is what is a reasonable
> chunk? Perhaps i will create parameter for this so i can tune this.

If you are running in the SIMPLE recovery model and have many small
insert/update/delete transactions, I suggest you simply let SQL Server
truncate the log automatically. However, if you modify a lot of data (e.g.
millions of rows) in a single statement or transaction, you can use the
techniques Erland mentioned to keep the log size reasonable. In my
experience, the optimal amount is somewhat less that the amount of memory
allocated to SQL Server.

--
Hope this helps.

Dan Guzman
SQL Server MVP

"Hennie de Nooijer" <hdenooijer@.hotmail.com> wrote in message
news:191115aa.0402110050.7797d139@.posting.google.c om...
> > > * How does the process of updating, insert and deleting works with SQL
> > > Server 2000 with respect to log cache, log file, buffer cache, commit,
> > > checkpoint, etc?
> > > What happens when?
> > There's a discussion of this in the Books Online
> > <architec.chm::/8_ar_sa_8unn.htm>.
> How does this works?
> > In summary, all data modifications are
> > written to both the buffer cache and transaction log buffer cache. The
log
> > buffer is written first. Modified buffer cache pages are written to
disk by
> > either worker threads, the lazy writer or the checkpoint process.
Worker
> > threads write data when they need to wait for a read i/o anyway. The
lazy
> > writer process wakes up periodically and writes data in order to
maintain a
> > reasonable number of free cache buffers. The checkpoint process writes
all
> > dirty pages to disk periodically (about a minute by default) in order to
> > reduce startup recovery time.
> Ok clear..
> > Log buffers are written as they become full, when you issue a COMMIT and
> > during a checkpoint.
> It thought that a commit is not neccessary a write to the disk. A
> commit is just a end mark of a transaction as far i can see in the
> documentation. A checkpoint will write the transactions to the
> database.
> > > * As far as i can see now: i'm thinking of creating chunks of data of
> > > 1000 records with a checkpoint after the Query. SQL server has the
> > > default of implicit transactions and so it will not need a commit.
> > > Something like this?
> > > * How do i create chunks of 1000 records automatically without
> > > creating a identity field or something. Is there something like SELECT
> > > NEXT 1000?
> > I'm not sure I understand what you are asking here. In the default
> > autocommit mode, each statement is a separate transaction.
> > It seems you are mostly concerned with transaction log space management.
> > The main consideration is to keep your transactions reasonably sized
since
> > only committed data can be removed from the log during a log backup
(FULL or
> > BULK_LOGGED recovery model) or checkpoint (SIMPLE model).
> Yup this is what i meant. Only the question is what is a reasonable
> chunk? Perhaps i will create parameter for this so i can tune this.|||[posted and mailed, please reply in news]

Hennie de Nooijer (hdenooijer@.hotmail.com) writes:
> Yeah well i have identity column which i could use for chunking the
> records. Disadvantage is that it is an implementation depending on the
> functionality of a column in the table. It's not what i want but i'll
> have to deal with it.

One thing that I should have added is that you should see if the
clustered index is good for chunking. That can speed up the time
it takes for SQL Server to locate the rows quite a bit.

--
Erland Sommarskog, SQL Server MVP, sommar@.algonet.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp