Those who are
experienced in data warehouse solutions realize that the process of understanding
the data sources, designing the transformations, testing the loading process, and
debugging is often the most time-consuming part of deployment. Transformations
generally remove bogus data (including erroneous entries and duplicate entries), convert
data items to an agreed-upon format, and filter data not considered necessary for
the warehouse. These operations not only improve the quality of the data, but
frequently reduce the overall amount of data, and that, in turn, improves data warehouse
performance.
The frequency of extraction and loading is largely determined by the required timeliness
of the data in the warehouse. Most extraction and loading takes place on a
???batch??? basis with a known time delay (typically subhourly or hourly or daily today).
Many first-generation warehouses were completely refreshed during the loading process.
As data volumes grew, this became impractical due to the limited time frames
available for loading.
Pages:
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525