Header Ads Widget

File Organization

It is used to determine an efficient file organization for each base relation. For example, if we want to retrieve student records in alphabetical order of name, sorting the file by student name is a good file organization. However, if we want to retrieve all students whose marks is in a certain range, a file ordered by student name would not be a good file organization. Some file organizations are efficient for bulk loading data into the database but inefficient for retrieve and other activities.

The objective of this selection is to choose an optimal file organization for each relation.

Types of File Organization

In order to make effective selection of file organizations and indexes, here we present the details different types of file Organization. These are:

Heap (unordered) File Organization

An unordered file, sometimes called a heap file, is the simplest type of file organization.
Records are placed in file in the same order as they are inserted. A new record is inserted in the last page of the file; if there is insufficient space in the last page, a new page is added to the file. This makes insertion very efficient. However, as a heap file has no particular ordering with respect to field values, a linear search must be performed to access a record. A linear search involves reading pages from the file until the required is found. This makes retrievals from heap files that have more than a few pages relatively slow, unless the retrieval involves a large proportion of the records in the file.

To delete a record, the required page first has to be retrieved, the record marked as deleted, and the page written back to disk. The space with deleted records is not reused. Consequently, performance progressively deteriorates as deletion occurs. This means that heap files have to be periodically reorganized by the Database Administrator (DBA) to reclaim the unused space of deleted records.

Heap files are one of the best organizations for bulk loading data into a table, as records are inserted at the end of the sequence; there is no overhead of calculating what page the record should go on.

Pros of Heap storage

Heap is a good storage structure in the following situations:

When data is being bulk-loaded into the relation.

The relation is only a few pages long. In this case, the time to locate any tuple is Short, even if the entire relation has been searched serially.

When every tuple in the relation has to be retrieved (in any order) every time the relation is accessed. For example, retrieve the name of all the students.

Cons of Heap storage

Heap files are inappropriate when only selected tuples of a relation are to be accessed.

Hash File Organization

In a hash file, records are not stored sequentially in a file instead a hash function is used to calculate the address of the page in which the record is to be stored.

The field on which hash function is calculated is called as Hash field and if that field acts as the key of the relation then it is called as Hash key. Records are randomly distributed in the file so it is also called as Random or Direct files. Commonly some arithmetic function is applied to the hash field so that records will be evenly distributed throughout the file.

Pros of Hash file organization

Hash is a good storage structure in the following situations:

When tuples are retrieve based on an exact match on the hash field value, particularly if the access order is random. For example, if the STUDENT relation is hashed on Name then retrieval of the tuple with Name equal to “Rahat Bhatia” is efficient.

Cons of Hash file organization

Hash is not a good storage structure in the following situations:

When tuples are retrieved based on a range of values for the hash field. For example, retrieve all students whose name begins with the “R”.

When tuples are retrieved based on a range of values for the hash field. For example, if STUDENT relation has hash filed Roll Number and the query is to retrieve all students with roll numbers in the range of 3000-5000.

When tuples re retrieved based on a field other than the hash field. For example, if the STUDENT relation is hashed on Roll Number, then hashing cannot be used to search for a tuple based on the Class attribute.

When tuples are retrieved based on only part of the hash field. For example, if the STUDENT relation is hashed on Roll Number and Class, then hashing cannot be used to search for a tuple based on the class attribute alone.

When the hash field frequently updated. When a hash field updated, the DBMS must deleted the entire tuple and possible relocate it to a new address (if the has function results in a new address). Thus, frequent updating of the hash field impacts performance.


Post a Comment

0 Comments