Data Input and Output


Data Transformation (1)

  • Name
    Data Transformation
    Type
    Description
    • The process of extracting data stored in a system, transforming it, and loading it into a newly developed system for operation
    • Extraction, Transformation, Loading: ETL process
    • Also known as data migration or data transfer
  • Name
    Data Transformation Plan
    Type
    Description
    • A document that analyzes the objects requiring data transformation and records all plans necessary for data transformation work
데이터 전환

Data Validation (2)

  • Name
    Data Validation
    Type
    Description
    • The process of verifying whether the data transformation process has been executed correctly
    • Data validation can be classified into validation methods and validation stages.
  • Name
    Validation Methods
    Type
    Description
    • Log Validation: Validates extraction, transformation, and loading logs generated in the data transformation process
    • Basic Item Validation: Validates individual validation items other than log validation requirements
    • Application Validation: Ensures the integrity of data transformation through applications
    • Application Data Validation: Validates the integrity of data transformation based on predefined business rules
    • Value Validation: Conducts validation such as total validation of numeric items, range validation of code data, and value validation due to attribute changes
  • Name
    Validation Stages
    Type
    Description
    • Extraction: Ensures the integrity of original system data (log validation)
    • Transformation: Verifies whether the content specified in the definition document is accurately reflected (log validation)
    • DB Loading: Checks for errors and missing data that occur during the process of reading SAM files (log validation)
    • After DB Loading: Ensures the integrity of completed loading (basic item validation)
    • After Transformation Completion: Confirms the integrity of data transformation through the validation process (application validation, application data validation)
데이터 검증

Measurement and Cleansing of Error Data (3)

  • Name
    Measurement and Cleansing of Error Data
    Type
    Description
    • Operation and management of high-quality data

    Process

    1. Data Quality Analysis: Work to verify data integrity to identify error data
    2. Measurement of Error Data: Measure the number of data and error data, and create an error management list
    3. Cleansing of Error Data: Analyze each item in the error management list, define the original data, or modify transformation programs
  • Name
    Error States
    Type
    Description
    • Open: State where errors are reported and not yet analyzed
    • Assigned: State where errors are communicated to developers for impact analysis and correction
    • Fixed: State where developers have corrected the error
    • Closed: State where tests are re-run for the corrected error, and no error is found
    • Deferred: State where the correction of the error is postponed
    • Classified: State where errors are confirmed to not be actual errors
  • Name
    Data Cleansing Request Document
    Type
    Description
    • Documentation of overall content related to data cleansing
    • Create a Data Cleansing Requirements List based on the error management list and create a Data Cleansing Request Document for each item on the list.
  • Name
    Data Cleansing Report
    Type
    Description
    • Documentation of the results verifying whether the cleansed original data has been successfully cleansed
오류 데이터 측정 및 정제

Database Overview (4)

  • Name
    Data Repository
    Type
    Description
    • Arrangement of data into logical structures or physical areas
    • Logical Data Repository: Organizing data and its relationships and constraints into a logical structure
    • Physical Data Repository: Storing the logical data repository on storage devices considering the physical characteristics of the operating environment
  • Name
    Database
    Type
    Description

    Operational data stored on storage devices, integrated without duplication to allow easy access and processing, and always available for use.

    • Integrated Data: A collection of data with duplicates removed
    • Stored Data: Data saved on storage media accessible by computers
    • Operational Data: Data necessary for performing an organization's specific business operations
    • Shared Data: Data shared and maintained by multiple application systems
  • Name
    DBMS
    Type
    Description
    • Software that generates information as per user requests and manages databases
    • Proposed as a solution to the dependence and redundancy issues of existing file systems

    Essential Functions:

    1. Definition Function: Defining data types, structures, usage, and constraints
    2. Manipulation Function: Providing interfaces for data retrieval, update, deletion, insertion, etc.
    3. Control Function: Providing data integrity, security, authorization, and control
  • Name
    Data Independence
    Type
    Description
    • Exists in logical and physical forms.

    Logical Independence: Separates application programs from the database, allowing changes to the logical structure without affecting application programs Physical Independence: Separates application programs from physical devices like secondary storage, allowing addition/modification of disks without affecting application programs

  • Name
    Schema
    Type
    Description
    • General specification describing the database's structure and constraints

    External Schema: Defines the logical structure of the database that users and application programmers require from their perspectives Conceptual Schema: Logical structure of the entire database Internal Schema: Database structure from the viewpoint of physical storage devices

데이터베이스 개요

Database Design (5)

  • Name
    Database Design
    Type
    Description
    • Analyzing user requirements, converting them into a structured database that can be stored in a computer, and making it accessible to general users using a DBMS.
  • Name
    Considerations in Database Design
    Type
    Description
    • Completeness: Data stored in the database must always satisfy predefined constraints after operations like insertion, deletion, and update.
    • Consistency: Data should exhibit consistency for specific queries.
    • Recoverability: After system failure and recovery, data must remain the same as before.
    • Security: Protection - Efficiency: Shortened response time, productivity, space optimization, etc.
    • Scalability: The ability to continuously add data is necessary.
  • Name
    Steps in Database Design
    Type
    Description
    1. Requirement Analysis: Creating requirement specification documents (understanding the intended use by database users).
    2. Conceptual Design: Creating conceptual schema, transaction modeling, E-R model (representing real-world understanding with abstract concepts).
    3. Logical Design: Designing logical schema suitable for the DBMS, designing transaction interfaces (converting to logical data structures supported by specific DBMS).
    4. Physical Design: Converting data to a suitable physical structure for the DBMS (converting data represented in logical structures to data in physical structures).
    5. Implementation: Creating the DBMS database, writing transaction descriptions (generating a database schema in the form of files derived from logical and physical designs).