Receive Graciously

The spirit of Christmas — giving. Yet for us to give, someone must receive. The joy of giving depends on it. I was thinking about writing a blog this week for readers who find themselves not in a…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Export Oracle database to PostgreSQL using AWS Glue!

AWS Glue is a completely managed ETL Service (Extract-Transform-Load) that makes it simple and cost-effective to clean the data, categorize it, move it, enrich it reliably between various data stores and data streams.

Architecture diagram of AWS Glue

Data Catalog: Persistent metadata store in AWS Glue. Contains table definitions, job definitions and other controlled information to manage AWS Glue environment.

Classifier: Determines the schema of data. Glue provides classifiers for common file types such as CSV, JSON, XML etc.

Connections: It contains the property that is required to connect the data stores.

Crawler: To populate the Data Catalog with the tables. This is the primary method used by most of the AWS Glue users.

Database: Database in Data Catalog is a container that holds the tables. Tables are created in the database either manually or by running crawler.

Data Store: It is a repository for persistently storing your data (Ex. S3 buckets, RDS).

Job: A business logic which is required to perform the ETL work. Composed of the source database, transformation script and target database.

Script: Contains the Code ( python,pyspark or scala)that extract the data from the source — transforms it — loads it into the target.

→ Need to have an AWS account.

→ Basic knowledge on either python/pyspark/scala for writing script according to the use cases.

→ Make sure you have Oracle and PostgreSQL database created.

3. Add JDBC URL relevant to your connection with Username and Password along with VPC name that contains your data store. Click Next and review the changes.

Step 3: Figure

4. Test connection.

Follow steps from 1 to 4 to create a connection for PostgreSQL (Target database)

Step1: Figure

1. Go to Crawler in Data Catalog, click on Add Crawler, give a crawler name as shown in the figure below. (tags, descriptions and other information are optional), Click Next.

Step1: Figure

2. Specify the crawler source type as shown below and then click Next.

Step 2: Figure

3. Add a data store -> choose a data store as JDBC -> add connection (ex. test) we created in the previous steps -> include a path of the table (ex. MyDatabase/Myschema/%). Here % crawls all the tables from the MyDatabase which has a schema name: MySchema.

4. Choose an IAM Role which we have created before proceeding these steps.

Step 4: Figure

5. create a schedule for this crawler according to your requirement.

Step 5: Figure

6. Configure the crawler’s output. Add a database that we have created in previous steps ( ex. source-database)

Follow these same steps to crawl the target database (if needed)

Step1: Figure

2. Choose the data source i.e (choose the necessary table name from the source database) here in our case (source-database).

3. similarly, choose the target data source(PostgreSQL).

4. Verify the mappings created by AWS Glue. You can change the mapping accordingly.

5. Save Job. Automatically script will be created by the AWS Glue server.

6. Edit the job according to your use-case and then Click Save and Run!

7. Job startup time is around 10mins so, wait till it gets finished. (if error, error message will be displayed in the job console/script console. For the detailed error message, you can always go to logs in the History tab as shown below:).

Step 7: Figure

Thank you!

Hope you liked it and found it useful!

For any query /clarification feel free to comment or send out mail! I will be happy to help :)

Add a comment

Related posts:

Discover The Originals

Does anyone outside of Sweden even know what Bolmen is? Apparently, the loo brush was named after a lake in the province of Småland. Ikea has been copying Swedish names to sell their products for…

What I Learned From Many Years Of Fitness

From the day I stepped foot in a gym to the day I became a personal trainer, there were many experiences. Plateaus, setbacks, progress, injuries, and overly-demanding clients — I have seen it all…

Developing A Growth Mindset.

Growth Mindsets are very important for an entrepreneur, so in this regard how to develop it Amal give us something called ‘AMAL TOTKEY’. These are: I personally found these tips very interesting and…