Brian Roepkebroepke.hashnode.net·Feb 4, 2023FeaturedHow to Setup a Simple ETL Pipeline with AWS Lambda for Data ScienceIntroduction to ETL with AWS Lambda When it comes time to build an ETL pipeline, many options exist. You can use a tool like Astronomer or Prefect for Orchestration, but you will also need somewhere to run the compute. With this, you have a few optio...Osvaldo Brignoni and 2 others are discussing this3 people are discussing thisDiscuss·56 likes·366 readsPython
Faith Kinkema Oyamakema.hashnode.net·Sep 14, 20224 Best Python Frameworks for Web ScrapingTable of Contents What is web scraping? What Is the difference between web Scraping and web crawling? Requests library Beautifulsoup Scrapy How to set up and activate a virtual environment Selenium Conclusion What is web scrapping? Simply put, web ...Discuss·13 likes·107 readsData Science
sohit kumarforCMD-LYNEtoplyne.hashnode.net·Mar 24, 2023Overcoming the Challenges of Building a Data PlatformToplyne helps businesses improve their win rates and reduce their sales cycle. We do this by identifying leads that are most likely to convert. Our customers can choose to ingest data from various sources, such as analytics tools, CRM systems, S3 sto...Discuss·11 likesdata-engineering
Karl Bolingerkbolinger.hashnode.net·Apr 24, 2023Understanding ETL and ELT Workflows in Data Engineering: An Easy Guide with ExamplesData engineering is a complex field where many different technologies, frameworks, and techniques come into play. Two of the most common data processing workflows data engineers use are ETL and ELT. ETL stands for Extract, Transform, and Load, while ...Discussdata-engineering
Islam O. Elgoharyiogohary.hashnode.net·Apr 24, 20235 Tips on Data Engineering2 years ago, I have developed an interest in data engineering and fortunately, I recently got a chance to work as a data engineer. In this article, I will write the takeaways from my experience and what I learned so far. What Is Data Engineering? Dat...Discuss·99 readsdata-engineering
Martijn Sturmmartijn-sturm.hashnode.net·Apr 23, 2023Data Engineering on AWS: Best Practices OverviewThis blog post contains a listing of best-practices for data engineering on AWS. I will try to update this post regularly with new insights and best practices. Please note that this is not an exhaustive list. Am I missing an important one? Please let...DiscussHow to Data Engineering on AWSdata-engineering
Martijn Sturmmartijn-sturm.hashnode.net·Apr 23, 2023Use your own Python packages in Glue jobsMany data engineering use cases require you to repeat some ETL logic on different (database) tables or event streams. It is adviced to separate the ETL workflows for those tables in separate Glue jobs for multiple reasons: Keeping your ETL runs per ...DiscussHow to Data Engineering on AWSAWS Glue
Madhusudhan Anandmaddymaster.hashnode.net·Apr 23, 2023How to Be a Top Data Engineer in 2023: A Comprehensive GuidePS: This is a long post. Bookmark it and read at your ease. I have been hiring in various capacities for data, database related roles. I have seen the roles and designations evolve along with technology. In the early 2000s Data Administrators were th...Discuss·38 readsdata-engineering
Martijn Sturmmartijn-sturm.hashnode.net·Apr 23, 2023Defining ETL jobs as Infrastructure-as-CodeUsing Infrastructure-as-Code (IaC) for deployment of resources to the cloud is a no-brainer nowadays. The learning-curve at the start is a bit steeper than applying click-ops, but will pay off in te long-term. In this post I try to assist in getting ...DiscussHow to Data Engineering on AWSETL
webbureaucratwebbureaucrat.hashnode.net·Apr 20, 2023Parsing JSON in ReScript Part III: Getting to the PointAfter having established some requirements and some basic utilities, we're ready for the fun part: putting the pieces together. At the end of this post, we will have our working parser. Writing our pipeline functions When we use our parsing library, ...Discussrescript
webbureaucratwebbureaucrat.hashnode.net·Apr 18, 2023Parsing JSON in ReScript Part II: Building BlocksThis is the second in a series of articles on how to build one's own, general-purpose parsing library. After having established a few expectations in the previous post, we are ready to begin building our utilities for our library. Let's start with so...Discussrescript
webbureaucratwebbureaucrat.hashnode.net·Apr 17, 2023Parsing JSON in ReScript Part I: Prerequisites and RequirementsThere are few things more satisfying than a slick, readable, and safe JSON parser. It's one of the joys of functional programming. Using a good JSON parsing pipeline can feel like magic. This series seeks to lift the veil and empower readers (and, im...Discussrescript
Karl Bolingerkbolinger.hashnode.net·Apr 17, 2023Building Data Pipelines with Apache Airflow: A Complete Guide with ExamplesData pipelines are a critical component of modern data infrastructures, allowing organizations to efficiently manage and process large volumes of data. Apache Airflow is an open-source platform that helps developers to create and manage data pipeline...DiscussData Engineering Basicsdata-engineering