Written by Sergio HenriqueFebruary 11, 2025

How to extract images and drawings from PDF with Python

Extracting images and drawings from PDF files can be a challenging task, but with the right tools and techniques, it’s entirely achievable. This blog post explores how to use the PyMuPDF library in Python to extract both images and drawings from PDF documents. We’ll dive into the nuances of handling transparency layers in images and clustering drawings to preserve embedded text. Whether you’re building a PDF summarizer or simply need to extract visual content from PDFs, these methods provide a robust solution to automate the process.

Written by Sergio HenriqueDecember 10, 2024December 10, 2024

Download data from Kaggle competition and upload in Azure ML

In some Kaggle competitions the provided machines can not handle the volume of data available. In this cases, I think that could be beneficial to train the model in another place.

Written by Sergio HenriqueDecember 2, 2024December 2, 2024

ARIMA and Online Learning in Financial Forecasting

I discuss the development of an online learning system using the Jane Street Real-Time Market Data Forecasting challenge as a practice ground for time-series forecasting. The project involves predicting the responder_6 variable using an ARIMA model, with a focus on adapting to new data by re-training the model whenever a new date_id is encountered. This approach leverages multiprocessing to meet strict time constraints

Written by Sergio HenriqueNovember 14, 2024November 14, 2024

Walk Forward Validation on Jane Street Real-Time Market Data Forecast

Walk Forward Validation (WFV) involves a training window that moves forward in time, training the model on historical data and then validating it on future, unseen data points. Unlike traditional cross-validation where data is randomly split, WFV respects the sequence of time, making it ideal for datasets with time-dependent features like stock prices, weather patterns, or sales figures.

Written by Sergio HenriqueNovember 6, 2024November 6, 2024

reAct, WESE, Plan-and-Execute and ChatDB architectures applied to question-answer database use case

An overview of the reAct, WESE, Plan-and-Execute and ChatDB architectures applied to the question-aswer database use case of the GDSC7 challenge.

Written by Sergio HenriqueOctober 18, 2024October 18, 2024

How to override a method of instantiated object in python

In this post, I describe how I overcame AWS login challenges in a coding competition by using a method override trick. By defining a new function for authentication and dynamically replacing the existing method in an instantiated object, I was able to experiment with the Embedchain package without altering its class definition. This technique allowed for seamless integration with AWS services and added a valuable tool to my programming arsenal.

Written by Sergio HenriqueSeptember 30, 2024September 30, 2024

How to use Azure Communication Services to send email in a Django application

I wanted to share my recent experience with setting up email sending in my Django application, which I’m deploying on Azure Web Apps. Since I’m using django-allauth for authentication, I needed to ensure that the email confirmation and password reset workflows were properly configured. After some research, I found a solution that worked well for me.

Written by Sergio HenriqueSeptember 26, 2024September 26, 2024

Create dependent dropdown with Django and HTMX

Recently, I faced the challenge of creating dynamic, interdependent form fields in my Django application. After some trial and error, I found a solution using Django and HTMX that I’d like to share. This combination allowed for seamless, server-side updates without full page reloads, significantly enhancing the user experience and performance of my application.

Written by Sergio HenriqueSeptember 13, 2024September 13, 2024

Setting up local database to simulate Azure production database

I have an existing SQL database in Azure, created from a scraping project using Azure Functions. My new SaaS needs to access this data. However, I prefer not to modify the existing Azure database during development.

Written by Sergio HenriqueSeptember 11, 2024September 13, 2024

How to override Django-Allauth default templates

The Problem:

I chose the package ‘django-allauth’ to help me with the login management of a SaaS code base that I am building. All my installed packages are inside of my virtual environment folder (venv) inside my project folder.

I had already created a base layout for the landing page. However, after installing the ‘django-allauth’ and configuring it, I noticed that the login page did not inherited the layout configuration from my base template.

Sergio Henrique

Data Analyst Building Things and Sharing Learning Along the Way

Tag: Python