Strakul's Thoughts: Python

Showing posts with label Python. Show all posts

Sunday, June 1, 2025

Data Science: Querying DnD Session Notes with Vector Databases and AI

Last week, we had some guests at work and one of them presented a multi-agent model for creating database queries with some clever documentation retrievals. It was very exciting and something I've wanted to do, so I decided to take some steps and familiarize myself with some of the concepts. I learn best by practicing, though, so I set out to do a simple hack day project: a way to create a retrieval augmented generation (RAG) application that would take some notes I have from our multi-year Dungeons and Dragons (DnD) campaign and generate appropriate responses when passed through an AI large language model (LLM).

Let's go through what I ended up building.
You can also follow along by looking through the GitHub repo for it: https://github.com/dr-rodriguez/ollama-ttrpg-query

Data Science: MongoDB Sky Searches with Geospatial Queries

This is the fourth and, for now, final set of posts in my tutorial on using MongoDB No-SQL databases for astronomical work. We've created a database of Brown Dwarf objects making use of Python 3.7's Dataclasses, we've also stored header metadata for a variety of FITS files, and we've written functions to perform cone searches using HEALPix. Today, we're looking again at how to query the sky, but this time using MongoDB's built-in geospatial's functionality. As before, I provide a Jupyter notebook where those interested can follow along.

Data Science: MongoDB Sky Searches with HEALPix

This is the third blog post in a series about utilizing MongoDB NoSQL databases with astronomical data. Prior posts introduced how to store astronomical objects and how to store FITS header metadata. On today's post, we'll visit one of the most common things we do in astronomy- the cone search. In other words, how to do you search your database for objects in the sky that are located close to your input coordinates. Today we'll be tackling that problem "from scratch" utilizing HEALPix rather than any built-in functionality. As before, I provide a Jupyter notebook in my GitHub repo for those who may want more details and to run it on their own.

Data Science: Astronomy FITS Headers in MongoDB

This is the second post I have about using MongoDB NoSQL databases with astronomical data. If you'd like a refresher about what that means, check out my first post, where I describe how to ingest a custom BrownDwarf class object into these type of databases. Today, we're looking at a more general problem- metadata. Metadata is the information that describes the how, when, where of the data itself. For example, which telescope took the data, at what time of night, for how long, with what filter, etc etc. A lot of this information is encapsulated in the data files itself and, currently, the most commonly used format in astronomy is the FITS file.

In this post, we'll have a look at how we can extract the metadata from a FITS file and load it into our NoSQL database. As before, I provide a Jupyter notebook if you'd like to run the code yourself.

Data Science: Python Dataclasses and MongoDB

Over the past few weeks, I've been playing a bit with some NoSQL databases, in particular, with MongoDB. This is one particular type of database known as a document-store database and it works primarily by saving JSON formatted 'documents'. While exploring this technology and working on some Python code, I realized how easy it is to convert a standard Python class into a dictionary and how dictionaries readily translate into JSON. With this knowledge in hand, a light-bulb went off in my head as I realized I could make use of the new dataclasses implemented as part of Python 3.7 and quickly create a working database with minimal code.

In this post, I'll describe some of the ideas I had in mind while working through this and, if you want to try this on your own, I can point you to this Jupyter notebook where I work out this example.

Data Science: What Should I Read Next?

As I wrote about last week, I’ve spent a bit of time looking over my reviews on Goodreads to explore trends in what authors I read, how fast I read, and how I review books. In today’s post, we’ll tackle something a little more ambitious: given the data I can readily access from the Goodreads API, can I predict how I will rate books I haven’t yet read?

Let’s dive right in.

Data Science: My Goodreads Reviews

Followers of my blog will know that I read and review quite a few of books throughout the year. I track the books I read and those I want to read on Goodreads and recently came across their API. I decided to figure out how to access it and see what sort of information I could glean from my Goodreads reading history. This particular post explores trends in my reading and reviewing habits, as well as looking at what authors I've read. Next week’s post will discuss my attempt to create a model to predict the reviews I give a particular book. With that model in hand, I can decide what books to read based on my own interests.

Let’s jump right in.

Data Science: Republican & Democratic Conventions

In the past few weeks, the two major political parties in the United States of America held their national conventions. While I couldn't listen to all the speeches, I followed the news and paid attention to the overall scene. After they were done, I decided to grab the speeches of the major speakers and see if I could find any obvious trends in their word choices, similar to what I did with my Twitter project. In this blog post, I'll discuss what I can see in the data. You can find my data and all my scripts at this GitHub repo.

Data Science: The Divided States of America

In the prior two posts, I have described how I gathered twitter data from @HillaryClinton and @realDonaldTrump, how I ran a sentiment analysis on the individual tweets, and how I performed a principal component analysis on the most commonly used words. Today, I’ll tie everything together and describe how I created a model to predict whether a given tweet belongs to either of the two candidates.

Data Science: Principal Component Analysis of Twitter Data

As described on my last blog post on this topic, I've been tracking tweets from the US presidential candidates, Hillary Clinton and Donald Trump. I've looked at the top words they used and the sentiments expressed in their tweets given their word choice. However, some words are used with others almost all the time, a notable example being a slogan like Make America Great Again. As such, it may be beneficial to look at groups of words rather than individual words. For that, I took an approach applying a Principal Component Analysis. Below I describe what this is, how I used it, and what it reveals. Do note, however, that I'm applying things I learned in astronomy to this problem rather than taking courses specific to text mining. It may be that there are better tools out there than what I've used.

Data Science: Presidential Candidates on Twitter

Over the past few months, I've been working on a little hobby data science project to explore twitter data with regards to the upcoming presidential election in the United States. The project has evolved quite a bit and detailing it in full is beyond the scope of a single blog post. As such, I've decided to split it into (at least) 3 posts. This post is the first of the series and will go over the basics of gathering data from Twitter and doing some simple text mining. The second and third posts will discuss more details of the project and show some neat visualizations I've created. I'll release all my code after the third post for any curious coders. For now, let's get started seeing what Hillary Clinton and Donald Trump's Twitter accounts are talking about.

Bokeh Plots in Blogger

This is a quick post to test if I can add Bokeh plots in my blog.

Coding: A Python Notifier for the NYC Subway

New York City Subway 6 train. Photo by Robert McConnell (Transferred from en.wikipedia to Commons.)

For fun, I created a small application in Python that checks the status of the New York City subway system and sends me an email in the morning and afternoon if there are delays in the specific lines I tend to use to get to/from work. Now, sure, something like this already exists and is offered by MTA, but I wanted to go ahead and write this myself.

Below, I describe how it works in case you want to create something similar. The code is available at GitHub, should you care to grab it.

Data Science: Creating my First Web Application

Over the past month or so, I've dedicated a bit of time each week to work on developing an web application for the Brown Dwarfs in New York City research team (BDNYC). This week, I was able to finally release it to the public as AstrodbWeb. I'm very proud of what I've made, simple though it is. It's inspired me to continue developing applications and exploring this route a bit more. For this blog post, I want to detail some of what I went through for others that may be thinking of similar projects. I'll provide links to resources that I found useful when developing this.

Strakul's Thoughts

Pages

Sunday, June 1, 2025

Data Science: Querying DnD Session Notes with Vector Databases and AI

Sunday, July 7, 2019

Data Science: MongoDB Sky Searches with Geospatial Queries

Sunday, June 16, 2019

Data Science: MongoDB Sky Searches with HEALPix

Saturday, June 1, 2019

Data Science: Astronomy FITS Headers in MongoDB

Saturday, May 18, 2019

Data Science: Python Dataclasses and MongoDB

Tuesday, September 20, 2016

Data Science: What Should I Read Next?

Tuesday, September 13, 2016

Data Science: My Goodreads Reviews

Monday, August 1, 2016

Data Science: Republican & Democratic Conventions

Friday, July 29, 2016

Data Science: The Divided States of America

Friday, July 22, 2016

Data Science: Principal Component Analysis of Twitter Data

Friday, July 15, 2016

Data Science: Presidential Candidates on Twitter

Friday, June 17, 2016

Bokeh Plots in Blogger

Thursday, May 5, 2016

Coding: A Python Notifier for the NYC Subway

Friday, March 25, 2016

Data Science: Creating my First Web Application

Total Pageviews

Reach Me