Showing posts with label Python. Show all posts
Showing posts with label Python. Show all posts

Sunday, July 7, 2019

Data Science: MongoDB Sky Searches with Geospatial Queries

This is the fourth and, for now, final set of posts in my tutorial on using MongoDB No-SQL databases for astronomical work. We've created a database of Brown Dwarf objects making use of Python 3.7's Dataclasses, we've also stored header metadata for a variety of FITS files, and we've written functions to perform cone searches using HEALPix. Today, we're looking again at how to query the sky, but this time using MongoDB's built-in geospatial's functionality. As before, I provide a Jupyter notebook where those interested can follow along.

Sunday, June 16, 2019

Data Science: MongoDB Sky Searches with HEALPix

This is the third blog post in a series about utilizing MongoDB NoSQL databases with astronomical data. Prior posts introduced how to store astronomical objects and how to store FITS header metadata. On today's post, we'll visit one of the most common things we do in astronomy- the cone search. In other words, how to do you search your database for objects in the sky that are located close to your input coordinates. Today we'll be tackling that problem "from scratch" utilizing HEALPix rather than any built-in functionality. As before, I provide a Jupyter notebook in my GitHub repo for those who may want more details and to run it on their own.

Saturday, June 1, 2019

Data Science: Astronomy FITS Headers in MongoDB

This is the second post I have about using MongoDB NoSQL databases with astronomical data. If you'd like a refresher about what that means, check out my first post, where I describe how to ingest a custom BrownDwarf class object into these type of databases. Today, we're looking at a more general problem- metadata. Metadata is the information that describes the how, when, where of the data itself. For example, which telescope took the data, at what time of night, for how long, with what filter, etc etc. A lot of this information is encapsulated in the data files itself and, currently, the most commonly used format in astronomy is the FITS file.

In this post, we'll have a look at how we can extract the metadata from a FITS file and load it into our NoSQL database. As before, I provide a Jupyter notebook if you'd like to run the code yourself.

Saturday, May 18, 2019

Data Science: Python Dataclasses and MongoDB

Over the past few weeks, I've been playing a bit with some NoSQL databases, in particular, with MongoDB. This is one particular type of database known as a document-store database and it works primarily by saving JSON formatted 'documents'. While exploring this technology and working on some Python code, I realized how easy it is to convert a standard Python class into a dictionary and how dictionaries readily translate into JSON. With this knowledge in hand, a light-bulb went off in my head as I realized I could make use of the new dataclasses implemented as part of Python 3.7 and quickly create a working database with minimal code.

In this post, I'll describe some of the ideas I had in mind while working through this and, if you want to try this on your own, I can point you to this Jupyter notebook where I work out this example.

Tuesday, September 20, 2016

Data Science: What Should I Read Next?


As I wrote about last week, I’ve spent a bit of time looking over my reviews on Goodreads to explore trends in what authors I read, how fast I read, and how I review books. In today’s post, we’ll tackle something a little more ambitious: given the data I can readily access from the Goodreads API, can I predict how I will rate books I haven’t yet read?

Let’s dive right in.

Tuesday, September 13, 2016

Data Science: My Goodreads Reviews

Followers of my blog will know that I read and review quite a few of books throughout the year. I track the books I read and those I want to read on Goodreads and recently came across their API. I decided to figure out how to access it and see what sort of information I could glean from my Goodreads reading history. This particular post explores trends in my reading and reviewing habits, as well as looking at what authors I've read. Next week’s post will discuss my attempt to create a model to predict the reviews I give a particular book. With that model in hand, I can decide what books to read based on my own interests.

Let’s jump right in.

Monday, August 1, 2016

Data Science: Republican & Democratic Conventions


In the past few weeks, the two major political parties in the United States of America held their national conventions. While I couldn't listen to all the speeches, I followed the news and paid attention to the overall scene. After they were done, I decided to grab the speeches of the major speakers and see if I could find any obvious trends in their word choices, similar to what I did with my Twitter project. In this blog post, I'll discuss what I can see in the data. You can find my data and all my scripts at this GitHub repo.

Friday, July 29, 2016

Data Science: The Divided States of America



In the prior two posts, I have described how I gathered twitter data from @HillaryClinton and @realDonaldTrump, how I ran a sentiment analysis on the individual tweets, and how I performed a principal component analysis on the most commonly used words. Today, I’ll tie everything together and describe how I created a model to predict whether a given tweet belongs to either of the two candidates.

Friday, July 22, 2016

Data Science: Principal Component Analysis of Twitter Data


As described on my last blog post on this topic, I've been tracking tweets from the US presidential candidates, Hillary Clinton and Donald Trump. I've looked at the top words they used and the sentiments expressed in their tweets given their word choice. However, some words are used with others almost all the time, a notable example being a slogan like Make America Great Again. As such, it may be beneficial to look at groups of words rather than individual words. For that, I took an approach applying a Principal Component Analysis. Below I describe what this is, how I used it, and what it reveals. Do note, however, that I'm applying things I learned in astronomy to this problem rather than taking courses specific to text mining. It may be that there are better tools out there than what I've used.

Friday, July 15, 2016

Data Science: Presidential Candidates on Twitter


Over the past few months, I've been working on a little hobby data science project to explore twitter data with regards to the upcoming presidential election in the United States. The project has evolved quite a bit and detailing it in full is beyond the scope of a single blog post. As such, I've decided to split it into (at least) 3 posts. This post is the first of the series and will go over the basics of gathering data from Twitter and doing some simple text mining. The second and third posts will discuss more details of the project and show some neat visualizations I've created. I'll release all my code after the third post for any curious coders. For now, let's get started seeing what Hillary Clinton and Donald Trump's Twitter accounts are talking about.

Friday, June 17, 2016

Thursday, May 5, 2016

Coding: A Python Notifier for the NYC Subway

New York City Subway 6 train. Photo by Robert McConnell (Transferred from en.wikipedia to Commons.)

For fun, I created a small application in Python that checks the status of the New York City subway system and sends me an email in the morning and afternoon if there are delays in the specific lines I tend to use to get to/from work. Now, sure, something like this already exists and is offered by MTA, but I wanted to go ahead and write this myself.

Below, I describe how it works in case you want to create something similar. The code is available at GitHub, should you care to grab it.

Friday, March 25, 2016

Data Science: Creating my First Web Application

Over the past month or so, I've dedicated a bit of time each week to work on developing an web application for the Brown Dwarfs in New York City research team (BDNYC). This week, I was able to finally release it to the public as AstrodbWeb. I'm very proud of what I've made, simple though it is. It's inspired me to continue developing applications and exploring this route a bit more. For this blog post, I want to detail some of what I went through for others that may be thinking of similar projects. I'll provide links to resources that I found useful when developing this.