# Hacker News

Resources for using the Hacker News API

## About

| Key     | Value |
| ------- | ----- |
| Source | [Hacker News API: Documentation and Samples for the Official HN API](https://github.com/HackerNews/API) |
| Website | [Y Combinator - Hacker News](https://news.ycombinator.com/) |


## Import Libraries

### External Libraries

In [1]:
from urllib.request import urlopen
import json
import pandas as pd

## Get Stories

In [2]:
def get_stories(
    category: str = 'topstories', 
    number: int = 5, 
    include_kids: bool = False
    ) -> pd.DataFrame:
    """Return a DataFrame of Hacker News Stories

    Args:
        category (str): topstories|newstories|beststories
        number (int): total number of stories to return
        include_kids (bool, optional): normalize with comments column. Defaults to False.

    Returns:
        pd.DataFrame: table of stories
    """
    # Define url variables for stories
    stories_list_base_url = 'https://hacker-news.firebaseio.com/v0/'
    stories_list_prefix_url = category
    stories_list_suffix_url = '.json'
    stories_list_url = \
        stories_list_base_url + \
        stories_list_prefix_url + \
        stories_list_suffix_url
    # Get list of stories
    response = urlopen(stories_list_url)
    data_json = response.read().decode('utf-8')
    stories_list = json.loads(data_json)[:number]
    # Define url variables for individual story
    story_base_url = 'https://hacker-news.firebaseio.com/v0/item/'
    # story_prefix_url = '1'
    story_suffix_url = '.json'
    column_names = [
        'by',
        'descendents',
        'id',
        'kids',
        'score',
        'time',
        'title',
        'type',
        'url'
    ]
    stories_df = pd.DataFrame(columns = column_names)
    # Retrieve json data for each story
    for story_item_num in stories_list:
        story_url = \
            story_base_url + \
            str(story_item_num) + \
            story_suffix_url
        story_df = pd.read_json(story_url)
        stories_df = pd.concat([stories_df, story_df])
    # Remove comment ids if requested
    if include_kids == False:
        stories_df.drop('kids', inplace = True, axis = 1)
        stories_df.drop_duplicates(inplace = True, ignore_index = True)
        return stories_df
    else:
        return stories_df

In [3]:
get_stories('topstories', 3, False)

Unnamed: 0,by,descendents,id,score,time,title,type,url,descendants
0,Vladimof,,31382372,255,1652562097,Shaped Charges – Sheet of copper going through...,story,https://www.youtube.com/watch?v=K-3cTsvI7ss,130.0
1,penneyd,,31383725,74,1652570912,Spreading rock dust on farms is an overlooked ...,story,https://www.anthropocenemagazine.org/2022/05/t...,31.0
2,sternmere,,31382096,209,1652560629,Long-term benzodiazepine use causes synapse lo...,story,https://scitechdaily.com/long-term-benzodiazep...,258.0


In [4]:
three_stories = get_stories('topstories', 3, False)

In [5]:
type(three_stories)

pandas.core.frame.DataFrame

In [6]:
three_stories['title'].to_list()

['Shaped Charges – Sheet of copper going through 1ft of solid steel (2010) [video]',
 'Spreading rock dust on farms is an overlooked but tantalizing climate solution',
 'Long-term benzodiazepine use causes synapse loss and cognitive deficits in mice']