Slack and Machine Learning: Use real-time Python to elevate work culture

Deephaven is a query engine that excels at working with real-time data. Data scientists and developers use Deephaven to analyze capital markets, blockchains, cryptocurrency, gaming, sports, and e-commerce. Why not use it for addressing ethical issues and improving an organization’s climate as well?

According to the MIT Sloan Management Review, toxic work culture is the biggest reason why people quit their jobs. Their research estimates it’s 10 times more important than salary.

Modern machine learning algorithms make recognizing toxic content in business messaging tools doable. Deephaven’s real-time capabilities make it easy.

Today, we’ll demonstrate how to create a working prototype of a solution that checks if a new message posted to a Slack channel reads as toxic. If so, a bot sends a warning message to the channel.

The process is simple and requires only 3 steps:

Receive and store real-time Slack chat messages in a Deephaven table.
Calculate the probability of toxicity for each message.
Send a notification if a message is classified as toxic.

If you just want to look at some code, this GitHub repository has everything.
For further details, keep reading!

To get messages from Slack, we’ll use Socket Mode. To set up Socket Mode, we need to create an app and generate an app-level token.

After that we are ready to receive a private WebSocket URL:

SLACK_ENDPOINT = 'https://slack.com/api/apps.connections.open'
APP_TOKEN = os.environ["APP_TOKEN"]


headers = {'Authorization': f'Bearer {APP_TOKEN}', 'Content-type': 'application/x-www-form-urlencoded'}
response = requests.post(SLACK_ENDPOINT, headers=headers)
url = response.json()["url"]

Let’s connect to it! For our example, we want the websocket to deliver events only about new messages in a Slack channel:

ws = create_connection(url)


BOT_OAUTH_TOKEN = os.environ["BOT_OAUTH_TOKEN"]


ws.send(
    json.dumps(
        {
            "type": "subscribe",
            "token": BOT_OAUTH_TOKEN,
            "event": {
                "type": "message",
                "subtype": None
            }
        }
    )
)

Deephaven’s DynamicTableWriter can help us create a live table to store incoming messages and their integer representations that will be used as features for our ML model:

Click to see the code!

columns = ["Index_" + str(num) for num in range(MAX_NUMBER)]
column_definitions = {col: dht.int32 for col in columns}
column_definitions["message"] = dht.string
dtw = DynamicTableWriter(column_definitions)
table = dtw.table



def thread_function():
    while True:
        try:
            data = json.loads(ws.recv())
            event = data["payload"]["event"]
            message = event["text"]
            if (data["retry_attempt"] == 0 and "bot_id" not in event):
                
                list_tokenized = tokenizer.texts_to_sequences([message])
                row_to_write = pad_sequences(list_tokenized, maxlen=MAX_NUMBER)[0].tolist()
                row_to_write.append(message)
                
                dtw.write_row(*row_to_write)

        except Exception as e:
            print(e)

thread = Thread(target=thread_function)
thread.start()

To recognize toxic patterns in incoming Slack messages, we’ll use a basic LSTM model trained on a Kaggle dataset:

Click to see the code!

model = load_model("/data/model.h5")
print(model.summary())



def predict_with_model(features):
    predictions = model.predict(features)
    return predictions



def table_to_array_int(rows, cols):
    return gather.table_to_numpy_2d(rows, cols, np_type=np.intc)


outputs = []
for i in range(len(TOXICITY_TYPES)):
    type = TOXICITY_TYPES[i]
    get_predicted_class = lambda data, idx: data[idx][i]
    outputs.append(learn.Output(type, get_predicted_class, "double"))


predicted = learn.learn(
    table=table,
    model_func=predict_with_model,
    inputs=[learn.Input(columns, table_to_array_int)],
    outputs=outputs,
    batch_size=100
)

Here is the table with our live predictions:

Now let’s use the Slack Web Client to send back a message to the channel containing the result of the predictions. An alert will trigger if the probability of toxic content is greater than a threshold for at least one of the toxicity types:

Click to see the code!

client = WebClient(token=BOT_OAUTH_TOKEN)

threshold = 0.5




def predicted_listener(update, is_replay):
    added_dict = update.added()
    warning = ""
    warning_types = [(type, added_dict[type]) for type in TOXICITY_TYPES if added_dict[type] > threshold]
    for item in warning_types:
        warning += f'Detected {item[0]} with probability {item[1][0]:.1f}. '
    if warning != "":
        client.chat_postMessage(channel=CHANNEL, text=warning)

predicted_handler = listen(predicted, predicted_listener)

Let’s test our bot:

This starter program just scratches the surface of the Artificial Intelligence (AI) integration into the workplace. But we hope it’ll inspire you to use Deephaven to solve real-life problems!

If you have any questions, comments, concerns, you can reach out to us on Slack – no toxicity welcome, of course. We’d love to hear from you!

Source link
lol