59 Commits

Author SHA1 Message Date
e9d58c95be Remove image downloading
The special casing of Wordpress and image downloading was not reliable
for me so I have removed it, and tried to simplify the code in the
process. If you still need this functionality you will want to pin v0.2.1.
2022-02-19 13:38:12 +00:00
f3daed0bfb link to feed-to-activitypub 2021-10-08 13:40:48 -04:00
7a90313f1e a bit more verbose 2021-09-25 16:35:45 +00:00
be69e525b9 guard against content-type http header not being present 2021-09-18 20:44:07 +00:00
fd292f6222 fixed config file error 2021-01-07 21:20:36 +00:00
29f416d7a4 catch http errors when fetching images 2020-11-24 21:45:10 +00:00
fb914c7510 doc fix 2020-10-09 16:35:11 +00:00
e73f405b54 new version 2020-10-09 16:28:41 +00:00
fa175cf9c6 published_parsed is not always available, but updated is 2020-10-09 16:27:50 +00:00
b72a1c63df 3.3 is only required 2020-09-29 12:02:16 -04:00
294bd2969f new release 2020-09-29 11:29:27 -04:00
a2b196bc25 fixed a stray diff boundary and reorganized imports 2020-09-29 11:28:08 -04:00
d75ecf5377 new release and require python3 2020-09-29 11:16:24 -04:00
79be320e06 merged and resolved conflicts #18 2020-09-29 11:13:36 -04:00
5515e7bd0c merged and resolved conflicts #14 2020-09-29 11:03:31 -04:00
46b46ca875 merged and resolved a few conflicts #8 2020-09-29 10:51:29 -04:00
b5ec046f87 Merge pull request #7 from htgoebel/issue-4
Sort entries in reverse published order.
2020-09-29 10:21:01 -04:00
bc593134c4 Merge pull request #6 from htgoebel/issue-5
Add minimal argparse support to get support for `--help`.
2020-09-29 10:19:48 -04:00
350764b352 Merge pull request #2 from stefangrotz/master
added entry.link template for some feeds (e.g. youtube-rss)
2020-09-29 10:19:24 -04:00
2748ac0da6 Fix: Posts published while feediverse is running are not tooted.
Fix race-condition: If a post was published within the short period of
time between fetching the RSS feed and saving the config-file, this
post was not tooted. This was caused by the timestamp in the
config-file having been the time when the file was written, not when
the feed was fetched.

I (hopefully) fixed this by storing the latest post's timestamp in the
config file. This still might cause the same issue if several feeds
are checked using the same config file.
2020-09-25 21:08:36 +02:00
7c7f1c049c With dry-run, print title of post.
This is to easy validation results.
2020-09-25 21:03:56 +02:00
60d74188c3 Enhance cleanup of fetched texts.
Remove all HTML-elements with a class "read-more" or a class matching
"read-more-*". This will remove the "Read More".
2020-09-25 19:18:47 +02:00
45897295d1 Remove hyphens ("-") from terms to avoid hashtags terminating early 2020-07-11 07:04:06 +01:00
8749618a8a Remove "." from terms to avoid hashtags terminating early 2020-07-05 08:48:24 +01:00
f804a5ea57 Connect multiword terms with underscore instead of splitting the words into separate hashtags to fix #17 2020-06-24 21:40:36 +01:00
f280fb0ffc Fix typo to correctly limit toot length to 500 characters, fixes #13 2020-06-15 16:15:28 +01:00
5945a9f9cb Add work-around for verbose-mode on non-unicode terminals. 2019-09-16 14:49:08 +02:00
52cf05c09c Add feed config option generator.
This allows setting or overwriting the generator provided by
the feed.
2019-09-16 14:28:45 +02:00
7df2d306e4 Don't crash if feed does not contain a "generator" element. 2019-09-16 14:28:45 +02:00
17bba74f22 Readme: Add "Special Handling for Different Feed Generators".
I should have had added this when adding the special support
for Wordpress in around e6a16dbe55.
2019-09-16 14:28:40 +02:00
b57bc48d0d Update readme. 2019-04-23 22:44:50 +02:00
9e1a94d4ca Add cleaning up white-space in fetched texts. 2019-04-23 22:39:37 +02:00
09a3588f71 Document template element '{content}'. 2019-04-23 22:39:25 +02:00
e41073efbc Fix template element '{content}'.
This was the same as '{summary}' and needs more attention, too.
2019-04-23 22:38:42 +02:00
7a5b30aeef Add option -v/--verbose. 2019-04-23 21:59:00 +02:00
8e51b4344d Add command line option -n/--dry-run. 2019-04-23 21:56:04 +02:00
0b65eb8e21 Make adding images into the toot configurable.
Add an option "include_images" into the config file.
2019-04-16 14:02:07 +02:00
2d45df57f1 Minor code cleanup.
Preset config values when reading config file. This is
to ease introducing new options (like the next commit
will do).
2019-04-16 14:02:07 +02:00
e0dde90b7d On setup ask whether existing entries shall be tooted, too. 2019-04-16 14:02:06 +02:00
b0ba30b5f3 Minor code cleanup.
Add and use helper function to ask a yes/no question.
2019-04-16 14:02:06 +02:00
da5486d004 Fix: Mastodon allows posting 4 images max. 2019-04-16 14:02:06 +02:00
2624eed96b Fix: If last-updated is not given in config, no feeds are pushed.
The bug was: If last-updated was not given in the config, the current
date and time was used, inhibiting posting "old" entries.

Todo: Add an option to ask whether "old" entries shall be posted on
first run.
2019-04-16 12:07:42 +02:00
e6a16dbe55 For wordpress skip all images provided by a plugin. 2019-04-16 12:07:41 +02:00
d2e57bbc27 Add a work-around for buggy wordpress (urls encoded wrong). 2019-04-16 12:07:41 +02:00
ae78c8c16f Make "content" available in the template.
Depending on the feed, the adding content could be more
appropriate. Leave this choice to the user.
2019-04-16 12:07:41 +02:00
03d48992c7 Add detection of premalink for wordpress-generated feeds. 2019-04-16 12:07:41 +02:00
350f2bca3f Add detection of feed generator and pass it for get_entry().
This allows generator-specific handling of e.g. url.

For example in wordpress `id` is an ugly url, while the
speaking permalink is stored in a alternate link
2019-04-16 12:07:40 +02:00
83ed532680 Add retrieving images from RSS & posting them.
Collects image urls from summary, content and enclosures
(attachments).

This add urllib3 as requirement.
2019-04-16 12:04:20 +02:00
5424eb2dd6 Sort entries in reverse published order.
In a feed typically the newest entries are on top, while the older
ones should be posted first. Thus reverse the order, based on
publish date.

Closes #4.
2019-04-04 15:10:50 +02:00
3f4d051b84 Add minimal argparse support to get support for --help.
Closes #5.
2019-04-04 15:10:39 +02:00
13d1dd2623 Fix deprecation warning when calling yaml.load().
This was the message: YAMLLoadWarning: calling yaml.load() without
Loader=... is deprecated, as the default Loader is unsafe
2019-03-31 00:00:54 +01:00
0b13bbbabe Very small code cleanup. 2019-03-31 00:00:53 +01:00
8886fd5d2d Remove HTML tags from content.
Do this as early as processing the entry so later steps can count
on it (esp. when counting characters)

Also add a new requirement: beautifulsoup4.
2019-03-31 00:00:53 +01:00
fc56be6d70 Filter entries prior to processing any entry.
This saves processing time, esp. since for most installation
there should not be so many changes and most times, there will
be zero entries to be posted, thus there is not need to process them.
2019-03-31 00:00:50 +01:00
e99c18b249 Sort entries in reverse published order.
In a feed typically the newest entries are on top, while the older
ones should be posted first. Thus reverse the order, based on
publish date.

Closes #4.
2019-03-30 23:59:40 +01:00
493c1ad3f3 Add minimal argparse support to get support for --help.
Closes #5.
2019-03-29 21:32:36 +01:00
078f0edbf7 updated readme template section 2019-03-10 10:18:58 +01:00
9d101a0dad added entry.link to support youtube rss
youtube RSS (e.g. https://www.youtube.com/feeds/videos.xml?channel_id=UCsXVk37bltHxD1rDPwtNM8Q) uses link tags so I added them to them.
2019-03-10 10:16:42 +01:00
37aedd9e56 try 499? 2018-10-23 19:45:16 -04:00
4 changed files with 121 additions and 66 deletions

View File

@ -1,6 +1,6 @@
The MIT License (MIT)
Copyright (c) 2018 Ed Summers
Copyright (c) Ed Summers
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal

View File

@ -1,6 +1,6 @@
*feediverse* will read RSS/Atom feeds and send the messages as Mastodon posts.
Please use responsibly! *feediverse* is kind of the same thing as [feed2toot]
but it's just one module that works with Python 3 ... and I was bored.
It's meant to add a little bit of spice to your timeline from other places.
Please use it responsibly.
## Install
@ -18,6 +18,8 @@ Once *feediverse* is configured you can add it to your crontab:
*/15 * * * * /usr/local/bin/feediverse
Run `feediverse --help` to show the command line options.
## Post Format
You can customize the post format by opening the configuration file (default is
@ -32,7 +34,11 @@ like so:
Bookmark: {title} {url} {summary}
`{hashtags}` will look for tags in the feed entry and turn them into a space
separated list of hashtags.
separated list of hashtags. For some feeds (e.g. youtube-rss) you should use `{link}` instead of `{url}`.
`{content}` is the whole content of the feed entry (with html-tags
stripped). Please be aware that this might easily exceed Mastodon's
limit of 512 characters.
## Multiple Feeds
@ -45,20 +51,3 @@ Since *feeds* is a list you can add additional feeds to watch if you want.
- url: https://example.org/feed/
template: "dot org: {title} {url}"
## Why?
I created *feediverse* because I wanted to send my Pinboard bookmarks to
Mastodon. I've got an IFTTT recipe that does this for Twitter, but IFTTT
doesn't appear to work with Mastodon yet. That being said *feediverse* should
work with any RSS or Atom feed (thanks to [feedparser]).
## Warning!
Please be responsible. Don't fill up Mastodon with tons of junk just because you
can. That kind of toxic behavior is why a lot of people are trying to establish
other forms of social media like Mastodon.
[feed2toot]: https://gitlab.com/chaica/feed2toot/
[feedparser]: http://feedparser.org/

View File

@ -1,16 +1,36 @@
#!/usr/bin/env python3
import os
import re
import sys
import yaml
import argparse
import dateutil
import feedparser
from bs4 import BeautifulSoup
from mastodon import Mastodon
from datetime import datetime, timezone
from datetime import datetime, timezone, MINYEAR
DEFAULT_CONFIG_FILE = os.path.join("~", ".feediverse")
def main():
config_file = get_config_file()
parser = argparse.ArgumentParser()
parser.add_argument("-n", "--dry-run", action="store_true",
help=("perform a trial run with no changes made: "
"don't toot, don't save config"))
parser.add_argument("-v", "--verbose", action="store_true",
help="be verbose")
parser.add_argument("-c", "--config",
help="config file to use",
default=os.path.expanduser(DEFAULT_CONFIG_FILE))
args = parser.parse_args()
config_file = args.config
if args.verbose:
print("using config file", config_file)
if not os.path.isfile(config_file):
setup(config_file)
@ -23,62 +43,101 @@ def main():
access_token=config['access_token']
)
newest_post = config['updated']
for feed in config['feeds']:
if args.verbose:
print(f"fetching {feed['url']} entries since {config['updated']}")
for entry in get_feed(feed['url'], config['updated']):
masto.status_post(feed['template'].format(**entry)[0:500])
newest_post = max(newest_post, entry['updated'])
if args.verbose:
print(entry)
if args.dry_run:
print("trial run, not tooting ", entry["title"][:50])
continue
masto.status_post(feed['template'].format(**entry)[:499])
save_config(config, config_file)
def get_config_file():
if __name__ == "__main__" and len(sys.argv) > 1:
config_file = sys.argv[1]
else:
config_file = os.path.join(os.path.expanduser("~"), ".feediverse")
return config_file
def save_config(config, config_file):
copy = dict(config)
copy['updated'] = datetime.now(tz=timezone.utc).isoformat()
with open(config_file, 'w') as fh:
fh.write(yaml.dump(copy, default_flow_style=False))
def read_config(config_file):
config = {}
with open(config_file) as fh:
config = yaml.load(fh)
if 'updated' in config:
config['updated'] = dateutil.parser.parse(config['updated'])
else:
config['updated'] = datetime.now(tz=timezone.utc)
return config
if not args.dry_run:
config['updated'] = newest_post.isoformat()
save_config(config, config_file)
def get_feed(feed_url, last_update):
new_entries = 0
feed = feedparser.parse(feed_url)
for entry in feed.entries:
e = get_entry(entry)
if last_update is None or e['updated'] > last_update:
new_entries += 1
yield e
return new_entries
if last_update:
entries = [e for e in feed.entries
if dateutil.parser.parse(e['updated']) > last_update]
else:
entries = feed.entries
entries.sort(key=lambda e: e.updated_parsed)
for entry in entries:
yield get_entry(entry)
def get_entry(entry):
hashtags = []
for tag in entry.get('tags', []):
for t in tag['term'].split(' '):
hashtags.append('#{}'.format(t))
t = tag['term'].replace(' ', '_').replace('.', '').replace('-', '')
hashtags.append('#{}'.format(t))
summary = entry.get('summary', '')
content = entry.get('content', '') or ''
if content:
content = cleanup(content[0].get('value', ''))
url = entry.id
return {
'url': entry.id,
'title': entry.title,
'summary': entry.get('summary', ''),
'url': url,
'link': entry.link,
'title': cleanup(entry.title),
'summary': cleanup(summary),
'content': content,
'hashtags': ' '.join(hashtags),
'updated': dateutil.parser.parse(entry['updated']),
'updated': dateutil.parser.parse(entry['updated'])
}
def cleanup(text):
html = BeautifulSoup(text, 'html.parser')
text = html.get_text()
text = re.sub('\xa0+', ' ', text)
text = re.sub(' +', ' ', text)
text = re.sub(' +\n', '\n', text)
text = re.sub('\n\n\n+', '\n\n', text, flags=re.M)
return text.strip()
def find_urls(html):
if not html:
return
urls = []
soup = BeautifulSoup(html, 'html.parser')
for tag in soup.find_all(["a", "img"]):
if tag.name == "a":
url = tag.get("href")
elif tag.name == "img":
url = tag.get("src")
if url and url not in urls:
urls.append(url)
return urls
def yes_no(question):
res = input(question + ' [y/n] ')
return res.lower() in "y1"
def save_config(config, config_file):
copy = dict(config)
with open(config_file, 'w') as fh:
fh.write(yaml.dump(copy, default_flow_style=False))
def read_config(config_file):
config = {
'updated': datetime(MINYEAR, 1, 1, 0, 0, 0, 0, timezone.utc)
}
with open(config_file) as fh:
cfg = yaml.load(fh, yaml.SafeLoader)
if 'updated' in cfg:
cfg['updated'] = dateutil.parser.parse(cfg['updated'])
config.update(cfg)
return config
def setup(config_file):
url = input('What is your Mastodon Instance URL? ')
have_app = input('Do you have your app credentials already? [y/n] ')
if have_app.lower() == 'y':
have_app = yes_no('Do you have your app credentials already?')
if have_app:
name = 'feediverse'
client_id = input('What is your app\'s client id: ')
client_secret = input('What is your client secret: ')
@ -98,6 +157,7 @@ def setup(config_file):
access_token = m.log_in(username, password)
feed_url = input('RSS/Atom feed URL to watch: ')
old_posts = yes_no('Shall already existing entries be tooted, too?')
config = {
'name': name,
'url': url,
@ -108,6 +168,8 @@ def setup(config_file):
{'url': feed_url, 'template': '{title} {url}'}
]
}
if not old_posts:
config['updated'] = datetime.now(tz=timezone.utc).isoformat()
save_config(config, config_file)
print("")
print("Your feediverse configuration has been saved to {}".format(config_file))

View File

@ -5,8 +5,8 @@ with open("README.md") as f:
setup(
name='feediverse',
version='0.0.10',
python_requires='>=2.7',
version='0.3.0',
python_requires='>=3.3',
url='https://github.com/edsu/feediverse',
author='Ed Summers',
author_email='ehs@pobox.com',
@ -14,6 +14,10 @@ setup(
description='Connect an RSS Feed to Mastodon',
long_description=long_description,
long_description_content_type="text/markdown",
install_requires=['feedparser', 'mastodon.py', 'python-dateutil', 'pyyaml'],
install_requires=['beautifulsoup4',
'feedparser',
'mastodon.py',
'python-dateutil',
'pyyaml'],
entry_points={'console_scripts': ['feediverse = feediverse:main']}
)