shell-script-pt
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [shell-script] Futuro desta lista de discussão


From: Rodrigo Tenorio
Subject: Re: [shell-script] Futuro desta lista de discussão
Date: Fri, 1 Nov 2019 09:00:34 -0300

Se for necessário para manter a lista com todo este conteúdo valioso, me comprometo a doar dois dólares em reais para o plano anual do grupo.

Infelizmente não disponho de tempo para me envolver além disso.

Em sex, 1 de nov de 2019 06:26, Jorge Barros de Abreu address@hidden [shell-script] <address@hidden> escreveu:
 

Algue'm ja' fez o backup????

Se todo mundo começar a fazer backup ao mesmo tempo o
yahoo vai entender mal essas tentativas de backup.

Abaixo segue dois README's de programas sugeridos de backup
postados aqui na lista (nao me lembro por quem, desculpe)

# YahooGroups-Archiver

#### A simple python script that archives all messages from a public Yahoo Group

YahooGroups-Archiver allows you to make a backup copy of all the messages in a public group. Not only is all the message content downloaded, but also all other raw data that Yahoo uses to display the messages.

Messages are downloaded in a JSON format, with one .json file per message.

There is support for private groups, but this requires that you have a Yahoo groups account that has access to the private groups you want to archive. See the 'Private Groups' section for more info.

Works with both Python 2 and Python 3.

## Usage
**`python archive_group.py <groupName> [options] [nologs]`**
where *`<groupName>`* is the name of the group you wish to archive (e..g: hypercard)

**Options**
* *`update`* - the default., Archive all new messages since the last time the script was run
* *`retry`* - Archive any new messages, and attempt to archive any messages that could not be downloaded last time
* *`restart`* - Delete all previously archived messages and archive again from scratch

Please note that you can only have one *Option*, if you specify more than one, only the first will be used, with the others being ignored.

By default a log file called <groupname>.txt is created and stores information such as what messages could not be received. This is entirely for the benefit of the user: it's not needed at all by the script during any re-runs (although re-runs will append new information to the log file). If you don't want a log file to be created or added to, add the `nologs` keyword when you call the script.

## Private Groups
It is possible to archive private groups using this tool, but the way to go about doing this is slighly fiddly at the moment. Rather than simply providing your login information for the Yahoo account that has access to the private groups, you need to provide two pieces of information from Yahoo's login cookies (small files created by web browsers to store data for various uses, such as allowing you to login to websites and then stay logged in for a certain period of time).

Cookie information can be found through the use of a plug-in for your web browser. (I use 'Cookie Manager' on FireFox, although there are many other options for FireFox and other browsers). The two cookies you are looking for are called *Y* and *T*, and they are linked to the domain *yahoo.com*. Extract the data from these cookies, and paste it into the appropriate variables in the *archive_groups.py* script. You should now be able to archive a private group.

Please note that this support is still experimental. One important issue to consider is that a cookie will expire after a certain amount of time, which varies between computers. This means that you may have to re-fetch the *Y* and *T* cookie data every few days, or you will not be able to archive private groups.

## Note
Yahoo attempts to block connections that it deems to be "spamming", and so after around 15,000 messages have been downloaded it is highly likely that Yahoo will block you. This is OK, the script will automatically stop, and Yahoo should unblock you after around two hours. Running the script again once you have been unblocked will just continue where it left off. (Unless you run with the *`restart`* *[option]*, of course!

## Credits
Thanks to the [Archive Team](http://archiveteam.org/) for making [information about the Yahoo Groups API](http://www.archiveteam.org/index.php?title=Yahoo!_Groups) available.

**********************

# yahoo-groups-backup
A python script to backup the contents of Yahoo! groups, be they private or public.

## Setup/Requirements

The project requires Python 3.5+, Mongo, and a computer with a GUI as Selenium is used for the scraping (to be able to handle private groups).

[virtualenv](https://virtualenv.pypa.io/en/stable/) is recommended.

git clone https://github.com/csaftoiu/yahoo-groups-backup.git
cd yahoo-groups-backup
pip install -r requirements.txt

## Example

To scrape an entire site, say the `concatenative` group:

../yahoo-groups-backup.py scrape_messages concatenative

This will shove all the messages into a Mongo database (default `localhost:27017`), into the database of the same name as the group.

To scrape the files as well (though this group has no files):

../yahoo-groups-backup.py scrape_files concatenative

To dump the site as a human-friendly, fully static (i.e. viewable from the file system) website:

../yahoo-groups-backup.py dump_site concatenative concatenative_static_site

Then simply open `concatenative_static_site/index.html` and browse!

## Full Usage

To see the full usage:

../yahoo-groups-backup.py -h

--
Data Estelar 2458788,887708
http://sites.google.com/site/ficmatinf
Desejo-lhe Paz, Vida Longa e Prosperidade.
São Bem Vindas Mensagens no Formato texto UTF-8 com Acentos.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]