Downloading Fanfiction

Shirotsume

Not The Goddamn @dmin
#28
I just found this thread, and I found it amusing, since it's the work of like an hour programming to pull off most of these requests.
 
#30
whoops should have read more before posting. That said, does anyone have anything to download forum threads? Would've liked to have something like that when i was grabbing fosfor's stuff. copy & paste to txt doesn't look all that nice. that and it ends up looking weird on my kindle
 

PCHeintz72

The Sentient Fanfic Search Engine mk II
#31
nemisis1400 said:
whoops should have read more before posting. That said, does anyone have anything to download forum threads? Would've liked to have something like that when i was grabbing fosfor's stuff. copy & paste to txt doesn't look all that nice. that and it ends up looking weird on my kindle
I'll make sure to remember that if anyone asks me if TXT looks ok on Kindle.
 
#32
It's not that it's bad, it's that i use the duokan os instead of the native kindle os. This causes some of the words to show up as chinese characters. Only happens on the txt files made from copy pasting from forum post stories, and only on some of them for some reason. I'd use the kindle os, but duokan lets me drag and drop manga as zip, and rar files instead of compiling it as a pdf to read it.
 

PCHeintz72

The Sentient Fanfic Search Engine mk II
#33
I figured I would give the big three normal down loaders another check on their feature sets, of the features in them that matter to me that is.

Posting it as figured it might be of interest to others, since this topic keeps coming up from time to time, not just in this thread, but across the forum, and even in some other forums I go to.

Fanfiction Downloader Calibre Plug-in:

- Ability to process multiple links in one run: Yes.

- Sites supported: A large list. No AdultFanFiction.NET though. MediaMiner Support is better than Fanfiction Downloader .NET.

- Multiple chapter as separate file support: No.

- TXT output format: Yes.

- Chapter divider method: Not so good. The issue is it does not have a specific fixed tag method, such as *Chapter #*, like FanFiction Downloader .NET.

- Directory structure: Yes, in the file naming format specification. An oddity I noticed is it tended to place a number after this that seemed to correspond to the number in the database.

- File name structure: Configurable.

- Other: This had a odd bug, uncertain if it was Calibre or the plug-in itself... I installed the plug-in without error, but it refused to show in the toolbars until I shut down and restarted the program a couple times.

Fanfiction Downloader .NET:

- Ability to process multiple links in one run: Yes. Note this tends to crash or stop on first bad link.

- Sites supported: A decent list. The big three are on it. MediaMiner support is not as good as the Calibre plug-in, it does not allow for the Index Link.

- Multiple chapter as separate file support: No.

- TXT output format: Yes.

- Chapter divider method: Nice, at least in the ones checked it had a clear searchable standard tag method *Chapter #*.

- Directory structure: None, though it may be possible to do this in the file naming format specification.

- File name structure: Configurable.

De-FFNet-izer:

- Ability to process multiple links in one run: No.

- Sites supported: Only FF.NET, as far as know. It uses the story code number, not even the normal URL link.

- Multiple chapter as separate file support: Yes.

- TXT output format: No, though Minimal comes closest, it still has more than want.

- Chapter divider method: N/A, since it by default, at least for me, split into separate file chapters.

- Directory structure, near as can tell, will only make a directory by the story name, this is not so good as I have stories that have same name in different fandoms I follow.

- File name structure: Fixed.

Were I to adopt these today to change my work flow and reading habits, I would have to:

- Design a custom companion program to further process the files created here, to split them into proper chapters, to re-date the files to match the respective last update, place into subdirectories, and to extract the generated author and direct links for use in browsers and feed readers.

- Use Fanfiction Downloader .NET for FanFiction.NET, FicWad.COM, MediaMiner.ORG, and AdultFanFiction.NET. My MediaMiner.ORG links would need alterations to use the pattern this program is expecting to see, but would in the end be more consistent and allow for easier chapter break parsing than using the Calibre plug-in for it.

- Use Fanfiction Downloader Calibre plug-in for other sites not supported by the .NET program.

- For sites not supported by either of the above that are primarily HTML based sites use instead PageNest (I had used Page Nest in the past for some sites like Anime Adventure). While it can be tricky to configure, it can get the job done if it is straight TXT or HTML.
 

PCHeintz72

The Sentient Fanfic Search Engine mk II
#34
Fanfiction Downloader plugin for Calibre responded back to some of my bug/enhancement requests...

Basically... they are not happening.

It was brought to my attention apparently the separate files for each chapter functionality works for the EPUB format though... Which I would not know, since I have no EPUB compliant gadget level device, nor a PC Windows/Linux based reader for such, nor do I care to switch to it.

I do not know enough about Calibre to know how its inner workings are though to say whether it could really be done or not, so oh well...
 

Draculthemad

Well-Known Member
#35
EPUB is an open container format though.
It should be fairly simple to process a number of files into an epub and then use calibre or something to convert it to something else.

CPAN has a perl extension to generate epubs, and there is a command line tool for calibre's converter.

So it should be fairly easy to code something up.

Whats your preferred input type, and destination format?

Also, would it need to work on linux or windows?
 

PCHeintz72

The Sentient Fanfic Search Engine mk II
#36
Draculthemad said:
EPUB is an open container format though.
It should be fairly simple to process a number of files into an epub and then use calibre or something to convert it to something else.

CPAN has a perl extension to generate epubs, and there is a command line tool for calibre's converter.

So it should be fairly easy to code something up.

Whats your preferred input type, and destination format?

Also, would it need to work on linux or windows?
Don't trouble yourself... I have in mind the idea of toying around with a processing engine to take the text version output of Fanfiction Downloader .NET (not the caliber plug in) and split it into its chapters, a metadata index file as a false chapter 0, and place into a sub directory, redate the files to match the updated date, and make URL files for Author and Story URL links... not too hard... it is mapped in my head how to do it, but finding the time to do it is a bit problematic.

I know Fanfiction Downloader .NET can handle big runs, I managed as a test to process some 1700 story urls from audult fanfiction.net, fanfiction.net, mediaminer.org, and ficwad.com.

Only issue I found would be some mediaminer information did not get populated, or populated correctly, despite mediaminer having it.
- Author URL does not get filled
- if the story is a one shot, or never added to, the date field does not get properly filled.
- the name is the name of the story plus chapter name, not nam of story like in the files made from other sites.
 

Draculthemad

Well-Known Member
#37
If you are going that far, it would be rather trivial to simply import those files using the perl module and constructing an epub out of it.

The API described here does support importing plain text files as chapters:

http://search.cpan.org/~oty/EBook-EPUB/lib/EBook/EPUB.pm
 

PCHeintz72

The Sentient Fanfic Search Engine mk II
#38
Draculthemad said:
If you are going that far, it would be rather trivial to simply import those files using the perl module and constructing an epub out of it.

The API described here does support importing plain text files as chapters:

http://search.cpan.org/~oty/EBook-EPUB/lib/EBook/EPUB.pm
I think there has been some form of misunderstanding.

EPUB may be a good format... But I do not want EPUB format stories. At all... I have no gadget level device needing it, I have no files in that format, and until I installed Calibre I had no program to even read them on the PC.

I specifically want/desire/like/prefer TXT format, and specifically want to restore the natural chapter breaks by making each chapter its own independent separate file. I prefer it that way, my reading habits prefer it that way, my preferred programs for reading fanfiction in is a Text Editor, and I mentally track stories that way. Combining them into one file makes for nothing more than mental confusion on my part.

The issue, there are downloaders that allow for downloading multiple stories at a time, there are downloaders that allow for TXT format output, and there are downloaders that allow for multiple chapters. There is not a single currently working downloader I've encountered yet that can give me all three.

That is why up until now to download stories I used copy and paste... probably still will going forward.
 

Draculthemad

Well-Known Member
#39
Ah, I misunderstood. I thought you were looking to convert fanfiction into some portable format compatible without it being epub.

As for yanking just the text out of fanfic sites, I've experimented with doing dom stuff via php. That tends to break whenever the site changes their formatting however.

FF.net specifically does that all the time. I gave it up as a bad job after they changed the formatting 3 different times on my over the course of 4 hours while I was trying to tweak my downloader.
 

PCHeintz72

The Sentient Fanfic Search Engine mk II
#40
Draculthemad said:
Ah, I misunderstood. I thought you were looking to convert fanfiction into some portable format compatible without it being epub.

As for yanking just the text out of fanfic sites, I've experimented with doing dom stuff via php. That tends to break whenever the site changes their formatting however.

FF.net specifically does that all the time. I gave it up as a bad job after they changed the formatting 3 different times on my over the course of 4 hours while I was trying to tweak my downloader.
That is the beauty here... I would not be designing the downloader... I'd be using someone elses and making small companion programs to pull from that.

The idea, in concept as I envision it, is to use these downloaders to do a one time re-grab of the stories I track on say the six biggest sites I go to... that covers a huge number of them, I think over 3300, then have a companion program to reprocess them. I could even in theory have them process the dead fanfiction I already have for consistent look and feel.

The need is because this would give far more consistent results than all the various copy pasted source material I've done over a now 14 year + period. Authors whom changed names, links that have changed, ones whom have silently edited stories, etc... potential other items missed would be flattened and eliminated by such a plan.

No fanfiction downloader or even off line reader like page nest and httptrack works exactly as like or gives the exact results I want

Tentatively, based on tests... I plan to:

use Fanfiction downloader plugin for Calibre for FicWad.COM and MediaMiner.ORG, as results are most clean results using that, though the chapter functionality comparatively sucks. I may still use the .net downloader, which is the preferred downloader of those tested, but the resultant metadata is not as intact using that.

use fanfiction downloader .net for Fanfiction.NET and AdultFanFiction.NET

use Page Nest for Anime Adventure, and a couple other sites for the bulk of the rest.

have a companion program to take those various outputs and reprocess them into what I want.

merge/replace my old download copies with the above.

There are several end result advantages to this approach, I would be able to far better document and track the stories deleted over the years by authors. And, better align the links for some that go to other sites to be on these more stable six sites.

I also envision being able to for the first time ever allowing for the potential for separate direct and preview links....
 
Top