I found the apk file and I played around with Android Studio today but was having trouble getting the Nexus device with KittyKat or Lollipop to not crash when trying to open the device. Will keep trying different options.
Can you explain more what you mean by “do whatever you want without as much effort”? Is it because text-davinci-003 accepts more tokens for the prompt? Something else?
I was trying to get davinci-003 to convert text to SQL, and it worked with a very simple prompt like "convert this text into SQL". With all their other models, I could get it to work too but all required a few examples within the prompt.
Ambiki is a web application for pediatric speech, occupational, and physical therapists which will be launching later this month. The product has been incubated and built alongside therapists for 3 years at a private practice in Tennessee. Ambiki touches on almost everything including teletherapy, billing, scheduling, tracking patient outcomes, as well as providing resources/tools for therapists to use in their treatment sessions.
Send your CV (PDF) and a short cover letter (PDF) to info@ambiki.com. In your cover letter please answer: What interests you about working on Ambiki?
Nice work, this looks nice. I signed up and was writing a post. I was nearly finished and switched to the tutorial post (which I had archived). I clicked the hamburger and clicked "permanently delete" to remove the tutorial post. However, nothing happened. I then clicked on "All posts" to go back to the post I was working on and it had deleted that. Any way to recover that? I don't see a help email.
Yes, there's a chance I can recover this. Can you shoot me an email (hello@papyrus.dev) with the email you signed up with, and the name of the post you were working on? Happy to look into this.
We give schools and patients access to the therapies and resources they need to improve their communication and overall quality of life. Our experienced team is made up of 100+ specialized therapists in Speech, Occupational and Physical Therapy. Our development team is currently 3 full-time employees.
I did an analysis of different sentence segmentation tools when I was working on my own rule-based segmenter. The results can be found in this README (https://github.com/diasks2/pragmatic_segmenter).
I think this blog post almost hits on the key in the middle - in my opinion it is important to test (all of) the edge cases. The problem with most corpora typically used to test segmenters is that 80-90% of the sentences are the same (i.e. a regular sentence ending in a period). Thus if a segmenter just simply split the sentence at every period it would still show a 80-90% accuracy rate. This is why I am trying to develop a standardized set of edge cases: https://github.com/diasks2/pragmatic_segmenter#the-golden-ru...
Great work. I love the "Golden Rules" list you compiled. It seems like teams develop their NLP systems without sharing a common training set which leaves some teams without testing things like the "a.m. / p.m." thing.
See my comment below for some of the reasons I've had issues trying to test the commonly used segmentation corpora. I completely agree it would be great if there was a free (as in both speech and beer) common training set. One key would be that this common training set either provide the exact text that should be run in the segmenter or exact instructions on how to produce the text to run in the segmenter (re: see the issue I mentioned below of the ambiguity around how to actually test the Brown corpus).
For comparability, most people use the Penn Treebank-III WSJ data. Sections 03-06 are test, the remaining sections are train/dev.
Most methods are based on some sort of simple feature templates and machine learning, so they should generalize relatively well to a wide variety of languages, IMO.
If your rules treat an edge case that the above don't it'd probably be worth trying to suggest improvements to the unicode rules or the locale-specific ones.
Have you tried to evaluate your splitter on some other data, on this "typically used corpora"? The evaluation quality looks too optimistic - 98% / 100% quality means you made your code to work on your examples, but by using only a set of standartized tests you can't check:
* how broad is the coverage - there are other edge cases in real world, it may be impossible to cover them all;
* that the splitter doesn't make mistakes for real-world "regular" sentences (80-90% of sentences which are "the same").
The example set looks very good, and it looks like a good way to compare other sentence splitters. But it is not fair to provide evaluation metrics on the examples you used to develop your sentence splitter.
Good points. I'd love to test it on some of the typically used corpora. The issues I have are:
1) Most segmentation research papers are done by Universities which have access to the Penn Treebank data (WSJ and Brown corpus). However, the cost of that data is $1,700 https://catalog.ldc.upenn.edu/LDC99T42
2) The Brown corpus is available for free in NLTK (http://www.nltk.org/nltk_data/). However it is the tagged corpus. I've contacted the researchers for all of the top segmentation libraries but never received an answer to any of the following questions:
a) I’m assuming you preprocessed the text by removing the tags. Is this correct? Or did you use the untagged version, and if so do you have a link to that as I only found the tagged version in the NLTK data?
b) When removing the tags did you also remove each carriage return and newline so the text was one long string, each sentence separated by just one whitespace?
c) The download contains 100+ files. Did you analyze each individually? Or did you create one combined file? If you created a combined file how did you space each individual file within the larger file? Also, if you combined them what order did you combine them in?
So sure, all of these papers use the same data, but we have no idea if they are actually using that data in the same way, as none of the papers actually release their code and tests, or tell the steps they used to preprocess the corpus.
To test more broad coverage on my library I added the full text of Alice in Wonderland https://github.com/diasks2/pragmatic_segmenter/blob/master/s.... A grad student from Stanford offered to test my library on the WSJ corpus a few months ago which was very kind, but I'm still waiting to hear back on that.
I've found the best way to make upgrading easier is having good test coverage. I think one of the biggest fears when upgrading is having something break but not knowing that it is now broken. Although tests won't always catch everything they are a great start to ease the upgrade process.